答え公開済み2 か月前Last edited 先月15 ソース

AIの「モデル崩壊」とは何か

AIが生成したデータでAIを繰り返し訓練すると「モデル崩壊」が起こり、データ分布の珍しいパターンが消えてモデルの表現力が低下することが研究で示されている。[1][4] 再帰的な学習ではサンプリングの偏りが増幅され、分布の“裾（まれな事例）”が徐々に消えていくため、後続モデルはそれらを学習できなくなる。[1][9] 分析研究では、少量の実世界データやモデルに組み込まれた事前知識がアンカーとして機能し、合成データが大多数でも崩壊を防ぐ可能性が示唆されている。[7][33]

Studio Global AIで検索して事実確認さらにトレンドページを見る

Concept illustration of AI model collapse showing synthetic data loops shrinking a distribution and removing rare patterns — What does the new study on AI model collapse find about preventing degradation when models are trained on synthetic data, why does recursiveRecursive training on AI‑generated data can gradually erase rare patterns from a model’s learned distribution, a phenomenon researchers call model collapse.
AI プロンプト
Create a landscape editorial hero image for this Studio Global article: What does the new study on AI model collapse find about preventing degradation when models are trained on synthetic data, why does recursive. Article summary: The study describes model collapse as a failure mode where recursively trained generative models lose information about the original data distribution, especially its rare or low-probability regions.. Topic tags: general, government, education, academic, general web. Reference image context from search candidates: Reference image 1: visual subject "However, as AI-generated data increasingly populates the internet, an important question arises: What happens when new AI models are trained on datasets containing their previous o" source context "Avoiding Model Collapse in AI Training - Risk Insight" Reference image 2: visual subject "Artificial intelligence models
openai.com

生成AIの学習では、**AIが作ったデータ（合成データ）**を使うケースが増えています。効率的にデータ量を増やせる一方で、研究者が警告しているのが「モデル崩壊（model collapse）」と呼ばれる現象です。これは、AIがAIの出力を学習し続けることで、現実のデータ分布の多様性を徐々に失っていく問題を指します。

研究によると、モデルを合成データだけで再帰的に訓練すると、元のデータに存在していた珍しいパターンが次第に消えていくことが分かっています。こうした欠落が積み重なると、モデルが世界を表現する能力そのものが歪んでしまいます。

インターネット上にAI生成コンテンツが急増している現在、この問題は次世代のAI開発にとって重要な課題と見られています。

モデル崩壊とは何か

モデル崩壊とは、生成モデルが人間などの実データではなく、過去のモデルが作ったデータを主に学習することで性能が劣化する現象です。

研究では、こうした再帰的な学習が不可逆的な欠陥を生む可能性が指摘されています。特に影響を受けるのが、データ分布の「テール（尾部）」と呼ばれる領域です。これは、頻度は低いものの現実を正確に表すために重要な珍しい事例を指します。

こうした事例が消えていくと、モデルの出力は次第に似通い、平均的で単調なパターンだけを再現するようになると考えられています。

Studio Global AI

Search, cite, and publish your own answer

Use this topic as a starting point for a fresh source-backed answer, then compare citations before you share it.

Studio Global AIで検索して事実確認

人々も尋ねます