答案已發布2 個月前Last edited 上個月15 來源

AI 模型崩塌（Model Collapse）係乜？點解 AI 用 AI 數據訓練會愈學愈偏

研究發現，如果 AI 模型反覆用 AI 生成嘅數據訓練，會出現「模型崩塌」，罕見模式會逐漸消失，模型對現實嘅描述變得愈來愈狹窄。[1][4] 遞歸式訓練會放大抽樣偏差：高機率模式愈來愈多，分佈尾端嘅罕見事件逐代被削弱，最終甚至完全消失。[1][9] 分析顯示，只要加入極少量真實世界數據，或者加入先驗知識限制模型，就有機會阻止模型崩塌，即使合成數據佔絕大多數。[7][33]

使用 Studio Global AI 搜尋並查核事實瀏覽更多熱門頁面

Concept illustration of AI model collapse showing synthetic data loops shrinking a distribution and removing rare patterns — What does the new study on AI model collapse find about preventing degradation when models are trained on synthetic data, why does recursiveRecursive training on AI‑generated data can gradually erase rare patterns from a model’s learned distribution, a phenomenon researchers call model collapse.
AI 提示
Create a landscape editorial hero image for this Studio Global article: What does the new study on AI model collapse find about preventing degradation when models are trained on synthetic data, why does recursive. Article summary: The study describes model collapse as a failure mode where recursively trained generative models lose information about the original data distribution, especially its rare or low-probability regions.. Topic tags: general, government, education, academic, general web. Reference image context from search candidates: Reference image 1: visual subject "However, as AI-generated data increasingly populates the internet, an important question arises: What happens when new AI models are trained on datasets containing their previous o" source context "Avoiding Model Collapse in AI Training - Risk Insight" Reference image 2: visual subject "Artificial intelligence models
openai.com

生成式 AI（例如大型語言模型）近年愈來愈多依賴合成數據（synthetic data）訓練——即係由較早一代 AI 產生嘅內容。但研究指出，如果呢個過程持續重複，可能會出現一種問題：模型崩塌（model collapse）。

所謂模型崩塌，係指模型逐漸失去對原始數據多樣性嘅理解，特別係罕見或不尋常嘅模式。當訓練循環反覆依賴 AI 生成嘅內容，而唔係真實世界數據時，呢啲模式會慢慢消失，最後令模型對現實嘅表示出現扭曲。

隨住 AI 生成內容喺網絡上愈來愈多，呢個問題亦開始被視為長遠 AI 發展嘅重要風險。

乜嘢係「模型崩塌」

模型崩塌係生成式模型嘅一種失效模式：當模型唔再主要從人類或真實世界數據學習，而係從舊模型生成嘅輸出學習時，就會出現性能同多樣性下降。

研究發現，呢種遞歸式訓練會引入不可逆嘅缺陷。模型會逐漸失去數據分佈「尾端」（tails）嘅資訊——即係那些出現次數少，但對準確描述現實非常重要嘅例子。

隨住訓練一代一代進行：

輸出變得愈來愈單一
模型更偏向常見模式
對罕見或特殊情況處理能力下降

呢個現象已經喺多種生成模型中觀察到，例如：

大型語言模型（LLMs）

Studio Global AI

Search, cite, and publish your own answer

Use this topic as a starting point for a fresh source-backed answer, then compare citations before you share it.

使用 Studio Global AI 搜尋並查核事實

人們還問