答案已發布2 個月前Last edited 上個月23 來源

Gemini Omni：Google I/O 2026 發表嘅多模態影片生成 AI

Google 在 I/O 2026 發表 Gemini Omni，多模態 AI 可以將文字、圖片、聲音同影片混合輸入，生成高質影片；首個版本 Gemini Omni Flash 已開始推出。[8][9] 同舊有影片模型 Veo 唔同，Gemini Omni 將影片生成同 Gemini 推理能力整合成一個統一模型，可以用自然語言對影片進行對話式編輯。[20][23] Google 同時擴展 SynthID 水印系統，用來標記 AI 生成內容，並會在 Search、Chrome 等產品加入檢測功能，亦有 OpenAI、Kakao、ElevenLabs 同 Nvidia 等公司採用。[17][24][39]

使用 Studio Global AI 搜尋並查核事實瀏覽更多熱門頁面

Illustration representing Google Gemini Omni generating video from text, image, audio and video inputs — What did Google announce with Gemini Omni at I/O 2026, how does it differ from Veo, what can Gemini Omni Flash do with text, image, audio, aGemini Omni is Google’s new multimodal AI model designed to generate video from combined text, image, audio, and video inputs.
AI 提示
Create a landscape editorial hero image for this Studio Global article: What did Google announce with Gemini Omni at I/O 2026, how does it differ from Veo, what can Gemini Omni Flash do with text, image, audio, a. Article summary: Google announced Gemini Omni at I/O 2026 as a new multimodal generation model that combines Gemini reasoning with creative generation, starting with video: it can take text, images, audio, and video together as input and. Topic tags: general, general web, user generated. Reference image context from search candidates: Reference image 1: visual subject "# Google's Gemini Omni can generate 'anything from any input,' starting with video. Google didn't forget AI creators in its latest round of Gemini announcements. Google didn't forg" source context "Google's Gemini Omni Can Generate 'Anything From Any Input ..." Reference image 2: visual subject "# Gemini Omni Vid
openai.com

Google 喺 Google I/O 2026 發表全新 AI 模型系列 Gemini Omni。呢個系統主打「多模態生成」，可以同時理解同處理多種輸入，例如文字、圖片、聲音同影片，然後生成新內容——而第一步就係 影片生成。

首個推出嘅版本叫 Gemini Omni Flash，發表當日已經開始喺 Google 嘅 AI 生態系統入面逐步推出。

以下整理 Gemini Omni 嘅重點：功能、同 Veo 嘅分別、實際可以做到啲咩、推出平台，以及 Google 點樣配合 SynthID 做 AI 內容識別。

Gemini Omni 係乜？

Gemini Omni 係一個 多模態生成模型家族（multimodal generative model）。Google 形容佢係將 Gemini 嘅推理能力 同 生成式媒體模型 結合起嚟。

推出初期主要做 影片生成：

可以同時輸入 文字、圖片、聲音同影片
AI 會根據呢啲資料生成 高質素影片
可以用 對話方式修改影片內容（例如改場景、物件、風格）

Google 表示，Omni 嘅模型設計目標係改善影片生成嘅 動作、物理效果同物件互動，令畫面更加自然同連貫。

目前主要輸出係影片，但 Google 已經透露，未來版本會逐步加入 圖片同文字生成 等能力。

同 Veo 有咩分別？

喺 Gemini Omni 之前，Google 嘅主要影片生成模型係 Veo。

兩者最大分別係設計定位。

Veo：

專門用嚟生成影片
屬於獨立嘅媒體生成模型

Gemini Omni：

統一多模態模型
可以同時接收文字、圖片、聲音、影片
將 Gemini 推理能力同生成模型整合

換句話講，Google 係將之前分散喺唔同工具（例如 Veo 等媒體模型）嘅能力整合成 一個基礎模型平台。

咁做嘅好處係：AI 可以同時理解唔同媒體之間嘅關係，例如用參考圖片、語音指示同影片素材一齊生成內容。

Gemini Omni Flash 可以做到啲咩？

Gemini Omni Flash 係 Omni 系列第一個正式推出嘅模型。

佢支援喺 同一個提示（prompt）入面混合多種輸入：

文字
圖片
聲音
影片

之後系統會生成 真實感較高嘅影片，而且可以再用自然語言繼續修改內容。

Google 示範過幾個典型用法，例如：

用文字 + 參考圖片生成完整影片場景
上傳現有影片，再用文字修改內容
用語音指示改變影片畫面

模型亦嘗試更好理解 物件移動、重力同互動，令生成影片嘅物理表現更合理。

推出時間同可以喺邊度用

Google 由 2026 年 5 月 19 日（I/O keynote 當日） 開始推出 Gemini Omni Flash。

首批支援平台包括：

Gemini App
Google Flow（AI 創作工具）
YouTube Shorts 同 YouTube Create（創作者工具）

喺 Gemini 生態入面，Omni 功能同 Google AI 訂閱計劃綁定。

支援方案包括：

Google AI Plus
Google AI Pro
Google AI Ultra

越高級嘅方案通常會有 更高使用上限同更進階功能。

Google 同時喺 I/O 2026 推出一個 每月 100 美元嘅 AI Ultra 訂閱方案，主要針對開發者同高階創作者，需要更高運算資源。

SynthID：AI 內容水印同檢測

隨住生成式 AI 越來越強，Google 亦同步推動 內容透明度工具。

核心技術叫 SynthID。

SynthID 係一種 隱形數碼水印技術，會嵌入喺 AI 生成內容入面，包括：

圖片
影片
聲音
文字

呢啲水印 人眼睇唔到，但軟件可以識別，用來驗證內容是否 AI 生成。

喺 I/O 2026，Google 宣布幾項重要擴展。

1. Search 同 Chrome 內建檢測

Google 會喺 Google Search 同 Chrome 瀏覽器加入識別工具，幫用戶判斷網上圖片是否 AI 生成或被 AI 修改。

2. 多間公司採用 SynthID

Google 亦宣布多間科技公司開始使用 SynthID，例如：

OpenAI
Kakao
ElevenLabs
Nvidia

目標係建立一個 跨公司嘅 AI 內容識別標準。

3. SynthID Detector 驗證工具

Google 仲推出 SynthID Detector 網站工具。

用戶可以上傳圖片、影片、聲音或文字，系統會檢查當中是否包含 SynthID 水印，協助記者、研究人員或平台驗證內容來源。

點解 Gemini Omni 重要？

Gemini Omni 代表 AI 媒體生成方向嘅一個重大轉變。

過去 AI 工具通常分開：

一個做文字
一個做圖片
一個做影片

Google 而家嘅策略係建立 統一多模態模型，一個系統同時理解同生成唔同媒體。

目前第一步係 多輸入生成影片，但長遠目標係：

任何輸入 → 任何輸出。

同時，Google 亦將呢類生成能力同 **水印同檢測系統（SynthID）**配合，希望減低 AI 假內容同 deepfake 帶來嘅風險。

Studio Global AI

Search, cite, and publish your own answer

Use this topic as a starting point for a fresh source-backed answer, then compare citations before you share it.

使用 Studio Global AI 搜尋並查核事實

人們還問