答案已發布2026年5月8日Last edited 2026年5月8日3 來源

騰訊開源 OpenSearch-VL：多模態 AI 搜尋代理的新「開放配方」

騰訊釋出 OpenSearch VL，一套用來建立多模態 AI 搜尋代理的開源框架或「配方」。 OpenSearch VL 可結合圖片理解、網頁搜尋、OCR、反向搜圖、裁剪、銳化、超解像等工具，做多步證據搜尋和推理。項目由騰訊混元牽頭，合作者包括 UCLA 和香港中文大學；論文於 2026年5月6日提交至 arXiv。

使用 Studio Global AI 搜尋並查核事實從「發現」瀏覽更多內容

4070

# Open-Source AI Agent Frameworks 2026: Complete Developer Comparison Guide# Open-Source AI Agent Frameworks 2026: Complete Developer Comparison Guide. #### Minghan Xu. Open-Source AI Agent Frameworks 2026: Complete Developer Comparison Guide. The 2026 landscape offers mature, production-ready options across different architectural approaches, each optimized for specific use cases and team reOpen-Source AI Agent Frameworks 2026: Complete Developer ...

簡單講，騰訊今次釋出嘅新框架叫 OpenSearch-VL。論文把它形容為一套建立前沿多模態搜尋代理的開放「配方」（open recipe），arXiv 顯示論文於 2026年5月6日提交 ^[2]。

它唔係純粹「睇張相然後答問題」的模型，而係想訓練 AI agent（智能代理）主動搵資料：一邊理解圖片，一邊調用外部工具，逐步搜尋、核對、再推理。

OpenSearch-VL 做乜？

OpenSearch-VL 的核心，是令多模態模型由「被動理解圖片」變成「主動尋找證據」。根據早期報道，它可使用的工具包括網頁搜尋、反向圖片搜尋、OCR 文字辨識、圖片裁剪、銳化、超解像，以及透視校正等 ^[3]。

可以咁理解：如果模型面前係一張又細又模糊、角度又歪的圖片，它唔一定要即刻估答案，而可以先裁剪重點位置、改善清晰度、做 OCR 抽文字，甚至用反向搜圖搵相關資料，再整合成答案。呢種「多步搵證據」正正係多模態搜尋代理同傳統影像問答模型的分別。

邊個發布？

這項工作來自 Tencent Hunyuan（騰訊混元），合作者包括 UCLA（加州大學洛杉磯分校）和 香港中文大學；相關作者和機構亦見於論文及早期報道 ^[1]^[3]。

訓練方法有咩特別？

OpenSearch-VL 唔只是一個模型名稱，而是一套訓練方案。項目包括監督式微調和強化學習資料，例如 SearchVL-SFT 的 36,000 條軌跡，以及 SearchVL-RL 的 8,000 條軌跡 ^[3]。

報道亦提到一個名為 Multi-round Fault-Aware GRPO 的訓練方法，目標是讓模型可從部分失敗的工具使用軌跡中學習，而唔係一遇到搜尋或工具調用出錯就整個推理崩潰 ^[3]。

同 OpenAI、Google 的閉源系統點比？

最大分別唔係一句「邊個一定更勁」就講得完，而係 開放程度。

OpenAI 和 Google 的同類多模態搜尋／研究代理大多屬於閉源商業系統；OpenSearch-VL 則被定位為開源替代方案，目標是釋出訓練資料、代碼和模型權重，方便研究人員重現、審視和改良 ^[3]。

性能方面，騰訊報告指 OpenSearch-VL 在七個多模態深度搜尋 benchmark 的平均表現提升超過10個百分點，並在部分任務上可與領先的閉源商業模型相若 ^[3]。

但要睇定啲

暫時較穩陣的講法是：OpenSearch-VL 是一個有野心的開源多模態搜尋代理框架，方向上明顯想對標 OpenAI、Google 等閉源系統；但「已經追上」或「全面超越」仍未有足夠獨立證據支持。

目前公開資料主要來自 arXiv 論文和發布初期報道，所以相關 benchmark 成績應視為初步結果，仍有待第三方重現和測試 ^[1]^[2]^[3]。

Studio Global AI

Search, cite, and publish your own answer

Use this topic as a starting point for a fresh source-backed answer, then compare citations before you share it.

使用 Studio Global AI 搜尋並查核事實

重點

騰訊釋出 OpenSearch VL，一套用來建立多模態 AI 搜尋代理的開源框架或「配方」。
OpenSearch VL 可結合圖片理解、網頁搜尋、OCR、反向搜圖、裁剪、銳化、超解像等工具，做多步證據搜尋和推理。
項目由騰訊混元牽頭，合作者包括 UCLA 和香港中文大學；論文於 2026年5月6日提交至 arXiv。
與 OpenAI、Google 相關閉源多模態搜尋／研究系統相比，OpenSearch VL 的賣點是計劃開放訓練資料、代碼和模型權重。

支持視覺效果

Abstract digital illustration of open-source AI agent frameworks with connected components — Open-Source AI Agent Frameworks 2026: Complete Developer Comparison GuideA generic AI-agent framework illustration; OpenSearch-VL applies the open-source approach to multimodal search agents.Open-Source AI Agent Frameworks 2026: Complete Developer ...

Pipecat - Open-source framework for voice and multimodal conversational AIPipecat - Open-source framework for voice and multimodal conversational AI. GitHub stars · Vocode - Open-source library for building voice-based LLM agents.GitHub - Zijian-Ni/awesome-ai-agents-2026: 🤖 A curated list of AI Agent frameworks, tools, platforms, and resources for 2026 — the year agents went mainstream · GitHub

人們還問

「騰訊開源 OpenSearch-VL：多模態 AI 搜尋代理的新「開放配方」」的簡短答案是什麼？

騰訊釋出 OpenSearch VL，一套用來建立多模態 AI 搜尋代理的開源框架或「配方」。

首先要驗證的關鍵點是什麼？

騰訊釋出 OpenSearch VL，一套用來建立多模態 AI 搜尋代理的開源框架或「配方」。 OpenSearch VL 可結合圖片理解、網頁搜尋、OCR、反向搜圖、裁剪、銳化、超解像等工具，做多步證據搜尋和推理。

接下來在實務上我該做什麼？

項目由騰訊混元牽頭，合作者包括 UCLA 和香港中文大學；論文於 2026年5月6日提交至 arXiv。

接下來我應該探索哪個相關主題？

繼續“比特幣四年減半週期未死，但ETF已接手帶節奏”以獲得另一個角度和額外的引用。

開啟相關頁面

我應該將其與什麼進行比較？

對照「AI 點樣幫到無障礙？Apple 學生得獎 app 的 4 個答案」交叉檢查此答案。

開啟相關頁面

繼續你的研究

Matt Hogan: Institutional adoption is ending the four-year cycle, Bitcoin halving is losing significance, and covered call strategies are reshaping investment | Empire. With a deep

比特幣四年減半週期未死，但ETF已接手帶節奏

What are some AI powered accessibility apps created by Apple’s 2026 Swift Student Challenge winners, and how do they solve real world proble

AI 點樣幫到無障礙？Apple 學生得獎 app 的 4 個答案

NVIDIA Just Invested $2 Billion to Build the First AI Factories DailyNoons 106 subscribers 4 likes 414 views 1 Apr 2026 NVIDIA just made a **massive move in the AI infrastructure r

Nvidia押注IREN：AI數據中心點解變成「先搶電」巨型工程

Nvidia最高21億美元IREN押注，透露AI數據中心新玩法

JPMorgan forecasts Strategy's Bitcoin acquisitions could hit $30 billion by 2026, with 145834 BTC bought this year alone, valued at $11

Strategy 或買 300 億美元比特幣？摩根大通預測拆解

摩根大通點解估 Strategy 2026 年或買 300 億美元比特幣

來源

[1] OpenSearch-VL: An Open Recipe for Frontier Multimodal Search Agentsarxiv.org
Multimodal Search Agents Shawn Chen1,2, Kaituo Feng3, Hangting Chen1, Wenxuan Huang3, Dasen Dai3, Quanxin Shou2,4 Yunlong Lin3, Xiangyu Yue3, Shenghua Gao4, Tianyu Pang1,†
[2] An Open Recipe for Frontier Multimodal Search Agents - arXivarxiv.org
Computer Science Computer Vision and Pattern Recognition arXiv:2605.05185 (cs) [Submitted on 6 May 2026]
[3] Tencent Releases OpenSearch-VL: A Comprehensive Solution for ...news.aibase.com
Tencent Releases OpenSearch-VL: A Comprehensive Solution for Open-Source Multimodal Deep Search Agent Published in Latest AI NewsTime :May 7, 2026Read :6minute With the rapid development of multimodal large language models (MLLMs), how to enable models to e...

熱門發現

答案已發布2026年5月8日Last edited 2026年5月8日3 來源

騰訊開源 OpenSearch-VL：多模態 AI 搜尋代理的新「開放配方」

使用 Studio Global AI 搜尋並查核事實從「發現」瀏覽更多內容

4070

OpenSearch-VL 做乜？

邊個發布？

訓練方法有咩特別？

同 OpenAI、Google 的閉源系統點比？

最大分別唔係一句「邊個一定更勁」就講得完，而係 開放程度。

性能方面，騰訊報告指 OpenSearch-VL 在七個多模態深度搜尋 benchmark 的平均表現提升超過10個百分點，並在部分任務上可與領先的閉源商業模型相若 ^[3]。

但要睇定啲

目前公開資料主要來自 arXiv 論文和發布初期報道，所以相關 benchmark 成績應視為初步結果，仍有待第三方重現和測試 ^[1]^[2]^[3]。

Studio Global AI

Search, cite, and publish your own answer

Use this topic as a starting point for a fresh source-backed answer, then compare citations before you share it.

使用 Studio Global AI 搜尋並查核事實

重點

騰訊釋出 OpenSearch VL，一套用來建立多模態 AI 搜尋代理的開源框架或「配方」。
OpenSearch VL 可結合圖片理解、網頁搜尋、OCR、反向搜圖、裁剪、銳化、超解像等工具，做多步證據搜尋和推理。
項目由騰訊混元牽頭，合作者包括 UCLA 和香港中文大學；論文於 2026年5月6日提交至 arXiv。
與 OpenAI、Google 相關閉源多模態搜尋／研究系統相比，OpenSearch VL 的賣點是計劃開放訓練資料、代碼和模型權重。

支持視覺效果

人們還問

「騰訊開源 OpenSearch-VL：多模態 AI 搜尋代理的新「開放配方」」的簡短答案是什麼？

騰訊釋出 OpenSearch VL，一套用來建立多模態 AI 搜尋代理的開源框架或「配方」。

首先要驗證的關鍵點是什麼？

接下來在實務上我該做什麼？

項目由騰訊混元牽頭，合作者包括 UCLA 和香港中文大學；論文於 2026年5月6日提交至 arXiv。

接下來我應該探索哪個相關主題？

繼續“比特幣四年減半週期未死，但ETF已接手帶節奏”以獲得另一個角度和額外的引用。

開啟相關頁面

我應該將其與什麼進行比較？

對照「AI 點樣幫到無障礙？Apple 學生得獎 app 的 4 個答案」交叉檢查此答案。

開啟相關頁面

繼續你的研究

來源

[1] OpenSearch-VL: An Open Recipe for Frontier Multimodal Search Agentsarxiv.org
Multimodal Search Agents Shawn Chen1,2, Kaituo Feng3, Hangting Chen1, Wenxuan Huang3, Dasen Dai3, Quanxin Shou2,4 Yunlong Lin3, Xiangyu Yue3, Shenghua Gao4, Tianyu Pang1,†
[2] An Open Recipe for Frontier Multimodal Search Agents - arXivarxiv.org
Computer Science Computer Vision and Pattern Recognition arXiv:2605.05185 (cs) [Submitted on 6 May 2026]
[3] Tencent Releases OpenSearch-VL: A Comprehensive Solution for ...news.aibase.com
Tencent Releases OpenSearch-VL: A Comprehensive Solution for Open-Source Multimodal Deep Search Agent Published in Latest AI NewsTime :May 7, 2026Read :6minute With the rapid development of multimodal large language models (MLLMs), how to enable models to e...

熱門發現

答案已發布2026年5月8日Last edited 2026年5月8日3 來源

騰訊開源 OpenSearch-VL：多模態 AI 搜尋代理的新「開放配方」

使用 Studio Global AI 搜尋並查核事實從「發現」瀏覽更多內容

4070

OpenSearch-VL 做乜？

邊個發布？

訓練方法有咩特別？

同 OpenAI、Google 的閉源系統點比？

最大分別唔係一句「邊個一定更勁」就講得完，而係 開放程度。

性能方面，騰訊報告指 OpenSearch-VL 在七個多模態深度搜尋 benchmark 的平均表現提升超過10個百分點，並在部分任務上可與領先的閉源商業模型相若 ^[3]。

但要睇定啲

目前公開資料主要來自 arXiv 論文和發布初期報道，所以相關 benchmark 成績應視為初步結果，仍有待第三方重現和測試 ^[1]^[2]^[3]。

Studio Global AI

Search, cite, and publish your own answer

Use this topic as a starting point for a fresh source-backed answer, then compare citations before you share it.

使用 Studio Global AI 搜尋並查核事實

重點

騰訊釋出 OpenSearch VL，一套用來建立多模態 AI 搜尋代理的開源框架或「配方」。
OpenSearch VL 可結合圖片理解、網頁搜尋、OCR、反向搜圖、裁剪、銳化、超解像等工具，做多步證據搜尋和推理。
項目由騰訊混元牽頭，合作者包括 UCLA 和香港中文大學；論文於 2026年5月6日提交至 arXiv。
與 OpenAI、Google 相關閉源多模態搜尋／研究系統相比，OpenSearch VL 的賣點是計劃開放訓練資料、代碼和模型權重。

支持視覺效果

人們還問

「騰訊開源 OpenSearch-VL：多模態 AI 搜尋代理的新「開放配方」」的簡短答案是什麼？

騰訊釋出 OpenSearch VL，一套用來建立多模態 AI 搜尋代理的開源框架或「配方」。

首先要驗證的關鍵點是什麼？

接下來在實務上我該做什麼？

項目由騰訊混元牽頭，合作者包括 UCLA 和香港中文大學；論文於 2026年5月6日提交至 arXiv。

接下來我應該探索哪個相關主題？

繼續“比特幣四年減半週期未死，但ETF已接手帶節奏”以獲得另一個角度和額外的引用。

開啟相關頁面

我應該將其與什麼進行比較？

對照「AI 點樣幫到無障礙？Apple 學生得獎 app 的 4 個答案」交叉檢查此答案。

開啟相關頁面

繼續你的研究

來源

[1] OpenSearch-VL: An Open Recipe for Frontier Multimodal Search Agentsarxiv.org
Multimodal Search Agents Shawn Chen1,2, Kaituo Feng3, Hangting Chen1, Wenxuan Huang3, Dasen Dai3, Quanxin Shou2,4 Yunlong Lin3, Xiangyu Yue3, Shenghua Gao4, Tianyu Pang1,†
[2] An Open Recipe for Frontier Multimodal Search Agents - arXivarxiv.org
Computer Science Computer Vision and Pattern Recognition arXiv:2605.05185 (cs) [Submitted on 6 May 2026]
[3] Tencent Releases OpenSearch-VL: A Comprehensive Solution for ...news.aibase.com
Tencent Releases OpenSearch-VL: A Comprehensive Solution for Open-Source Multimodal Deep Search Agent Published in Latest AI NewsTime :May 7, 2026Read :6minute With the rapid development of multimodal large language models (MLLMs), how to enable models to e...