उत्तरप्रकाशित8 मई 2026Last edited 8 मई 20263 स्रोत

Tencent OpenSearch-VL: OpenAI और Google के बंद systems को open-source जवाब?

Tencent ने OpenSearch VL जारी किया है, जिसे frontier multimodal search agents बनाने की open source recipe/framework के रूप में पेश किया गया है [1][2][3]. यह केवल image question answering model नहीं है; यह web search, OCR, reverse image search और image processing tools से evidence जुटाने के लिए बनाया गया है [3].

Studio Global AI के साथ खोजें और तथ्यों की जांच करें डिस्कवर से और अधिक ब्राउज़ करें

4750

# Open-Source AI Agent Frameworks 2026: Complete Developer Comparison Guide# Open-Source AI Agent Frameworks 2026: Complete Developer Comparison Guide. #### Minghan Xu. Open-Source AI Agent Frameworks 2026: Complete Developer Comparison Guide. The 2026 landscape offers mature, production-ready options across different architectural approaches, each optimized for specific use cases and team reOpen-Source AI Agent Frameworks 2026: Complete Developer ...

Tencent ने OpenSearch-VL नाम का open-source framework जारी किया है। इसे ऐसे multimodal AI search agents बनाने की “open recipe” के रूप में पेश किया गया है, जो सिर्फ तस्वीर देखकर जवाब नहीं देते, बल्कि web search, OCR, reverse image search और image-processing tools की मदद से कई चरणों में evidence जुटाकर reasoning कर सकते हैं ^[1]^[2]^[3].

OpenSearch-VL क्या है?

OpenSearch-VL को arXiv पर “An Open Recipe for Frontier Multimodal Search Agents” शीर्षक से 6 मई 2026 को submitted paper में पेश किया गया ^[2]. काम Tencent Hunyuan से आया है और शुरुआती reporting व paper listing के अनुसार इसमें UCLA और The Chinese University of Hong Kong जैसे collaborators भी शामिल हैं ^[1]^[3].

सरल भाषा में कहें तो यह कोई साधारण chatbot या सिर्फ image-captioning model नहीं है। इसका लक्ष्य ऐसा agent बनाना है जो किसी visual question को हल करने के लिए खुद tools चुन सके—जैसे तस्वीर में लिखा text पढ़ने के लिए OCR, source खोजने के लिए reverse image search, या image को crop, sharpen, super-resolution और perspective correction के जरिए साफ करना ^[3].

इसमें नया क्या है?

कई multimodal models तस्वीर को “समझने” की कोशिश करते हैं, लेकिन OpenSearch-VL को “active evidence seeking” के लिए design किया गया है। यानी model जवाब देने से पहले बाहरी sources और tools से अतिरिक्त जानकारी जुटा सकता है ^[3].

इस framework में training data और method पर भी जोर है। रिपोर्ट के अनुसार इसमें supervised fine-tuning और reinforcement-learning data शामिल हैं: SearchVL-SFT में 36,000 trajectories और SearchVL-RL में 8,000 trajectories बताए गए हैं ^[3]. यहां trajectory का मतलब task solve करते समय model के tool-use और reasoning steps की पूरी श्रृंखला से है। Tencent ने Multi-round Fault-Aware GRPO नाम की training method भी बताई है, जिसका उद्देश्य partially failed tool-use trajectories से भी सीखना है ^[3].

OpenAI और Google से तुलना कैसे है?

मुख्य अंतर openness का है। OpenAI और Google के comparable multimodal search या research agents बड़े पैमाने पर proprietary systems हैं—यानी उनके training data, code और model weights पर बाहरी researchers की पूरी पहुंच नहीं होती। OpenSearch-VL को इसके उलट training data, code और weights जारी करने वाली open approach के रूप में रखा गया है, ताकि researchers system को reproduce और improve कर सकें ^[3].

पहलू	OpenSearch-VL	OpenAI/Google जैसे proprietary systems
Access	Open-source recipe/framework; data, code और weights release करने की दिशा में positioned ^[3]	आम तौर पर closed/proprietary systems ^[3]
Tool use	Web search, OCR, reverse image search, cropping, sharpening, super-resolution, perspective correction ^[3]	Similar multimodal search/research capabilities हो सकती हैं, लेकिन internal methods अक्सर public नहीं होते
Reported performance	Tencent के tests में सात multimodal deep-search benchmarks पर औसतन 10 percentage points से ज्यादा सुधार; कुछ tasks में leading closed-source commercial models के comparable बताया गया ^[3]	Strong commercial systems, लेकिन पूरा evaluation setup हमेशा खुला नहीं होता

बड़ा दावा, लेकिन सावधानी जरूरी

Tencent की reported evaluation के अनुसार OpenSearch-VL ने multimodal deep-search benchmarks पर मजबूत प्रदर्शन दिखाया और कुछ tasks में top closed-source commercial models के बराबर बताया गया ^[3]. लेकिन इस समय उपलब्ध public evidence मुख्यतः Tencent के paper, arXiv listing और शुरुआती coverage पर आधारित है ^[1]^[2]^[3]. इसलिए OpenAI या Google जैसे systems को “match” करने का दावा अभी preliminary माना जाना चाहिए, जब तक independent evaluators इसे अलग-अलग benchmarks और real-world tasks पर verify न कर लें।

फिलहाल सबसे अहम बात यह है: OpenSearch-VL multimodal AI search agents को बंद labs से निकालकर open research community के लिए ज्यादा reproducible बनाने की कोशिश है। अगर इसके code, data और weights वादे के मुताबिक उपयोगी रूप में उपलब्ध होते हैं, तो यह researchers और developers के लिए multimodal search agents बनाने का एक महत्वपूर्ण आधार बन सकता है ^[3].

Studio Global AI

Search, cite, and publish your own answer

Use this topic as a starting point for a fresh source-backed answer, then compare citations before you share it.

Studio Global AI के साथ खोजें और तथ्यों की जांच करें

मुख्य निष्कर्ष

Tencent ने OpenSearch VL जारी किया है, जिसे frontier multimodal search agents बनाने की open source recipe/framework के रूप में पेश किया गया है [1][2][3].
यह केवल image question answering model नहीं है; यह web search, OCR, reverse image search और image processing tools से evidence जुटाने के लिए बनाया गया है [3].
Training setup में SearchVL SFT के 36,000 trajectories और SearchVL RL के 8,000 trajectories शामिल बताए गए हैं [3].
Tencent के reported benchmarks के अनुसार, OpenSearch VL ने सात multimodal deep search benchmarks में औसतन 10 percentage points से ज्यादा सुधार दिखाया [3].

सहायक दृश्य

Abstract digital illustration of open-source AI agent frameworks with connected components — Open-Source AI Agent Frameworks 2026: Complete Developer Comparison GuideA generic AI-agent framework illustration; OpenSearch-VL applies the open-source approach to multimodal search agents.Open-Source AI Agent Frameworks 2026: Complete Developer ...

Pipecat - Open-source framework for voice and multimodal conversational AIPipecat - Open-source framework for voice and multimodal conversational AI. GitHub stars · Vocode - Open-source library for building voice-based LLM agents.GitHub - Zijian-Ni/awesome-ai-agents-2026: 🤖 A curated list of AI Agent frameworks, tools, platforms, and resources for 2026 — the year agents went mainstream · GitHub

लोग पूछते भी हैं

"Tencent OpenSearch-VL: OpenAI और Google के बंद systems को open-source जवाब?" का संक्षिप्त उत्तर क्या है?

Tencent ने OpenSearch VL जारी किया है, जिसे frontier multimodal search agents बनाने की open source recipe/framework के रूप में पेश किया गया है [1][2][3].

सबसे पहले सत्यापित करने योग्य मुख्य बिंदु क्या हैं?

मुझे अभ्यास में आगे क्या करना चाहिए?

Training setup में SearchVL SFT के 36,000 trajectories और SearchVL RL के 8,000 trajectories शामिल बताए गए हैं [3].

मुझे आगे किस संबंधित विषय का पता लगाना चाहिए?

अन्य कोण और अतिरिक्त उद्धरणों के लिए "AMD Instinct MI350P: एंटरप्राइज AI में PCIe वापसी क्यों बड़ी खबर है" के साथ जारी रखें।

संबंधित पृष्ठ खोलें

मुझे इसकी तुलना किससे करनी चाहिए?

इस उत्तर को "माइक्रोसॉफ्ट AI Diffusion Report: जनरेटिव एआई बढ़ रहा है, लेकिन वैश्विक खाई भी" के सामने क्रॉस-चेक करें।

संबंधित पृष्ठ खोलें

अपना शोध जारी रखें

MI350P PCIe GPU accelerator aimed at enterprises that want to run AI workloads on-premises without infrastructure overhaul.

AMD Instinct MI350P: एंटरप्राइज AI में PCIe वापसी क्यों बड़ी खबर है

AMD Instinct MI350P समझिए: एंटरप्राइज AI के लिए PCIe क्यों अहम है

In the first quarter of 2026, 27.5 percent of people aged 15-64 in developed countries used a generative AI tool, compared with 15.4 percent in

माइक्रोसॉफ्ट AI Diffusion Report: जनरेटिव एआई बढ़ रहा है, लेकिन वैश्विक खाई भी

जनरेटिव एआई का फैलाव तेज, पर बराबरी से नहीं: माइक्रोसॉफ्ट रिपोर्ट की बड़ी बात

A general view of the office building of AMD is in Pudong, Shanghai, on February 9, 2026. # Motherboard sales 'collapse' by more than 25% as chipmakers strangle enthusiast PC marke

2026 में PC मदरबोर्ड शिपमेंट क्यों गिर सकते हैं

2026 में PC मदरबोर्ड शिपमेंट में तेज गिरावट क्यों मानी जा रही है

TrendForce’s latest research on solid-state batteries suggests that the commercialization of humanoid robots around 2026 is expected to significantly accelerate demand for next-gen

EV से पहले रोबोटों में क्यों दिख सकती हैं सॉलिड-स्टेट बैटरियां

सॉलिड-स्टेट बैटरियां: पहली बड़ी मंज़िल EV नहीं, ह्यूमनॉइड रोबोट क्यों हो सकते हैं

सूत्र

[1] OpenSearch-VL: An Open Recipe for Frontier Multimodal Search Agentsarxiv.org
Multimodal Search Agents Shawn Chen1,2, Kaituo Feng3, Hangting Chen1, Wenxuan Huang3, Dasen Dai3, Quanxin Shou2,4 Yunlong Lin3, Xiangyu Yue3, Shenghua Gao4, Tianyu Pang1,†
[2] An Open Recipe for Frontier Multimodal Search Agents - arXivarxiv.org
Computer Science Computer Vision and Pattern Recognition arXiv:2605.05185 (cs) [Submitted on 6 May 2026]
[3] Tencent Releases OpenSearch-VL: A Comprehensive Solution for ...news.aibase.com
Tencent Releases OpenSearch-VL: A Comprehensive Solution for Open-Source Multimodal Deep Search Agent Published in Latest AI NewsTime :May 7, 2026Read :6minute With the rapid development of multimodal large language models (MLLMs), how to enable models to e...

ट्रेंडिंग डिस्कवर

उत्तरप्रकाशित8 मई 2026Last edited 8 मई 20263 स्रोत

Tencent OpenSearch-VL: OpenAI और Google के बंद systems को open-source जवाब?

Studio Global AI के साथ खोजें और तथ्यों की जांच करें डिस्कवर से और अधिक ब्राउज़ करें

4750

OpenSearch-VL क्या है?

इसमें नया क्या है?

OpenAI और Google से तुलना कैसे है?

पहलू	OpenSearch-VL	OpenAI/Google जैसे proprietary systems
Access	Open-source recipe/framework; data, code और weights release करने की दिशा में positioned ^[3]	आम तौर पर closed/proprietary systems ^[3]
Tool use	Web search, OCR, reverse image search, cropping, sharpening, super-resolution, perspective correction ^[3]	Similar multimodal search/research capabilities हो सकती हैं, लेकिन internal methods अक्सर public नहीं होते
Reported performance	Tencent के tests में सात multimodal deep-search benchmarks पर औसतन 10 percentage points से ज्यादा सुधार; कुछ tasks में leading closed-source commercial models के comparable बताया गया ^[3]	Strong commercial systems, लेकिन पूरा evaluation setup हमेशा खुला नहीं होता

बड़ा दावा, लेकिन सावधानी जरूरी

Studio Global AI

Search, cite, and publish your own answer

Use this topic as a starting point for a fresh source-backed answer, then compare citations before you share it.

Studio Global AI के साथ खोजें और तथ्यों की जांच करें

मुख्य निष्कर्ष

Tencent ने OpenSearch VL जारी किया है, जिसे frontier multimodal search agents बनाने की open source recipe/framework के रूप में पेश किया गया है [1][2][3].
यह केवल image question answering model नहीं है; यह web search, OCR, reverse image search और image processing tools से evidence जुटाने के लिए बनाया गया है [3].
Training setup में SearchVL SFT के 36,000 trajectories और SearchVL RL के 8,000 trajectories शामिल बताए गए हैं [3].
Tencent के reported benchmarks के अनुसार, OpenSearch VL ने सात multimodal deep search benchmarks में औसतन 10 percentage points से ज्यादा सुधार दिखाया [3].

सहायक दृश्य

लोग पूछते भी हैं

"Tencent OpenSearch-VL: OpenAI और Google के बंद systems को open-source जवाब?" का संक्षिप्त उत्तर क्या है?

सबसे पहले सत्यापित करने योग्य मुख्य बिंदु क्या हैं?

मुझे अभ्यास में आगे क्या करना चाहिए?

Training setup में SearchVL SFT के 36,000 trajectories और SearchVL RL के 8,000 trajectories शामिल बताए गए हैं [3].

मुझे आगे किस संबंधित विषय का पता लगाना चाहिए?

संबंधित पृष्ठ खोलें

मुझे इसकी तुलना किससे करनी चाहिए?

संबंधित पृष्ठ खोलें

अपना शोध जारी रखें

AMD Instinct MI350P: एंटरप्राइज AI में PCIe वापसी क्यों बड़ी खबर है

AMD Instinct MI350P समझिए: एंटरप्राइज AI के लिए PCIe क्यों अहम है

माइक्रोसॉफ्ट AI Diffusion Report: जनरेटिव एआई बढ़ रहा है, लेकिन वैश्विक खाई भी

2026 में PC मदरबोर्ड शिपमेंट क्यों गिर सकते हैं

2026 में PC मदरबोर्ड शिपमेंट में तेज गिरावट क्यों मानी जा रही है

EV से पहले रोबोटों में क्यों दिख सकती हैं सॉलिड-स्टेट बैटरियां

सूत्र

[1] OpenSearch-VL: An Open Recipe for Frontier Multimodal Search Agentsarxiv.org
Multimodal Search Agents Shawn Chen1,2, Kaituo Feng3, Hangting Chen1, Wenxuan Huang3, Dasen Dai3, Quanxin Shou2,4 Yunlong Lin3, Xiangyu Yue3, Shenghua Gao4, Tianyu Pang1,†
[2] An Open Recipe for Frontier Multimodal Search Agents - arXivarxiv.org
Computer Science Computer Vision and Pattern Recognition arXiv:2605.05185 (cs) [Submitted on 6 May 2026]
[3] Tencent Releases OpenSearch-VL: A Comprehensive Solution for ...news.aibase.com
Tencent Releases OpenSearch-VL: A Comprehensive Solution for Open-Source Multimodal Deep Search Agent Published in Latest AI NewsTime :May 7, 2026Read :6minute With the rapid development of multimodal large language models (MLLMs), how to enable models to e...

ट्रेंडिंग डिस्कवर

उत्तरप्रकाशित8 मई 2026Last edited 8 मई 20263 स्रोत

Tencent OpenSearch-VL: OpenAI और Google के बंद systems को open-source जवाब?

Studio Global AI के साथ खोजें और तथ्यों की जांच करें डिस्कवर से और अधिक ब्राउज़ करें

4750

OpenSearch-VL क्या है?

इसमें नया क्या है?

OpenAI और Google से तुलना कैसे है?

पहलू	OpenSearch-VL	OpenAI/Google जैसे proprietary systems
Access	Open-source recipe/framework; data, code और weights release करने की दिशा में positioned ^[3]	आम तौर पर closed/proprietary systems ^[3]
Tool use	Web search, OCR, reverse image search, cropping, sharpening, super-resolution, perspective correction ^[3]	Similar multimodal search/research capabilities हो सकती हैं, लेकिन internal methods अक्सर public नहीं होते
Reported performance	Tencent के tests में सात multimodal deep-search benchmarks पर औसतन 10 percentage points से ज्यादा सुधार; कुछ tasks में leading closed-source commercial models के comparable बताया गया ^[3]	Strong commercial systems, लेकिन पूरा evaluation setup हमेशा खुला नहीं होता

बड़ा दावा, लेकिन सावधानी जरूरी

Studio Global AI

Search, cite, and publish your own answer

Use this topic as a starting point for a fresh source-backed answer, then compare citations before you share it.

Studio Global AI के साथ खोजें और तथ्यों की जांच करें

मुख्य निष्कर्ष

Tencent ने OpenSearch VL जारी किया है, जिसे frontier multimodal search agents बनाने की open source recipe/framework के रूप में पेश किया गया है [1][2][3].
यह केवल image question answering model नहीं है; यह web search, OCR, reverse image search और image processing tools से evidence जुटाने के लिए बनाया गया है [3].
Training setup में SearchVL SFT के 36,000 trajectories और SearchVL RL के 8,000 trajectories शामिल बताए गए हैं [3].
Tencent के reported benchmarks के अनुसार, OpenSearch VL ने सात multimodal deep search benchmarks में औसतन 10 percentage points से ज्यादा सुधार दिखाया [3].

सहायक दृश्य

लोग पूछते भी हैं

"Tencent OpenSearch-VL: OpenAI और Google के बंद systems को open-source जवाब?" का संक्षिप्त उत्तर क्या है?

सबसे पहले सत्यापित करने योग्य मुख्य बिंदु क्या हैं?

मुझे अभ्यास में आगे क्या करना चाहिए?

Training setup में SearchVL SFT के 36,000 trajectories और SearchVL RL के 8,000 trajectories शामिल बताए गए हैं [3].

मुझे आगे किस संबंधित विषय का पता लगाना चाहिए?

संबंधित पृष्ठ खोलें

मुझे इसकी तुलना किससे करनी चाहिए?

संबंधित पृष्ठ खोलें

अपना शोध जारी रखें

सूत्र

[1] OpenSearch-VL: An Open Recipe for Frontier Multimodal Search Agentsarxiv.org
Multimodal Search Agents Shawn Chen1,2, Kaituo Feng3, Hangting Chen1, Wenxuan Huang3, Dasen Dai3, Quanxin Shou2,4 Yunlong Lin3, Xiangyu Yue3, Shenghua Gao4, Tianyu Pang1,†
[2] An Open Recipe for Frontier Multimodal Search Agents - arXivarxiv.org
Computer Science Computer Vision and Pattern Recognition arXiv:2605.05185 (cs) [Submitted on 6 May 2026]
[3] Tencent Releases OpenSearch-VL: A Comprehensive Solution for ...news.aibase.com
Tencent Releases OpenSearch-VL: A Comprehensive Solution for Open-Source Multimodal Deep Search Agent Published in Latest AI NewsTime :May 7, 2026Read :6minute With the rapid development of multimodal large language models (MLLMs), how to enable models to e...