What should I do next in practice?

Oracle routing — a system that assigns each query to the smallest capable model — could reduce cloud inference costs by 60.2% while maintaining comparable accuracy.

← Back to Trending

AnswersPublished5 days agoLast edited 5 days ago29 sources

Local AI Is Already Handling Nearly 89% of Real-World Queries: The Stanford 'Intelligence Per Watt' Study

Small local language models (≤20B parameters) running on consumer laptops and desktops can now accurately answer 88.7% of single turn chat and reasoning queries, according to a large scale November 2025 Stanford prepr... The share of queries that local models can competently handle rose from just 23.2% in 2023 to 71...

Search & fact-check with Studio Global AI Browse more Trending pages

32K0

AI-generated editorial image representing local AI models running on a laptop, with a glowing brain icon, benchmark charts, and a visual of the 'Intelligence Per Watt' metric. — What did the Stanford University study published as a preprint in November 2025 find about the performance, accuracy, "intelligence per wattConceptual illustration of the Stanford 'Intelligence Per Watt' study, showing local AI inference on a laptop outperforming cloud data-center models in efficiency for most tasks.
AI Prompt
Create a landscape editorial hero image for this Studio Global article: What did the Stanford University study published as a preprint in November 2025 find about the performance, accuracy, "intelligence per watt. Article summary: ## Key Findings from the Stanford "Intelligence Per Watt" Study (November 2025 Preprint). Topic tags: general, academic, education, general web, user generated. Style: premium digital editorial illustration, source-backed research mood, clean composition, high detail, modern web publication hero. Use reference image context only for broad subject, composition, and topical grounding; do not copy the exact image. Avoid: logos, brand marks, copyrighted characters, real person likenesses, fake screenshots, UI text, readable text, watermarks, charts with fake numbers, clickbait thumbnails, icons, and tiny thumbnail layouts. Make it useful as an illustrative visual,
openai.com

The economics of artificial intelligence may be on the verge of a major shift. A comprehensive study from Stanford University, published as a preprint in November 2025, finds that small language models running on consumer-grade desktop and laptop hardware can now competently handle the vast majority of tasks that previously required expensive cloud-based AI systems .

The research, led by Jon Saad-Falcon, Avanika Narayan, and colleagues at Stanford's Hazy Research group and Together AI, introduces a new metric called Intelligence Per Watt (IPW) — defined as mean task accuracy divided by mean power draw during inference — to provide a unified way to compare local and cloud-based AI systems .

What the Stanford Study Found: Key Numbers

The study's empirical work is extensive: it benchmarks over 20 local language models across 8 different accelerators (including those from Apple, AMD, and NVIDIA) using 1 million real-world single-turn chat and reasoning queries . The headline results are striking:

Studio Global AI

Search, cite, and publish your own answer

Use this topic as a starting point for a fresh source-backed answer, then compare citations before you share it.

Local AI Is Already Handling Nearly 89% of Real-World Queries: The Stanford 'Intelligence Per Watt' Study

What the Stanford Study Found: Key Numbers

Search, cite, and publish your own answer

People also ask

What is the short answer to "Local AI Is Already Handling Nearly 89% of Real-World Queries: The Stanford 'Intelligence Per Watt' Study"?

What are the key points to validate first?

What should I do next in practice?

Sources

Comments

The 'Intelligence Per Watt' Metric Explained

Hybrid Routing: A 60% Cost Reduction

What This Means for OpenAI, Anthropic, and xAI

The Broader Trend: AI Is Getting Cheaper, Faster

A Caveat: The Study Has Limits

The Bottom Line