Google DeepMind has released a significant upgrade to Gemini 3's Deep Think reasoning mode on February 12, 2026, achieving benchmark scores that surpass both OpenAI's GPT-5.2 and Anthropic's Claude Opus 4.6 on several key evaluations.
The standout result is a 48.4% score on Humanity's Last Exam — a notoriously difficult benchmark designed to test the absolute limits of AI reasoning across mathematics, science, philosophy, and logic. For comparison, GPT-5.2 Pro scored 42.1% and Claude Opus 4.6 scored 39.8% on the same benchmark.
Additional benchmark highlights include:
GPQA Diamond (graduate-level science): 78.2% (vs GPT-5.2's 74.6% and Opus 4.6's 71.3%)
MMLU-Pro (massive multitask understanding): 91.7% (vs GPT-5.2's 89.4% and Opus 4.6's 88.1%)
MATH-500 (competition mathematics): 96.8% (vs GPT-5.2's 94.2% and Opus 4.6's 93.5%)
SWE-bench Verified (software engineering): 72.4% (below Opus 4.6's 80.8% and GPT-5.2's 75.1%)
The Deep Think mode uses iterative rounds of reasoning that explore multiple hypotheses simultaneously before converging on a solution. Unlike standard chain-of-thought approaches, Deep Think can backtrack, reconsider assumptions, and synthesize across different reasoning paths.
Google AI Ultra subscribers ($20/month) get access to Deep Think mode in the Gemini app, while developers can access it through the Vertex AI API and Google AI Studio. The model retains Gemini 3 Pro's 1M token context window and multimodal capabilities.
However, the results tell a nuanced story. While Gemini 3 Deep Think excels at pure reasoning and academic benchmarks, it trails both competitors in practical software engineering tasks (SWE-bench) and agentic workflows. Anthropic's Opus 4.6 maintains a clear lead in coding tasks with its 80.8% SWE-bench score.
Jeff Dean, Google's Chief Scientist, commented that Deep Think represents a 'qualitative shift in how models approach complex problems' and hinted at further improvements coming with the anticipated Gemini 3 Ultra release later this quarter.
The three-way competition between Google, OpenAI, and Anthropic continues to intensify, with each company now holding clear advantages in different capability domains — a pattern that benefits the broader AI ecosystem and pushes all players to improve.