When the competition among large language models shifts from “who responds faster” to “who thinks more deeply,” Google has once again unveiled a new core weapon. On February 19, Google officially announced Gemini 3.1 Pro, which is not only a version update of the Gemini 3 series but also a comprehensive upgrade focused on advanced reasoning capabilities. The company states that 3.1 Pro is designed specifically for “complex tasks without standard answers,” targeting scientific research, engineering development, and long-chain decision-making scenarios.
Based on publicly available benchmark data, this upgrade is not just theoretical but has achieved breakthrough progress in multiple high-difficulty assessments.
Core Upgrade for Complex Tasks
In their announcement, Google positions Gemini 3.1 Pro as a “smarter, more capable foundational model,” emphasizing its leap in core reasoning ability. This model builds on the research成果 of Gemini 3 Deep Think, further strengthening its underlying intelligence to perform more maturely in multi-step logical reasoning, abstract thinking, and professional problem decomposition.
Compared to Gemini 3 Pro released in November 2025, 3.1 Pro is not just an efficiency optimization but a structural growth in reasoning ability.
ARC-AGI-2 jumps to 77.1%: Abstract reasoning capability doubles
The most notable achievement comes from the ARC-AGI-2 test, regarded as a high-level AI reasoning benchmark. This assessment specifically tests the model’s ability to solve “new logical patterns,” avoiding reliance on existing knowledge memory.
According to publicly available data:
Gemini 3.1 Pro: 77.1% (ARC Prize verified)
Gemini 3 Pro: 31.1%
Sonnet 4.6: 58.3%
Opus 4.6: 68.8%
GPT-5.2: 52.9%
Compared to the previous 31.1%, 3.1 Pro nearly doubles its performance. This indicates that the model has stronger abstract reasoning and pattern induction abilities when facing unknown problems.
Simultaneous Enhancement of Professional Knowledge and Scientific Reasoning
In the scientific knowledge assessment GPQA Diamond, Gemini 3.1 Pro scored 94.3%, surpassing GPT-5.2’s 92.4%, Opus 4.6’s 91.3%, and Sonnet 4.6’s 89.9%.
This demonstrates that 3.1 Pro not only handles abstract logic but also maintains top-tier performance in integrating professional knowledge and scientific reasoning.
Significant Evolution in Programming Capabilities: Competitive-Level Performance
In programming and agent-based task assessments, Gemini 3.1 Pro also delivers impressive results.
LiveCodeBench Pro: Elo 2887 (GPT-5.2: 2393, Gemini 3 Pro: 2439)
SWE-Bench Verified: 80.6% (GPT-5.2: 80.0%, Opus 4.6: 80.8%)
Terminal-Bench 2.0: 68.5% (GPT-5.2: 54.0%, Sonnet 4.6: 59.1%)
SciCode: 59% (GPT-5.2: 52%, Sonnet 4.6: 47%)
Especially in competitive programming tests, the Elo score of 2887 shows a clear advantage in high-difficulty algorithms and multi-step programming logic.
High-Performance Multimodal and Long-Text Capabilities
In multimodal understanding and long-text processing, Gemini 3.1 Pro also demonstrates stable performance:
MMMU Pro: 80.5%
MMLU: 92.6%
MRCR v2 (128k): 84.9%
1M token long-text pointwise: 26.3%
This indicates that the model can not only reason but also maintain consistency and accuracy within large contexts.
From Answering Questions to Directly Producing Results
Google emphasizes that the value of 3.1 Pro is not just reflected in scores but in practical application capabilities.
For example, the model can directly generate deployable animated SVG code. These outputs are purely code-based rather than pixel images, allowing infinite scalability while maintaining clarity. The file size is also much smaller than traditional video formats, making it suitable for embedding directly into websites.
This capability shows that the model is shifting from a “response tool” to a “creation and development tool.”
Simultaneous Launch Across Multiple Platforms for Enterprise and Developer Early Access
Currently, Gemini 3.1 Pro is available in preview:
Developers
Gemini API (Google AI Studio)
Gemini CLI
Google Antigravity
Android Studio
Enterprises
Vertex AI
Gemini Enterprise
Consumers
Gemini App (Pro and Ultra users enjoy higher usage limits)
NotebookLM (limited to Pro and Ultra users)
Google states that the preview phase will continue to optimize, especially for advanced applications like agentic workflows, before a full release.
AI Competition Enters the “Deep Thinking” Era
From various benchmark results, Gemini 3.1 Pro clearly emphasizes higher-level reasoning abilities and professional application scenarios. The ARC-AGI-2 score of 77.1% is particularly significant, symbolizing a breakthrough in handling unknown logical problems.
As the competition among large models intensifies, Google appears to be betting on “deeper intelligence” rather than merely improving response speed or conversational fluency.
As enterprises and developers begin testing this model, its true value will gradually emerge through practical applications. The focus of AI competition may be shifting from generative capabilities to more comprehensive thinking skills.
This article on Gemini 3.1 Pro debut: From abstract reasoning to competitive programming, Google sets a new high standard for advanced AI. Originally published on Chain News ABMedia.