Google Research Releases ReasoningBank: AI Agents Learn Reasoning Strategies from Success and Failure

Gate News message, April 22 — Google Research released ReasoningBank, an agent memory framework that enables large language model-driven agents to continuously learn after deployment. The framework extracts universal reasoning strategies from both successful and failed task experiences, storing them in a memory bank for retrieval and execution on similar future tasks. The associated paper was published at ICLR, and code has been open-sourced on GitHub.

ReasoningBank improves upon two existing approaches: Synapse, which records complete action trajectories but has limited transferability due to fine-grained granularity, and Agent Workflow Memory, which only learns from successful cases. ReasoningBank makes two key changes: storing “reasoning patterns” instead of “action sequences,” with each memory containing structured fields for title, description, and content; and incorporating failure trajectories into learning. The framework uses a model to self-evaluate execution trajectories, transforming failure experiences into anti-pitfall rules. For example, the rule “click Load More button when seen” evolves into “verify current page identifier first, avoid infinite scrolling loops, then click load more.”

The paper also introduces Memory-aware Test-time Scaling (MaTTS), which allocates additional compute during inference to explore multiple trajectories and store findings in the memory bank. Parallel expansion runs multiple distinct trajectories for the same task, refining more robust strategies through self-comparison; sequential expansion iteratively refines a single trajectory, storing intermediate reasoning in memory.

On WebArena browser tasks and SWE-Bench-Verified coding tasks using Gemini 2.5 Flash as a ReAct agent, ReasoningBank achieved 8.3% higher success rate on WebArena and 4.6% higher on SWE-Bench-Verified compared to a baseline without memory, reducing average steps per task by approximately 3. Adding MaTTS with parallel expansion (k=5) further improved WebArena success rate by 3 percentage points and reduced steps by an additional 0.4.

Disclaimer: The information on this page may come from third parties and does not represent the views or opinions of Gate. The content displayed on this page is for reference only and does not constitute any financial, investment, or legal advice. Gate does not guarantee the accuracy or completeness of the information and shall not be liable for any losses arising from the use of this information. Virtual asset investments carry high risks and are subject to significant price volatility. You may lose all of your invested principal. Please fully understand the relevant risks and make prudent decisions based on your own financial situation and risk tolerance. For details, please refer to Disclaimer.

Related Articles

Taiwan banks team up to build local AI! Finance’s large language model goes live by the end of the year at the fastest

CITIC Financial Holding, led by CITIC Financial Holding’s 16 financial institutions, announced the launch of the “Financial Large Language Model FinLLM” project. The first release of the banking model is expected to be published in August, and in 2026 Q1, AI agents based on FinLLM will be introduced. Training will begin in May, with a budget of approximately 40–70 million yuan. Due to regulatory and localization needs, local data training will be the core, strengthening sovereign AI, building shared infrastructure, and extending to inclusive finance. The plan has been incorporated into the national AI development plan and has received cross-ministry support.

ChainNewsAbmedia6m ago

Google CEO: Capital expenditures in 2026 will reach $185 billion; ramping up investment in the era of AI agents

Google CEO Sundar Pichai announced at Google Cloud Next in Las Vegas on April 22 that Google plans to invest $175 billion to $185 billion in 2026 in capital expenditures to build the infrastructure needed for autonomous AI agents, up from $31 billion in 2022.

MarketWhisper41m ago

Google Jules releases a new version candidate list, repositioning it as an end-to-end product development platform

According to the official April 23 announcement by the Google Jules team, Jules’s product positioning has been upgraded from an asynchronous coding agent to an “end-to-end agentic product development platform.” The new version can read the full product context, independently determine the next steps for building, and submit a PR. The official also announced that the new version candidate list is now open.

MarketWhisper47m ago

Google Jules Rebrands as End-to-End Agentic Product Development Platform, Opens Waitlist for New Version

Gate News message, April 23 — Google's Jules team announced the opening of a waitlist for a new version of the product, repositioning Jules from an asynchronous coding agent to an end-to-end agentic product development platform. According to the official description, the upgraded platform reads enti

GateNews1h ago

Perplexity Discloses Web Search Agent Post-Training Method; Qwen3.5-Based Model Outperforms GPT-5.4 on Accuracy and Cost

Perplexity uses SFT followed by RL with Qwen3.5 models, leveraging a multi-hop QA dataset and rubric checks to boost search accuracy and efficiency, achieving best-in-class FRAMES performance. Abstract: Perplexity's post-training workflow for web-search agents combines supervised fine-tuning (SFT) to enforce instruction-following and language consistency with online reinforcement learning (RL) via the GRPO algorithm. The RL stage uses a proprietary multi-hop verifiable QA dataset and rubric-based conversational data to prevent SFT drift, with reward gating and within-group efficiency penalties. Evaluation shows Qwen3.5-397B-SFT-RL achieving top FRAMES performance, 57.3% accuracy with a single tool call and 73.9% with four calls at $0.02 per query, outperforming GPT-5.4 and Claude Sonnet 4.6 on these metrics. Pricing is API-based and excludes caching.

GateNews2h ago

OpenAI Codex Team Fixes OpenClaw Authentication Bug, Significantly Improves Agent Behavior

OpenClaw switches from Pi to Codex harness to fix a silent authentication fallback, with two PRs addressing the bridge and fallback; post-fix, the agent shifts from shallow heartbeat polling to a full work loop, enabling progress. Abstract: OpenClaw’s Codex harness optimization addressed a critical authentication flaw that caused silent fallback to the Pi harness when using Codex with OpenAI models. Two pull requests fix the authentication bridge and prevent silent fallback, changing the runtime adapter. As a result, agent behavior evolves from shallow heartbeat polling to a full work loop that reads context, analyzes tasks, edits repositories, and verifies progress, improving continuity and visibility across heartbeats.

GateNews3h ago
Comment
0/400
No comments