Skip to content

AgentTool.run_async extracts only p.text, loses code_execution_result.output from inner code-executor agents #5481

@Number531

Description

@Number531

Summary

AgentTool.run_async silently drops code_execution_result.output when wrapping an inner LlmAgent that uses a code_executor (BuiltInCodeExecutor, AgentEngineSandboxCodeExecutor). Parent agents receive response={'result': ''} on a substantial fraction of invocations even though the sandbox successfully ran code and produced output. Observed rates across 30+ production dispatches on gemini-3-flash-preview: 38% baseline, up to 62% under some configurations. After the local subclass fix described below: 0% on two consecutive clean runs.

Affected file: google/adk/tools/agent_tool.py:267-270 (as of v1.27.2; confirmed unchanged through v1.31.1 at time of writing).

Relation to prior issue #180: #180 (closed) reported the same underlying behavior on gemini-2.0-flash + BuiltInCodeExecutor. It was closed with a prompt-level workaround — instructing the inner agent to "present EXACTLY what was returned." That workaround is fragile because (a) it relies on the inner LLM's self-consistency under compaction, (b) it doesn't help when the inner LLM emits no text part at all (the dominant failure mode with Gemini 3's code-exec path), and (c) it shifts the bug burden onto every team using the pattern. This issue proposes a structural code fix that removes the responsibility from users.

Current behavior

# google/adk/tools/agent_tool.py
if last_content is None or last_content.parts is None:
    return ''
merged_text = '\n'.join(
    p.text for p in last_content.parts if p.text and not p.thought
)

The extraction inspects p.text only. When the inner Gemini model finishes a turn with only executable_code + code_execution_result parts (no wrapping text summary — a documented Gemini code-exec behavior), merged_text = '' and the parent tool sees an empty result.

The parent LLM has no way to recover — it sees an empty string, not the sandbox's stdout or tracebacks. This prevents intelligent retry behavior and forces either (a) silent loss of computational output or (b) manual code-path workarounds.

Reproducer

from google.adk.agents import LlmAgent
from google.adk.models.google_llm import Gemini
from google.adk.code_executors import AgentEngineSandboxCodeExecutor
from google.adk.tools.agent_tool import AgentTool

inner = LlmAgent(
    name="calculator",
    model=Gemini(model="gemini-3-flash-preview",
                 generation_config={"max_output_tokens": 16384}),
    tools=[],
    code_executor=AgentEngineSandboxCodeExecutor(
        agent_engine_resource_name="projects/.../reasoningEngines/..."
    ),
    include_contents="default",
)
tool = AgentTool(agent=inner, skip_summarization=False)

# Call via parent agent with request='Compute IRR for these cash flows: [...]'
# Observe response['result'] == '' on ~30% of runs despite audit logs showing
# successful Python execution with stdout containing the IRR value.

Proposed fix

One-line extension to the extraction — also capture code_execution_result.output when it exists:

if last_content is None or last_content.parts is None:
    return ''
merged_text = '\n'.join(
    p.text for p in last_content.parts if p.text and not p.thought
)
# NEW: fall back to code_execution_result.output when text parts are empty
if not merged_text.strip():
    code_outputs = [
        p.code_execution_result.output
        for p in last_content.parts
        if p.code_execution_result and p.code_execution_result.output
    ]
    if code_outputs:
        merged_text = '\n\n'.join(code_outputs)

Reference workaround

Subclass approach that preserves base-class semantics and adds the fallback: CodeExecutionAgentTool in utils/code_execution_audit.py.

Measured impact across 30+ dispatches: empty-response rate on the "code-ran-no-text" class dropped from 38-62% to 0% on the two most recent runs; parent LLMs now receive actionable diagnostic content (including sandbox tracebacks) instead of empty strings, enabling intelligent retry/recovery.

Request for maintainers

Would you prefer:
(a) A PR adding the one-block fix above directly to AgentTool.run_async?
(b) A new CodeExecutionAgentTool subclass shipped alongside AgentTool?
(c) Documentation of the subclass pattern as the recommended approach for code-executor-bearing sub-agents?

Happy to contribute in whichever form best fits the project's direction.

Environment

  • ADK: v1.27.2 (also reproduces on v1.28.x-v1.31.1 — no change to affected path)
  • google-genai: 1.x
  • Model: gemini-3-flash-preview (also observed on gemini-2.5-flash)
  • Executor: AgentEngineSandboxCodeExecutor, BuiltInCodeExecutor
  • Python: 3.13

Impact

Without the fix, any production workflow using AgentTool to wrap a code-executor sub-agent loses 25-60% of computational output silently. Parent LLMs cannot recover because they see no diagnostic content. This particularly affects multi-agent patterns that work around the Gemini API's code_execution + function_calling mutual-exclusion constraint via AgentTool decomposition.

Metadata

Metadata

Assignees

Labels

tools[Component] This issue is related to tools

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions