Summary
AgentTool.run_async silently drops code_execution_result.output when wrapping an inner LlmAgent that uses a code_executor (BuiltInCodeExecutor, AgentEngineSandboxCodeExecutor). Parent agents receive response={'result': ''} on a substantial fraction of invocations even though the sandbox successfully ran code and produced output. Observed rates across 30+ production dispatches on gemini-3-flash-preview: 38% baseline, up to 62% under some configurations. After the local subclass fix described below: 0% on two consecutive clean runs.
Affected file: google/adk/tools/agent_tool.py:267-270 (as of v1.27.2; confirmed unchanged through v1.31.1 at time of writing).
Relation to prior issue #180: #180 (closed) reported the same underlying behavior on gemini-2.0-flash + BuiltInCodeExecutor. It was closed with a prompt-level workaround — instructing the inner agent to "present EXACTLY what was returned." That workaround is fragile because (a) it relies on the inner LLM's self-consistency under compaction, (b) it doesn't help when the inner LLM emits no text part at all (the dominant failure mode with Gemini 3's code-exec path), and (c) it shifts the bug burden onto every team using the pattern. This issue proposes a structural code fix that removes the responsibility from users.
Current behavior
# google/adk/tools/agent_tool.py
if last_content is None or last_content.parts is None:
return ''
merged_text = '\n'.join(
p.text for p in last_content.parts if p.text and not p.thought
)
The extraction inspects p.text only. When the inner Gemini model finishes a turn with only executable_code + code_execution_result parts (no wrapping text summary — a documented Gemini code-exec behavior), merged_text = '' and the parent tool sees an empty result.
The parent LLM has no way to recover — it sees an empty string, not the sandbox's stdout or tracebacks. This prevents intelligent retry behavior and forces either (a) silent loss of computational output or (b) manual code-path workarounds.
Reproducer
from google.adk.agents import LlmAgent
from google.adk.models.google_llm import Gemini
from google.adk.code_executors import AgentEngineSandboxCodeExecutor
from google.adk.tools.agent_tool import AgentTool
inner = LlmAgent(
name="calculator",
model=Gemini(model="gemini-3-flash-preview",
generation_config={"max_output_tokens": 16384}),
tools=[],
code_executor=AgentEngineSandboxCodeExecutor(
agent_engine_resource_name="projects/.../reasoningEngines/..."
),
include_contents="default",
)
tool = AgentTool(agent=inner, skip_summarization=False)
# Call via parent agent with request='Compute IRR for these cash flows: [...]'
# Observe response['result'] == '' on ~30% of runs despite audit logs showing
# successful Python execution with stdout containing the IRR value.
Proposed fix
One-line extension to the extraction — also capture code_execution_result.output when it exists:
if last_content is None or last_content.parts is None:
return ''
merged_text = '\n'.join(
p.text for p in last_content.parts if p.text and not p.thought
)
# NEW: fall back to code_execution_result.output when text parts are empty
if not merged_text.strip():
code_outputs = [
p.code_execution_result.output
for p in last_content.parts
if p.code_execution_result and p.code_execution_result.output
]
if code_outputs:
merged_text = '\n\n'.join(code_outputs)
Reference workaround
Subclass approach that preserves base-class semantics and adds the fallback: CodeExecutionAgentTool in utils/code_execution_audit.py.
Measured impact across 30+ dispatches: empty-response rate on the "code-ran-no-text" class dropped from 38-62% to 0% on the two most recent runs; parent LLMs now receive actionable diagnostic content (including sandbox tracebacks) instead of empty strings, enabling intelligent retry/recovery.
Request for maintainers
Would you prefer:
(a) A PR adding the one-block fix above directly to AgentTool.run_async?
(b) A new CodeExecutionAgentTool subclass shipped alongside AgentTool?
(c) Documentation of the subclass pattern as the recommended approach for code-executor-bearing sub-agents?
Happy to contribute in whichever form best fits the project's direction.
Environment
- ADK: v1.27.2 (also reproduces on v1.28.x-v1.31.1 — no change to affected path)
- google-genai: 1.x
- Model:
gemini-3-flash-preview (also observed on gemini-2.5-flash)
- Executor:
AgentEngineSandboxCodeExecutor, BuiltInCodeExecutor
- Python: 3.13
Impact
Without the fix, any production workflow using AgentTool to wrap a code-executor sub-agent loses 25-60% of computational output silently. Parent LLMs cannot recover because they see no diagnostic content. This particularly affects multi-agent patterns that work around the Gemini API's code_execution + function_calling mutual-exclusion constraint via AgentTool decomposition.
Summary
AgentTool.run_asyncsilently dropscode_execution_result.outputwhen wrapping an innerLlmAgentthat uses acode_executor(BuiltInCodeExecutor,AgentEngineSandboxCodeExecutor). Parent agents receiveresponse={'result': ''}on a substantial fraction of invocations even though the sandbox successfully ran code and produced output. Observed rates across 30+ production dispatches ongemini-3-flash-preview: 38% baseline, up to 62% under some configurations. After the local subclass fix described below: 0% on two consecutive clean runs.Affected file:
google/adk/tools/agent_tool.py:267-270(as of v1.27.2; confirmed unchanged through v1.31.1 at time of writing).Relation to prior issue #180: #180 (closed) reported the same underlying behavior on
gemini-2.0-flash+BuiltInCodeExecutor. It was closed with a prompt-level workaround — instructing the inner agent to "present EXACTLY what was returned." That workaround is fragile because (a) it relies on the inner LLM's self-consistency under compaction, (b) it doesn't help when the inner LLM emits no text part at all (the dominant failure mode with Gemini 3's code-exec path), and (c) it shifts the bug burden onto every team using the pattern. This issue proposes a structural code fix that removes the responsibility from users.Current behavior
The extraction inspects
p.textonly. When the inner Gemini model finishes a turn with onlyexecutable_code+code_execution_resultparts (no wrapping text summary — a documented Gemini code-exec behavior),merged_text = ''and the parent tool sees an empty result.The parent LLM has no way to recover — it sees an empty string, not the sandbox's stdout or tracebacks. This prevents intelligent retry behavior and forces either (a) silent loss of computational output or (b) manual code-path workarounds.
Reproducer
Proposed fix
One-line extension to the extraction — also capture
code_execution_result.outputwhen it exists:Reference workaround
Subclass approach that preserves base-class semantics and adds the fallback: CodeExecutionAgentTool in
utils/code_execution_audit.py.Measured impact across 30+ dispatches: empty-response rate on the "code-ran-no-text" class dropped from 38-62% to 0% on the two most recent runs; parent LLMs now receive actionable diagnostic content (including sandbox tracebacks) instead of empty strings, enabling intelligent retry/recovery.
Request for maintainers
Would you prefer:
(a) A PR adding the one-block fix above directly to
AgentTool.run_async?(b) A new
CodeExecutionAgentToolsubclass shipped alongsideAgentTool?(c) Documentation of the subclass pattern as the recommended approach for code-executor-bearing sub-agents?
Happy to contribute in whichever form best fits the project's direction.
Environment
gemini-3-flash-preview(also observed ongemini-2.5-flash)AgentEngineSandboxCodeExecutor,BuiltInCodeExecutorImpact
Without the fix, any production workflow using AgentTool to wrap a code-executor sub-agent loses 25-60% of computational output silently. Parent LLMs cannot recover because they see no diagnostic content. This particularly affects multi-agent patterns that work around the Gemini API's
code_execution + function_callingmutual-exclusion constraint via AgentTool decomposition.