Gemini's built-in code execution capability is unique among frontier LLMs. Gemini can write Python code, execute it in a sandboxed environment, observe the output, and use that output to refine its answer — all within a single API response. This transforms Gemini from a text generator into a computation engine.

The use cases are substantial: data analysis with real computation, self-verifying mathematical reasoning, chart generation, file processing, and iterative problem-solving where Gemini tests its own hypotheses against actual results.

But the code execution sandbox has limits. It can't access the network, can't read your files, can't install arbitrary packages, and has a timeout. Understanding these boundaries — and how to prompt within them — is essential.

Enabling Code Execution

Code execution must be enabled at the API level. It is not on by default.

{
  "tools": [
    {
      "codeExecution": {}
    }
  ]
}

When enabled, Gemini can decide whether to use code execution for a given prompt. You can also instruct it explicitly in your prompt.

Core Patterns

Pattern 1: Self-Verification

The fundamental pattern: generate code → execute → verify output → correct.

Solve this problem step by step:

1. Write Python code to solve the problem
2. Execute the code and report the output
3. If the output matches expectations, explain why it's correct
4. If the output is wrong, analyze what went wrong, fix the code,
   and re-execute
5. Repeat until correct, or report that you can't solve it after 3 attempts

PROBLEM: Calculate the probability of drawing at least two aces
in a 5-card poker hand dealt from a standard 52-card deck.
Show your work.

Note:

Always set a maximum number of attempts (3 is a good default). Without a limit, Gemini can get stuck in loops where it repeatedly generates slightly different wrong answers without converging.

Pattern 2: Data Analysis

I'll provide a dataset. Use code execution to analyze it.

1. Load the data into a pandas DataFrame
2. Run descriptive statistics: mean, median, std, quartiles for all
   numeric columns
3. Identify and report any outliers (values > 3 std from mean)
4. Generate a correlation matrix for numeric columns
5. Create visualizations (matplotlib):
   - Histogram of the primary metric
   - Scatter plot of the two most correlated variables
   - Box plot by category if categorical columns exist
6. Summarize the 3 most important findings from the analysis

DATA:
[your dataset]

Pattern 3: Iterative Problem Solving

APPROACH: Iterative improvement

1. Start with the simplest approach that might work
2. Write code, execute, observe results
3. Based on results, refine the approach
4. Repeat until the solution is optimal or 5 iterations pass

At each iteration, report:
- Approach: what you're trying
- Code: the implementation
- Results: what happened when executed
- Insights: what you learned
- Next step: what you'll try differently

PROBLEM: Find the shortest path through a 50-city traveling
salesman problem using heuristic approaches. Cities are at
random coordinates in a 100x100 grid.

Pattern 4: Calculation Verification

For any answer that involves computation, prompt Gemini to verify:

For any calculation in your response:
1. Show the formula
2. Implement it in Python and execute
3. Report the computed result
4. Compare with your text answer

If there's a discrepancy, the computed result is authoritative.

This pattern has caught calculation errors that would otherwise go unnoticed, especially in financial and statistical responses.

What the Sandbox Can and Cannot Do

Capability	Supported?	Notes
Pure Python computation	Yes	All standard library modules
pandas, numpy, matplotlib	Yes	Pre-installed
scipy, scikit-learn	Varies	Check current availability
Network access	No	No HTTP, no socket connections
File I/O	Sandbox only	Can write/read within sandbox; cannot access your files
Subprocess / OS commands	No	No shell access
External packages (pip install)	No	Only pre-installed libraries
Long-running computation	Limited	Timeout applies (typically 30-60 seconds)
GPU computation	No	CPU only

Note:

The sandbox is stateless between API calls. Data you generate in one call is not available in the next. If you need persistent computation across calls, generate the code in Gemini but execute it in your own environment.

Prompting for Code Execution

Good Prompt (triggers execution)

Calculate the compound annual growth rate of an investment
that grew from $10,000 to $52,000 over 8.5 years. Use Python
to compute the exact value and show the formula.

Bad Prompt (may not trigger execution)

What's the CAGR for $10K to $52K over 8.5 years?

Explicit Trigger

If Gemini doesn't use code execution when you want it to:

Please use the code_execution tool to compute this.

Error Recovery Patterns

Code in the sandbox can fail — syntax errors, runtime exceptions, timeout. Prompt Gemini to handle failures gracefully:

For any code execution:

1. Wrap your main logic in a try/except block
2. Catch specific exceptions (ValueError, ZeroDivisionError, etc.)
3. If execution fails:
   a. Report the exact error message
   b. Explain what caused it in plain language
   c. Attempt a fix and re-execute (max 2 retries)
4. If execution times out:
   a. Report that the computation was too expensive
   b. Suggest an optimized or approximate approach

Never silently fail. Always report errors.

Data Visualization

Gemini can generate charts using matplotlib directly in the sandbox:

Create a visualization of this sales data:

Month,Product A,Product B,Product C
Jan,1200,900,1500
Feb,1350,950,1400
...

Generate:
1. A line chart showing all three products over time
2. A stacked bar chart showing monthly composition
3. A pie chart of total annual sales by product

For each chart:
- Include title, axis labels, and legend
- Use a professional color scheme
- Display the chart
- Provide 2-3 sentences interpreting what the chart shows

Note:

When generating charts, always ask for interpretation alongside the visual. The chart image itself doesn't appear in the API response text — you get a rendering, but asking for textual interpretation ensures you'll have the analysis even when viewing the raw response.

Common Failures

Failure	Cause	Fix
No execution triggered	Prompt doesn't signal computation needed	Add "Use Python to calculate" explicitly
Infinite computation loops	No iteration limit	Set max attempts (3-5)
Massive data in prompt	Including full datasets as text	Summarize data shape; feed essential rows
Timeout on large computation	Algorithm too expensive for sandbox	Ask for approximate or optimized approach
Missing library	Needed package not pre-installed	Use standard library fallback or pre-installed alternatives

Code Generation Patterns — Writing effective code prompts for Gemini
Grounding with Search — Combine execution with live fact-checking

Gemini Built-in Code Execution: Python Sandbox Mastery

Enabling Code Execution

Core Patterns

Pattern 1: Self-Verification

Pattern 2: Data Analysis

Pattern 3: Iterative Problem Solving

Pattern 4: Calculation Verification

What the Sandbox Can and Cannot Do

Prompting for Code Execution

Good Prompt (triggers execution)

Bad Prompt (may not trigger execution)

Explicit Trigger

Error Recovery Patterns

Data Visualization

Common Failures

Related Articles

DeepSeek Flash vs Pro: Model Selection Guide

Claude Artifacts: Creation & Iteration Strategies

Claude Computer Use: Prompting for GUI Automation

On this page

Gemini Built-in Code Execution: Python Sandbox Mastery

Enabling Code Execution

Core Patterns

Pattern 1: Self-Verification

Pattern 2: Data Analysis

Pattern 3: Iterative Problem Solving

Pattern 4: Calculation Verification

What the Sandbox Can and Cannot Do

Prompting for Code Execution

Good Prompt (triggers execution)

Bad Prompt (may not trigger execution)

Explicit Trigger

Error Recovery Patterns

Data Visualization

Common Failures

Related Pages

Related Articles

DeepSeek Flash vs Pro: Model Selection Guide

Claude Artifacts: Creation & Iteration Strategies

Claude Computer Use: Prompting for GUI Automation

On this page