Gemini Built-in Code Execution: Python Sandbox Mastery
Harness Gemini's Python code execution sandbox. Learn self-verification patterns, data analysis with pandas, iterative problem-solving, and error recovery techniques.
Gemini's built-in code execution capability is unique among frontier LLMs. Gemini can write Python code, execute it in a sandboxed environment, observe the output, and use that output to refine its answer — all within a single API response. This transforms Gemini from a text generator into a computation engine.
The use cases are substantial: data analysis with real computation, self-verifying mathematical reasoning, chart generation, file processing, and iterative problem-solving where Gemini tests its own hypotheses against actual results.
But the code execution sandbox has limits. It can't access the network, can't read your files, can't install arbitrary packages, and has a timeout. Understanding these boundaries — and how to prompt within them — is essential.
Enabling Code Execution
Code execution must be enabled at the API level. It is not on by default.
{
"tools": [
{
"codeExecution": {}
}
]
}
When enabled, Gemini can decide whether to use code execution for a given prompt. You can also instruct it explicitly in your prompt.
Core Patterns
Pattern 1: Self-Verification
The fundamental pattern: generate code → execute → verify output → correct.
Solve this problem step by step:
1. Write Python code to solve the problem
2. Execute the code and report the output
3. If the output matches expectations, explain why it's correct
4. If the output is wrong, analyze what went wrong, fix the code,
and re-execute
5. Repeat until correct, or report that you can't solve it after 3 attempts
PROBLEM: Calculate the probability of drawing at least two aces
in a 5-card poker hand dealt from a standard 52-card deck.
Show your work.
Note:
Always set a maximum number of attempts (3 is a good default). Without a limit, Gemini can get stuck in loops where it repeatedly generates slightly different wrong answers without converging.
Pattern 2: Data Analysis
I'll provide a dataset. Use code execution to analyze it.
1. Load the data into a pandas DataFrame
2. Run descriptive statistics: mean, median, std, quartiles for all
numeric columns
3. Identify and report any outliers (values > 3 std from mean)
4. Generate a correlation matrix for numeric columns
5. Create visualizations (matplotlib):
- Histogram of the primary metric
- Scatter plot of the two most correlated variables
- Box plot by category if categorical columns exist
6. Summarize the 3 most important findings from the analysis
DATA:
[your dataset]
Pattern 3: Iterative Problem Solving
APPROACH: Iterative improvement
1. Start with the simplest approach that might work
2. Write code, execute, observe results
3. Based on results, refine the approach
4. Repeat until the solution is optimal or 5 iterations pass
At each iteration, report:
- Approach: what you're trying
- Code: the implementation
- Results: what happened when executed
- Insights: what you learned
- Next step: what you'll try differently
PROBLEM: Find the shortest path through a 50-city traveling
salesman problem using heuristic approaches. Cities are at
random coordinates in a 100x100 grid.
Pattern 4: Calculation Verification
For any answer that involves computation, prompt Gemini to verify:
For any calculation in your response:
1. Show the formula
2. Implement it in Python and execute
3. Report the computed result
4. Compare with your text answer
If there's a discrepancy, the computed result is authoritative.
This pattern has caught calculation errors that would otherwise go unnoticed, especially in financial and statistical responses.
What the Sandbox Can and Cannot Do
| Capability | Supported? | Notes |
|---|---|---|
| Pure Python computation | Yes | All standard library modules |
| pandas, numpy, matplotlib | Yes | Pre-installed |
| scipy, scikit-learn | Varies | Check current availability |
| Network access | No | No HTTP, no socket connections |
| File I/O | Sandbox only | Can write/read within sandbox; cannot access your files |
| Subprocess / OS commands | No | No shell access |
| External packages (pip install) | No | Only pre-installed libraries |
| Long-running computation | Limited | Timeout applies (typically 30-60 seconds) |
| GPU computation | No | CPU only |
Note:
The sandbox is stateless between API calls. Data you generate in one call is not available in the next. If you need persistent computation across calls, generate the code in Gemini but execute it in your own environment.
Prompting for Code Execution
Good Prompt (triggers execution)
Calculate the compound annual growth rate of an investment
that grew from $10,000 to $52,000 over 8.5 years. Use Python
to compute the exact value and show the formula.
Bad Prompt (may not trigger execution)
What's the CAGR for $10K to $52K over 8.5 years?
Explicit Trigger
If Gemini doesn't use code execution when you want it to:
Please use the code_execution tool to compute this.
Error Recovery Patterns
Code in the sandbox can fail — syntax errors, runtime exceptions, timeout. Prompt Gemini to handle failures gracefully:
For any code execution:
1. Wrap your main logic in a try/except block
2. Catch specific exceptions (ValueError, ZeroDivisionError, etc.)
3. If execution fails:
a. Report the exact error message
b. Explain what caused it in plain language
c. Attempt a fix and re-execute (max 2 retries)
4. If execution times out:
a. Report that the computation was too expensive
b. Suggest an optimized or approximate approach
Never silently fail. Always report errors.
Data Visualization
Gemini can generate charts using matplotlib directly in the sandbox:
Create a visualization of this sales data:
Month,Product A,Product B,Product C
Jan,1200,900,1500
Feb,1350,950,1400
...
Generate:
1. A line chart showing all three products over time
2. A stacked bar chart showing monthly composition
3. A pie chart of total annual sales by product
For each chart:
- Include title, axis labels, and legend
- Use a professional color scheme
- Display the chart
- Provide 2-3 sentences interpreting what the chart shows
Note:
When generating charts, always ask for interpretation alongside the visual. The chart image itself doesn't appear in the API response text — you get a rendering, but asking for textual interpretation ensures you'll have the analysis even when viewing the raw response.
Common Failures
| Failure | Cause | Fix |
|---|---|---|
| No execution triggered | Prompt doesn't signal computation needed | Add "Use Python to calculate" explicitly |
| Infinite computation loops | No iteration limit | Set max attempts (3-5) |
| Massive data in prompt | Including full datasets as text | Summarize data shape; feed essential rows |
| Timeout on large computation | Algorithm too expensive for sandbox | Ask for approximate or optimized approach |
| Missing library | Needed package not pre-installed | Use standard library fallback or pre-installed alternatives |
Related Pages
- Code Generation Patterns — Writing effective code prompts for Gemini
- Grounding with Search — Combine execution with live fact-checking
Related Articles
Presentation Guide - Master Academic Presentations
Master academic presentations with these ChatGPT prompts designed to help you create and deliver effective presentations, from planning to delivery.
Background Modification Prompts: Nano Banana Guide
Swap messy backgrounds for professional settings with Nano Banana. Master context control and subject isolation.
Gemini Large Document Analysis: Books, Codebases & Research Sets
Use Gemini's massive context window for full-document analysis. Learn patterns for analyzing entire books, codebases, legal documents, and research corpora in a single prompt.