Gemini Structured Output & JSON Mode: Reliable Data Extraction

Master Gemini's structured output capabilities. Learn JSON schema enforcement, prompt patterns for reliable extraction, handling nested objects, and common schema design antipatterns.

June 14, 2026
GeminiStructured OutputJSONSchemaData ExtractionPrompt Engineering

Getting structured data out of LLMs reliably is hard. Without constraints, Gemini will occasionally deviate from your requested format — wrapping JSON in markdown fences, adding commentary, using inconsistent field names, or hallucinating fields that don't exist in your schema.

Gemini's structured output mode, combined with response_schema or response_mime_type configuration, addresses this at the API level. But prompting still matters enormously — schema enforcement prevents malformed output, but good prompting determines whether the extracted data is actually correct.

Enabling Structured Output

JSON Mode (response_mime_type)

{
  "generationConfig": {
    "responseMimeType": "application/json"
  }
}

This ensures Gemini returns valid JSON with no markdown wrapping, no commentary, no explanatory text. But it doesn't enforce a specific schema — the JSON structure is whatever Gemini generates.

Schema Enforcement (response_schema)

{
  "generationConfig": {
    "responseMimeType": "application/json",
    "responseSchema": {
      "type": "OBJECT",
      "properties": {
        "products": {
          "type": "ARRAY",
          "items": {
            "type": "OBJECT",
            "properties": {
              "name": { "type": "STRING" },
              "price": { "type": "NUMBER" },
              "in_stock": { "type": "BOOLEAN" },
              "categories": {
                "type": "ARRAY",
                "items": { "type": "STRING" }
              }
            },
            "required": ["name", "price", "in_stock"]
          }
        }
      },
      "required": ["products"]
    }
  }
}

This enforces the schema at the API level — Gemini cannot return a response that doesn't match the specified structure. Missing required fields or wrong types will cause the API to reject the output.

Note:

Schema enforcement doesn't guarantee the data is correct — only that it's well-formed. Gemini can fill price: 999999 into a required number field without the price being accurate. Always validate extracted data against source material.

Prompting for Reliable Extraction

Schema enforcement solves formatting. The prompt solves accuracy.

The Extraction Prompt Pattern

Extract the following information from the provided [DOCUMENT/IMAGE/TEXT].

OUTPUT SCHEMA:
[Describe your schema in plain language alongside the formal schema]

EXTRACTION RULES:
1. If a field's value is not present in the source, use null — never
   fabricate data to fill required fields
2. For numeric fields: extract exactly what's stated. Do not convert
   units or perform calculations unless explicitly instructed
3. For boolean fields: only return true if the source explicitly
   states or unambiguously shows the condition. Default to false
4. For text fields: extract verbatim, not paraphrased
5. If you're uncertain about any value, include a confidence field:
   { "value": ..., "confidence": "high|medium|low" }

SOURCE:
[Your content to extract from]

Example: Extracting from Unstructured Text

Extract structured data from the following product descriptions.

SCHEMA (for each product):
{
  "name": string,        // Product name exactly as written
  "price": number,       // Numeric price only (no currency symbol)
  "currency": string,    // USD, EUR, GBP, etc.
  "features": string[],  // Distinct features mentioned
  "warranty_years": number | null  // Warranty duration if mentioned
}

RULES:
- If multiple prices appear (sale, MSRP), extract all as an array
- Features should be individual, distinct items — not comma-separated
  strings
- If no warranty is mentioned, use null (not 0)

PRODUCT DESCRIPTIONS:
[Your text]

Schema Design Best Practices

DoDon't
Use null for missing dataUse 0 or "" as sentinel for missing
Keep nesting ≤ 3 levels deepDeeply nest objects (Gemini accuracy drops past 3-4 levels)
Use arrays of objects for listsUse comma-separated strings inside a single field
Include required field listsLeave all fields optional (you'll get inconsistent output)
Add confidence fields for ambiguous extractionAssume extraction is always certain
Use enums for constrained valuesUse free-text for fields with a known set of values

Enum Pattern

{
  "sentiment": {
    "type": "STRING",
    "enum": ["positive", "negative", "neutral", "mixed"]
  }
}

Enums produce far more consistent results than free-text fields for categorical data. Gemini is less likely to return "somewhat positive" or "mostly good" when those aren't valid enum values.

Handling Ambiguity

For fields where extraction might legitimately be uncertain:

{
  "extracted_name": {
    "type": "OBJECT",
    "properties": {
      "value": { "type": "STRING" },
      "confidence": {
        "type": "STRING",
        "enum": ["high", "medium", "low"]
      },
      "evidence": { "type": "STRING" }
    }
  }
}

The evidence field forces Gemini to point to where in the source it found the information, which reduces hallucination.

Nested Object Extraction

For complex documents, flatten where possible:

// HARD for Gemini (deeply nested, implicit relationships)
{
  "company": {
    "departments": [
      {
        "name": "...",
        "employees": [
          {
            "name": "...",
            "manager": { "name": "..." }  // Object reference
          }
        ]
      }
    ]
  }
}

// EASIER for Gemini (flattened, explicit relationships)
{
  "departments": [
    { "id": "dept-1", "name": "Engineering" }
  ],
  "employees": [
    {
      "name": "Alice Chen",
      "department_id": "dept-1",
      "manager_name": "Bob Smith"
    }
  ]
}

Note:

Deeply nested schemas with object references (where one extracted entity references another by ID) are the most common cause of schema extraction failures. Gemini loses track of cross-references in complex hierarchies. Flatten to arrays with explicit foreign keys whenever possible.

Common Schema Antipatterns

AntipatternProblemFix
Overly permissive types"type": "STRING" for everythingUse specific types, enums, and constraints
Missing required arrayGemini omits fields unpredictablyAlways specify required for critical fields
Deep nestingAccuracy decays past 3 levelsFlatten with explicit IDs
No null handlingRequired fields force fabricationAllow null for fields that may be absent
Vague field descriptionsGemini guesses what you wantAdd description to every schema property
Too many fieldsAccuracy per field drops past ~15 fieldsBreak into multiple extraction passes

Multi-Pass Extraction

For documents with 15+ fields, split extraction into passes:

// PASS 1: Extract entity list
Extract all product names and IDs from this catalog.
Schema: { products: [{ id: string, name: string }] }

// PASS 2: Extract details per entity
For each product from Pass 1, extract:
Schema: { details: [{ id: string, price: number, ... }] }

// PASS 3: Merge (application-side, not Gemini)
Merge details into products by matching IDs.

This is more API calls but produces more accurate results than a single massive extraction pass.

Common Failures

FailureCauseFix
Fabricated valuesRequired field with no data in sourceAllow null; add explicit "use null if not found" rule
Type mismatchesGemini returns "N/A" for a number fieldSchema enforcement catches this; add extraction rules
Inconsistent enumsFree-text fields for categorical dataUse enum constraints
Missing nested fieldsDeep nesting confuses extractionFlatten to ≤ 3 levels
Commentary in outputGemini adds explanationsUse responseMimeType: "application/json" to suppress