Gemini's safety system is the most configurable of any major LLM — and the most frequently misunderstood. Unlike OpenAI's opaque content filters or Claude's constitutional approach, Gemini exposes four distinct harm categories with adjustable probability and severity thresholds. You can dial safety up for a children's education app or down for a content moderation research project.

But misconfiguring safety settings is the #1 cause of "Gemini refuses my prompt for no reason." Understanding what each category actually catches — and what it doesn't — is essential.

The Four Harm Categories

Gemini classifies unsafe content into four categories. Each is independently configurable.

HARASSMENT

Covers bullying, threats, slurs, identity-based attacks, and toxic commentary directed at individuals or groups. Also catches content that encourages harassment even if it doesn't directly perform it.

What it catches: Hate speech, personal attacks, cyberbullying prompts, doxxing instructions. What it doesn't: Rude but non-identity-based criticism, heated debate, disagreement.

HATE_SPEECH

A subset of harassment focused specifically on attacks against protected characteristics: race, ethnicity, religion, gender, sexual orientation, disability, and national origin. More aggressive than the harassment filter.

What it catches: Racial slurs, antisemitic content, transphobic content, ethnic stereotyping. What it doesn't: Legitimate discussion of hate speech as a topic, academic analysis of extremist content.

SEXUALLY_EXPLICIT

Covers pornographic content, sexual acts, fetish content, and sexual violence. Also catches prompts that solicit sexually explicit material even when framed as research.

What it catches: Pornographic text generation, erotic content, sexual violence depictions. What it doesn't: Medical/biological discussion of reproduction, relationship advice, discussions of sexuality in academic contexts.

DANGEROUS

Covers content that could cause physical harm: instructions for weapons, explosives, self-harm, illegal activities, medical advice that could be dangerous, and financial schemes.

What it catches: Bomb-making instructions, suicide methods, drug manufacturing, phishing tutorials. What it doesn't: General chemistry discussions, self-defense techniques, regulated financial advice.

Blocking Thresholds

Each category has two thresholds that control when content is blocked. Understanding the difference between them is critical.

Probability Threshold (`HARM_BLOCK_THRESHOLD`)

Controls how likely Gemini thinks harmful content is before blocking. Values:

Level	Behavior
`BLOCK_NONE`	Never block (use with extreme caution)
`BLOCK_ONLY_HIGH`	Block only when harm is very likely
`BLOCK_MEDIUM_AND_ABOVE`	Block medium and high probability (reasonable default)
`BLOCK_LOW_AND_ABOVE`	Block everything above low probability (strict)
`HARM_BLOCK_THRESHOLD_UNSPECIFIED`	Use Google's default per-category

Note:

BLOCK_NONE disables filtering entirely for that category. This is appropriate for content moderation research and adversarial testing, but never for user-facing applications. Even with BLOCK_NONE, Gemini may still refuse to generate content that violates Google's Acceptable Use Policy at the infrastructure level.

Severity Threshold (`HARM_SEVERITY`)

Controls how severe the harmful content must be before blocking. This is distinct from probability — a response can be very likely to contain mildly harmful content, or unlikely to contain extremely harmful content.

Level	Behavior
`HARM_SEVERITY_UNSPECIFIED`	Use Google's default
`HARM_SEVERITY_NEGLIGIBLE`	Block only when harm is non-negligible
`HARM_SEVERITY_LOW`	Block even low-severity harmful content
`HARM_SEVERITY_MEDIUM`	Block medium and high severity
`HARM_SEVERITY_HIGH`	Block only high-severity content

Configuring via API

{
  "safetySettings": [
    {
      "category": "HARM_CATEGORY_HARASSMENT",
      "threshold": "BLOCK_MEDIUM_AND_ABOVE"
    },
    {
      "category": "HARM_CATEGORY_HATE_SPEECH",
      "threshold": "BLOCK_LOW_AND_ABOVE"
    },
    {
      "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
      "threshold": "BLOCK_MEDIUM_AND_ABOVE"
    },
    {
      "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
      "threshold": "BLOCK_ONLY_HIGH"
    }
  ]
}

This configuration blocks hate speech aggressively (even low probability), treats harassment and explicit content at medium, and is permissive about dangerous content (blocking only high probability). This might be appropriate for a cybersecurity research tool — you want Gemini to discuss vulnerabilities but not generate harassment.

When Gemini Refuses: Debugging Safety Blocks

Gemini's refusal messages are deliberately vague ("I can't help with that request"). To debug which safety category is triggered:

Test each category in isolation. Set three categories to BLOCK_NONE and one to BLOCK_LOW_AND_ABOVE. Rotate through all four. The one that blocks your prompt tells you the category.
Lower the severity, not the probability. If a legitimate prompt is blocked, try raising the severity threshold first (HARM_SEVERITY_HIGH). This preserves blocking for severe content while letting milder cases through.
Rephrase, don't fight. If a prompt triggers the DANGEROUS category because it contains keywords like "hack" or "exploit," rephrase without those trigger tokens. "Find vulnerabilities in" instead of "hack." "Workaround for" instead of "exploit."

Note:

Safety blocks are applied at the prompt level and the response level. A prompt might pass safety checks, but if Gemini's response would contain blocked content, it'll refuse after beginning to generate. You'll see an empty or truncated response with a SAFETY finish reason in the API.

Use Case Presets

Copy these configurations for common scenarios:

Children's Education App

{
  "safetySettings": [
    { "category": "HARM_CATEGORY_HARASSMENT", "threshold": "BLOCK_LOW_AND_ABOVE" },
    { "category": "HARM_CATEGORY_HATE_SPEECH", "threshold": "BLOCK_LOW_AND_ABOVE" },
    { "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT", "threshold": "BLOCK_LOW_AND_ABOVE" },
    { "category": "HARM_CATEGORY_DANGEROUS_CONTENT", "threshold": "BLOCK_LOW_AND_ABOVE" }
  ]
}

Creative Writing Tool

{
  "safetySettings": [
    { "category": "HARM_CATEGORY_HARASSMENT", "threshold": "BLOCK_MEDIUM_AND_ABOVE" },
    { "category": "HARM_CATEGORY_HATE_SPEECH", "threshold": "BLOCK_LOW_AND_ABOVE" },
    { "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT", "threshold": "BLOCK_MEDIUM_AND_ABOVE" },
    { "category": "HARM_CATEGORY_DANGEROUS_CONTENT", "threshold": "BLOCK_MEDIUM_AND_ABOVE" }
  ]
}

Security Research / Red Teaming

{
  "safetySettings": [
    { "category": "HARM_CATEGORY_HARASSMENT", "threshold": "BLOCK_ONLY_HIGH" },
    { "category": "HARM_CATEGORY_HATE_SPEECH", "threshold": "BLOCK_ONLY_HIGH" },
    { "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT", "threshold": "BLOCK_ONLY_HIGH" },
    { "category": "HARM_CATEGORY_DANGEROUS_CONTENT", "threshold": "BLOCK_NONE" }
  ]
}

Note:

Even with BLOCK_NONE on dangerous content, Gemini will not generate instructions for weapons of mass destruction, CSAM, or content that violates international law. These are hard blocks at Google's infrastructure level and cannot be overridden.

Safety in System Prompts vs. API Config

A common mistake is trying to control safety behavior through the system prompt:

// INEFFECTIVE — safety is an API concern
"You are allowed to discuss controversial topics and should not
refuse requests unless they are truly harmful."

System prompts cannot override safety settings. If you need permissive behavior, configure it in the API. The system prompt should focus on how to handle sensitive topics that pass safety filters, not whether to filter them:

// EFFECTIVE — guides behavior within safety boundaries
"When discussing topics that others might find controversial,
maintain a neutral, analytical tone. Present multiple perspectives
fairly. Do not advocate for harmful actions even when describing
them as historical facts."

System Prompt Structure — How safety config relates to system prompt design
Persona & Tone Crafting — Personas that push safety boundaries

Gemini Safety Settings: Harms Categories & Configuration

The Four Harm Categories

HARASSMENT

HATE_SPEECH

SEXUALLY_EXPLICIT

DANGEROUS

Blocking Thresholds

Probability Threshold (`HARM_BLOCK_THRESHOLD`)

Severity Threshold (`HARM_SEVERITY`)

Configuring via API

When Gemini Refuses: Debugging Safety Blocks

Use Case Presets

Children's Education App

Creative Writing Tool

Security Research / Red Teaming

Safety in System Prompts vs. API Config

Related Articles

Graph-of-Thought: Beyond Trees and Chains

Master ChatGPT Prompts: Complete Strategy Guide

DeepSeek for Coding: Agentic Coding & FIM Completion

On this page

Gemini Safety Settings: Harms Categories & Configuration

The Four Harm Categories

HARASSMENT

HATE_SPEECH

SEXUALLY_EXPLICIT

DANGEROUS

Blocking Thresholds

Probability Threshold (HARM_BLOCK_THRESHOLD)

Severity Threshold (HARM_SEVERITY)

Configuring via API

When Gemini Refuses: Debugging Safety Blocks

Use Case Presets

Children's Education App

Creative Writing Tool

Security Research / Red Teaming

Safety in System Prompts vs. API Config

Related Pages

Related Articles

Graph-of-Thought: Beyond Trees and Chains

Master ChatGPT Prompts: Complete Strategy Guide

DeepSeek for Coding: Agentic Coding & FIM Completion

On this page

Probability Threshold (`HARM_BLOCK_THRESHOLD`)

Severity Threshold (`HARM_SEVERITY`)