Gemini Safety Settings: Harms Categories & Configuration
Master Gemini's configurable safety filters. Learn how to balance harm prevention with creative freedom by tuning harm categories, probability thresholds, and severity levels.
Gemini's safety system is the most configurable of any major LLM — and the most frequently misunderstood. Unlike OpenAI's opaque content filters or Claude's constitutional approach, Gemini exposes four distinct harm categories with adjustable probability and severity thresholds. You can dial safety up for a children's education app or down for a content moderation research project.
But misconfiguring safety settings is the #1 cause of "Gemini refuses my prompt for no reason." Understanding what each category actually catches — and what it doesn't — is essential.
The Four Harm Categories
Gemini classifies unsafe content into four categories. Each is independently configurable.
HARASSMENT
Covers bullying, threats, slurs, identity-based attacks, and toxic commentary directed at individuals or groups. Also catches content that encourages harassment even if it doesn't directly perform it.
What it catches: Hate speech, personal attacks, cyberbullying prompts, doxxing instructions. What it doesn't: Rude but non-identity-based criticism, heated debate, disagreement.
HATE_SPEECH
A subset of harassment focused specifically on attacks against protected characteristics: race, ethnicity, religion, gender, sexual orientation, disability, and national origin. More aggressive than the harassment filter.
What it catches: Racial slurs, antisemitic content, transphobic content, ethnic stereotyping. What it doesn't: Legitimate discussion of hate speech as a topic, academic analysis of extremist content.
SEXUALLY_EXPLICIT
Covers pornographic content, sexual acts, fetish content, and sexual violence. Also catches prompts that solicit sexually explicit material even when framed as research.
What it catches: Pornographic text generation, erotic content, sexual violence depictions. What it doesn't: Medical/biological discussion of reproduction, relationship advice, discussions of sexuality in academic contexts.
DANGEROUS
Covers content that could cause physical harm: instructions for weapons, explosives, self-harm, illegal activities, medical advice that could be dangerous, and financial schemes.
What it catches: Bomb-making instructions, suicide methods, drug manufacturing, phishing tutorials. What it doesn't: General chemistry discussions, self-defense techniques, regulated financial advice.
Blocking Thresholds
Each category has two thresholds that control when content is blocked. Understanding the difference between them is critical.
Probability Threshold (HARM_BLOCK_THRESHOLD)
Controls how likely Gemini thinks harmful content is before blocking. Values:
| Level | Behavior |
|---|---|
BLOCK_NONE | Never block (use with extreme caution) |
BLOCK_ONLY_HIGH | Block only when harm is very likely |
BLOCK_MEDIUM_AND_ABOVE | Block medium and high probability (reasonable default) |
BLOCK_LOW_AND_ABOVE | Block everything above low probability (strict) |
HARM_BLOCK_THRESHOLD_UNSPECIFIED | Use Google's default per-category |
Note:
BLOCK_NONE disables filtering entirely for that category. This is appropriate for content moderation research and adversarial testing, but never for user-facing applications. Even with BLOCK_NONE, Gemini may still refuse to generate content that violates Google's Acceptable Use Policy at the infrastructure level.
Severity Threshold (HARM_SEVERITY)
Controls how severe the harmful content must be before blocking. This is distinct from probability — a response can be very likely to contain mildly harmful content, or unlikely to contain extremely harmful content.
| Level | Behavior |
|---|---|
HARM_SEVERITY_UNSPECIFIED | Use Google's default |
HARM_SEVERITY_NEGLIGIBLE | Block only when harm is non-negligible |
HARM_SEVERITY_LOW | Block even low-severity harmful content |
HARM_SEVERITY_MEDIUM | Block medium and high severity |
HARM_SEVERITY_HIGH | Block only high-severity content |
Configuring via API
{
"safetySettings": [
{
"category": "HARM_CATEGORY_HARASSMENT",
"threshold": "BLOCK_MEDIUM_AND_ABOVE"
},
{
"category": "HARM_CATEGORY_HATE_SPEECH",
"threshold": "BLOCK_LOW_AND_ABOVE"
},
{
"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
"threshold": "BLOCK_MEDIUM_AND_ABOVE"
},
{
"category": "HARM_CATEGORY_DANGEROUS_CONTENT",
"threshold": "BLOCK_ONLY_HIGH"
}
]
}
This configuration blocks hate speech aggressively (even low probability), treats harassment and explicit content at medium, and is permissive about dangerous content (blocking only high probability). This might be appropriate for a cybersecurity research tool — you want Gemini to discuss vulnerabilities but not generate harassment.
When Gemini Refuses: Debugging Safety Blocks
Gemini's refusal messages are deliberately vague ("I can't help with that request"). To debug which safety category is triggered:
-
Test each category in isolation. Set three categories to
BLOCK_NONEand one toBLOCK_LOW_AND_ABOVE. Rotate through all four. The one that blocks your prompt tells you the category. -
Lower the severity, not the probability. If a legitimate prompt is blocked, try raising the severity threshold first (
HARM_SEVERITY_HIGH). This preserves blocking for severe content while letting milder cases through. -
Rephrase, don't fight. If a prompt triggers the DANGEROUS category because it contains keywords like "hack" or "exploit," rephrase without those trigger tokens. "Find vulnerabilities in" instead of "hack." "Workaround for" instead of "exploit."
Note:
Safety blocks are applied at the prompt level and the response level. A prompt might pass safety checks, but if Gemini's response would contain blocked content, it'll refuse after beginning to generate. You'll see an empty or truncated response with a SAFETY finish reason in the API.
Use Case Presets
Copy these configurations for common scenarios:
Children's Education App
{
"safetySettings": [
{ "category": "HARM_CATEGORY_HARASSMENT", "threshold": "BLOCK_LOW_AND_ABOVE" },
{ "category": "HARM_CATEGORY_HATE_SPEECH", "threshold": "BLOCK_LOW_AND_ABOVE" },
{ "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT", "threshold": "BLOCK_LOW_AND_ABOVE" },
{ "category": "HARM_CATEGORY_DANGEROUS_CONTENT", "threshold": "BLOCK_LOW_AND_ABOVE" }
]
}
Creative Writing Tool
{
"safetySettings": [
{ "category": "HARM_CATEGORY_HARASSMENT", "threshold": "BLOCK_MEDIUM_AND_ABOVE" },
{ "category": "HARM_CATEGORY_HATE_SPEECH", "threshold": "BLOCK_LOW_AND_ABOVE" },
{ "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT", "threshold": "BLOCK_MEDIUM_AND_ABOVE" },
{ "category": "HARM_CATEGORY_DANGEROUS_CONTENT", "threshold": "BLOCK_MEDIUM_AND_ABOVE" }
]
}
Security Research / Red Teaming
{
"safetySettings": [
{ "category": "HARM_CATEGORY_HARASSMENT", "threshold": "BLOCK_ONLY_HIGH" },
{ "category": "HARM_CATEGORY_HATE_SPEECH", "threshold": "BLOCK_ONLY_HIGH" },
{ "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT", "threshold": "BLOCK_ONLY_HIGH" },
{ "category": "HARM_CATEGORY_DANGEROUS_CONTENT", "threshold": "BLOCK_NONE" }
]
}
Note:
Even with BLOCK_NONE on dangerous content, Gemini will not generate instructions for weapons of mass destruction, CSAM, or content that violates international law. These are hard blocks at Google's infrastructure level and cannot be overridden.
Safety in System Prompts vs. API Config
A common mistake is trying to control safety behavior through the system prompt:
// INEFFECTIVE — safety is an API concern
"You are allowed to discuss controversial topics and should not
refuse requests unless they are truly harmful."
System prompts cannot override safety settings. If you need permissive behavior, configure it in the API. The system prompt should focus on how to handle sensitive topics that pass safety filters, not whether to filter them:
// EFFECTIVE — guides behavior within safety boundaries
"When discussing topics that others might find controversial,
maintain a neutral, analytical tone. Present multiple perspectives
fairly. Do not advocate for harmful actions even when describing
them as historical facts."
Related Pages
- System Prompt Structure — How safety config relates to system prompt design
- Persona & Tone Crafting — Personas that push safety boundaries
Related Articles
Multi-Artifact Workflows: Building Apps Across Artifacts
Build applications that span multiple Claude Artifacts. Learn to orchestrate dependencies, maintain consistent design systems, share types across artifacts, and manage complex multi-file projects.
Portrait Photography SREF Codes
Professional portrait photography SREF codes for Midjourney including studio portraits, natural light, headshots, and character studies.
DeepSeek Tool Calls with Thinking: reasoning_content Management
Combine DeepSeek's tool calls with thinking mode. The mandatory reasoning_content passback in tool loops, strict mode for JSON schema enforcement, and error handling patterns to avoid 400 errors.