Configuring Generative AI Safety: Content Filters on Vertex AI
In-depth discussion
Technical
0 0 358
This article provides an overview of the safety and content filters available in the Gemini API within Vertex AI. It explains how to configure these filters to block harmful responses, details the types of unsafe prompts and responses, and offers best practices for using safety filters effectively.
main points
unique insights
practical applications
key topics
key insights
learning outcomes
• main points
1
Comprehensive coverage of safety filter configurations
2
Clear explanations of unsafe prompts and responses
3
Practical examples of API usage for content filtering
• unique insights
1
Detailed breakdown of harm categories and their definitions
2
Insights into the balance between safety and content generation
• practical applications
The article provides actionable guidance on configuring content filters, making it highly valuable for developers looking to implement safety measures in their applications.
• key topics
1
Safety filters in AI
2
Configurable content filters
3
Harm categories and their implications
• key insights
1
In-depth exploration of safety measures in generative AI
2
Practical API examples for real-world implementation
3
Guidance on balancing safety and content generation
• learning outcomes
1
Understand the importance of safety filters in AI applications
2
Learn how to configure content filters using the Gemini API
3
Gain insights into best practices for managing harmful content
“ Introduction to Safety and Content Filters in Generative AI
Generative AI models, like Gemini on Vertex AI, prioritize safety but can still produce harmful responses. Content filters are crucial for blocking potentially harmful outputs by adjusting blocking thresholds. These filters act as a barrier but don't directly influence the model's behavior. For guiding the model's output, system instructions for safety are recommended. This article provides a comprehensive guide to understanding and configuring these filters for optimal safety and responsible AI practices.
“ Understanding Unsafe Prompts and Responses
The Gemini API on Vertex AI can reject prompts for various reasons, indicated by enum codes such as `PROHIBITED_CONTENT` (usually CSAM), `BLOCKED_REASON_UNSPECIFIED`, and `OTHER`. When a prompt is blocked, the API provides feedback with a `blockReason`. Unsafe responses are detected and blocked by non-configurable safety filters (CSAM, PII), configurable content filters (harm categories), and citation filters. The API uses enum codes like `SAFETY`, `RECITATION`, `SPII`, and `PROHIBITED_CONTENT` to explain why token generation stopped. If a filter blocks a response, the `Candidate.content` field is empty, without providing feedback to the model.
“ Configurable Content Filters: Harm Categories and Scoring
Configurable content filters assess content against a list of harms, assigning probability and severity scores for each harm category. Harm categories include Hate Speech, Harassment, Sexually Explicit content, and Dangerous Content. Probability scores reflect the likelihood of harm, discretized into NEGLIGIBLE, LOW, MEDIUM, and HIGH levels. Severity scores reflect the magnitude of potential harm, also discretized into four levels. Content can have varying combinations of probability and severity scores, requiring careful configuration of filters.
“ Configuring Content Filters via Gemini API and Google Cloud Console
Content filters can be configured using the Gemini API in Vertex AI or the Google Cloud console. The Gemini API offers fine-grained control with `SEVERITY` and `PROBABILITY` methods and multiple threshold levels like `BLOCK_LOW_AND_ABOVE`, `BLOCK_MEDIUM_AND_ABOVE`, `BLOCK_ONLY_HIGH`, `HARM_BLOCK_THRESHOLD_UNSPECIFIED`, `OFF`, and `BLOCK_NONE`. The Google Cloud console provides a simpler UI-based approach with predefined threshold levels: Off, Block few, Block some, and Block most, using only probability scores. Examples in Python, Node.js, Java, Go, C#, and REST are available for Gemini API configuration.
“ Citation and Civic Integrity Filters
The citation filter in Vertex AI's generative code features cites sources when the model quotes extensively from a web page, ensuring original content and compliance with license requirements. The civic integrity filter, currently in preview, detects and blocks prompts related to political elections and candidates. It is disabled by default and can be enabled by setting the blocking threshold for `CIVIC_INTEGRITY` to `BLOCK_LOW_AND_ABOVE`, `BLOCK_MEDIUM_AND_ABOVE`, or `BLOCK_ONLY_HIGH`.
“ Best Practices for Using Content Filters
While content filters are essential for preventing unsafe content, they may occasionally block benign content or miss harmful content. Testing different filter settings is crucial to find the right balance between safety and allowing appropriate content. Advanced models like Gemini 2.5 Flash are designed to generate safe responses even without filters, emphasizing the importance of continuous monitoring and adjustment of safety settings.
“ Examples of Content Filter Configuration
The article provides examples of how to configure content filters using the Gemini API in Vertex AI, including Python and REST examples. These examples demonstrate how to set thresholds for different harm categories, such as sexually explicit content, hate speech, harassment, and dangerous content. The REST example shows how to send a request to the publisher model endpoint with specific safety settings.
“ Conclusion
Configuring safety and content filters in Generative AI models like Gemini on Vertex AI is crucial for responsible AI development. By understanding unsafe prompts and responses, utilizing configurable content filters, and following best practices, developers can create safer and more reliable AI applications. Regular monitoring and adjustments are essential to maintain an optimal balance between safety and functionality.
We use cookies that are essential for our site to work. To improve our site, we would like to use additional cookies to help us understand how visitors use it, measure traffic to our site from social media platforms and to personalise your experience. Some of the cookies that we use are provided by third parties. To accept all cookies click ‘Accept’. To reject all optional cookies click ‘Reject’.
Comment(0)