Configuring Generative AI Safety: Content Filters on Vertex AI

In-depth discussion

Technical

450

This article provides an overview of the safety and content filters available in the Gemini API within Vertex AI. It explains how to configure these filters to block harmful responses, details the types of unsafe prompts and responses, and offers best practices for using safety filters effectively.

main points
unique insights
practical applications
key topics
key insights
learning outcomes

• main points
- 1
  Comprehensive coverage of safety filter configurations
- 2
  Clear explanations of unsafe prompts and responses
- 3
  Practical examples of API usage for content filtering
• unique insights
- 1
  Detailed breakdown of harm categories and their definitions
- 2
  Insights into the balance between safety and content generation
• practical applications
- The article provides actionable guidance on configuring content filters, making it highly valuable for developers looking to implement safety measures in their applications.
• key topics
- 1
  Safety filters in AI
- 2
  Configurable content filters
- 3
  Harm categories and their implications
• key insights
- 1
  In-depth exploration of safety measures in generative AI
- 2
  Practical API examples for real-world implementation
- 3
  Guidance on balancing safety and content generation
• learning outcomes
- 1
  Understand the importance of safety filters in AI applications
- 2
  Learn how to configure content filters using the Gemini API
- 3
  Gain insights into best practices for managing harmful content

examples	tutorials	code samples	visuals
fundamentals	advanced content	practical tips	best practices

• Introduction to Safety and Content Filters in Generative AI
• Understanding Unsafe Prompts and Responses
• Configurable Content Filters: Harm Categories and Scoring
• Configuring Content Filters via Gemini API and Google Cloud Console
• Citation and Civic Integrity Filters
• Best Practices for Using Content Filters
• Examples of Content Filter Configuration
• Conclusion

“ Introduction to Safety and Content Filters in Generative AI

Generative AI models, like Gemini on Vertex AI, prioritize safety but can still produce harmful responses. Content filters are crucial for blocking potentially harmful outputs by adjusting blocking thresholds. These filters act as a barrier but don't directly influence the model's behavior. For guiding the model's output, system instructions for safety are recommended. This article provides a comprehensive guide to understanding and configuring these filters for optimal safety and responsible AI practices.

“ Understanding Unsafe Prompts and Responses

The Gemini API on Vertex AI can reject prompts for various reasons, indicated by enum codes such as `PROHIBITED_CONTENT` (usually CSAM), `BLOCKED_REASON_UNSPECIFIED`, and `OTHER`. When a prompt is blocked, the API provides feedback with a `blockReason`. Unsafe responses are detected and blocked by non-configurable safety filters (CSAM, PII), configurable content filters (harm categories), and citation filters. The API uses enum codes like `SAFETY`, `RECITATION`, `SPII`, and `PROHIBITED_CONTENT` to explain why token generation stopped. If a filter blocks a response, the `Candidate.content` field is empty, without providing feedback to the model.

“ Configurable Content Filters: Harm Categories and Scoring

Configurable content filters assess content against a list of harms, assigning probability and severity scores for each harm category. Harm categories include Hate Speech, Harassment, Sexually Explicit content, and Dangerous Content. Probability scores reflect the likelihood of harm, discretized into NEGLIGIBLE, LOW, MEDIUM, and HIGH levels. Severity scores reflect the magnitude of potential harm, also discretized into four levels. Content can have varying combinations of probability and severity scores, requiring careful configuration of filters.

“ Configuring Content Filters via Gemini API and Google Cloud Console

Content filters can be configured using the Gemini API in Vertex AI or the Google Cloud console. The Gemini API offers fine-grained control with `SEVERITY` and `PROBABILITY` methods and multiple threshold levels like `BLOCK_LOW_AND_ABOVE`, `BLOCK_MEDIUM_AND_ABOVE`, `BLOCK_ONLY_HIGH`, `HARM_BLOCK_THRESHOLD_UNSPECIFIED`, `OFF`, and `BLOCK_NONE`. The Google Cloud console provides a simpler UI-based approach with predefined threshold levels: Off, Block few, Block some, and Block most, using only probability scores. Examples in Python, Node.js, Java, Go, C#, and REST are available for Gemini API configuration.

“ Citation and Civic Integrity Filters

The citation filter in Vertex AI's generative code features cites sources when the model quotes extensively from a web page, ensuring original content and compliance with license requirements. The civic integrity filter, currently in preview, detects and blocks prompts related to political elections and candidates. It is disabled by default and can be enabled by setting the blocking threshold for `CIVIC_INTEGRITY` to `BLOCK_LOW_AND_ABOVE`, `BLOCK_MEDIUM_AND_ABOVE`, or `BLOCK_ONLY_HIGH`.

“ Best Practices for Using Content Filters

While content filters are essential for preventing unsafe content, they may occasionally block benign content or miss harmful content. Testing different filter settings is crucial to find the right balance between safety and allowing appropriate content. Advanced models like Gemini 2.5 Flash are designed to generate safe responses even without filters, emphasizing the importance of continuous monitoring and adjustment of safety settings.

“ Examples of Content Filter Configuration

The article provides examples of how to configure content filters using the Gemini API in Vertex AI, including Python and REST examples. These examples demonstrate how to set thresholds for different harm categories, such as sexually explicit content, hate speech, harassment, and dangerous content. The REST example shows how to send a request to the publisher model endpoint with specific safety settings.

“ Conclusion

Configuring safety and content filters in Generative AI models like Gemini on Vertex AI is crucial for responsible AI development. By understanding unsafe prompts and responses, utilizing configurable content filters, and following best practices, developers can create safer and more reliable AI applications. Regular monitoring and adjustments are essential to maintain an optimal balance between safety and functionality.

Original link: https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/configure-safety-filters

Comment(0)

Desc

Configuring Generative AI Safety: Content Filters on Vertex AI

• main points

• unique insights

• practical applications

• key topics

• key insights

• learning outcomes

Table of contents

“ Introduction to Safety and Content Filters in Generative AI

“ Understanding Unsafe Prompts and Responses

“ Configurable Content Filters: Harm Categories and Scoring

“ Configuring Content Filters via Gemini API and Google Cloud Console

“ Citation and Civic Integrity Filters

“ Best Practices for Using Content Filters

“ Examples of Content Filter Configuration

“ Conclusion

Comment(0)

Similar Learning

Mastering the OpenAI API: A Comprehensive Guide to Using GPT-3.5 and GPT-4 in Python

Luma AI: Transforming 3D Modeling with Visual AI Innovations

Maximizing the Feedly PIR Blueprint for Effective Threat Intelligence

Mastering AI Actions: A Guide to Optimizing Prompts for Effective Insights

Practical Steps for Effective Threat Modeling in Cybersecurity

Mastering Seaborn Heatmaps for Effective Data Visualization

Related Tools

ChatGPT

Canva

SayNow AI

Gemini

Nova

StyleMagicAI