Managing Guardrails in Airia
Welcome to the guide to Managing Guardrails within your Airia platform. Airia Guardrails provides configurable safeguards to help safely build generative AI applications, agnostic of foundation models, at scale. The Airia platform allows you to implement these safeguards and customize them to your use cases and responsible AI policies. You can create multiple Guardrails tailored to different use cases, apply them to all agents in your organization, all agents in a project, or even granularly to specific agents. Guardrail filters for Security, Responsible AI, or Data Leakage prevention can be targeted on the input and output of models, agents, or data retreival using RAG techniques.
Here's how you can create and update your Guardrails to moderate model responses, prevent secrets and sensitive data Leakage, and block prompt injection or jailbreak attempts. We place particular focus on enabling you to leverage AI quickly, responsibily and securely. This guide explains how to set these features up in your platform and start tracking possible violations.
Applying Guardrails
- Log into your platform and navigate to the Guardrails heading on the left-hand menu. You will see a list of all the Guardrails in your tenant.
- Click "Create Guardrail"
- Add a name and description for your Guardrail before proceeding to select your desired filters. A Guardrail can contain one or multiple filters.
- Click "Target and Scope" to select what traffic and which agents the Guardrails should be applied to.
- Guardrail filters can target the model input, output, or both inside an agent, to AI gateway traffic, and to data chunks sent to a model.
- Select the scope to decide whether the Guardrail should be applied to all agents built in the Airia tenant, all agents in a project, or specific agents.
- Click "save draft" to save your progress or "Apply Guardrail" to activate.
Note: Guardrails can also be created from the Agent builder canvas.
Data Leakage Prevention Filters
Airia's DLP filters allow out of the box detection of 100+ sensitive data attributes such as credit card numbers and Social Security Numbers, in addition to allowing the creation of custom detectors. Organizations can define the default guardrail behavior and opt to block the request or response containing a detected attribute, redact and replace sensitive attributes to prevent sending those attributes to a model hosted by a third party, or simply audit and record possible violations.
Responsible AI Filters
Airia's keyword detection and content moderation filters allow organizations to detect and block references to competitors, inappropriate requests, and inappropriate model responses based on configurable threshholds. The Airia platform allows choice of moderation models between an Airia hosted RoBERTa model for text moderation and OpenAI's Omni moderation endpoint for text and image moderation.
Security Filters
Airia's Security filters enable enterprises to detect and block direct and indirect prompt injection and jailbreak attempts out of the box using fine tuned industry leading maodels like Llama's prompt guard.
Conflict resolution
When multiple Guardrails apply to an entity, conflicts may arise in guardrail action type, action message, and confidence score. Airia calculates a resulting set of Guardrails and prioritizes its behavior in this order of priority: Block (Highest priority) > Redact > Audit
Tracking Violations
-
Log onto your platform and navigate to the “Feeds” heading on the left-hand menu. You will see a list of all the Feeds Airia provides but we will focus on Security Violations and DLP Violations at the bottom of the list.
-
Click “Security Violations” or “DLP Violations” to access the list view of all the recorded violations.
-
You can use the filters across the top of the Feed to narrow the view to specific data you are looking for or keep the whole list.
-
Exporting this data can be done by clicking the “Export” button in the top right of the Feed. This will download a CSV file of your records.