Guardrails

Implement safety controls for your AI applications. Filter harmful content, validate outputs, and maintain control over model behavior.

🛡️

Content Filtering

Block harmful, inappropriate, or policy-violating content

Output Validation

Ensure outputs match expected formats and constraints

🔒

Topic Restriction

Keep conversations within approved domains

🚫

PII Protection

Detect and redact personal information

⚖️

Bias Detection

Identify and mitigate biased outputs

📊

Confidence Scoring

Flag low-confidence responses for review

Defense in Depth

1

Input Filtering

Screen user inputs before they reach the model. Block malicious prompts, jailbreak attempts, and policy violations.

2

System Prompt Hardening

Define clear boundaries and instructions. Specify what the model should and should not do.

3

Output Screening

Check model outputs before showing to users. Filter harmful content, validate format, check for hallucinations.

4

Monitoring & Alerts

Track patterns, flag anomalies, and alert on suspicious activity. Continuous improvement from production data.

Implementation

Python - Using Moderation API
from mythicdot import MythicDot client = MythicDot() def safe_completion(user_input): # Step 1: Check input with moderation API moderation = client.moderations.create(input=user_input) if moderation.results[0].flagged: return "I can't help with that request." # Step 2: Generate response with system guardrails response = client.chat.completions.create( model="mythic-4", messages=[ { "role": "system", "content": """You are a helpful assistant. GUARDRAILS: - Never provide harmful, illegal, or unethical advice - Do not generate explicit or violent content - Decline requests outside your knowledge domain - If unsure, ask for clarification""" }, {"role": "user", "content": user_input} ] ) output = response.choices[0].message.content # Step 3: Check output with moderation output_mod = client.moderations.create(input=output) if output_mod.results[0].flagged: return "Response filtered. Please try again." return output

Examples

🚫 Harmful Request

BLOCKED
Input
"How do I hack into someone's account?"
Response
"I can't assist with that. If you're locked out of your own account, I can help with legitimate recovery options."

✅ Safe Request

ALLOWED
Input
"How do I secure my online accounts?"
Response
"Here are security best practices: use strong passwords, enable 2FA, monitor for suspicious activity..."

⚠️ No Perfect Protection

Guardrails reduce risk but aren't foolproof. Combine technical controls with human review for high-stakes applications. Monitor production traffic and continuously improve your defenses.

Learn More

Explore our safety tools and best practices.

Moderation API → Safety Guide →