Implement safety controls for your AI applications. Filter harmful content, validate outputs, and maintain control over model behavior.
Block harmful, inappropriate, or policy-violating content
Ensure outputs match expected formats and constraints
Keep conversations within approved domains
Detect and redact personal information
Identify and mitigate biased outputs
Flag low-confidence responses for review
Screen user inputs before they reach the model. Block malicious prompts, jailbreak attempts, and policy violations.
Define clear boundaries and instructions. Specify what the model should and should not do.
Check model outputs before showing to users. Filter harmful content, validate format, check for hallucinations.
Track patterns, flag anomalies, and alert on suspicious activity. Continuous improvement from production data.
from mythicdot import MythicDot
client = MythicDot()
def safe_completion(user_input):
# Step 1: Check input with moderation API
moderation = client.moderations.create(input=user_input)
if moderation.results[0].flagged:
return "I can't help with that request."
# Step 2: Generate response with system guardrails
response = client.chat.completions.create(
model="mythic-4",
messages=[
{
"role": "system",
"content": """You are a helpful assistant.
GUARDRAILS:
- Never provide harmful, illegal, or unethical advice
- Do not generate explicit or violent content
- Decline requests outside your knowledge domain
- If unsure, ask for clarification"""
},
{"role": "user", "content": user_input}
]
)
output = response.choices[0].message.content
# Step 3: Check output with moderation
output_mod = client.moderations.create(input=output)
if output_mod.results[0].flagged:
return "Response filtered. Please try again."
return output
Guardrails reduce risk but aren't foolproof. Combine technical controls with human review for high-stakes applications. Monitor production traffic and continuously improve your defenses.
Explore our safety tools and best practices.