AI/ML Model Scanning
The AI/ML Model Scanning feature lets you automatically probe and test AI/ML model endpoints for security vulnerabilities using techniques from the MITRE ATLAS framework. Instead of manually crafting adversarial inputs, RTF automates the process.
What It Does
The scanner sends a series of adversarial inputs to an AI/ML model API and evaluates how the model responds. It tests for things like:
- Prompt injection — can you override the model's instructions?
- Jailbreaking — can you bypass safety guidelines?
- Information extraction — can you get the model to reveal system prompts or training data?
- Harmful content generation — does the model refuse appropriately?
- Behavioral inconsistencies — does the model behave differently under adversarial conditions?
Before You Start
You need:
- API access to the AI/ML model you want to test
- An API key (if the target uses authentication)
- The model's endpoint URL
- An active RTF engagement (profile)
Only scan AI/ML systems you are authorized to test. Unauthorized scanning of AI systems is both unethical and potentially illegal.
Step 1 — Create a Scan Configuration
A Scan Configuration defines the target model and how the scan should run.
-
Go to AI/ML Scanning → Configurations
-
Click New Configuration
-
Fill in the settings:
Setting Required Description Name Yes Unique name for this configuration Target Model Yes The model to test (e.g., gpt-4,llama-3-8b)Target Model Type Yes OpenAI-compatible, Ollama, or Anthropic API Key If needed API key for the target model Controller Model Yes The AI model that generates attack prompts Iterations Yes How many attack attempts to run (1–100) Intensity Yes Low, Medium, or High attack intensity Security Level Yes How strictly the scanner evaluates responses Firewall Toggle Simulate WAF/content filtering during scan Custom Rules No Additional instructions for the attack controller -
Click Save Configuration
Step 2 — Configure HTTP Settings (Optional)
If the target model is behind a custom HTTP endpoint (not a standard API), you can configure:
- Custom headers
- Base URL overrides
- Authentication tokens
Go to AI/ML Scanning → HTTP Configurations to set this up.
Step 3 — Run a Scan
- Go to AI/ML Scanning → Scans
- Select your saved configuration
- Click Start Scan
- The scan runs automatically — each iteration sends attack prompts and records responses
You can monitor the scan in real time as results come in.
Understanding Scan Results
After the scan completes, you'll see:
| Result Field | Meaning |
|---|---|
| Iteration | Which attack attempt this was |
| Attack Prompt | What the controller sent to the target |
| Target Response | What the target model replied |
| Evaluation | Whether the attack succeeded or the model defended properly |
| Technique | Which ATLAS technique this attempt relates to |
Scan Intensity Levels
| Intensity | Meaning |
|---|---|
| Low | Gentle probing — basic prompt injections and jailbreak attempts |
| Medium | Moderate testing — multiple attack strategies |
| High | Aggressive testing — extensive adversarial inputs, repeated attempts |
Start with Low for initial scans on unknown targets. Use High only for comprehensive assessments where you have explicit authorization.
Using a Local Model for Testing
If you want to test an AI model running locally (not a cloud API), RTF integrates with Ollama. Set the model type to Ollama and provide the Ollama endpoint URL.
This is useful for:
- Testing open-source models (Llama, Mistral, etc.)
- Running scans without sending data to external APIs
- Development and research environments
Saving Results as Findings
After reviewing scan results, you can create Findings from significant discoveries:
- Identify a result where the model's defense failed
- Click Create Finding on that result
- The finding is pre-filled with the ATLAS technique and attack details
- Add your description and evidence
- Save — it appears in your Findings list and syncs to the ATLAS Navigator
Tips
- Start small — run 5–10 iterations first to understand how the target model behaves
- Vary intensity — low intensity finds obvious issues, high intensity finds edge cases
- Document everything — even failed attacks are useful evidence of the model's defenses
- Use the controller model wisely — the default controller is optimized for ATLAS attacks; you can override it with a different model if needed
Next Steps
- ATLAS Navigator → — see your scan findings reflected in the navigator
- Findings → — record and document what the scan revealed
- Analytics Dashboard → — see coverage across your AI/ML assessment