AI/ML Model Scanning

The AI/ML Model Scanning feature lets you automatically probe and test AI/ML model endpoints for security vulnerabilities using techniques from the MITRE ATLAS framework. Instead of manually crafting adversarial inputs, RTF automates the process.

What It Does

The scanner sends a series of adversarial inputs to an AI/ML model API and evaluates how the model responds. It tests for things like:

Prompt injection — can you override the model's instructions?
Jailbreaking — can you bypass safety guidelines?
Information extraction — can you get the model to reveal system prompts or training data?
Harmful content generation — does the model refuse appropriately?
Behavioral inconsistencies — does the model behave differently under adversarial conditions?

Before You Start

You need:

API access to the AI/ML model you want to test
An API key (if the target uses authentication)
The model's endpoint URL
An active RTF engagement (profile)

Authorization Required

Only scan AI/ML systems you are authorized to test. Unauthorized scanning of AI systems is both unethical and potentially illegal.

Step 1 — Create a Scan Configuration

A Scan Configuration defines the target model and how the scan should run.

Go to AI/ML Scanning → Configurations
Click New Configuration

Fill in the settings:

Setting	Required	Description
Name	Yes	Unique name for this configuration
Target Model	Yes	The model to test (e.g., `gpt-4`, `llama-3-8b`)
Target Model Type	Yes	OpenAI-compatible, Ollama, or Anthropic
API Key	If needed	API key for the target model
Controller Model	Yes	The AI model that generates attack prompts
Iterations	Yes	How many attack attempts to run (1–100)
Intensity	Yes	Low, Medium, or High attack intensity
Security Level	Yes	How strictly the scanner evaluates responses
Firewall	Toggle	Simulate WAF/content filtering during scan
Custom Rules	No	Additional instructions for the attack controller

Click Save Configuration

Step 2 — Configure HTTP Settings (Optional)

If the target model is behind a custom HTTP endpoint (not a standard API), you can configure:

Custom headers
Base URL overrides
Authentication tokens

Go to AI/ML Scanning → HTTP Configurations to set this up.

Step 3 — Run a Scan

Go to AI/ML Scanning → Scans
Select your saved configuration
Click Start Scan
The scan runs automatically — each iteration sends attack prompts and records responses

You can monitor the scan in real time as results come in.

Understanding Scan Results

After the scan completes, you'll see:

Result Field	Meaning
Iteration	Which attack attempt this was
Attack Prompt	What the controller sent to the target
Target Response	What the target model replied
Evaluation	Whether the attack succeeded or the model defended properly
Technique	Which ATLAS technique this attempt relates to

Scan Intensity Levels

Intensity	Meaning
Low	Gentle probing — basic prompt injections and jailbreak attempts
Medium	Moderate testing — multiple attack strategies
High	Aggressive testing — extensive adversarial inputs, repeated attempts

Start with Low for initial scans on unknown targets. Use High only for comprehensive assessments where you have explicit authorization.

Using a Local Model for Testing

If you want to test an AI model running locally (not a cloud API), RTF integrates with Ollama. Set the model type to Ollama and provide the Ollama endpoint URL.

This is useful for:

Testing open-source models (Llama, Mistral, etc.)
Running scans without sending data to external APIs
Development and research environments

Saving Results as Findings

After reviewing scan results, you can create Findings from significant discoveries:

Identify a result where the model's defense failed
Click Create Finding on that result
The finding is pre-filled with the ATLAS technique and attack details
Add your description and evidence
Save — it appears in your Findings list and syncs to the ATLAS Navigator

Tips

Start small — run 5–10 iterations first to understand how the target model behaves
Vary intensity — low intensity finds obvious issues, high intensity finds edge cases
Document everything — even failed attacks are useful evidence of the model's defenses
Use the controller model wisely — the default controller is optimized for ATLAS attacks; you can override it with a different model if needed

Next Steps

ATLAS Navigator → — see your scan findings reflected in the navigator
Findings → — record and document what the scan revealed
Analytics Dashboard → — see coverage across your AI/ML assessment

What It Does​

Before You Start​

Step 1 — Create a Scan Configuration​

Step 2 — Configure HTTP Settings (Optional)​

Step 3 — Run a Scan​

Understanding Scan Results​

Scan Intensity Levels​

Using a Local Model for Testing​

Saving Results as Findings​

Tips​

Next Steps​