Skip to main content

AI/ML Model Scanning

The AI/ML Model Scanning feature lets you automatically probe and test AI/ML model endpoints for security vulnerabilities using techniques from the MITRE ATLAS framework. Instead of manually crafting adversarial inputs, RTF automates the process.


What It Does

The scanner sends a series of adversarial inputs to an AI/ML model API and evaluates how the model responds. It tests for things like:

  • Prompt injection — can you override the model's instructions?
  • Jailbreaking — can you bypass safety guidelines?
  • Information extraction — can you get the model to reveal system prompts or training data?
  • Harmful content generation — does the model refuse appropriately?
  • Behavioral inconsistencies — does the model behave differently under adversarial conditions?

Before You Start

You need:

  1. API access to the AI/ML model you want to test
  2. An API key (if the target uses authentication)
  3. The model's endpoint URL
  4. An active RTF engagement (profile)
Authorization Required

Only scan AI/ML systems you are authorized to test. Unauthorized scanning of AI systems is both unethical and potentially illegal.


Step 1 — Create a Scan Configuration

A Scan Configuration defines the target model and how the scan should run.

  1. Go to AI/ML Scanning → Configurations

  2. Click New Configuration

  3. Fill in the settings:

    SettingRequiredDescription
    NameYesUnique name for this configuration
    Target ModelYesThe model to test (e.g., gpt-4, llama-3-8b)
    Target Model TypeYesOpenAI-compatible, Ollama, or Anthropic
    API KeyIf neededAPI key for the target model
    Controller ModelYesThe AI model that generates attack prompts
    IterationsYesHow many attack attempts to run (1–100)
    IntensityYesLow, Medium, or High attack intensity
    Security LevelYesHow strictly the scanner evaluates responses
    FirewallToggleSimulate WAF/content filtering during scan
    Custom RulesNoAdditional instructions for the attack controller
  4. Click Save Configuration


Step 2 — Configure HTTP Settings (Optional)

If the target model is behind a custom HTTP endpoint (not a standard API), you can configure:

  • Custom headers
  • Base URL overrides
  • Authentication tokens

Go to AI/ML Scanning → HTTP Configurations to set this up.


Step 3 — Run a Scan

  1. Go to AI/ML Scanning → Scans
  2. Select your saved configuration
  3. Click Start Scan
  4. The scan runs automatically — each iteration sends attack prompts and records responses

You can monitor the scan in real time as results come in.


Understanding Scan Results

After the scan completes, you'll see:

Result FieldMeaning
IterationWhich attack attempt this was
Attack PromptWhat the controller sent to the target
Target ResponseWhat the target model replied
EvaluationWhether the attack succeeded or the model defended properly
TechniqueWhich ATLAS technique this attempt relates to

Scan Intensity Levels

IntensityMeaning
LowGentle probing — basic prompt injections and jailbreak attempts
MediumModerate testing — multiple attack strategies
HighAggressive testing — extensive adversarial inputs, repeated attempts

Start with Low for initial scans on unknown targets. Use High only for comprehensive assessments where you have explicit authorization.


Using a Local Model for Testing

If you want to test an AI model running locally (not a cloud API), RTF integrates with Ollama. Set the model type to Ollama and provide the Ollama endpoint URL.

This is useful for:

  • Testing open-source models (Llama, Mistral, etc.)
  • Running scans without sending data to external APIs
  • Development and research environments

Saving Results as Findings

After reviewing scan results, you can create Findings from significant discoveries:

  1. Identify a result where the model's defense failed
  2. Click Create Finding on that result
  3. The finding is pre-filled with the ATLAS technique and attack details
  4. Add your description and evidence
  5. Save — it appears in your Findings list and syncs to the ATLAS Navigator

Tips

  • Start small — run 5–10 iterations first to understand how the target model behaves
  • Vary intensity — low intensity finds obvious issues, high intensity finds edge cases
  • Document everything — even failed attacks are useful evidence of the model's defenses
  • Use the controller model wisely — the default controller is optimized for ATLAS attacks; you can override it with a different model if needed

Next Steps