Testing Methodology

How we validate that Vigile catches real vulnerabilities without false alarms.

Security tools have two failure modes: missing real vulnerabilities (false negatives) and flagging safe configurations (false positives). Both erode trust. We test for both using a five-layer testing pyramid adapted from established security testing standards.

Security Tool Testing Pyramid

L5

Adversarial Testing

Planned

Can an attacker evade detection? We test obfuscation, encoding tricks, and novel attack patterns that attempt to bypass the scanner.

  • -Multi-layer obfuscation and encoding bypass attempts
  • -Novel exfiltration channels not covered by existing patterns
  • -Context-dependent payloads that change behavior based on parser
  • -Time-delayed activation designed to evade initial scans
L4

Real-World Validation

Planned

Does the scanner work on unknown targets? We test against real-world MCP servers and deployed applications to measure accuracy at scale.

  • -Scan public MCP servers from community registries
  • -Validate findings against manually audited ground truth
  • -Measure detection rate vs. false positive rate at scale
  • -Cross-reference with other security tools for coverage gaps
L3

Regression Corpus

Active

Does fixing one thing break another? Every confirmed vulnerability becomes a permanent test case. The corpus only grows.

  • -Confirmed findings become permanent regression test cases
  • -Pattern changes must pass the full regression suite before merge
  • -Historical findings re-tested against every scanner release
  • -Corpus includes both true positives and confirmed false positives
L2

Known-Vulnerable Fixtures

Active

Does the scanner catch what it claims to catch? We maintain deliberately misconfigured targets that trigger every detection rule, alongside secure baselines that must produce zero findings.

  • -Purpose-built vulnerable targets covering all BaaS detection categories
  • -Secure baseline projects validated for zero false positives
  • -Both Supabase and Firebase configurations tested end-to-end
  • -Tests run against deployed cloud infrastructure, not mocks
L1

Unit Tests

Complete

Does the detection logic work in isolation? Every pattern, parser, and scanner module has dedicated unit tests covering both positive and negative cases.

  • -Comprehensive coverage across pattern matching, probe logic, and bundle analysis
  • -Per-module test isolation with mocked API responses
  • -All tests run in CI on every commit — zero flaky tests policy

BaaS Scanner Coverage

Vigile tests Supabase and Firebase projects for security misconfigurations that expose data or allow unauthorized access.

Supabase — 7 security checks

CategorySeverity
Data Access Control
Tables with RLS disabled, exposing data to anonymous reads
critical
Credential Exposure
Privileged keys leaked in client-side bundles
critical
Write Access
Unauthorized mutations allowed through the public API
critical
Authentication Config
Weak signup and confirmation settings
medium
Transport Security
Overly permissive CORS policies
medium

Firebase — 5 security checks

CategorySeverity
Database Rules
Firestore and Realtime Database with public read/write access
critical
Config Exposure
Firebase project config leaked in client-side JavaScript
high
Storage Security
Cloud Storage buckets with public listing or access
high
Hosting Headers
Missing security headers on Firebase Hosting
medium

Standards Alignment

Our methodology draws from established security testing practices. Where no standard exists for AI agent security tooling, we derive from the closest analogue.

Established StandardVigile Equivalent
OWASP WebGoat / Juice ShopPurpose-built vulnerable fixtures for BaaS platforms
Nuclei Template TestingPer-pattern unit tests with fixture matching
Snyk Vulnerability DB MethodologyRegression corpus with ground truth validation
NIST SP 800-53 (CA-7)Continuous monitoring via automated test suite

Transparency

The Vigile CLI scanner is open source under Apache 2.0. Security tools earn trust through transparency — you can verify how the scanner operates on your infrastructure. Our detection engine is continuously refined using proprietary threat intelligence and real-world validation data.