Four failure modes. Continuous detection.
Traditional monitoring tracks latency and errors. Guardian detects the silent failures that destroy trust in AI systems.
Sandbagging Detection
Models sometimes deliberately hide capabilities or underperform on evaluation tasks while performing normally otherwise. Guardian uses metacognitive probing techniques to detect this deceptive behavior with 96% accuracy. Based on peer-reviewed research from Rotalabs.
Hallucination Monitoring
Track confidence calibration across all outputs. Detect when models express high confidence in incorrect information. Alert on sudden increases in hallucination rates. Distinguish between uncertainty and confident confabulation.
Drift Detection
Automatically establish behavioral baselines during initial deployment. Detect when model behavior deviates—from provider updates, distribution shift, or prompt injection. Track performance degradation before it impacts users.
Compliance Reporting
Generate audit-ready reports for regulators. Document model behavior, decision rationale, and reliability metrics over time. Meet EU AI Act, OCC SR 11-7, and MAS FEAT requirements for AI transparency.
Research Foundation: Rotalabs
Guardian's detection methods are built on peer-reviewed research from Rotalabs, our open source AI safety research division. The sandbagging detection techniques have been validated against frontier models and published for community verification.
Open source: rotalabs-probe toolkit available at rotalabs.ai. Verify our methods. Contribute improvements.