We’ve covered the why (Part 1), the technical pillars (Part 2), the governance model (Part 3), and the economics (Part 4). Now let’s make it concrete.
This post provides a reference architecture for enterprise agent governance. Not concepts - specifications. The goal: something you can actually build.
Architecture Overview
The Agent Watchtower consists of five core layers:
- Platform Adapters - Connect to AWS, Azure, GCP, OSS frameworks
- Core Services - Registry, policy engine, trust scoring
- Observability Pipeline - Telemetry collection, processing, storage
- Control Plane - Runtime enforcement, intervention capabilities
- Interface Layer - APIs, dashboards, integrations
Each layer is independent and can be implemented incrementally.
Layer 1: Platform Adapters
Adapters translate between platform-specific APIs and the unified control plane.
Adapter Responsibilities
Each adapter must implement:
- Discovery: Find agents on the platform
- Registration: Sync agent metadata to registry
- Telemetry: Collect and forward observability data
- Policy: Translate and apply policies
- Control: Execute runtime interventions
AWS Bedrock Adapter
Discovery:
- List Bedrock agents via AWS SDK
- Poll for changes (or use EventBridge)
- Extract agent configuration, guardrails
Telemetry:
- Enable Bedrock tracing
- Forward to observability pipeline
- Parse Bedrock-specific trace format
Policy:
- Map Watchtower policies to Bedrock guardrails
- Configure content filters, denied topics
- Set up CloudWatch alarms
Control:
- Invoke UpdateAgent for config changes
- Use CloudWatch for alerts
- Lambda for kill switch execution
Azure AI Adapter
Discovery:
- List AI deployments via Azure SDK
- Monitor via Azure Resource Graph
- Extract deployment configuration
Telemetry:
- Enable Azure AI tracing
- Forward via Event Hubs
- Parse Azure-specific format
Policy:
- Map to Azure AI Content Safety
- Configure responsible AI settings
- Integrate with Azure Policy
Control:
- Azure SDK for deployment updates
- Azure Monitor for alerts
- Azure Functions for interventions
Open Source Adapter (LangChain/LangGraph)
Discovery:
- Service mesh integration (Kubernetes)
- Process registration on startup
- Configuration from environment
Telemetry:
- LangChain callbacks/LangSmith
- OpenTelemetry instrumentation
- Custom middleware for traces
Policy:
- SDK-level policy enforcement
- Proxy for pre/post processing
- Custom guardrail implementations
Control:
- Kubernetes for deployments
- Feature flags for behavior
- Service mesh for traffic control
Layer 2: Core Services
Agent Registry Service
The single source of truth for all agents.
Data Model:
Agent {
id: UUID (unique across platforms)
external_id: String (platform-specific ID)
platform: Enum (AWS, Azure, GCP, OSS, Internal)
name: String
version: String
description: String
owner_team: String
owner_bu: String
contacts: [Contact]
risk_tier: Enum (Critical, High, Medium, Low)
data_classification: Enum
regulatory_scope: [String]
capabilities: [Capability]
tools: [Tool]
data_sources: [DataSource]
status: Enum (Active, Suspended, Deprecated)
autonomy_level: Enum (L1-L5)
created_at: Timestamp
updated_at: Timestamp
last_active: Timestamp
}
API Operations:
POST /agents- Register new agentGET /agents/{id}- Get agent detailsPUT /agents/{id}- Update agentDELETE /agents/{id}- Deregister agentGET /agents?filters- Search/list agentsPOST /agents/{id}/suspend- Suspend agentPOST /agents/{id}/activate- Activate agent
Policy Engine Service
Evaluates policies and returns decisions.
Policy Structure:
Policy {
id: UUID
name: String
scope: Enum (Enterprise, Domain, Agent)
scope_target: String (domain name or agent ID)
rules: [Rule]
priority: Integer
enabled: Boolean
created_by: String
created_at: Timestamp
updated_at: Timestamp
}
Rule {
condition: Expression
action: Enum (Allow, Deny, Escalate, Modify)
parameters: Map
}
Evaluation Flow:
- Collect applicable policies (enterprise + domain + agent)
- Order by priority
- Evaluate conditions against context
- Return first matching action (deny-by-default)
API Operations:
POST /policies- Create policyGET /policies/{id}- Get policyPUT /policies/{id}- Update policyDELETE /policies/{id}- Delete policyPOST /evaluate- Evaluate request against policies
Trust Scoring Service
Calculates and maintains trust scores for agents and teams.
Trust Score Components:
- Behavioral score: Based on observed behavior (hallucination rate, policy compliance, escalation patterns)
- Performance score: Reliability, latency, error rates
- Compliance score: Audit findings, violation history
- Maturity score: Team certifications, operational capability
Scoring Algorithm:
trust_score = (
w1 * behavioral_score +
w2 * performance_score +
w3 * compliance_score +
w4 * maturity_score
) * decay_factor(time_since_last_incident)
Weights (default): w1=0.35, w2=0.25, w3=0.25, w4=0.15
Decay: 0.95^(weeks_since_incident)
API Operations:
GET /trust/{agent_id}- Get agent trust scoreGET /trust/team/{team_id}- Get team trust scorePOST /trust/{agent_id}/incident- Record incident (lowers score)GET /trust/{agent_id}/history- Get score history
Layer 3: Observability Pipeline
Data Flow
- Collection: Adapters collect platform telemetry
- Ingestion: Kafka/Kinesis for high-throughput ingestion
- Processing: Stream processing for real-time analytics
- Storage: Time-series DB + object storage + search index
- Analysis: ML models for anomaly detection
Telemetry Schema
AgentEvent {
event_id: UUID
agent_id: UUID
timestamp: Timestamp
event_type: Enum (Request, Response, ToolCall, Error, Escalation)
request: {
input: String (masked)
input_tokens: Integer
metadata: Map
}
response: {
output: String (masked)
output_tokens: Integer
latency_ms: Integer
confidence: Float
}
tool_calls: [{
tool_name: String
parameters: Map (masked)
result: String (masked)
success: Boolean
}]
policy_evaluation: {
policies_evaluated: [String]
decision: Enum
escalated: Boolean
}
cost: {
inference_cost: Decimal
tool_cost: Decimal
total_cost: Decimal
}
}
PII Masking
All telemetry passes through PII detection and masking before storage.
- Named entity recognition for names, addresses
- Pattern matching for SSN, credit cards, etc.
- Configurable masking (hash, redact, tokenize)
- Reversible tokenization for authorized access
Anomaly Detection
ML models running on the telemetry stream:
- Behavioral drift: Agent responses changing over time
- Sandbagging detection: Agent performing differently under observation
- Topic clustering: Detecting out-of-scope conversations
- Confidence calibration: Are confidence scores predictive?
Layer 4: Control Plane
Runtime Enforcement
Pre-invocation checks:
- Policy evaluation (should this request proceed?)
- Rate limiting (quota check)
- Circuit breaker (is agent healthy?)
During execution:
- Tool call interception (are tools allowed?)
- Data access monitoring (what’s being accessed?)
- Timeout enforcement
Post-execution checks:
- Output validation (policy compliance)
- PII scan (no leakage)
- Confidence threshold (escalation needed?)
Intervention Capabilities
Kill Switch:
- Immediately halt agent
- Options: single agent, agent type, all agents in domain
- Configurable: hard stop vs. graceful drain
Behavior Modification:
- Update confidence thresholds
- Enable/disable specific tools
- Adjust escalation rules
- Modify prompt templates
Traffic Control:
- Route to different agent versions
- Canary deployments
- A/B testing
- Gradual rollout
Emergency Response
Automated response (configurable):
| Trigger | Response |
|---|---|
| Trust score below threshold | Increase human oversight |
| Anomaly score spike | Alert on-call, reduce autonomy |
| Policy violation | Suspend agent, notify owner |
| Confidence consistently low | Escalate all requests |
Manual response:
- SOC dashboard for real-time status
- One-click kill switch per agent/domain
- Incident workflow integration
Layer 5: Interface Layer
REST API
All services expose REST APIs with:
- OpenAPI specifications
- JWT authentication
- RBAC authorization
- Rate limiting
- Audit logging
Event Streaming
Kafka/EventBridge topics for:
- Agent registration events
- Policy changes
- Trust score updates
- Anomaly alerts
- Incident notifications
Dashboard
Executive view:
- Agent inventory summary
- Risk distribution
- Cost trends
- Incident summary
Operations view:
- Real-time agent status
- Performance metrics
- Anomaly alerts
- Intervention controls
Governance view:
- Policy compliance
- Audit trail
- Trust score trends
- Autonomy level distribution
Integrations
SIEM: Forward security events ITSM: Incident creation IAM: Authorization sync CI/CD: Deployment gates Cost Management: Chargeback data
Implementation Roadmap
Phase 1: Foundation (Weeks 1-4)
Week 1-2:
- Deploy registry service
- Implement manual agent registration
- Basic API and authentication
Week 3-4:
- Deploy first adapter (start with most common platform)
- Automated agent discovery
- Basic telemetry collection
Deliverable: Registry of all agents with manual classification
Phase 2: Observability (Weeks 5-8)
Week 5-6:
- Deploy observability pipeline
- Telemetry storage and search
- Basic dashboards
Week 7-8:
- PII masking
- Cost attribution
- Performance metrics
Deliverable: Visibility into agent behavior and costs
Phase 3: Policy (Weeks 9-12)
Week 9-10:
- Deploy policy engine
- Define enterprise policies
- Basic policy evaluation
Week 11-12:
- Domain-specific policies
- Pre-invocation enforcement
- Policy violation alerts
Deliverable: Policy enforcement for high-risk scenarios
Phase 4: Control (Weeks 13-16)
Week 13-14:
- Runtime control plane
- Kill switch implementation
- Manual interventions
Week 15-16:
- Automated responses
- Trust scoring service
- Autonomy level enforcement
Deliverable: Full runtime control capability
Phase 5: Optimization (Weeks 17-20)
Week 17-18:
- Anomaly detection models
- Trust Cascade implementation
- Cost optimization routing
Week 19-20:
- Advanced analytics
- Self-service onboarding
- Full integration suite
Deliverable: Production-ready agent governance platform
Key Decisions
Decisions you’ll need to make during implementation:
Build vs. Buy:
- Control plane core: Build (competitive differentiation)
- Observability storage: Buy (commodity infrastructure)
- Policy engine: Consider OPA (open source, mature)
- Adapters: Build (platform-specific)
Deployment:
- Where does control plane run? (Own cloud, multi-cloud, vendor-hosted)
- Latency requirements? (Real-time vs. near-real-time)
- Data residency requirements? (Regional deployment)
Organizational:
- Who owns the platform? (AI CoE, Platform team, Security)
- Who defines policies? (Federated model recommended)
- Who operates it? (SRE, dedicated team)
The Bottom Line
This reference architecture provides a blueprint, not a prescription. Your implementation will differ based on:
- Which platforms you use
- Your existing infrastructure
- Your risk appetite
- Your team capabilities
The principles remain constant:
- Know what agents exist (Registry)
- See what they do (Observability)
- Define what they should do (Policy)
- Enforce it (Control)
Start small. Build incrementally. Optimize continuously.
The Watchtower isn’t a destination. It’s a capability that grows with your agent deployment. Build the foundation now, and you’ll be ready for whatever comes next.
This concludes The Agent Watchtower series. For implementation support, contact the Rotascale team.