The Agent Watchtower, Part 5: Reference Architecture

A complete, implementable design for enterprise agent governance. Concrete specifications, integration patterns, and implementation roadmap.

Contents

We’ve covered the why (Part 1), the technical pillars (Part 2), the governance model (Part 3), and the economics (Part 4). Now let’s make it concrete.

This post provides a reference architecture for enterprise agent governance. Not concepts - specifications. The goal: something you can actually build.

Architecture Overview

The Agent Watchtower consists of five core layers:

  1. Platform Adapters - Connect to AWS, Azure, GCP, OSS frameworks
  2. Core Services - Registry, policy engine, trust scoring
  3. Observability Pipeline - Telemetry collection, processing, storage
  4. Control Plane - Runtime enforcement, intervention capabilities
  5. Interface Layer - APIs, dashboards, integrations

Each layer is independent and can be implemented incrementally.

Layer 1: Platform Adapters

Adapters translate between platform-specific APIs and the unified control plane.

Adapter Responsibilities

Each adapter must implement:

  • Discovery: Find agents on the platform
  • Registration: Sync agent metadata to registry
  • Telemetry: Collect and forward observability data
  • Policy: Translate and apply policies
  • Control: Execute runtime interventions

AWS Bedrock Adapter

Discovery:

  • List Bedrock agents via AWS SDK
  • Poll for changes (or use EventBridge)
  • Extract agent configuration, guardrails

Telemetry:

  • Enable Bedrock tracing
  • Forward to observability pipeline
  • Parse Bedrock-specific trace format

Policy:

  • Map Watchtower policies to Bedrock guardrails
  • Configure content filters, denied topics
  • Set up CloudWatch alarms

Control:

  • Invoke UpdateAgent for config changes
  • Use CloudWatch for alerts
  • Lambda for kill switch execution

Azure AI Adapter

Discovery:

  • List AI deployments via Azure SDK
  • Monitor via Azure Resource Graph
  • Extract deployment configuration

Telemetry:

  • Enable Azure AI tracing
  • Forward via Event Hubs
  • Parse Azure-specific format

Policy:

  • Map to Azure AI Content Safety
  • Configure responsible AI settings
  • Integrate with Azure Policy

Control:

  • Azure SDK for deployment updates
  • Azure Monitor for alerts
  • Azure Functions for interventions

Open Source Adapter (LangChain/LangGraph)

Discovery:

  • Service mesh integration (Kubernetes)
  • Process registration on startup
  • Configuration from environment

Telemetry:

  • LangChain callbacks/LangSmith
  • OpenTelemetry instrumentation
  • Custom middleware for traces

Policy:

  • SDK-level policy enforcement
  • Proxy for pre/post processing
  • Custom guardrail implementations

Control:

  • Kubernetes for deployments
  • Feature flags for behavior
  • Service mesh for traffic control

Layer 2: Core Services

Agent Registry Service

The single source of truth for all agents.

Data Model:

Agent {
  id: UUID (unique across platforms)
  external_id: String (platform-specific ID)
  platform: Enum (AWS, Azure, GCP, OSS, Internal)

  name: String
  version: String
  description: String

  owner_team: String
  owner_bu: String
  contacts: [Contact]

  risk_tier: Enum (Critical, High, Medium, Low)
  data_classification: Enum
  regulatory_scope: [String]

  capabilities: [Capability]
  tools: [Tool]
  data_sources: [DataSource]

  status: Enum (Active, Suspended, Deprecated)
  autonomy_level: Enum (L1-L5)

  created_at: Timestamp
  updated_at: Timestamp
  last_active: Timestamp
}

API Operations:

  • POST /agents - Register new agent
  • GET /agents/{id} - Get agent details
  • PUT /agents/{id} - Update agent
  • DELETE /agents/{id} - Deregister agent
  • GET /agents?filters - Search/list agents
  • POST /agents/{id}/suspend - Suspend agent
  • POST /agents/{id}/activate - Activate agent

Policy Engine Service

Evaluates policies and returns decisions.

Policy Structure:

Policy {
  id: UUID
  name: String
  scope: Enum (Enterprise, Domain, Agent)
  scope_target: String (domain name or agent ID)

  rules: [Rule]

  priority: Integer
  enabled: Boolean

  created_by: String
  created_at: Timestamp
  updated_at: Timestamp
}

Rule {
  condition: Expression
  action: Enum (Allow, Deny, Escalate, Modify)
  parameters: Map
}

Evaluation Flow:

  1. Collect applicable policies (enterprise + domain + agent)
  2. Order by priority
  3. Evaluate conditions against context
  4. Return first matching action (deny-by-default)

API Operations:

  • POST /policies - Create policy
  • GET /policies/{id} - Get policy
  • PUT /policies/{id} - Update policy
  • DELETE /policies/{id} - Delete policy
  • POST /evaluate - Evaluate request against policies

Trust Scoring Service

Calculates and maintains trust scores for agents and teams.

Trust Score Components:

  • Behavioral score: Based on observed behavior (hallucination rate, policy compliance, escalation patterns)
  • Performance score: Reliability, latency, error rates
  • Compliance score: Audit findings, violation history
  • Maturity score: Team certifications, operational capability

Scoring Algorithm:

trust_score = (
  w1 * behavioral_score +
  w2 * performance_score +
  w3 * compliance_score +
  w4 * maturity_score
) * decay_factor(time_since_last_incident)

Weights (default): w1=0.35, w2=0.25, w3=0.25, w4=0.15
Decay: 0.95^(weeks_since_incident)

API Operations:

  • GET /trust/{agent_id} - Get agent trust score
  • GET /trust/team/{team_id} - Get team trust score
  • POST /trust/{agent_id}/incident - Record incident (lowers score)
  • GET /trust/{agent_id}/history - Get score history

Layer 3: Observability Pipeline

Data Flow

  1. Collection: Adapters collect platform telemetry
  2. Ingestion: Kafka/Kinesis for high-throughput ingestion
  3. Processing: Stream processing for real-time analytics
  4. Storage: Time-series DB + object storage + search index
  5. Analysis: ML models for anomaly detection

Telemetry Schema

AgentEvent {
  event_id: UUID
  agent_id: UUID
  timestamp: Timestamp

  event_type: Enum (Request, Response, ToolCall, Error, Escalation)

  request: {
    input: String (masked)
    input_tokens: Integer
    metadata: Map
  }

  response: {
    output: String (masked)
    output_tokens: Integer
    latency_ms: Integer
    confidence: Float
  }

  tool_calls: [{
    tool_name: String
    parameters: Map (masked)
    result: String (masked)
    success: Boolean
  }]

  policy_evaluation: {
    policies_evaluated: [String]
    decision: Enum
    escalated: Boolean
  }

  cost: {
    inference_cost: Decimal
    tool_cost: Decimal
    total_cost: Decimal
  }
}

PII Masking

All telemetry passes through PII detection and masking before storage.

  • Named entity recognition for names, addresses
  • Pattern matching for SSN, credit cards, etc.
  • Configurable masking (hash, redact, tokenize)
  • Reversible tokenization for authorized access

Anomaly Detection

ML models running on the telemetry stream:

  • Behavioral drift: Agent responses changing over time
  • Sandbagging detection: Agent performing differently under observation
  • Topic clustering: Detecting out-of-scope conversations
  • Confidence calibration: Are confidence scores predictive?

Layer 4: Control Plane

Runtime Enforcement

Pre-invocation checks:

  • Policy evaluation (should this request proceed?)
  • Rate limiting (quota check)
  • Circuit breaker (is agent healthy?)

During execution:

  • Tool call interception (are tools allowed?)
  • Data access monitoring (what’s being accessed?)
  • Timeout enforcement

Post-execution checks:

  • Output validation (policy compliance)
  • PII scan (no leakage)
  • Confidence threshold (escalation needed?)

Intervention Capabilities

Kill Switch:

  • Immediately halt agent
  • Options: single agent, agent type, all agents in domain
  • Configurable: hard stop vs. graceful drain

Behavior Modification:

  • Update confidence thresholds
  • Enable/disable specific tools
  • Adjust escalation rules
  • Modify prompt templates

Traffic Control:

  • Route to different agent versions
  • Canary deployments
  • A/B testing
  • Gradual rollout

Emergency Response

Automated response (configurable):

Trigger Response
Trust score below threshold Increase human oversight
Anomaly score spike Alert on-call, reduce autonomy
Policy violation Suspend agent, notify owner
Confidence consistently low Escalate all requests

Manual response:

  • SOC dashboard for real-time status
  • One-click kill switch per agent/domain
  • Incident workflow integration

Layer 5: Interface Layer

REST API

All services expose REST APIs with:

  • OpenAPI specifications
  • JWT authentication
  • RBAC authorization
  • Rate limiting
  • Audit logging

Event Streaming

Kafka/EventBridge topics for:

  • Agent registration events
  • Policy changes
  • Trust score updates
  • Anomaly alerts
  • Incident notifications

Dashboard

Executive view:

  • Agent inventory summary
  • Risk distribution
  • Cost trends
  • Incident summary

Operations view:

  • Real-time agent status
  • Performance metrics
  • Anomaly alerts
  • Intervention controls

Governance view:

  • Policy compliance
  • Audit trail
  • Trust score trends
  • Autonomy level distribution

Integrations

SIEM: Forward security events ITSM: Incident creation IAM: Authorization sync CI/CD: Deployment gates Cost Management: Chargeback data

Implementation Roadmap

Phase 1: Foundation (Weeks 1-4)

Week 1-2:

  • Deploy registry service
  • Implement manual agent registration
  • Basic API and authentication

Week 3-4:

  • Deploy first adapter (start with most common platform)
  • Automated agent discovery
  • Basic telemetry collection

Deliverable: Registry of all agents with manual classification

Phase 2: Observability (Weeks 5-8)

Week 5-6:

  • Deploy observability pipeline
  • Telemetry storage and search
  • Basic dashboards

Week 7-8:

  • PII masking
  • Cost attribution
  • Performance metrics

Deliverable: Visibility into agent behavior and costs

Phase 3: Policy (Weeks 9-12)

Week 9-10:

  • Deploy policy engine
  • Define enterprise policies
  • Basic policy evaluation

Week 11-12:

  • Domain-specific policies
  • Pre-invocation enforcement
  • Policy violation alerts

Deliverable: Policy enforcement for high-risk scenarios

Phase 4: Control (Weeks 13-16)

Week 13-14:

  • Runtime control plane
  • Kill switch implementation
  • Manual interventions

Week 15-16:

  • Automated responses
  • Trust scoring service
  • Autonomy level enforcement

Deliverable: Full runtime control capability

Phase 5: Optimization (Weeks 17-20)

Week 17-18:

  • Anomaly detection models
  • Trust Cascade implementation
  • Cost optimization routing

Week 19-20:

  • Advanced analytics
  • Self-service onboarding
  • Full integration suite

Deliverable: Production-ready agent governance platform

Key Decisions

Decisions you’ll need to make during implementation:

Build vs. Buy:

  • Control plane core: Build (competitive differentiation)
  • Observability storage: Buy (commodity infrastructure)
  • Policy engine: Consider OPA (open source, mature)
  • Adapters: Build (platform-specific)

Deployment:

  • Where does control plane run? (Own cloud, multi-cloud, vendor-hosted)
  • Latency requirements? (Real-time vs. near-real-time)
  • Data residency requirements? (Regional deployment)

Organizational:

  • Who owns the platform? (AI CoE, Platform team, Security)
  • Who defines policies? (Federated model recommended)
  • Who operates it? (SRE, dedicated team)

The Bottom Line

This reference architecture provides a blueprint, not a prescription. Your implementation will differ based on:

  • Which platforms you use
  • Your existing infrastructure
  • Your risk appetite
  • Your team capabilities

The principles remain constant:

  1. Know what agents exist (Registry)
  2. See what they do (Observability)
  3. Define what they should do (Policy)
  4. Enforce it (Control)

Start small. Build incrementally. Optimize continuously.

The Watchtower isn’t a destination. It’s a capability that grows with your agent deployment. Build the foundation now, and you’ll be ready for whatever comes next.


This concludes The Agent Watchtower series. For implementation support, contact the Rotascale team.

Share this article

Stay ahead of AI governance

Get insights on enterprise AI trust, agentic systems, and production architecture delivered to your inbox.

Subscribe

Related Articles