Monitoring & Observability

Comprehensive guide to monitoring the NorthBuilt RAG System.

Overview

The system uses AWS CloudWatch for centralized monitoring, logging, and alerting across all components.

┌─────────────────────────────────────────────────────────────────┐
│                     CloudWatch Dashboard                         │
│  - API Gateway Metrics                                          │
│  - Lambda Performance                                           │
│  - Bedrock Usage                                                │
│  - Cost Tracking                                                │
└─────────────────────────────────────────────────────────────────┘
         │                │                │               │
    ┌────▼────┐      ┌────▼────┐     ┌────▼────┐    ┌────▼────┐
    │  Logs   │      │ Metrics │     │ Alarms  │    │ Insights│
    │ (7-day) │      │(Custom) │     │  (SNS)  │    │ (Query) │
    └─────────┘      └─────────┘     └─────────┘    └─────────┘

Key Metrics

API Gateway Metrics

Metric Description Good Warning Critical
4xxError Client errors <5% 5-10% >10%
5xxError Server errors <0.1% 0.1-1% >1%
Latency p95 response time <2s 2-5s >5s
Count Total requests N/A N/A >1000/min

CloudWatch Query:

# Get error rate for last hour
aws cloudwatch get-metric-statistics \
  --namespace AWS/ApiGateway \
  --metric-name 4XXError \
  --dimensions Name=ApiName,Value=nb-rag-sys-api \
  --start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%S) \
  --end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
  --period 300 \
  --statistics Sum

Lambda Metrics

Metric Function Good Warning Critical
Duration Chat <3s 3-5s >5s
Duration Classification <2s 2-5s >5s
Duration Webhooks <5s 5-10s >10s
Errors All <0.1% 0.1-1% >1%
Throttles All 0 1-10 >10
ConcurrentExecutions Chat <5 5-8 >8

CloudWatch Query:

# Get average duration for Chat Lambda
aws cloudwatch get-metric-statistics \
  --namespace AWS/Lambda \
  --metric-name Duration \
  --dimensions Name=FunctionName,Value=nb-rag-sys-chat \
  --start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%S) \
  --end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
  --period 300 \
  --statistics Average

Bedrock Metrics

Metric Description Good Warning Critical
Invocations Model calls N/A N/A >1000/hour
ModelInvocationLatency Response time <2s 2-5s >5s
ModelInvocationClientErrors 4xx errors 0 1-10 >10
ModelInvocationServerErrors 5xx errors 0 1-5 >5

CloudWatch Query:

# Get Bedrock invocations
aws cloudwatch get-metric-statistics \
  --namespace AWS/Bedrock \
  --metric-name Invocations \
  --dimensions Name=ModelId,Value=us.anthropic.claude-sonnet-4-5-20250929-v1:0 \
  --start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%S) \
  --end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
  --period 3600 \
  --statistics Sum

Custom Application Metrics

The RAG system emits custom CloudWatch metrics via the RAGMetrics class in lambda/shared/utils/metrics.py.

Namespace: RAG/Retrieval

Metrics Emitted:

Metric Unit Description
RetrievalLatencyMs Milliseconds Time for Bedrock KB vector search
CandidatesRetrieved Count Raw results from vector search
ResultsAfterFilter Count Results after post-filtering
FilterEffectiveness None (0.0-1.0) Ratio of filtered results
Errors Count Error counts by type
LLMGenerationLatencyMs Milliseconds Time for Bedrock LLM response
LLMInputTokens Count Input tokens used
LLMOutputTokens Count Output tokens generated

Dimensions:

  • HasFilter - Whether client filter was applied (true/false)
  • RerankingEnabled - Whether reranking was enabled (true/false)
  • ErrorType - Type of error (e.g., RetrievalError, LLMError)

Usage in Lambda:

from shared.utils.metrics import RAGMetrics

metrics = RAGMetrics(cloudwatch_client)

# Record retrieval metrics
metrics.record_retrieval(
    latency_ms=150.5,
    candidates_retrieved=15,
    results_after_filter=5,
    has_client_filter=True,
    reranking_enabled=False
)

# Record LLM generation metrics
metrics.record_llm_generation(
    latency_ms=2500,
    input_tokens=500,
    output_tokens=200
)

# Record errors
metrics.record_error('RetrievalError')

Query Custom Metrics:

# Get retrieval latency (last hour)
aws cloudwatch get-metric-statistics \
  --namespace RAG/Retrieval \
  --metric-name RetrievalLatencyMs \
  --start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%S) \
  --end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
  --period 300 \
  --statistics Average,Maximum

# Get filter effectiveness
aws cloudwatch get-metric-statistics \
  --namespace RAG/Retrieval \
  --metric-name FilterEffectiveness \
  --dimensions Name=HasFilter,Value=true \
  --start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%S) \
  --end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
  --period 300 \
  --statistics Average

Ingestion Metrics (RAG/Ingestion Namespace)

The ingestion pipeline emits custom CloudWatch metrics via multiple classes in lambda/shared/utils/metrics.py.

Namespace: RAG/Ingestion

IngestionMetrics

Tracks webhook events, document ingestion, and sync job completion.

Metric Unit Description
WebhooksReceived Count Webhook events received by Source and Success
WebhookProcessingLatencyMs Milliseconds Time to process a webhook event
DocumentsIngested Count Documents saved to S3 by Source, Category, SourceType
SyncJobsCompleted Count Sync operations completed by Source and Completed
SyncDurationSeconds Seconds Total sync job duration
ItemsSynced Count New items added from sync jobs
ItemsSkipped Count Already existing items skipped
ItemsFailed Count Items that failed to sync
SyncAPICallsTotal Count External API calls made during sync
SyncS3SavesTotal Count S3 save operations during sync
SyncProcessingRate Count/Second Items processed per second
IngestionErrors Count Errors by Source, ErrorType, SourceType

Dimensions:

  • Source - Source system (fathom, helpscout)
  • Success - Whether operation succeeded (true/false)
  • Category - Document category (meeting-transcript, customer-conversation, issue)
  • SourceType - Ingestion trigger (webhook or polling)
  • ErrorType - Type of error (ValidationError, S3Error, APIError, etc.)
  • Completed - Whether sync ran to completion (true/false)

Usage in Lambda:

from shared.utils.metrics import IngestionMetrics

metrics = IngestionMetrics(cloudwatch_client)

# Record webhook received (success)
metrics.record_webhook_received(
    source='fathom',
    processed_successfully=True,
    latency_ms=250.5
)

# Record webhook received (failure with error type)
metrics.record_webhook_received(
    source='fathom',
    processed_successfully=False,
    latency_ms=100.0,
    error_type='ValidationError'
)

# Record document ingested
metrics.record_document_ingested(
    source='fathom',
    category='meeting-transcript',
    source_type='webhook'
)

# Record sync completion with worker stats
metrics.record_sync_completed(
    source='fathom',
    completed=True,
    items_synced=25,
    items_skipped=100,
    items_failed=2,
    duration_seconds=45.5,
    api_calls=150,
    s3_saves=25
)

ClassificationMetrics

Tracks client/project classification operations.

Metric Unit Description
ClassificationsTotal Count Classification attempts by Source and Success
ClassificationLatencyMs Milliseconds DynamoDB lookup time
ClassificationMatched Count Whether a match was found (1 or 0)
ClassificationErrors Count Errors by Source and ErrorType

Dimensions:

  • Source - Source system (fathom, helpscout)
  • Success - Whether classification completed (true/false)
  • ErrorType - Type of error (ConfigurationError, ValidationError, StrategyError)

Usage in Lambda:

from shared.utils.metrics import ClassificationMetrics

metrics = ClassificationMetrics(cloudwatch_client)

# Record successful classification
metrics.record_classification(
    source='fathom',
    completed=True,
    latency_ms=50.0,
    match_found=True
)

# Record failed classification with error type
metrics.record_classification(
    source='fathom',
    completed=False,
    latency_ms=25.0,
    error_type='StrategyError'
)

OrchestratorMetrics

Tracks sync handler invocations and worker Lambda invocation status.

Metric Unit Description
SyncHandlerInvocations Count Handler invocations by Source and Success
SyncHandlerErrors Count Errors by Source and ErrorType

Dimensions:

  • Source - Source system (fathom, helpscout)
  • Success - Whether worker was invoked successfully (true/false)
  • ErrorType - Type of error (ConfigurationError, LambdaInvokeError, HandlerError)

Usage in Lambda:

from shared.utils.metrics import OrchestratorMetrics

metrics = OrchestratorMetrics(cloudwatch_client)

# Record successful handler invocation
metrics.record_handler_invocation(
    source='fathom',
    invoked_successfully=True
)

# Record failed handler invocation
metrics.record_handler_invocation(
    source='fathom',
    invoked_successfully=False,
    error_type='LambdaInvokeError'
)

KBIngestionMetrics

Tracks Bedrock Knowledge Base ingestion jobs (re-indexing operations).

Metric Unit Description
IngestionJobStarted Count KB ingestion job started successfully
IngestionJobAlreadyRunning Count Job skipped because one is already running
IngestionJobErrors Count Errors by ErrorType

Dimensions:

  • ErrorType - Type of error (BedrockAPIError, ConfigurationError, HandlerError)

Usage in Lambda:

from shared.utils.metrics import KBIngestionMetrics

metrics = KBIngestionMetrics(cloudwatch_client)

# Record successful job start
metrics.record_job_started()

# Record job already running (expected, not an error)
metrics.record_job_already_running()

# Record error
metrics.record_error('BedrockAPIError')

Query Ingestion Metrics:

# Get webhook success rate by source
aws cloudwatch get-metric-statistics \
  --namespace RAG/Ingestion \
  --metric-name WebhooksReceived \
  --dimensions Name=Source,Value=fathom Name=Success,Value=true \
  --start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%S) \
  --end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
  --period 300 \
  --statistics Sum

# Get documents ingested by source
aws cloudwatch get-metric-statistics \
  --namespace RAG/Ingestion \
  --metric-name DocumentsIngested \
  --dimensions Name=Source,Value=fathom Name=SourceType,Value=webhook \
  --start-time $(date -u -d '24 hours ago' +%Y-%m-%dT%H:%M:%S) \
  --end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
  --period 3600 \
  --statistics Sum

# Get classification match rate
aws cloudwatch get-metric-statistics \
  --namespace RAG/Ingestion \
  --metric-name ClassificationMatched \
  --dimensions Name=Source,Value=fathom \
  --start-time $(date -u -d '24 hours ago' +%Y-%m-%dT%H:%M:%S) \
  --end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
  --period 3600 \
  --statistics Sum,Average

# Get sync worker performance
aws cloudwatch get-metric-statistics \
  --namespace RAG/Ingestion \
  --metric-name SyncProcessingRate \
  --dimensions Name=Source,Value=fathom \
  --start-time $(date -u -d '7 days ago' +%Y-%m-%dT%H:%M:%S) \
  --end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
  --period 86400 \
  --statistics Average,Maximum

CloudWatch Dashboards

Main Dashboard

Create Dashboard:

aws cloudwatch put-dashboard --dashboard-name nb-rag-sys-main --dashboard-body file://dashboard.json

Dashboard JSON (dashboard.json):

{
  "widgets": [
    {
      "type": "metric",
      "properties": {
        "title": "API Gateway Requests",
        "region": "us-east-1",
        "metrics": [
          ["AWS/ApiGateway", "Count", {"stat": "Sum", "label": "Total Requests"}],
          [".", "4XXError", {"stat": "Sum", "label": "4xx Errors"}],
          [".", "5XXError", {"stat": "Sum", "label": "5xx Errors"}]
        ],
        "period": 300,
        "yAxis": {"left": {"min": 0}}
      }
    },
    {
      "type": "metric",
      "properties": {
        "title": "Lambda Duration (Chat)",
        "region": "us-east-1",
        "metrics": [
          ["AWS/Lambda", "Duration", {"stat": "Average", "label": "Average"}],
          ["...", {"stat": "p95", "label": "p95"}],
          ["...", {"stat": "Maximum", "label": "Maximum"}]
        ],
        "period": 300,
        "yAxis": {"left": {"min": 0, "max": 10000}}
      }
    },
    {
      "type": "metric",
      "properties": {
        "title": "Lambda Errors",
        "region": "us-east-1",
        "metrics": [
          ["AWS/Lambda", "Errors", {"stat": "Sum", "label": "Chat"}, {"dimensions": {"FunctionName": "nb-rag-sys-chat"}}],
          ["...", {"dimensions": {"FunctionName": "nb-rag-sys"}}],
          ["...", {"dimensions": {"FunctionName": "nb-rag-sys-fathom-webhook"}}]
        ],
        "period": 300,
        "yAxis": {"left": {"min": 0}}
      }
    },
    {
      "type": "metric",
      "properties": {
        "title": "Bedrock Invocations",
        "region": "us-east-1",
        "metrics": [
          ["AWS/Bedrock", "Invocations", {"stat": "Sum", "label": "Claude Sonnet 4.5"}],
          [".", "ModelInvocationLatency", {"stat": "Average", "label": "Latency (ms)"}]
        ],
        "period": 3600,
        "yAxis": {"left": {"min": 0}}
      }
    },
    {
      "type": "log",
      "properties": {
        "title": "Recent Errors",
        "region": "us-east-1",
        "query": "SOURCE '/aws/lambda/nb-rag-sys-chat'\n| fields @timestamp, @message\n| filter @message like /ERROR/\n| sort @timestamp desc\n| limit 20"
      }
    }
  ]
}

Cost Dashboard

Track Costs by Service:

{
  "widgets": [
    {
      "type": "metric",
      "properties": {
        "title": "Estimated Monthly Cost",
        "region": "us-east-1",
        "metrics": [
          ["AWS/Billing", "EstimatedCharges", {"stat": "Maximum"}]
        ],
        "period": 21600,
        "yAxis": {"left": {"min": 0}}
      }
    },
    {
      "type": "metric",
      "properties": {
        "title": "Cost by Service",
        "region": "us-east-1",
        "metrics": [
          ["AWS/Billing", "EstimatedCharges", {"stat": "Maximum"}, {"dimensions": {"ServiceName": "AWS Lambda"}}],
          ["...", {"dimensions": {"ServiceName": "Amazon Bedrock"}}],
          ["...", {"dimensions": {"ServiceName": "Amazon API Gateway"}}],
          ["...", {"dimensions": {"ServiceName": "Amazon S3"}}],
          ["...", {"dimensions": {"ServiceName": "Amazon DynamoDB"}}]
        ],
        "period": 21600,
        "yAxis": {"left": {"min": 0}}
      }
    }
  ]
}

CloudWatch Alarms

Critical Alarms

High Error Rate

API Gateway 5xx Errors:

resource "aws_cloudwatch_metric_alarm" "api_5xx_errors" {
  alarm_name          = "nb-rag-sys-api-5xx-errors"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = "2"
  metric_name         = "5XXError"
  namespace           = "AWS/ApiGateway"
  period              = "300"
  statistic           = "Sum"
  threshold           = "5"
  alarm_description   = "Alert when API Gateway returns 5 or more 5xx errors in 5 minutes"
  alarm_actions       = [aws_sns_topic.alerts.arn]

  dimensions = {
    ApiName = "nb-rag-sys-api"
  }
}

Lambda Errors:

resource "aws_cloudwatch_metric_alarm" "lambda_errors" {
  alarm_name          = "nb-rag-sys-lambda-errors"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = "2"
  metric_name         = "Errors"
  namespace           = "AWS/Lambda"
  period              = "300"
  statistic           = "Sum"
  threshold           = "10"
  alarm_description   = "Alert when Lambda function has 10+ errors in 5 minutes"
  alarm_actions       = [aws_sns_topic.alerts.arn]

  dimensions = {
    FunctionName = "nb-rag-sys-chat"
  }
}

High Latency

API Gateway Latency:

resource "aws_cloudwatch_metric_alarm" "api_latency" {
  alarm_name          = "nb-rag-sys-api-latency"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = "2"
  metric_name         = "Latency"
  namespace           = "AWS/ApiGateway"
  period              = "300"
  statistic           = "Average"
  threshold           = "5000"  # 5 seconds
  alarm_description   = "Alert when API Gateway latency exceeds 5 seconds"
  alarm_actions       = [aws_sns_topic.alerts.arn]

  dimensions = {
    ApiName = "nb-rag-sys-api"
  }
}

Lambda Duration:

resource "aws_cloudwatch_metric_alarm" "lambda_duration" {
  alarm_name          = "nb-rag-sys-lambda-duration"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = "2"
  metric_name         = "Duration"
  namespace           = "AWS/Lambda"
  period              = "300"
  statistic           = "Average"
  threshold           = "10000"  # 10 seconds
  alarm_description   = "Alert when Lambda duration exceeds 10 seconds"
  alarm_actions       = [aws_sns_topic.alerts.arn]

  dimensions = {
    FunctionName = "nb-rag-sys-chat"
  }
}

Lambda Throttling

resource "aws_cloudwatch_metric_alarm" "lambda_throttles" {
  alarm_name          = "nb-rag-sys-lambda-throttles"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = "1"
  metric_name         = "Throttles"
  namespace           = "AWS/Lambda"
  period              = "300"
  statistic           = "Sum"
  threshold           = "10"
  alarm_description   = "Alert when Lambda throttles occur"
  alarm_actions       = [aws_sns_topic.alerts.arn]

  dimensions = {
    FunctionName = "nb-rag-sys-chat"
  }
}

Warning Alarms

High Cost

resource "aws_cloudwatch_metric_alarm" "high_cost" {
  alarm_name          = "nb-rag-sys-high-cost"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = "1"
  metric_name         = "EstimatedCharges"
  namespace           = "AWS/Billing"
  period              = "21600"  # 6 hours
  statistic           = "Maximum"
  threshold           = "200"  # $200/month
  alarm_description   = "Alert when monthly cost exceeds $200"
  alarm_actions       = [aws_sns_topic.alerts.arn]

  dimensions = {
    Currency = "USD"
  }
}

Low Traffic (Anomaly Detection)

resource "aws_cloudwatch_metric_alarm" "low_traffic" {
  alarm_name          = "nb-rag-sys-low-traffic"
  comparison_operator = "LessThanLowerThreshold"
  evaluation_periods  = "2"
  threshold_metric_id = "ad1"
  alarm_description   = "Alert when traffic drops below expected baseline"
  alarm_actions       = [aws_sns_topic.alerts.arn]

  metric_query {
    id          = "m1"
    return_data = true

    metric {
      metric_name = "Count"
      namespace   = "AWS/ApiGateway"
      period      = "300"
      stat        = "Sum"

      dimensions = {
        ApiName = "nb-rag-sys-api"
      }
    }
  }

  metric_query {
    id          = "ad1"
    expression  = "ANOMALY_DETECTION_BAND(m1, 2)"
    label       = "Traffic (expected)"
    return_data = true
  }
}

SNS Topic for Alerts

resource "aws_sns_topic" "alerts" {
  name = "nb-rag-sys-alerts"
}

resource "aws_sns_topic_subscription" "email" {
  topic_arn = aws_sns_topic.alerts.arn
  protocol  = "email"
  endpoint  = "alerts@yourcompany.com"
}

resource "aws_sns_topic_subscription" "slack" {
  topic_arn = aws_sns_topic.alerts.arn
  protocol  = "https"
  endpoint  = "https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK"
}

CloudWatch Logs

Log Groups

Lambda Function Log Group Retention
Chat /aws/lambda/nb-rag-sys-chat 7 days
Classification /aws/lambda/nb-rag-sys 7 days
Ingest /aws/lambda/nb-rag-sys-ingest 7 days
Fathom Webhook /aws/lambda/nb-rag-sys-fathom-webhook 7 days
HelpScout Webhook /aws/lambda/nb-rag-sys-helpscout-webhook 7 days

Log Queries

Find All Errors

fields @timestamp, @message
| filter @message like /ERROR/
| sort @timestamp desc
| limit 100

Find Slow Requests

fields @timestamp, @message, @duration
| filter @duration > 5000
| sort @duration desc
| limit 20

Count Errors by Function

fields @log as log_group
| filter @message like /ERROR/
| stats count() by log_group
| sort count desc

Track User Queries

fields @timestamp, @message
| filter @message like /Query:/
| parse @message 'Query: *' as query
| display @timestamp, query
| sort @timestamp desc
| limit 50

Bedrock Token Usage

fields @timestamp, @message
| filter @message like /Bedrock invocation/
| parse @message 'input_tokens=* output_tokens=*' as input, output
| stats sum(input) as total_input, sum(output) as total_output by bin(5m)

Structured Logging

Lambda Logger Setup:

import json
import logging
from datetime import datetime

logger = logging.getLogger()
logger.setLevel(logging.INFO)

def log_structured(level, message, **kwargs):
    log_entry = {
        'timestamp': datetime.utcnow().isoformat(),
        'level': level,
        'message': message,
        **kwargs
    }
    logger.log(getattr(logging, level), json.dumps(log_entry))

def handler(event, context):
    request_id = context.request_id
    user_id = event['requestContext']['authorizer']['claims']['sub']

    log_structured('INFO', 'Chat request received',
                   request_id=request_id,
                   user_id=user_id)

    try:
        result = process_request(event)
        log_structured('INFO', 'Chat request completed',
                       request_id=request_id,
                       user_id=user_id,
                       duration_ms=context.get_remaining_time_in_millis())
        return result
    except Exception as e:
        log_structured('ERROR', 'Chat request failed',
                       request_id=request_id,
                       user_id=user_id,
                       error=str(e))
        raise

Query Structured Logs:

fields @timestamp, @message
| parse @message '{"timestamp": "*", "level": "*", "message": "*", "request_id": "*", "user_id": "*"}'
  as timestamp, level, message, request_id, user_id
| filter level = "ERROR"
| stats count() by user_id

Distributed Tracing (Optional)

AWS X-Ray Integration

Enable X-Ray in Lambda:

resource "aws_lambda_function" "chat" {
  tracing_config {
    mode = "Active"
  }
}

Instrument Lambda Code:

from aws_xray_sdk.core import xray_recorder
from aws_xray_sdk.core import patch_all

# Patch AWS SDK calls
patch_all()

def handler(event, context):
    # Subsegment for Bedrock Knowledge Base retrieval
    with xray_recorder.capture('bedrock_kb_retrieve'):
        retrieval_result = bedrock_agent.retrieve(
            knowledgeBaseId=KNOWLEDGE_BASE_ID,
            retrievalQuery={'text': query}
        )

    # Subsegment for Bedrock LLM inference
    with xray_recorder.capture('bedrock_inference'):
        response = bedrock.invoke_model(
            modelId='us.anthropic.claude-sonnet-4-5-20250929-v1:0',
            body=json.dumps(prompt)
        )

    return response

View Traces:

# Get trace IDs for slow requests
aws xray get-trace-summaries \
  --start-time $(date -u -d '1 hour ago' +%s) \
  --end-time $(date -u +%s) \
  --filter-expression 'duration > 5'

Service Map:

  • Visualize request flow: API Gateway → Lambda → Bedrock
  • Identify bottlenecks
  • Track downstream dependencies

Performance Monitoring

Lambda Performance Tuning

Memory vs Duration Tradeoff:

# Test different memory settings
for mem in 512 1024 1536 2048; do
  aws lambda update-function-configuration \
    --function-name nb-rag-sys-chat \
    --memory-size $mem

  # Wait for update
  sleep 10

  # Invoke and measure
  time aws lambda invoke --function-name nb-rag-sys-chat /dev/null
done

Cold Start Monitoring:

fields @timestamp, @message, @initDuration
| filter @initDuration > 1000
| stats count() as cold_starts, avg(@initDuration) as avg_cold_start_ms
| sort @timestamp desc

Provisioned Concurrency Analysis:

fields @timestamp, @message, @duration
| filter @message like /INIT_START/
| stats count() as invocations, sum(@duration) as total_init_time_ms
| display invocations, total_init_time_ms

Bedrock Performance

Track Token Usage:

from shared.utils.metrics import RAGMetrics

metrics = RAGMetrics(cloudwatch_client)

# Record LLM generation metrics (includes token counts)
metrics.record_llm_generation(
    latency_ms=2500,
    input_tokens=input_tokens,
    output_tokens=output_tokens
)

Latency Breakdown:

fields @timestamp, @message
| filter @message like /Bedrock/
| parse @message 'embedding_time=*ms retrieval_time=*ms inference_time=*ms'
  as embed_ms, retrieval_ms, inference_ms
| stats avg(embed_ms) as avg_embed, avg(retrieval_ms) as avg_retrieval, avg(inference_ms) as avg_inference

Health Checks

API Health Endpoint

Create Health Check Lambda:

def health_check_handler(event, context):
    checks = {}

    # Check Bedrock connectivity
    try:
        bedrock.list_foundation_models()
        checks['bedrock'] = 'ok'
    except Exception as e:
        checks['bedrock'] = f'error: {str(e)}'

    # Check Knowledge Base connectivity
    try:
        bedrock_agent.get_knowledge_base(knowledgeBaseId='[kb-id]')
        checks['knowledge_base'] = 'ok'
    except Exception as e:
        checks['knowledge_base'] = f'error: {str(e)}'

    # Check DynamoDB connectivity
    try:
        dynamodb.describe_table(TableName='nb-rag-sys')
        checks['dynamodb'] = 'ok'
    except Exception as e:
        checks['dynamodb'] = f'error: {str(e)}'

    all_healthy = all(v == 'ok' for v in checks.values())

    return {
        'statusCode': 200 if all_healthy else 503,
        'body': json.dumps({
            'status': 'healthy' if all_healthy else 'degraded',
            'checks': checks,
            'timestamp': datetime.utcnow().isoformat()
        })
    }

Route53 Health Check (optional):

resource "aws_route53_health_check" "api" {
  fqdn              = "api.yourdomain.com"
  port              = 443
  type              = "HTTPS"
  resource_path     = "/health"
  failure_threshold = "3"
  request_interval  = "30"

  tags = {
    Name = "nb-rag-sys-api-health"
  }
}

Synthetic Monitoring

CloudWatch Synthetics Canary:

resource "aws_synthetics_canary" "api" {
  name                 = "nb-rag-sys-api-canary"
  artifact_s3_location = "s3://${aws_s3_bucket.canary_artifacts.bucket}"
  execution_role_arn   = aws_iam_role.canary.arn
  handler              = "index.handler"
  zip_file             = "canary.zip"
  runtime_version      = "syn-python-selenium-1.0"

  schedule {
    expression = "rate(5 minutes)"
  }
}

Canary Script (canary.py):

from aws_synthetics.selenium import synthetics_webdriver as webdriver
from aws_synthetics.common import synthetics_logger as logger

def main():
    driver = webdriver.Chrome()
    driver.get("https://yourdomain.com")

    # Wait for page load
    driver.implicitly_wait(10)

    # Check for key elements
    assert "NorthBuilt" in driver.title
    assert driver.find_element_by_id("chat-input")

    logger.info("Page loaded successfully")
    driver.quit()

def handler(event, context):
    return main()

Last updated: 2026-01-16