Monitoring & Observability

Comprehensive guide to monitoring the NorthBuilt RAG System.

Overview

The system uses AWS CloudWatch for centralized monitoring, logging, and alerting across all components.

┌─────────────────────────────────────────────────────────────────┐
│                     CloudWatch Dashboard                         │
│  - API Gateway Metrics                                          │
│  - Lambda Performance                                           │
│  - Bedrock Usage                                                │
│  - Cost Tracking                                                │
└─────────────────────────────────────────────────────────────────┘
         │                │                │               │
    ┌────▼────┐      ┌────▼────┐     ┌────▼────┐    ┌────▼────┐
    │  Logs   │      │ Metrics │     │ Alarms  │    │ Insights│
    │ (7-day) │      │(Custom) │     │  (SNS)  │    │ (Query) │
    └─────────┘      └─────────┘     └─────────┘    └─────────┘

Key Metrics

API Gateway Metrics

Metric Description Good Warning Critical
4xxError Client errors <5% 5-10% >10%
5xxError Server errors <0.1% 0.1-1% >1%
Latency p95 response time <2s 2-5s >5s
Count Total requests N/A N/A >1000/min

CloudWatch Query:

# Get error rate for last hour
aws cloudwatch get-metric-statistics \
  --namespace AWS/ApiGateway \
  --metric-name 4XXError \
  --dimensions Name=ApiName,Value=nb-rag-sys-api \
  --start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%S) \
  --end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
  --period 300 \
  --statistics Sum

Lambda Metrics

Metric Function Good Warning Critical
Duration Chat <3s 3-5s >5s
Duration Classify <2s 2-5s >5s
Duration Webhooks <5s 5-10s >10s
Errors All <0.1% 0.1-1% >1%
Throttles All 0 1-10 >10
ConcurrentExecutions Chat <5 5-8 >8

CloudWatch Query:

# Get average duration for Chat Lambda
aws cloudwatch get-metric-statistics \
  --namespace AWS/Lambda \
  --metric-name Duration \
  --dimensions Name=FunctionName,Value=nb-rag-sys-chat \
  --start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%S) \
  --end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
  --period 300 \
  --statistics Average

Bedrock Metrics

Metric Description Good Warning Critical
Invocations Model calls N/A N/A >1000/hour
ModelInvocationLatency Response time <2s 2-5s >5s
ModelInvocationClientErrors 4xx errors 0 1-10 >10
ModelInvocationServerErrors 5xx errors 0 1-5 >5

CloudWatch Query:

# Get Bedrock invocations
aws cloudwatch get-metric-statistics \
  --namespace AWS/Bedrock \
  --metric-name Invocations \
  --dimensions Name=ModelId,Value=us.anthropic.claude-sonnet-4-5-20250929-v1:0 \
  --start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%S) \
  --end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
  --period 3600 \
  --statistics Sum

Custom Application Metrics

Publishing Custom Metrics (in Lambda):

import boto3
from datetime import datetime

cloudwatch = boto3.client('cloudwatch')

def publish_metric(metric_name, value, unit='None', dimensions=None):
    cloudwatch.put_metric_data(
        Namespace='NorthBuilt/RAG',
        MetricData=[
            {
                'MetricName': metric_name,
                'Value': value,
                'Unit': unit,
                'Timestamp': datetime.utcnow(),
                'Dimensions': dimensions or []
            }
        ]
    )

# Example usage in Chat Lambda
def handler(event, context):
    start_time = time.time()

    # Process request...
    result = process_chat_request(event)

    # Publish metrics
    duration = time.time() - start_time
    publish_metric('ChatRequestDuration', duration, 'Seconds')
    publish_metric('ChatRequestSuccess', 1 if result else 0, 'Count')
    publish_metric('RetrievedDocuments', len(result.get('sources', [])), 'Count')

    return result

Custom Metrics to Track:

  • ChatRequestDuration - End-to-end chat request time
  • QueryRetrievalTime - Time to retrieve documents
  • BedrockInferenceTime - Time for Bedrock response
  • RetrievedDocuments - Number of documents retrieved
  • CacheHitRate - Query cache hit percentage (if implemented)

CloudWatch Dashboards

Main Dashboard

Create Dashboard:

aws cloudwatch put-dashboard --dashboard-name nb-rag-sys-main --dashboard-body file://dashboard.json

Dashboard JSON (dashboard.json):

{
  "widgets": [
    {
      "type": "metric",
      "properties": {
        "title": "API Gateway Requests",
        "region": "us-east-1",
        "metrics": [
          ["AWS/ApiGateway", "Count", {"stat": "Sum", "label": "Total Requests"}],
          [".", "4XXError", {"stat": "Sum", "label": "4xx Errors"}],
          [".", "5XXError", {"stat": "Sum", "label": "5xx Errors"}]
        ],
        "period": 300,
        "yAxis": {"left": {"min": 0}}
      }
    },
    {
      "type": "metric",
      "properties": {
        "title": "Lambda Duration (Chat)",
        "region": "us-east-1",
        "metrics": [
          ["AWS/Lambda", "Duration", {"stat": "Average", "label": "Average"}],
          ["...", {"stat": "p95", "label": "p95"}],
          ["...", {"stat": "Maximum", "label": "Maximum"}]
        ],
        "period": 300,
        "yAxis": {"left": {"min": 0, "max": 10000}}
      }
    },
    {
      "type": "metric",
      "properties": {
        "title": "Lambda Errors",
        "region": "us-east-1",
        "metrics": [
          ["AWS/Lambda", "Errors", {"stat": "Sum", "label": "Chat"}, {"dimensions": {"FunctionName": "nb-rag-sys-chat"}}],
          ["...", {"dimensions": {"FunctionName": "nb-rag-sys-classify"}}],
          ["...", {"dimensions": {"FunctionName": "nb-rag-sys-fathom-webhook"}}]
        ],
        "period": 300,
        "yAxis": {"left": {"min": 0}}
      }
    },
    {
      "type": "metric",
      "properties": {
        "title": "Bedrock Invocations",
        "region": "us-east-1",
        "metrics": [
          ["AWS/Bedrock", "Invocations", {"stat": "Sum", "label": "Claude Sonnet 4.5"}],
          [".", "ModelInvocationLatency", {"stat": "Average", "label": "Latency (ms)"}]
        ],
        "period": 3600,
        "yAxis": {"left": {"min": 0}}
      }
    },
    {
      "type": "log",
      "properties": {
        "title": "Recent Errors",
        "region": "us-east-1",
        "query": "SOURCE '/aws/lambda/nb-rag-sys-chat'\n| fields @timestamp, @message\n| filter @message like /ERROR/\n| sort @timestamp desc\n| limit 20"
      }
    }
  ]
}

Cost Dashboard

Track Costs by Service:

{
  "widgets": [
    {
      "type": "metric",
      "properties": {
        "title": "Estimated Monthly Cost",
        "region": "us-east-1",
        "metrics": [
          ["AWS/Billing", "EstimatedCharges", {"stat": "Maximum"}]
        ],
        "period": 21600,
        "yAxis": {"left": {"min": 0}}
      }
    },
    {
      "type": "metric",
      "properties": {
        "title": "Cost by Service",
        "region": "us-east-1",
        "metrics": [
          ["AWS/Billing", "EstimatedCharges", {"stat": "Maximum"}, {"dimensions": {"ServiceName": "AWS Lambda"}}],
          ["...", {"dimensions": {"ServiceName": "Amazon Bedrock"}}],
          ["...", {"dimensions": {"ServiceName": "Amazon API Gateway"}}],
          ["...", {"dimensions": {"ServiceName": "Amazon S3"}}],
          ["...", {"dimensions": {"ServiceName": "Amazon DynamoDB"}}]
        ],
        "period": 21600,
        "yAxis": {"left": {"min": 0}}
      }
    }
  ]
}

CloudWatch Alarms

Critical Alarms

High Error Rate

API Gateway 5xx Errors:

resource "aws_cloudwatch_metric_alarm" "api_5xx_errors" {
  alarm_name          = "nb-rag-sys-api-5xx-errors"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = "2"
  metric_name         = "5XXError"
  namespace           = "AWS/ApiGateway"
  period              = "300"
  statistic           = "Sum"
  threshold           = "5"
  alarm_description   = "Alert when API Gateway returns 5 or more 5xx errors in 5 minutes"
  alarm_actions       = [aws_sns_topic.alerts.arn]

  dimensions = {
    ApiName = "nb-rag-sys-api"
  }
}

Lambda Errors:

resource "aws_cloudwatch_metric_alarm" "lambda_errors" {
  alarm_name          = "nb-rag-sys-lambda-errors"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = "2"
  metric_name         = "Errors"
  namespace           = "AWS/Lambda"
  period              = "300"
  statistic           = "Sum"
  threshold           = "10"
  alarm_description   = "Alert when Lambda function has 10+ errors in 5 minutes"
  alarm_actions       = [aws_sns_topic.alerts.arn]

  dimensions = {
    FunctionName = "nb-rag-sys-chat"
  }
}

High Latency

API Gateway Latency:

resource "aws_cloudwatch_metric_alarm" "api_latency" {
  alarm_name          = "nb-rag-sys-api-latency"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = "2"
  metric_name         = "Latency"
  namespace           = "AWS/ApiGateway"
  period              = "300"
  statistic           = "Average"
  threshold           = "5000"  # 5 seconds
  alarm_description   = "Alert when API Gateway latency exceeds 5 seconds"
  alarm_actions       = [aws_sns_topic.alerts.arn]

  dimensions = {
    ApiName = "nb-rag-sys-api"
  }
}

Lambda Duration:

resource "aws_cloudwatch_metric_alarm" "lambda_duration" {
  alarm_name          = "nb-rag-sys-lambda-duration"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = "2"
  metric_name         = "Duration"
  namespace           = "AWS/Lambda"
  period              = "300"
  statistic           = "Average"
  threshold           = "10000"  # 10 seconds
  alarm_description   = "Alert when Lambda duration exceeds 10 seconds"
  alarm_actions       = [aws_sns_topic.alerts.arn]

  dimensions = {
    FunctionName = "nb-rag-sys-chat"
  }
}

Lambda Throttling

resource "aws_cloudwatch_metric_alarm" "lambda_throttles" {
  alarm_name          = "nb-rag-sys-lambda-throttles"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = "1"
  metric_name         = "Throttles"
  namespace           = "AWS/Lambda"
  period              = "300"
  statistic           = "Sum"
  threshold           = "10"
  alarm_description   = "Alert when Lambda throttles occur"
  alarm_actions       = [aws_sns_topic.alerts.arn]

  dimensions = {
    FunctionName = "nb-rag-sys-chat"
  }
}

Warning Alarms

High Cost

resource "aws_cloudwatch_metric_alarm" "high_cost" {
  alarm_name          = "nb-rag-sys-high-cost"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = "1"
  metric_name         = "EstimatedCharges"
  namespace           = "AWS/Billing"
  period              = "21600"  # 6 hours
  statistic           = "Maximum"
  threshold           = "200"  # $200/month
  alarm_description   = "Alert when monthly cost exceeds $200"
  alarm_actions       = [aws_sns_topic.alerts.arn]

  dimensions = {
    Currency = "USD"
  }
}

Low Traffic (Anomaly Detection)

resource "aws_cloudwatch_metric_alarm" "low_traffic" {
  alarm_name          = "nb-rag-sys-low-traffic"
  comparison_operator = "LessThanLowerThreshold"
  evaluation_periods  = "2"
  threshold_metric_id = "ad1"
  alarm_description   = "Alert when traffic drops below expected baseline"
  alarm_actions       = [aws_sns_topic.alerts.arn]

  metric_query {
    id          = "m1"
    return_data = true

    metric {
      metric_name = "Count"
      namespace   = "AWS/ApiGateway"
      period      = "300"
      stat        = "Sum"

      dimensions = {
        ApiName = "nb-rag-sys-api"
      }
    }
  }

  metric_query {
    id          = "ad1"
    expression  = "ANOMALY_DETECTION_BAND(m1, 2)"
    label       = "Traffic (expected)"
    return_data = true
  }
}

SNS Topic for Alerts

resource "aws_sns_topic" "alerts" {
  name = "nb-rag-sys-alerts"
}

resource "aws_sns_topic_subscription" "email" {
  topic_arn = aws_sns_topic.alerts.arn
  protocol  = "email"
  endpoint  = "alerts@yourcompany.com"
}

resource "aws_sns_topic_subscription" "slack" {
  topic_arn = aws_sns_topic.alerts.arn
  protocol  = "https"
  endpoint  = "https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK"
}

CloudWatch Logs

Log Groups

Lambda Function Log Group Retention
Chat /aws/lambda/nb-rag-sys-chat 7 days
Classify /aws/lambda/nb-rag-sys-classify 7 days
Ingest /aws/lambda/nb-rag-sys-ingest 7 days
Fathom Webhook /aws/lambda/nb-rag-sys-fathom-webhook 7 days
HelpScout Webhook /aws/lambda/nb-rag-sys-helpscout-webhook 7 days
Linear Webhook /aws/lambda/nb-rag-sys-linear-webhook 7 days
Linear Sync /aws/lambda/nb-rag-sys-linear-sync 7 days

Log Queries

Find All Errors

fields @timestamp, @message
| filter @message like /ERROR/
| sort @timestamp desc
| limit 100

Find Slow Requests

fields @timestamp, @message, @duration
| filter @duration > 5000
| sort @duration desc
| limit 20

Count Errors by Function

fields @log as log_group
| filter @message like /ERROR/
| stats count() by log_group
| sort count desc

Track User Queries

fields @timestamp, @message
| filter @message like /Query:/
| parse @message 'Query: *' as query
| display @timestamp, query
| sort @timestamp desc
| limit 50

Bedrock Token Usage

fields @timestamp, @message
| filter @message like /Bedrock invocation/
| parse @message 'input_tokens=* output_tokens=*' as input, output
| stats sum(input) as total_input, sum(output) as total_output by bin(5m)

Structured Logging

Lambda Logger Setup:

import json
import logging
from datetime import datetime

logger = logging.getLogger()
logger.setLevel(logging.INFO)

def log_structured(level, message, **kwargs):
    log_entry = {
        'timestamp': datetime.utcnow().isoformat(),
        'level': level,
        'message': message,
        **kwargs
    }
    logger.log(getattr(logging, level), json.dumps(log_entry))

def handler(event, context):
    request_id = context.request_id
    user_id = event['requestContext']['authorizer']['claims']['sub']

    log_structured('INFO', 'Chat request received',
                   request_id=request_id,
                   user_id=user_id)

    try:
        result = process_request(event)
        log_structured('INFO', 'Chat request completed',
                       request_id=request_id,
                       user_id=user_id,
                       duration_ms=context.get_remaining_time_in_millis())
        return result
    except Exception as e:
        log_structured('ERROR', 'Chat request failed',
                       request_id=request_id,
                       user_id=user_id,
                       error=str(e))
        raise

Query Structured Logs:

fields @timestamp, @message
| parse @message '{"timestamp": "*", "level": "*", "message": "*", "request_id": "*", "user_id": "*"}'
  as timestamp, level, message, request_id, user_id
| filter level = "ERROR"
| stats count() by user_id

Distributed Tracing (Optional)

AWS X-Ray Integration

Enable X-Ray in Lambda:

resource "aws_lambda_function" "chat" {
  tracing_config {
    mode = "Active"
  }
}

Instrument Lambda Code:

from aws_xray_sdk.core import xray_recorder
from aws_xray_sdk.core import patch_all

# Patch AWS SDK calls
patch_all()

def handler(event, context):
    # Subsegment for Bedrock Knowledge Base retrieval
    with xray_recorder.capture('bedrock_kb_retrieve'):
        retrieval_result = bedrock_agent.retrieve(
            knowledgeBaseId=KNOWLEDGE_BASE_ID,
            retrievalQuery={'text': query}
        )

    # Subsegment for Bedrock LLM inference
    with xray_recorder.capture('bedrock_inference'):
        response = bedrock.invoke_model(
            modelId='us.anthropic.claude-sonnet-4-5-20250929-v1:0',
            body=json.dumps(prompt)
        )

    return response

View Traces:

# Get trace IDs for slow requests
aws xray get-trace-summaries \
  --start-time $(date -u -d '1 hour ago' +%s) \
  --end-time $(date -u +%s) \
  --filter-expression 'duration > 5'

Service Map:

  • Visualize request flow: API Gateway → Lambda → Bedrock
  • Identify bottlenecks
  • Track downstream dependencies

Performance Monitoring

Lambda Performance Tuning

Memory vs Duration Tradeoff:

# Test different memory settings
for mem in 512 1024 1536 2048; do
  aws lambda update-function-configuration \
    --function-name nb-rag-sys-chat \
    --memory-size $mem

  # Wait for update
  sleep 10

  # Invoke and measure
  time aws lambda invoke --function-name nb-rag-sys-chat /dev/null
done

Cold Start Monitoring:

fields @timestamp, @message, @initDuration
| filter @initDuration > 1000
| stats count() as cold_starts, avg(@initDuration) as avg_cold_start_ms
| sort @timestamp desc

Provisioned Concurrency Analysis:

fields @timestamp, @message, @duration
| filter @message like /INIT_START/
| stats count() as invocations, sum(@duration) as total_init_time_ms
| display invocations, total_init_time_ms

Bedrock Performance

Track Token Usage:

def track_bedrock_usage(input_tokens, output_tokens):
    cloudwatch.put_metric_data(
        Namespace='NorthBuilt/RAG/Bedrock',
        MetricData=[
            {'MetricName': 'InputTokens', 'Value': input_tokens, 'Unit': 'Count'},
            {'MetricName': 'OutputTokens', 'Value': output_tokens, 'Unit': 'Count'},
            {'MetricName': 'TotalCost', 'Value': (input_tokens * 0.000003 + output_tokens * 0.000015), 'Unit': 'None'}
        ]
    )

Latency Breakdown:

fields @timestamp, @message
| filter @message like /Bedrock/
| parse @message 'embedding_time=*ms retrieval_time=*ms inference_time=*ms'
  as embed_ms, retrieval_ms, inference_ms
| stats avg(embed_ms) as avg_embed, avg(retrieval_ms) as avg_retrieval, avg(inference_ms) as avg_inference

Health Checks

API Health Endpoint

Create Health Check Lambda:

def health_check_handler(event, context):
    checks = {}

    # Check Bedrock connectivity
    try:
        bedrock.list_foundation_models()
        checks['bedrock'] = 'ok'
    except Exception as e:
        checks['bedrock'] = f'error: {str(e)}'

    # Check Knowledge Base connectivity
    try:
        bedrock_agent.get_knowledge_base(knowledgeBaseId='[kb-id]')
        checks['knowledge_base'] = 'ok'
    except Exception as e:
        checks['knowledge_base'] = f'error: {str(e)}'

    # Check DynamoDB connectivity
    try:
        dynamodb.describe_table(TableName='nb-rag-sys-classify')
        checks['dynamodb'] = 'ok'
    except Exception as e:
        checks['dynamodb'] = f'error: {str(e)}'

    all_healthy = all(v == 'ok' for v in checks.values())

    return {
        'statusCode': 200 if all_healthy else 503,
        'body': json.dumps({
            'status': 'healthy' if all_healthy else 'degraded',
            'checks': checks,
            'timestamp': datetime.utcnow().isoformat()
        })
    }

Route53 Health Check (optional):

resource "aws_route53_health_check" "api" {
  fqdn              = "api.yourdomain.com"
  port              = 443
  type              = "HTTPS"
  resource_path     = "/health"
  failure_threshold = "3"
  request_interval  = "30"

  tags = {
    Name = "nb-rag-sys-api-health"
  }
}

Synthetic Monitoring

CloudWatch Synthetics Canary:

resource "aws_synthetics_canary" "api" {
  name                 = "nb-rag-sys-api-canary"
  artifact_s3_location = "s3://${aws_s3_bucket.canary_artifacts.bucket}"
  execution_role_arn   = aws_iam_role.canary.arn
  handler              = "index.handler"
  zip_file             = "canary.zip"
  runtime_version      = "syn-python-selenium-1.0"

  schedule {
    expression = "rate(5 minutes)"
  }
}

Canary Script (canary.py):

from aws_synthetics.selenium import synthetics_webdriver as webdriver
from aws_synthetics.common import synthetics_logger as logger

def main():
    driver = webdriver.Chrome()
    driver.get("https://yourdomain.com")

    # Wait for page load
    driver.implicitly_wait(10)

    # Check for key elements
    assert "NorthBuilt" in driver.title
    assert driver.find_element_by_id("chat-input")

    logger.info("Page loaded successfully")
    driver.quit()

def handler(event, context):
    return main()

Last updated: 2025-12-31