Monitoring & Observability

Comprehensive guide to monitoring the NorthBuilt RAG System.

Overview

The system uses AWS CloudWatch for centralized monitoring, logging, and alerting across all components.

┌─────────────────────────────────────────────────────────────────┐
│                     CloudWatch Dashboard                         │
│  - API Gateway Metrics                                          │
│  - Lambda Performance                                           │
│  - Bedrock Usage                                                │
│  - Cost Tracking                                                │
└─────────────────────────────────────────────────────────────────┘
         │                │                │               │
    ┌────▼────┐      ┌────▼────┐     ┌────▼────┐    ┌────▼────┐
    │  Logs   │      │ Metrics │     │ Alarms  │    │ Insights│
    │ (7-day) │      │(Custom) │     │  (SNS)  │    │ (Query) │
    └─────────┘      └─────────┘     └─────────┘    └─────────┘

Key Metrics

API Gateway Metrics

Metric	Description	Good	Warning	Critical
4xxError	Client errors	<5%	5-10%	>10%
5xxError	Server errors	<0.1%	0.1-1%	>1%
Latency	p95 response time	<2s	2-5s	>5s
Count	Total requests	N/A	N/A	>1000/min

CloudWatch Query:

# Get error rate for last hour
aws cloudwatch get-metric-statistics \
  --namespace AWS/ApiGateway \
  --metric-name 4XXError \
  --dimensions Name=ApiName,Value=nb-rag-sys-api \
  --start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%S) \
  --end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
  --period 300 \
  --statistics Sum

Lambda Metrics

Metric	Function	Good	Warning	Critical
Duration	Chat	<3s	3-5s	>5s
Duration	Classification	<2s	2-5s	>5s
Duration	Webhooks	<5s	5-10s	>10s
Errors	All	<0.1%	0.1-1%	>1%
Throttles	All	0	1-10	>10
ConcurrentExecutions	Chat	<5	5-8	>8

CloudWatch Query:

# Get average duration for Chat Lambda
aws cloudwatch get-metric-statistics \
  --namespace AWS/Lambda \
  --metric-name Duration \
  --dimensions Name=FunctionName,Value=nb-rag-sys-chat \
  --start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%S) \
  --end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
  --period 300 \
  --statistics Average

Bedrock Metrics

Metric	Description	Good	Warning	Critical
Invocations	Model calls	N/A	N/A	>1000/hour
ModelInvocationLatency	Response time	<2s	2-5s	>5s
ModelInvocationClientErrors	4xx errors	0	1-10	>10
ModelInvocationServerErrors	5xx errors	0	1-5	>5

CloudWatch Query:

# Get Bedrock invocations
aws cloudwatch get-metric-statistics \
  --namespace AWS/Bedrock \
  --metric-name Invocations \
  --dimensions Name=ModelId,Value=us.anthropic.claude-sonnet-4-5-20250929-v1:0 \
  --start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%S) \
  --end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
  --period 3600 \
  --statistics Sum

Custom Application Metrics

The RAG system emits custom CloudWatch metrics via the RAGMetrics class in lambda/shared/utils/metrics.py.

Namespace: RAG/Retrieval

Metrics Emitted:

Metric	Unit	Description
`RetrievalLatencyMs`	Milliseconds	Time for Bedrock KB vector search
`CandidatesRetrieved`	Count	Raw results from vector search
`ResultsAfterFilter`	Count	Results after post-filtering
`FilterEffectiveness`	None (0.0-1.0)	Ratio of filtered results
`Errors`	Count	Error counts by type
`LLMGenerationLatencyMs`	Milliseconds	Time for Bedrock LLM response
`LLMInputTokens`	Count	Input tokens used
`LLMOutputTokens`	Count	Output tokens generated

Dimensions:

HasFilter - Whether client filter was applied (true/false)
RerankingEnabled - Whether reranking was enabled (true/false)
ErrorType - Type of error (e.g., RetrievalError, LLMError)

Usage in Lambda:

from shared.utils.metrics import RAGMetrics

metrics = RAGMetrics(cloudwatch_client)

# Record retrieval metrics
metrics.record_retrieval(
    latency_ms=150.5,
    candidates_retrieved=15,
    results_after_filter=5,
    has_client_filter=True,
    reranking_enabled=False
)

# Record LLM generation metrics
metrics.record_llm_generation(
    latency_ms=2500,
    input_tokens=500,
    output_tokens=200
)

# Record errors
metrics.record_error('RetrievalError')

Query Custom Metrics:

# Get retrieval latency (last hour)
aws cloudwatch get-metric-statistics \
  --namespace RAG/Retrieval \
  --metric-name RetrievalLatencyMs \
  --start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%S) \
  --end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
  --period 300 \
  --statistics Average,Maximum

# Get filter effectiveness
aws cloudwatch get-metric-statistics \
  --namespace RAG/Retrieval \
  --metric-name FilterEffectiveness \
  --dimensions Name=HasFilter,Value=true \
  --start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%S) \
  --end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
  --period 300 \
  --statistics Average

Ingestion Metrics (RAG/Ingestion Namespace)

The ingestion pipeline emits custom CloudWatch metrics via multiple classes in lambda/shared/utils/metrics.py.

Namespace: RAG/Ingestion

IngestionMetrics

Tracks webhook events, document ingestion, and sync job completion.

Metric	Unit	Description
`WebhooksReceived`	Count	Webhook events received by Source and Success
`WebhookProcessingLatencyMs`	Milliseconds	Time to process a webhook event
`DocumentsIngested`	Count	Documents saved to S3 by Source, Category, SourceType
`SyncJobsCompleted`	Count	Sync operations completed by Source and Completed
`SyncDurationSeconds`	Seconds	Total sync job duration
`ItemsSynced`	Count	New items added from sync jobs
`ItemsSkipped`	Count	Already existing items skipped
`ItemsFailed`	Count	Items that failed to sync
`SyncAPICallsTotal`	Count	External API calls made during sync
`SyncS3SavesTotal`	Count	S3 save operations during sync
`SyncProcessingRate`	Count/Second	Items processed per second
`IngestionErrors`	Count	Errors by Source, ErrorType, SourceType

Dimensions:

Source - Source system (fathom, helpscout)
Success - Whether operation succeeded (true/false)
Category - Document category (meeting-transcript, customer-conversation, issue)
SourceType - Ingestion trigger (webhook or polling)
ErrorType - Type of error (ValidationError, S3Error, APIError, etc.)
Completed - Whether sync ran to completion (true/false)

Usage in Lambda:

from shared.utils.metrics import IngestionMetrics

metrics = IngestionMetrics(cloudwatch_client)

# Record webhook received (success)
metrics.record_webhook_received(
    source='fathom',
    processed_successfully=True,
    latency_ms=250.5
)

# Record webhook received (failure with error type)
metrics.record_webhook_received(
    source='fathom',
    processed_successfully=False,
    latency_ms=100.0,
    error_type='ValidationError'
)

# Record document ingested
metrics.record_document_ingested(
    source='fathom',
    category='meeting-transcript',
    source_type='webhook'
)

# Record sync completion with worker stats
metrics.record_sync_completed(
    source='fathom',
    completed=True,
    items_synced=25,
    items_skipped=100,
    items_failed=2,
    duration_seconds=45.5,
    api_calls=150,
    s3_saves=25
)

ClassificationMetrics

Tracks client/project classification operations.

Metric	Unit	Description
`ClassificationsTotal`	Count	Classification attempts by Source and Success
`ClassificationLatencyMs`	Milliseconds	DynamoDB lookup time
`ClassificationMatched`	Count	Whether a match was found (1 or 0)
`ClassificationErrors`	Count	Errors by Source and ErrorType

Dimensions:

Source - Source system (fathom, helpscout)
Success - Whether classification completed (true/false)
ErrorType - Type of error (ConfigurationError, ValidationError, StrategyError)

Usage in Lambda:

from shared.utils.metrics import ClassificationMetrics

metrics = ClassificationMetrics(cloudwatch_client)

# Record successful classification
metrics.record_classification(
    source='fathom',
    completed=True,
    latency_ms=50.0,
    match_found=True
)

# Record failed classification with error type
metrics.record_classification(
    source='fathom',
    completed=False,
    latency_ms=25.0,
    error_type='StrategyError'
)

OrchestratorMetrics

Tracks sync handler invocations and worker Lambda invocation status.

Metric	Unit	Description
`SyncHandlerInvocations`	Count	Handler invocations by Source and Success
`SyncHandlerErrors`	Count	Errors by Source and ErrorType

Dimensions:

Source - Source system (fathom, helpscout)
Success - Whether worker was invoked successfully (true/false)
ErrorType - Type of error (ConfigurationError, LambdaInvokeError, HandlerError)

Usage in Lambda:

from shared.utils.metrics import OrchestratorMetrics

metrics = OrchestratorMetrics(cloudwatch_client)

# Record successful handler invocation
metrics.record_handler_invocation(
    source='fathom',
    invoked_successfully=True
)

# Record failed handler invocation
metrics.record_handler_invocation(
    source='fathom',
    invoked_successfully=False,
    error_type='LambdaInvokeError'
)

KBIngestionMetrics

Tracks Bedrock Knowledge Base ingestion jobs (re-indexing operations).

Metric	Unit	Description
`IngestionJobStarted`	Count	KB ingestion job started successfully
`IngestionJobAlreadyRunning`	Count	Job skipped because one is already running
`IngestionJobErrors`	Count	Errors by ErrorType

Dimensions:

ErrorType - Type of error (BedrockAPIError, ConfigurationError, HandlerError)

Usage in Lambda:

from shared.utils.metrics import KBIngestionMetrics

metrics = KBIngestionMetrics(cloudwatch_client)

# Record successful job start
metrics.record_job_started()

# Record job already running (expected, not an error)
metrics.record_job_already_running()

# Record error
metrics.record_error('BedrockAPIError')

Query Ingestion Metrics:

# Get webhook success rate by source
aws cloudwatch get-metric-statistics \
  --namespace RAG/Ingestion \
  --metric-name WebhooksReceived \
  --dimensions Name=Source,Value=fathom Name=Success,Value=true \
  --start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%S) \
  --end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
  --period 300 \
  --statistics Sum

# Get documents ingested by source
aws cloudwatch get-metric-statistics \
  --namespace RAG/Ingestion \
  --metric-name DocumentsIngested \
  --dimensions Name=Source,Value=fathom Name=SourceType,Value=webhook \
  --start-time $(date -u -d '24 hours ago' +%Y-%m-%dT%H:%M:%S) \
  --end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
  --period 3600 \
  --statistics Sum

# Get classification match rate
aws cloudwatch get-metric-statistics \
  --namespace RAG/Ingestion \
  --metric-name ClassificationMatched \
  --dimensions Name=Source,Value=fathom \
  --start-time $(date -u -d '24 hours ago' +%Y-%m-%dT%H:%M:%S) \
  --end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
  --period 3600 \
  --statistics Sum,Average

# Get sync worker performance
aws cloudwatch get-metric-statistics \
  --namespace RAG/Ingestion \
  --metric-name SyncProcessingRate \
  --dimensions Name=Source,Value=fathom \
  --start-time $(date -u -d '7 days ago' +%Y-%m-%dT%H:%M:%S) \
  --end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
  --period 86400 \
  --statistics Average,Maximum

CloudWatch Dashboards

Main Dashboard

Create Dashboard:

aws cloudwatch put-dashboard --dashboard-name nb-rag-sys-main --dashboard-body file://dashboard.json

Dashboard JSON (dashboard.json):

{
  "widgets": [
    {
      "type": "metric",
      "properties": {
        "title": "API Gateway Requests",
        "region": "us-east-1",
        "metrics": [
          ["AWS/ApiGateway", "Count", {"stat": "Sum", "label": "Total Requests"}],
          [".", "4XXError", {"stat": "Sum", "label": "4xx Errors"}],
          [".", "5XXError", {"stat": "Sum", "label": "5xx Errors"}]
        ],
        "period": 300,
        "yAxis": {"left": {"min": 0}}
      }
    },
    {
      "type": "metric",
      "properties": {
        "title": "Lambda Duration (Chat)",
        "region": "us-east-1",
        "metrics": [
          ["AWS/Lambda", "Duration", {"stat": "Average", "label": "Average"}],
          ["...", {"stat": "p95", "label": "p95"}],
          ["...", {"stat": "Maximum", "label": "Maximum"}]
        ],
        "period": 300,
        "yAxis": {"left": {"min": 0, "max": 10000}}
      }
    },
    {
      "type": "metric",
      "properties": {
        "title": "Lambda Errors",
        "region": "us-east-1",
        "metrics": [
          ["AWS/Lambda", "Errors", {"stat": "Sum", "label": "Chat"}, {"dimensions": {"FunctionName": "nb-rag-sys-chat"}}],
          ["...", {"dimensions": {"FunctionName": "nb-rag-sys"}}],
          ["...", {"dimensions": {"FunctionName": "nb-rag-sys-fathom-webhook"}}]
        ],
        "period": 300,
        "yAxis": {"left": {"min": 0}}
      }
    },
    {
      "type": "metric",
      "properties": {
        "title": "Bedrock Invocations",
        "region": "us-east-1",
        "metrics": [
          ["AWS/Bedrock", "Invocations", {"stat": "Sum", "label": "Claude Sonnet 4.5"}],
          [".", "ModelInvocationLatency", {"stat": "Average", "label": "Latency (ms)"}]
        ],
        "period": 3600,
        "yAxis": {"left": {"min": 0}}
      }
    },
    {
      "type": "log",
      "properties": {
        "title": "Recent Errors",
        "region": "us-east-1",
        "query": "SOURCE '/aws/lambda/nb-rag-sys-chat'\n| fields @timestamp, @message\n| filter @message like /ERROR/\n| sort @timestamp desc\n| limit 20"
      }
    }
  ]
}

Cost Dashboard

Track Costs by Service:

{
  "widgets": [
    {
      "type": "metric",
      "properties": {
        "title": "Estimated Monthly Cost",
        "region": "us-east-1",
        "metrics": [
          ["AWS/Billing", "EstimatedCharges", {"stat": "Maximum"}]
        ],
        "period": 21600,
        "yAxis": {"left": {"min": 0}}
      }
    },
    {
      "type": "metric",
      "properties": {
        "title": "Cost by Service",
        "region": "us-east-1",
        "metrics": [
          ["AWS/Billing", "EstimatedCharges", {"stat": "Maximum"}, {"dimensions": {"ServiceName": "AWS Lambda"}}],
          ["...", {"dimensions": {"ServiceName": "Amazon Bedrock"}}],
          ["...", {"dimensions": {"ServiceName": "Amazon API Gateway"}}],
          ["...", {"dimensions": {"ServiceName": "Amazon S3"}}],
          ["...", {"dimensions": {"ServiceName": "Amazon DynamoDB"}}]
        ],
        "period": 21600,
        "yAxis": {"left": {"min": 0}}
      }
    }
  ]
}

CloudWatch Alarms

Critical Alarms

High Error Rate

API Gateway 5xx Errors:

resource "aws_cloudwatch_metric_alarm" "api_5xx_errors" {
  alarm_name          = "nb-rag-sys-api-5xx-errors"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = "2"
  metric_name         = "5XXError"
  namespace           = "AWS/ApiGateway"
  period              = "300"
  statistic           = "Sum"
  threshold           = "5"
  alarm_description   = "Alert when API Gateway returns 5 or more 5xx errors in 5 minutes"
  alarm_actions       = [aws_sns_topic.alerts.arn]

  dimensions = {
    ApiName = "nb-rag-sys-api"
  }
}

Lambda Errors:

resource "aws_cloudwatch_metric_alarm" "lambda_errors" {
  alarm_name          = "nb-rag-sys-lambda-errors"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = "2"
  metric_name         = "Errors"
  namespace           = "AWS/Lambda"
  period              = "300"
  statistic           = "Sum"
  threshold           = "10"
  alarm_description   = "Alert when Lambda function has 10+ errors in 5 minutes"
  alarm_actions       = [aws_sns_topic.alerts.arn]

  dimensions = {
    FunctionName = "nb-rag-sys-chat"
  }
}

High Latency

API Gateway Latency:

resource "aws_cloudwatch_metric_alarm" "api_latency" {
  alarm_name          = "nb-rag-sys-api-latency"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = "2"
  metric_name         = "Latency"
  namespace           = "AWS/ApiGateway"
  period              = "300"
  statistic           = "Average"
  threshold           = "5000"  # 5 seconds
  alarm_description   = "Alert when API Gateway latency exceeds 5 seconds"
  alarm_actions       = [aws_sns_topic.alerts.arn]

  dimensions = {
    ApiName = "nb-rag-sys-api"
  }
}

Lambda Duration:

resource "aws_cloudwatch_metric_alarm" "lambda_duration" {
  alarm_name          = "nb-rag-sys-lambda-duration"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = "2"
  metric_name         = "Duration"
  namespace           = "AWS/Lambda"
  period              = "300"
  statistic           = "Average"
  threshold           = "10000"  # 10 seconds
  alarm_description   = "Alert when Lambda duration exceeds 10 seconds"
  alarm_actions       = [aws_sns_topic.alerts.arn]

  dimensions = {
    FunctionName = "nb-rag-sys-chat"
  }
}

Lambda Throttling

resource "aws_cloudwatch_metric_alarm" "lambda_throttles" {
  alarm_name          = "nb-rag-sys-lambda-throttles"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = "1"
  metric_name         = "Throttles"
  namespace           = "AWS/Lambda"
  period              = "300"
  statistic           = "Sum"
  threshold           = "10"
  alarm_description   = "Alert when Lambda throttles occur"
  alarm_actions       = [aws_sns_topic.alerts.arn]

  dimensions = {
    FunctionName = "nb-rag-sys-chat"
  }
}

Warning Alarms

High Cost

resource "aws_cloudwatch_metric_alarm" "high_cost" {
  alarm_name          = "nb-rag-sys-high-cost"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = "1"
  metric_name         = "EstimatedCharges"
  namespace           = "AWS/Billing"
  period              = "21600"  # 6 hours
  statistic           = "Maximum"
  threshold           = "200"  # $200/month
  alarm_description   = "Alert when monthly cost exceeds $200"
  alarm_actions       = [aws_sns_topic.alerts.arn]

  dimensions = {
    Currency = "USD"
  }
}

Low Traffic (Anomaly Detection)

resource "aws_cloudwatch_metric_alarm" "low_traffic" {
  alarm_name          = "nb-rag-sys-low-traffic"
  comparison_operator = "LessThanLowerThreshold"
  evaluation_periods  = "2"
  threshold_metric_id = "ad1"
  alarm_description   = "Alert when traffic drops below expected baseline"
  alarm_actions       = [aws_sns_topic.alerts.arn]

  metric_query {
    id          = "m1"
    return_data = true

    metric {
      metric_name = "Count"
      namespace   = "AWS/ApiGateway"
      period      = "300"
      stat        = "Sum"

      dimensions = {
        ApiName = "nb-rag-sys-api"
      }
    }
  }

  metric_query {
    id          = "ad1"
    expression  = "ANOMALY_DETECTION_BAND(m1, 2)"
    label       = "Traffic (expected)"
    return_data = true
  }
}

resource "aws_sns_topic" "alerts" {
  name = "nb-rag-sys-alerts"
}

resource "aws_sns_topic_subscription" "email" {
  topic_arn = aws_sns_topic.alerts.arn
  protocol  = "email"
  endpoint  = "alerts@yourcompany.com"
}

resource "aws_sns_topic_subscription" "slack" {
  topic_arn = aws_sns_topic.alerts.arn
  protocol  = "https"
  endpoint  = "https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK"
}

CloudWatch Logs

Log Groups

Lambda Function	Log Group	Retention
Chat	`/aws/lambda/nb-rag-sys-chat`	7 days
Classification	`/aws/lambda/nb-rag-sys`	7 days
Ingest	`/aws/lambda/nb-rag-sys-ingest`	7 days
Fathom Webhook	`/aws/lambda/nb-rag-sys-fathom-webhook`	7 days
HelpScout Webhook	`/aws/lambda/nb-rag-sys-helpscout-webhook`	7 days

Log Queries

Find All Errors

fields @timestamp, @message
| filter @message like /ERROR/
| sort @timestamp desc
| limit 100

Find Slow Requests

fields @timestamp, @message, @duration
| filter @duration > 5000
| sort @duration desc
| limit 20

Count Errors by Function

fields @log as log_group
| filter @message like /ERROR/
| stats count() by log_group
| sort count desc

Track User Queries

fields @timestamp, @message
| filter @message like /Query:/
| parse @message 'Query: *' as query
| display @timestamp, query
| sort @timestamp desc
| limit 50

Bedrock Token Usage

fields @timestamp, @message
| filter @message like /Bedrock invocation/
| parse @message 'input_tokens=* output_tokens=*' as input, output
| stats sum(input) as total_input, sum(output) as total_output by bin(5m)

Structured Logging

Lambda Logger Setup:

import json
import logging
from datetime import datetime

logger = logging.getLogger()
logger.setLevel(logging.INFO)

def log_structured(level, message, **kwargs):
    log_entry = {
        'timestamp': datetime.utcnow().isoformat(),
        'level': level,
        'message': message,
        **kwargs
    }
    logger.log(getattr(logging, level), json.dumps(log_entry))

def handler(event, context):
    request_id = context.request_id
    user_id = event['requestContext']['authorizer']['claims']['sub']

    log_structured('INFO', 'Chat request received',
                   request_id=request_id,
                   user_id=user_id)

    try:
        result = process_request(event)
        log_structured('INFO', 'Chat request completed',
                       request_id=request_id,
                       user_id=user_id,
                       duration_ms=context.get_remaining_time_in_millis())
        return result
    except Exception as e:
        log_structured('ERROR', 'Chat request failed',
                       request_id=request_id,
                       user_id=user_id,
                       error=str(e))
        raise

Query Structured Logs:

fields @timestamp, @message
| parse @message '{"timestamp": "*", "level": "*", "message": "*", "request_id": "*", "user_id": "*"}'
  as timestamp, level, message, request_id, user_id
| filter level = "ERROR"
| stats count() by user_id

Distributed Tracing (Optional)

AWS X-Ray Integration

Enable X-Ray in Lambda:

resource "aws_lambda_function" "chat" {
  tracing_config {
    mode = "Active"
  }
}

Instrument Lambda Code:

from aws_xray_sdk.core import xray_recorder
from aws_xray_sdk.core import patch_all

# Patch AWS SDK calls
patch_all()

def handler(event, context):
    # Subsegment for Bedrock Knowledge Base retrieval
    with xray_recorder.capture('bedrock_kb_retrieve'):
        retrieval_result = bedrock_agent.retrieve(
            knowledgeBaseId=KNOWLEDGE_BASE_ID,
            retrievalQuery={'text': query}
        )

    # Subsegment for Bedrock LLM inference
    with xray_recorder.capture('bedrock_inference'):
        response = bedrock.invoke_model(
            modelId='us.anthropic.claude-sonnet-4-5-20250929-v1:0',
            body=json.dumps(prompt)
        )

    return response

View Traces:

# Get trace IDs for slow requests
aws xray get-trace-summaries \
  --start-time $(date -u -d '1 hour ago' +%s) \
  --end-time $(date -u +%s) \
  --filter-expression 'duration > 5'

Service Map:

Visualize request flow: API Gateway → Lambda → Bedrock
Identify bottlenecks
Track downstream dependencies

Performance Monitoring

Lambda Performance Tuning

Memory vs Duration Tradeoff:

# Test different memory settings
for mem in 512 1024 1536 2048; do
  aws lambda update-function-configuration \
    --function-name nb-rag-sys-chat \
    --memory-size $mem

  # Wait for update
  sleep 10

  # Invoke and measure
  time aws lambda invoke --function-name nb-rag-sys-chat /dev/null
done

Cold Start Monitoring:

fields @timestamp, @message, @initDuration
| filter @initDuration > 1000
| stats count() as cold_starts, avg(@initDuration) as avg_cold_start_ms
| sort @timestamp desc

Provisioned Concurrency Analysis:

fields @timestamp, @message, @duration
| filter @message like /INIT_START/
| stats count() as invocations, sum(@duration) as total_init_time_ms
| display invocations, total_init_time_ms

Bedrock Performance

Track Token Usage:

from shared.utils.metrics import RAGMetrics

metrics = RAGMetrics(cloudwatch_client)

# Record LLM generation metrics (includes token counts)
metrics.record_llm_generation(
    latency_ms=2500,
    input_tokens=input_tokens,
    output_tokens=output_tokens
)

Latency Breakdown:

fields @timestamp, @message
| filter @message like /Bedrock/
| parse @message 'embedding_time=*ms retrieval_time=*ms inference_time=*ms'
  as embed_ms, retrieval_ms, inference_ms
| stats avg(embed_ms) as avg_embed, avg(retrieval_ms) as avg_retrieval, avg(inference_ms) as avg_inference

Health Checks

API Health Endpoint

Create Health Check Lambda:

def health_check_handler(event, context):
    checks = {}

    # Check Bedrock connectivity
    try:
        bedrock.list_foundation_models()
        checks['bedrock'] = 'ok'
    except Exception as e:
        checks['bedrock'] = f'error: {str(e)}'

    # Check Knowledge Base connectivity
    try:
        bedrock_agent.get_knowledge_base(knowledgeBaseId='[kb-id]')
        checks['knowledge_base'] = 'ok'
    except Exception as e:
        checks['knowledge_base'] = f'error: {str(e)}'

    # Check DynamoDB connectivity
    try:
        dynamodb.describe_table(TableName='nb-rag-sys')
        checks['dynamodb'] = 'ok'
    except Exception as e:
        checks['dynamodb'] = f'error: {str(e)}'

    all_healthy = all(v == 'ok' for v in checks.values())

    return {
        'statusCode': 200 if all_healthy else 503,
        'body': json.dumps({
            'status': 'healthy' if all_healthy else 'degraded',
            'checks': checks,
            'timestamp': datetime.utcnow().isoformat()
        })
    }

Route53 Health Check (optional):

resource "aws_route53_health_check" "api" {
  fqdn              = "api.yourdomain.com"
  port              = 443
  type              = "HTTPS"
  resource_path     = "/health"
  failure_threshold = "3"
  request_interval  = "30"

  tags = {
    Name = "nb-rag-sys-api-health"
  }
}

Synthetic Monitoring

CloudWatch Synthetics Canary:

resource "aws_synthetics_canary" "api" {
  name                 = "nb-rag-sys-api-canary"
  artifact_s3_location = "s3://${aws_s3_bucket.canary_artifacts.bucket}"
  execution_role_arn   = aws_iam_role.canary.arn
  handler              = "index.handler"
  zip_file             = "canary.zip"
  runtime_version      = "syn-python-selenium-1.0"

  schedule {
    expression = "rate(5 minutes)"
  }
}

Canary Script (canary.py):

from aws_synthetics.selenium import synthetics_webdriver as webdriver
from aws_synthetics.common import synthetics_logger as logger

def main():
    driver = webdriver.Chrome()
    driver.get("https://yourdomain.com")

    # Wait for page load
    driver.implicitly_wait(10)

    # Check for key elements
    assert "NorthBuilt" in driver.title
    assert driver.find_element_by_id("chat-input")

    logger.info("Page loaded successfully")
    driver.quit()

def handler(event, context):
    return main()

Last updated: 2026-01-16

Monitoring & Observability

Overview

Key Metrics

API Gateway Metrics

Lambda Metrics

Bedrock Metrics

Custom Application Metrics

Ingestion Metrics (RAG/Ingestion Namespace)

IngestionMetrics

ClassificationMetrics

OrchestratorMetrics

KBIngestionMetrics

CloudWatch Dashboards

Main Dashboard

Cost Dashboard

CloudWatch Alarms

Critical Alarms

High Error Rate

High Latency

Lambda Throttling

Warning Alarms

High Cost

Low Traffic (Anomaly Detection)

SNS Topic for Alerts

CloudWatch Logs

Log Groups

Log Queries

Find All Errors

Find Slow Requests

Count Errors by Function

Track User Queries

Bedrock Token Usage

Structured Logging

Distributed Tracing (Optional)

AWS X-Ray Integration

Performance Monitoring

Lambda Performance Tuning

Bedrock Performance

Health Checks

API Health Endpoint

Synthetic Monitoring