Monitoring & Observability
Comprehensive guide to monitoring the NorthBuilt RAG System.
Overview
The system uses AWS CloudWatch for centralized monitoring, logging, and alerting across all components.
┌─────────────────────────────────────────────────────────────────┐
│ CloudWatch Dashboard │
│ - API Gateway Metrics │
│ - Lambda Performance │
│ - Bedrock Usage │
│ - Cost Tracking │
└─────────────────────────────────────────────────────────────────┘
│ │ │ │
┌────▼────┐ ┌────▼────┐ ┌────▼────┐ ┌────▼────┐
│ Logs │ │ Metrics │ │ Alarms │ │ Insights│
│ (7-day) │ │(Custom) │ │ (SNS) │ │ (Query) │
└─────────┘ └─────────┘ └─────────┘ └─────────┘
Key Metrics
API Gateway Metrics
| Metric | Description | Good | Warning | Critical |
|---|---|---|---|---|
| 4xxError | Client errors | <5% | 5-10% | >10% |
| 5xxError | Server errors | <0.1% | 0.1-1% | >1% |
| Latency | p95 response time | <2s | 2-5s | >5s |
| Count | Total requests | N/A | N/A | >1000/min |
CloudWatch Query:
# Get error rate for last hour
aws cloudwatch get-metric-statistics \
--namespace AWS/ApiGateway \
--metric-name 4XXError \
--dimensions Name=ApiName,Value=nb-rag-sys-api \
--start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%S) \
--end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
--period 300 \
--statistics Sum
Lambda Metrics
| Metric | Function | Good | Warning | Critical |
|---|---|---|---|---|
| Duration | Chat | <3s | 3-5s | >5s |
| Duration | Classification | <2s | 2-5s | >5s |
| Duration | Webhooks | <5s | 5-10s | >10s |
| Errors | All | <0.1% | 0.1-1% | >1% |
| Throttles | All | 0 | 1-10 | >10 |
| ConcurrentExecutions | Chat | <5 | 5-8 | >8 |
CloudWatch Query:
# Get average duration for Chat Lambda
aws cloudwatch get-metric-statistics \
--namespace AWS/Lambda \
--metric-name Duration \
--dimensions Name=FunctionName,Value=nb-rag-sys-chat \
--start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%S) \
--end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
--period 300 \
--statistics Average
Bedrock Metrics
| Metric | Description | Good | Warning | Critical |
|---|---|---|---|---|
| Invocations | Model calls | N/A | N/A | >1000/hour |
| ModelInvocationLatency | Response time | <2s | 2-5s | >5s |
| ModelInvocationClientErrors | 4xx errors | 0 | 1-10 | >10 |
| ModelInvocationServerErrors | 5xx errors | 0 | 1-5 | >5 |
CloudWatch Query:
# Get Bedrock invocations
aws cloudwatch get-metric-statistics \
--namespace AWS/Bedrock \
--metric-name Invocations \
--dimensions Name=ModelId,Value=us.anthropic.claude-sonnet-4-5-20250929-v1:0 \
--start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%S) \
--end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
--period 3600 \
--statistics Sum
Custom Application Metrics
The RAG system emits custom CloudWatch metrics via the RAGMetrics class in lambda/shared/utils/metrics.py.
Namespace: RAG/Retrieval
Metrics Emitted:
| Metric | Unit | Description |
|---|---|---|
RetrievalLatencyMs |
Milliseconds | Time for Bedrock KB vector search |
CandidatesRetrieved |
Count | Raw results from vector search |
ResultsAfterFilter |
Count | Results after post-filtering |
FilterEffectiveness |
None (0.0-1.0) | Ratio of filtered results |
Errors |
Count | Error counts by type |
LLMGenerationLatencyMs |
Milliseconds | Time for Bedrock LLM response |
LLMInputTokens |
Count | Input tokens used |
LLMOutputTokens |
Count | Output tokens generated |
Dimensions:
HasFilter- Whether client filter was applied (true/false)RerankingEnabled- Whether reranking was enabled (true/false)ErrorType- Type of error (e.g.,RetrievalError,LLMError)
Usage in Lambda:
from shared.utils.metrics import RAGMetrics
metrics = RAGMetrics(cloudwatch_client)
# Record retrieval metrics
metrics.record_retrieval(
latency_ms=150.5,
candidates_retrieved=15,
results_after_filter=5,
has_client_filter=True,
reranking_enabled=False
)
# Record LLM generation metrics
metrics.record_llm_generation(
latency_ms=2500,
input_tokens=500,
output_tokens=200
)
# Record errors
metrics.record_error('RetrievalError')
Query Custom Metrics:
# Get retrieval latency (last hour)
aws cloudwatch get-metric-statistics \
--namespace RAG/Retrieval \
--metric-name RetrievalLatencyMs \
--start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%S) \
--end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
--period 300 \
--statistics Average,Maximum
# Get filter effectiveness
aws cloudwatch get-metric-statistics \
--namespace RAG/Retrieval \
--metric-name FilterEffectiveness \
--dimensions Name=HasFilter,Value=true \
--start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%S) \
--end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
--period 300 \
--statistics Average
Ingestion Metrics (RAG/Ingestion Namespace)
The ingestion pipeline emits custom CloudWatch metrics via multiple classes in lambda/shared/utils/metrics.py.
Namespace: RAG/Ingestion
IngestionMetrics
Tracks webhook events, document ingestion, and sync job completion.
| Metric | Unit | Description |
|---|---|---|
WebhooksReceived |
Count | Webhook events received by Source and Success |
WebhookProcessingLatencyMs |
Milliseconds | Time to process a webhook event |
DocumentsIngested |
Count | Documents saved to S3 by Source, Category, SourceType |
SyncJobsCompleted |
Count | Sync operations completed by Source and Completed |
SyncDurationSeconds |
Seconds | Total sync job duration |
ItemsSynced |
Count | New items added from sync jobs |
ItemsSkipped |
Count | Already existing items skipped |
ItemsFailed |
Count | Items that failed to sync |
SyncAPICallsTotal |
Count | External API calls made during sync |
SyncS3SavesTotal |
Count | S3 save operations during sync |
SyncProcessingRate |
Count/Second | Items processed per second |
IngestionErrors |
Count | Errors by Source, ErrorType, SourceType |
Dimensions:
Source- Source system (fathom,helpscout)Success- Whether operation succeeded (true/false)Category- Document category (meeting-transcript,customer-conversation,issue)SourceType- Ingestion trigger (webhookorpolling)ErrorType- Type of error (ValidationError,S3Error,APIError, etc.)Completed- Whether sync ran to completion (true/false)
Usage in Lambda:
from shared.utils.metrics import IngestionMetrics
metrics = IngestionMetrics(cloudwatch_client)
# Record webhook received (success)
metrics.record_webhook_received(
source='fathom',
processed_successfully=True,
latency_ms=250.5
)
# Record webhook received (failure with error type)
metrics.record_webhook_received(
source='fathom',
processed_successfully=False,
latency_ms=100.0,
error_type='ValidationError'
)
# Record document ingested
metrics.record_document_ingested(
source='fathom',
category='meeting-transcript',
source_type='webhook'
)
# Record sync completion with worker stats
metrics.record_sync_completed(
source='fathom',
completed=True,
items_synced=25,
items_skipped=100,
items_failed=2,
duration_seconds=45.5,
api_calls=150,
s3_saves=25
)
ClassificationMetrics
Tracks client/project classification operations.
| Metric | Unit | Description |
|---|---|---|
ClassificationsTotal |
Count | Classification attempts by Source and Success |
ClassificationLatencyMs |
Milliseconds | DynamoDB lookup time |
ClassificationMatched |
Count | Whether a match was found (1 or 0) |
ClassificationErrors |
Count | Errors by Source and ErrorType |
Dimensions:
Source- Source system (fathom,helpscout)Success- Whether classification completed (true/false)ErrorType- Type of error (ConfigurationError,ValidationError,StrategyError)
Usage in Lambda:
from shared.utils.metrics import ClassificationMetrics
metrics = ClassificationMetrics(cloudwatch_client)
# Record successful classification
metrics.record_classification(
source='fathom',
completed=True,
latency_ms=50.0,
match_found=True
)
# Record failed classification with error type
metrics.record_classification(
source='fathom',
completed=False,
latency_ms=25.0,
error_type='StrategyError'
)
OrchestratorMetrics
Tracks sync handler invocations and worker Lambda invocation status.
| Metric | Unit | Description |
|---|---|---|
SyncHandlerInvocations |
Count | Handler invocations by Source and Success |
SyncHandlerErrors |
Count | Errors by Source and ErrorType |
Dimensions:
Source- Source system (fathom,helpscout)Success- Whether worker was invoked successfully (true/false)ErrorType- Type of error (ConfigurationError,LambdaInvokeError,HandlerError)
Usage in Lambda:
from shared.utils.metrics import OrchestratorMetrics
metrics = OrchestratorMetrics(cloudwatch_client)
# Record successful handler invocation
metrics.record_handler_invocation(
source='fathom',
invoked_successfully=True
)
# Record failed handler invocation
metrics.record_handler_invocation(
source='fathom',
invoked_successfully=False,
error_type='LambdaInvokeError'
)
KBIngestionMetrics
Tracks Bedrock Knowledge Base ingestion jobs (re-indexing operations).
| Metric | Unit | Description |
|---|---|---|
IngestionJobStarted |
Count | KB ingestion job started successfully |
IngestionJobAlreadyRunning |
Count | Job skipped because one is already running |
IngestionJobErrors |
Count | Errors by ErrorType |
Dimensions:
ErrorType- Type of error (BedrockAPIError,ConfigurationError,HandlerError)
Usage in Lambda:
from shared.utils.metrics import KBIngestionMetrics
metrics = KBIngestionMetrics(cloudwatch_client)
# Record successful job start
metrics.record_job_started()
# Record job already running (expected, not an error)
metrics.record_job_already_running()
# Record error
metrics.record_error('BedrockAPIError')
Query Ingestion Metrics:
# Get webhook success rate by source
aws cloudwatch get-metric-statistics \
--namespace RAG/Ingestion \
--metric-name WebhooksReceived \
--dimensions Name=Source,Value=fathom Name=Success,Value=true \
--start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%S) \
--end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
--period 300 \
--statistics Sum
# Get documents ingested by source
aws cloudwatch get-metric-statistics \
--namespace RAG/Ingestion \
--metric-name DocumentsIngested \
--dimensions Name=Source,Value=fathom Name=SourceType,Value=webhook \
--start-time $(date -u -d '24 hours ago' +%Y-%m-%dT%H:%M:%S) \
--end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
--period 3600 \
--statistics Sum
# Get classification match rate
aws cloudwatch get-metric-statistics \
--namespace RAG/Ingestion \
--metric-name ClassificationMatched \
--dimensions Name=Source,Value=fathom \
--start-time $(date -u -d '24 hours ago' +%Y-%m-%dT%H:%M:%S) \
--end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
--period 3600 \
--statistics Sum,Average
# Get sync worker performance
aws cloudwatch get-metric-statistics \
--namespace RAG/Ingestion \
--metric-name SyncProcessingRate \
--dimensions Name=Source,Value=fathom \
--start-time $(date -u -d '7 days ago' +%Y-%m-%dT%H:%M:%S) \
--end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
--period 86400 \
--statistics Average,Maximum
CloudWatch Dashboards
Main Dashboard
Create Dashboard:
aws cloudwatch put-dashboard --dashboard-name nb-rag-sys-main --dashboard-body file://dashboard.json
Dashboard JSON (dashboard.json):
{
"widgets": [
{
"type": "metric",
"properties": {
"title": "API Gateway Requests",
"region": "us-east-1",
"metrics": [
["AWS/ApiGateway", "Count", {"stat": "Sum", "label": "Total Requests"}],
[".", "4XXError", {"stat": "Sum", "label": "4xx Errors"}],
[".", "5XXError", {"stat": "Sum", "label": "5xx Errors"}]
],
"period": 300,
"yAxis": {"left": {"min": 0}}
}
},
{
"type": "metric",
"properties": {
"title": "Lambda Duration (Chat)",
"region": "us-east-1",
"metrics": [
["AWS/Lambda", "Duration", {"stat": "Average", "label": "Average"}],
["...", {"stat": "p95", "label": "p95"}],
["...", {"stat": "Maximum", "label": "Maximum"}]
],
"period": 300,
"yAxis": {"left": {"min": 0, "max": 10000}}
}
},
{
"type": "metric",
"properties": {
"title": "Lambda Errors",
"region": "us-east-1",
"metrics": [
["AWS/Lambda", "Errors", {"stat": "Sum", "label": "Chat"}, {"dimensions": {"FunctionName": "nb-rag-sys-chat"}}],
["...", {"dimensions": {"FunctionName": "nb-rag-sys"}}],
["...", {"dimensions": {"FunctionName": "nb-rag-sys-fathom-webhook"}}]
],
"period": 300,
"yAxis": {"left": {"min": 0}}
}
},
{
"type": "metric",
"properties": {
"title": "Bedrock Invocations",
"region": "us-east-1",
"metrics": [
["AWS/Bedrock", "Invocations", {"stat": "Sum", "label": "Claude Sonnet 4.5"}],
[".", "ModelInvocationLatency", {"stat": "Average", "label": "Latency (ms)"}]
],
"period": 3600,
"yAxis": {"left": {"min": 0}}
}
},
{
"type": "log",
"properties": {
"title": "Recent Errors",
"region": "us-east-1",
"query": "SOURCE '/aws/lambda/nb-rag-sys-chat'\n| fields @timestamp, @message\n| filter @message like /ERROR/\n| sort @timestamp desc\n| limit 20"
}
}
]
}
Cost Dashboard
Track Costs by Service:
{
"widgets": [
{
"type": "metric",
"properties": {
"title": "Estimated Monthly Cost",
"region": "us-east-1",
"metrics": [
["AWS/Billing", "EstimatedCharges", {"stat": "Maximum"}]
],
"period": 21600,
"yAxis": {"left": {"min": 0}}
}
},
{
"type": "metric",
"properties": {
"title": "Cost by Service",
"region": "us-east-1",
"metrics": [
["AWS/Billing", "EstimatedCharges", {"stat": "Maximum"}, {"dimensions": {"ServiceName": "AWS Lambda"}}],
["...", {"dimensions": {"ServiceName": "Amazon Bedrock"}}],
["...", {"dimensions": {"ServiceName": "Amazon API Gateway"}}],
["...", {"dimensions": {"ServiceName": "Amazon S3"}}],
["...", {"dimensions": {"ServiceName": "Amazon DynamoDB"}}]
],
"period": 21600,
"yAxis": {"left": {"min": 0}}
}
}
]
}
CloudWatch Alarms
Critical Alarms
High Error Rate
API Gateway 5xx Errors:
resource "aws_cloudwatch_metric_alarm" "api_5xx_errors" {
alarm_name = "nb-rag-sys-api-5xx-errors"
comparison_operator = "GreaterThanThreshold"
evaluation_periods = "2"
metric_name = "5XXError"
namespace = "AWS/ApiGateway"
period = "300"
statistic = "Sum"
threshold = "5"
alarm_description = "Alert when API Gateway returns 5 or more 5xx errors in 5 minutes"
alarm_actions = [aws_sns_topic.alerts.arn]
dimensions = {
ApiName = "nb-rag-sys-api"
}
}
Lambda Errors:
resource "aws_cloudwatch_metric_alarm" "lambda_errors" {
alarm_name = "nb-rag-sys-lambda-errors"
comparison_operator = "GreaterThanThreshold"
evaluation_periods = "2"
metric_name = "Errors"
namespace = "AWS/Lambda"
period = "300"
statistic = "Sum"
threshold = "10"
alarm_description = "Alert when Lambda function has 10+ errors in 5 minutes"
alarm_actions = [aws_sns_topic.alerts.arn]
dimensions = {
FunctionName = "nb-rag-sys-chat"
}
}
High Latency
API Gateway Latency:
resource "aws_cloudwatch_metric_alarm" "api_latency" {
alarm_name = "nb-rag-sys-api-latency"
comparison_operator = "GreaterThanThreshold"
evaluation_periods = "2"
metric_name = "Latency"
namespace = "AWS/ApiGateway"
period = "300"
statistic = "Average"
threshold = "5000" # 5 seconds
alarm_description = "Alert when API Gateway latency exceeds 5 seconds"
alarm_actions = [aws_sns_topic.alerts.arn]
dimensions = {
ApiName = "nb-rag-sys-api"
}
}
Lambda Duration:
resource "aws_cloudwatch_metric_alarm" "lambda_duration" {
alarm_name = "nb-rag-sys-lambda-duration"
comparison_operator = "GreaterThanThreshold"
evaluation_periods = "2"
metric_name = "Duration"
namespace = "AWS/Lambda"
period = "300"
statistic = "Average"
threshold = "10000" # 10 seconds
alarm_description = "Alert when Lambda duration exceeds 10 seconds"
alarm_actions = [aws_sns_topic.alerts.arn]
dimensions = {
FunctionName = "nb-rag-sys-chat"
}
}
Lambda Throttling
resource "aws_cloudwatch_metric_alarm" "lambda_throttles" {
alarm_name = "nb-rag-sys-lambda-throttles"
comparison_operator = "GreaterThanThreshold"
evaluation_periods = "1"
metric_name = "Throttles"
namespace = "AWS/Lambda"
period = "300"
statistic = "Sum"
threshold = "10"
alarm_description = "Alert when Lambda throttles occur"
alarm_actions = [aws_sns_topic.alerts.arn]
dimensions = {
FunctionName = "nb-rag-sys-chat"
}
}
Warning Alarms
High Cost
resource "aws_cloudwatch_metric_alarm" "high_cost" {
alarm_name = "nb-rag-sys-high-cost"
comparison_operator = "GreaterThanThreshold"
evaluation_periods = "1"
metric_name = "EstimatedCharges"
namespace = "AWS/Billing"
period = "21600" # 6 hours
statistic = "Maximum"
threshold = "200" # $200/month
alarm_description = "Alert when monthly cost exceeds $200"
alarm_actions = [aws_sns_topic.alerts.arn]
dimensions = {
Currency = "USD"
}
}
Low Traffic (Anomaly Detection)
resource "aws_cloudwatch_metric_alarm" "low_traffic" {
alarm_name = "nb-rag-sys-low-traffic"
comparison_operator = "LessThanLowerThreshold"
evaluation_periods = "2"
threshold_metric_id = "ad1"
alarm_description = "Alert when traffic drops below expected baseline"
alarm_actions = [aws_sns_topic.alerts.arn]
metric_query {
id = "m1"
return_data = true
metric {
metric_name = "Count"
namespace = "AWS/ApiGateway"
period = "300"
stat = "Sum"
dimensions = {
ApiName = "nb-rag-sys-api"
}
}
}
metric_query {
id = "ad1"
expression = "ANOMALY_DETECTION_BAND(m1, 2)"
label = "Traffic (expected)"
return_data = true
}
}
SNS Topic for Alerts
resource "aws_sns_topic" "alerts" {
name = "nb-rag-sys-alerts"
}
resource "aws_sns_topic_subscription" "email" {
topic_arn = aws_sns_topic.alerts.arn
protocol = "email"
endpoint = "alerts@yourcompany.com"
}
resource "aws_sns_topic_subscription" "slack" {
topic_arn = aws_sns_topic.alerts.arn
protocol = "https"
endpoint = "https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK"
}
CloudWatch Logs
Log Groups
| Lambda Function | Log Group | Retention |
|---|---|---|
| Chat | /aws/lambda/nb-rag-sys-chat |
7 days |
| Classification | /aws/lambda/nb-rag-sys |
7 days |
| Ingest | /aws/lambda/nb-rag-sys-ingest |
7 days |
| Fathom Webhook | /aws/lambda/nb-rag-sys-fathom-webhook |
7 days |
| HelpScout Webhook | /aws/lambda/nb-rag-sys-helpscout-webhook |
7 days |
Log Queries
Find All Errors
fields @timestamp, @message
| filter @message like /ERROR/
| sort @timestamp desc
| limit 100
Find Slow Requests
fields @timestamp, @message, @duration
| filter @duration > 5000
| sort @duration desc
| limit 20
Count Errors by Function
fields @log as log_group
| filter @message like /ERROR/
| stats count() by log_group
| sort count desc
Track User Queries
fields @timestamp, @message
| filter @message like /Query:/
| parse @message 'Query: *' as query
| display @timestamp, query
| sort @timestamp desc
| limit 50
Bedrock Token Usage
fields @timestamp, @message
| filter @message like /Bedrock invocation/
| parse @message 'input_tokens=* output_tokens=*' as input, output
| stats sum(input) as total_input, sum(output) as total_output by bin(5m)
Structured Logging
Lambda Logger Setup:
import json
import logging
from datetime import datetime
logger = logging.getLogger()
logger.setLevel(logging.INFO)
def log_structured(level, message, **kwargs):
log_entry = {
'timestamp': datetime.utcnow().isoformat(),
'level': level,
'message': message,
**kwargs
}
logger.log(getattr(logging, level), json.dumps(log_entry))
def handler(event, context):
request_id = context.request_id
user_id = event['requestContext']['authorizer']['claims']['sub']
log_structured('INFO', 'Chat request received',
request_id=request_id,
user_id=user_id)
try:
result = process_request(event)
log_structured('INFO', 'Chat request completed',
request_id=request_id,
user_id=user_id,
duration_ms=context.get_remaining_time_in_millis())
return result
except Exception as e:
log_structured('ERROR', 'Chat request failed',
request_id=request_id,
user_id=user_id,
error=str(e))
raise
Query Structured Logs:
fields @timestamp, @message
| parse @message '{"timestamp": "*", "level": "*", "message": "*", "request_id": "*", "user_id": "*"}'
as timestamp, level, message, request_id, user_id
| filter level = "ERROR"
| stats count() by user_id
Distributed Tracing (Optional)
AWS X-Ray Integration
Enable X-Ray in Lambda:
resource "aws_lambda_function" "chat" {
tracing_config {
mode = "Active"
}
}
Instrument Lambda Code:
from aws_xray_sdk.core import xray_recorder
from aws_xray_sdk.core import patch_all
# Patch AWS SDK calls
patch_all()
def handler(event, context):
# Subsegment for Bedrock Knowledge Base retrieval
with xray_recorder.capture('bedrock_kb_retrieve'):
retrieval_result = bedrock_agent.retrieve(
knowledgeBaseId=KNOWLEDGE_BASE_ID,
retrievalQuery={'text': query}
)
# Subsegment for Bedrock LLM inference
with xray_recorder.capture('bedrock_inference'):
response = bedrock.invoke_model(
modelId='us.anthropic.claude-sonnet-4-5-20250929-v1:0',
body=json.dumps(prompt)
)
return response
View Traces:
# Get trace IDs for slow requests
aws xray get-trace-summaries \
--start-time $(date -u -d '1 hour ago' +%s) \
--end-time $(date -u +%s) \
--filter-expression 'duration > 5'
Service Map:
- Visualize request flow: API Gateway → Lambda → Bedrock
- Identify bottlenecks
- Track downstream dependencies
Performance Monitoring
Lambda Performance Tuning
Memory vs Duration Tradeoff:
# Test different memory settings
for mem in 512 1024 1536 2048; do
aws lambda update-function-configuration \
--function-name nb-rag-sys-chat \
--memory-size $mem
# Wait for update
sleep 10
# Invoke and measure
time aws lambda invoke --function-name nb-rag-sys-chat /dev/null
done
Cold Start Monitoring:
fields @timestamp, @message, @initDuration
| filter @initDuration > 1000
| stats count() as cold_starts, avg(@initDuration) as avg_cold_start_ms
| sort @timestamp desc
Provisioned Concurrency Analysis:
fields @timestamp, @message, @duration
| filter @message like /INIT_START/
| stats count() as invocations, sum(@duration) as total_init_time_ms
| display invocations, total_init_time_ms
Bedrock Performance
Track Token Usage:
from shared.utils.metrics import RAGMetrics
metrics = RAGMetrics(cloudwatch_client)
# Record LLM generation metrics (includes token counts)
metrics.record_llm_generation(
latency_ms=2500,
input_tokens=input_tokens,
output_tokens=output_tokens
)
Latency Breakdown:
fields @timestamp, @message
| filter @message like /Bedrock/
| parse @message 'embedding_time=*ms retrieval_time=*ms inference_time=*ms'
as embed_ms, retrieval_ms, inference_ms
| stats avg(embed_ms) as avg_embed, avg(retrieval_ms) as avg_retrieval, avg(inference_ms) as avg_inference
Health Checks
API Health Endpoint
Create Health Check Lambda:
def health_check_handler(event, context):
checks = {}
# Check Bedrock connectivity
try:
bedrock.list_foundation_models()
checks['bedrock'] = 'ok'
except Exception as e:
checks['bedrock'] = f'error: {str(e)}'
# Check Knowledge Base connectivity
try:
bedrock_agent.get_knowledge_base(knowledgeBaseId='[kb-id]')
checks['knowledge_base'] = 'ok'
except Exception as e:
checks['knowledge_base'] = f'error: {str(e)}'
# Check DynamoDB connectivity
try:
dynamodb.describe_table(TableName='nb-rag-sys')
checks['dynamodb'] = 'ok'
except Exception as e:
checks['dynamodb'] = f'error: {str(e)}'
all_healthy = all(v == 'ok' for v in checks.values())
return {
'statusCode': 200 if all_healthy else 503,
'body': json.dumps({
'status': 'healthy' if all_healthy else 'degraded',
'checks': checks,
'timestamp': datetime.utcnow().isoformat()
})
}
Route53 Health Check (optional):
resource "aws_route53_health_check" "api" {
fqdn = "api.yourdomain.com"
port = 443
type = "HTTPS"
resource_path = "/health"
failure_threshold = "3"
request_interval = "30"
tags = {
Name = "nb-rag-sys-api-health"
}
}
Synthetic Monitoring
CloudWatch Synthetics Canary:
resource "aws_synthetics_canary" "api" {
name = "nb-rag-sys-api-canary"
artifact_s3_location = "s3://${aws_s3_bucket.canary_artifacts.bucket}"
execution_role_arn = aws_iam_role.canary.arn
handler = "index.handler"
zip_file = "canary.zip"
runtime_version = "syn-python-selenium-1.0"
schedule {
expression = "rate(5 minutes)"
}
}
Canary Script (canary.py):
from aws_synthetics.selenium import synthetics_webdriver as webdriver
from aws_synthetics.common import synthetics_logger as logger
def main():
driver = webdriver.Chrome()
driver.get("https://yourdomain.com")
# Wait for page load
driver.implicitly_wait(10)
# Check for key elements
assert "NorthBuilt" in driver.title
assert driver.find_element_by_id("chat-input")
logger.info("Page loaded successfully")
driver.quit()
def handler(event, context):
return main()
Last updated: 2026-01-16