Monitoring & Observability
Comprehensive guide to monitoring the NorthBuilt RAG System.
Overview
The system uses AWS CloudWatch for centralized monitoring, logging, and alerting across all components.
┌─────────────────────────────────────────────────────────────────┐
│ CloudWatch Dashboard │
│ - API Gateway Metrics │
│ - Lambda Performance │
│ - Bedrock Usage │
│ - Cost Tracking │
└─────────────────────────────────────────────────────────────────┘
│ │ │ │
┌────▼────┐ ┌────▼────┐ ┌────▼────┐ ┌────▼────┐
│ Logs │ │ Metrics │ │ Alarms │ │ Insights│
│ (7-day) │ │(Custom) │ │ (SNS) │ │ (Query) │
└─────────┘ └─────────┘ └─────────┘ └─────────┘
Key Metrics
API Gateway Metrics
| Metric | Description | Good | Warning | Critical |
|---|---|---|---|---|
| 4xxError | Client errors | <5% | 5-10% | >10% |
| 5xxError | Server errors | <0.1% | 0.1-1% | >1% |
| Latency | p95 response time | <2s | 2-5s | >5s |
| Count | Total requests | N/A | N/A | >1000/min |
CloudWatch Query:
# Get error rate for last hour
aws cloudwatch get-metric-statistics \
--namespace AWS/ApiGateway \
--metric-name 4XXError \
--dimensions Name=ApiName,Value=nb-rag-sys-api \
--start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%S) \
--end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
--period 300 \
--statistics Sum
Lambda Metrics
| Metric | Function | Good | Warning | Critical |
|---|---|---|---|---|
| Duration | Chat | <3s | 3-5s | >5s |
| Duration | Classify | <2s | 2-5s | >5s |
| Duration | Webhooks | <5s | 5-10s | >10s |
| Errors | All | <0.1% | 0.1-1% | >1% |
| Throttles | All | 0 | 1-10 | >10 |
| ConcurrentExecutions | Chat | <5 | 5-8 | >8 |
CloudWatch Query:
# Get average duration for Chat Lambda
aws cloudwatch get-metric-statistics \
--namespace AWS/Lambda \
--metric-name Duration \
--dimensions Name=FunctionName,Value=nb-rag-sys-chat \
--start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%S) \
--end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
--period 300 \
--statistics Average
Bedrock Metrics
| Metric | Description | Good | Warning | Critical |
|---|---|---|---|---|
| Invocations | Model calls | N/A | N/A | >1000/hour |
| ModelInvocationLatency | Response time | <2s | 2-5s | >5s |
| ModelInvocationClientErrors | 4xx errors | 0 | 1-10 | >10 |
| ModelInvocationServerErrors | 5xx errors | 0 | 1-5 | >5 |
CloudWatch Query:
# Get Bedrock invocations
aws cloudwatch get-metric-statistics \
--namespace AWS/Bedrock \
--metric-name Invocations \
--dimensions Name=ModelId,Value=us.anthropic.claude-sonnet-4-5-20250929-v1:0 \
--start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%S) \
--end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
--period 3600 \
--statistics Sum
Custom Application Metrics
Publishing Custom Metrics (in Lambda):
import boto3
from datetime import datetime
cloudwatch = boto3.client('cloudwatch')
def publish_metric(metric_name, value, unit='None', dimensions=None):
cloudwatch.put_metric_data(
Namespace='NorthBuilt/RAG',
MetricData=[
{
'MetricName': metric_name,
'Value': value,
'Unit': unit,
'Timestamp': datetime.utcnow(),
'Dimensions': dimensions or []
}
]
)
# Example usage in Chat Lambda
def handler(event, context):
start_time = time.time()
# Process request...
result = process_chat_request(event)
# Publish metrics
duration = time.time() - start_time
publish_metric('ChatRequestDuration', duration, 'Seconds')
publish_metric('ChatRequestSuccess', 1 if result else 0, 'Count')
publish_metric('RetrievedDocuments', len(result.get('sources', [])), 'Count')
return result
Custom Metrics to Track:
ChatRequestDuration- End-to-end chat request timeQueryRetrievalTime- Time to retrieve documentsBedrockInferenceTime- Time for Bedrock responseRetrievedDocuments- Number of documents retrievedCacheHitRate- Query cache hit percentage (if implemented)
CloudWatch Dashboards
Main Dashboard
Create Dashboard:
aws cloudwatch put-dashboard --dashboard-name nb-rag-sys-main --dashboard-body file://dashboard.json
Dashboard JSON (dashboard.json):
{
"widgets": [
{
"type": "metric",
"properties": {
"title": "API Gateway Requests",
"region": "us-east-1",
"metrics": [
["AWS/ApiGateway", "Count", {"stat": "Sum", "label": "Total Requests"}],
[".", "4XXError", {"stat": "Sum", "label": "4xx Errors"}],
[".", "5XXError", {"stat": "Sum", "label": "5xx Errors"}]
],
"period": 300,
"yAxis": {"left": {"min": 0}}
}
},
{
"type": "metric",
"properties": {
"title": "Lambda Duration (Chat)",
"region": "us-east-1",
"metrics": [
["AWS/Lambda", "Duration", {"stat": "Average", "label": "Average"}],
["...", {"stat": "p95", "label": "p95"}],
["...", {"stat": "Maximum", "label": "Maximum"}]
],
"period": 300,
"yAxis": {"left": {"min": 0, "max": 10000}}
}
},
{
"type": "metric",
"properties": {
"title": "Lambda Errors",
"region": "us-east-1",
"metrics": [
["AWS/Lambda", "Errors", {"stat": "Sum", "label": "Chat"}, {"dimensions": {"FunctionName": "nb-rag-sys-chat"}}],
["...", {"dimensions": {"FunctionName": "nb-rag-sys-classify"}}],
["...", {"dimensions": {"FunctionName": "nb-rag-sys-fathom-webhook"}}]
],
"period": 300,
"yAxis": {"left": {"min": 0}}
}
},
{
"type": "metric",
"properties": {
"title": "Bedrock Invocations",
"region": "us-east-1",
"metrics": [
["AWS/Bedrock", "Invocations", {"stat": "Sum", "label": "Claude Sonnet 4.5"}],
[".", "ModelInvocationLatency", {"stat": "Average", "label": "Latency (ms)"}]
],
"period": 3600,
"yAxis": {"left": {"min": 0}}
}
},
{
"type": "log",
"properties": {
"title": "Recent Errors",
"region": "us-east-1",
"query": "SOURCE '/aws/lambda/nb-rag-sys-chat'\n| fields @timestamp, @message\n| filter @message like /ERROR/\n| sort @timestamp desc\n| limit 20"
}
}
]
}
Cost Dashboard
Track Costs by Service:
{
"widgets": [
{
"type": "metric",
"properties": {
"title": "Estimated Monthly Cost",
"region": "us-east-1",
"metrics": [
["AWS/Billing", "EstimatedCharges", {"stat": "Maximum"}]
],
"period": 21600,
"yAxis": {"left": {"min": 0}}
}
},
{
"type": "metric",
"properties": {
"title": "Cost by Service",
"region": "us-east-1",
"metrics": [
["AWS/Billing", "EstimatedCharges", {"stat": "Maximum"}, {"dimensions": {"ServiceName": "AWS Lambda"}}],
["...", {"dimensions": {"ServiceName": "Amazon Bedrock"}}],
["...", {"dimensions": {"ServiceName": "Amazon API Gateway"}}],
["...", {"dimensions": {"ServiceName": "Amazon S3"}}],
["...", {"dimensions": {"ServiceName": "Amazon DynamoDB"}}]
],
"period": 21600,
"yAxis": {"left": {"min": 0}}
}
}
]
}
CloudWatch Alarms
Critical Alarms
High Error Rate
API Gateway 5xx Errors:
resource "aws_cloudwatch_metric_alarm" "api_5xx_errors" {
alarm_name = "nb-rag-sys-api-5xx-errors"
comparison_operator = "GreaterThanThreshold"
evaluation_periods = "2"
metric_name = "5XXError"
namespace = "AWS/ApiGateway"
period = "300"
statistic = "Sum"
threshold = "5"
alarm_description = "Alert when API Gateway returns 5 or more 5xx errors in 5 minutes"
alarm_actions = [aws_sns_topic.alerts.arn]
dimensions = {
ApiName = "nb-rag-sys-api"
}
}
Lambda Errors:
resource "aws_cloudwatch_metric_alarm" "lambda_errors" {
alarm_name = "nb-rag-sys-lambda-errors"
comparison_operator = "GreaterThanThreshold"
evaluation_periods = "2"
metric_name = "Errors"
namespace = "AWS/Lambda"
period = "300"
statistic = "Sum"
threshold = "10"
alarm_description = "Alert when Lambda function has 10+ errors in 5 minutes"
alarm_actions = [aws_sns_topic.alerts.arn]
dimensions = {
FunctionName = "nb-rag-sys-chat"
}
}
High Latency
API Gateway Latency:
resource "aws_cloudwatch_metric_alarm" "api_latency" {
alarm_name = "nb-rag-sys-api-latency"
comparison_operator = "GreaterThanThreshold"
evaluation_periods = "2"
metric_name = "Latency"
namespace = "AWS/ApiGateway"
period = "300"
statistic = "Average"
threshold = "5000" # 5 seconds
alarm_description = "Alert when API Gateway latency exceeds 5 seconds"
alarm_actions = [aws_sns_topic.alerts.arn]
dimensions = {
ApiName = "nb-rag-sys-api"
}
}
Lambda Duration:
resource "aws_cloudwatch_metric_alarm" "lambda_duration" {
alarm_name = "nb-rag-sys-lambda-duration"
comparison_operator = "GreaterThanThreshold"
evaluation_periods = "2"
metric_name = "Duration"
namespace = "AWS/Lambda"
period = "300"
statistic = "Average"
threshold = "10000" # 10 seconds
alarm_description = "Alert when Lambda duration exceeds 10 seconds"
alarm_actions = [aws_sns_topic.alerts.arn]
dimensions = {
FunctionName = "nb-rag-sys-chat"
}
}
Lambda Throttling
resource "aws_cloudwatch_metric_alarm" "lambda_throttles" {
alarm_name = "nb-rag-sys-lambda-throttles"
comparison_operator = "GreaterThanThreshold"
evaluation_periods = "1"
metric_name = "Throttles"
namespace = "AWS/Lambda"
period = "300"
statistic = "Sum"
threshold = "10"
alarm_description = "Alert when Lambda throttles occur"
alarm_actions = [aws_sns_topic.alerts.arn]
dimensions = {
FunctionName = "nb-rag-sys-chat"
}
}
Warning Alarms
High Cost
resource "aws_cloudwatch_metric_alarm" "high_cost" {
alarm_name = "nb-rag-sys-high-cost"
comparison_operator = "GreaterThanThreshold"
evaluation_periods = "1"
metric_name = "EstimatedCharges"
namespace = "AWS/Billing"
period = "21600" # 6 hours
statistic = "Maximum"
threshold = "200" # $200/month
alarm_description = "Alert when monthly cost exceeds $200"
alarm_actions = [aws_sns_topic.alerts.arn]
dimensions = {
Currency = "USD"
}
}
Low Traffic (Anomaly Detection)
resource "aws_cloudwatch_metric_alarm" "low_traffic" {
alarm_name = "nb-rag-sys-low-traffic"
comparison_operator = "LessThanLowerThreshold"
evaluation_periods = "2"
threshold_metric_id = "ad1"
alarm_description = "Alert when traffic drops below expected baseline"
alarm_actions = [aws_sns_topic.alerts.arn]
metric_query {
id = "m1"
return_data = true
metric {
metric_name = "Count"
namespace = "AWS/ApiGateway"
period = "300"
stat = "Sum"
dimensions = {
ApiName = "nb-rag-sys-api"
}
}
}
metric_query {
id = "ad1"
expression = "ANOMALY_DETECTION_BAND(m1, 2)"
label = "Traffic (expected)"
return_data = true
}
}
SNS Topic for Alerts
resource "aws_sns_topic" "alerts" {
name = "nb-rag-sys-alerts"
}
resource "aws_sns_topic_subscription" "email" {
topic_arn = aws_sns_topic.alerts.arn
protocol = "email"
endpoint = "alerts@yourcompany.com"
}
resource "aws_sns_topic_subscription" "slack" {
topic_arn = aws_sns_topic.alerts.arn
protocol = "https"
endpoint = "https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK"
}
CloudWatch Logs
Log Groups
| Lambda Function | Log Group | Retention |
|---|---|---|
| Chat | /aws/lambda/nb-rag-sys-chat |
7 days |
| Classify | /aws/lambda/nb-rag-sys-classify |
7 days |
| Ingest | /aws/lambda/nb-rag-sys-ingest |
7 days |
| Fathom Webhook | /aws/lambda/nb-rag-sys-fathom-webhook |
7 days |
| HelpScout Webhook | /aws/lambda/nb-rag-sys-helpscout-webhook |
7 days |
| Linear Webhook | /aws/lambda/nb-rag-sys-linear-webhook |
7 days |
| Linear Sync | /aws/lambda/nb-rag-sys-linear-sync |
7 days |
Log Queries
Find All Errors
fields @timestamp, @message
| filter @message like /ERROR/
| sort @timestamp desc
| limit 100
Find Slow Requests
fields @timestamp, @message, @duration
| filter @duration > 5000
| sort @duration desc
| limit 20
Count Errors by Function
fields @log as log_group
| filter @message like /ERROR/
| stats count() by log_group
| sort count desc
Track User Queries
fields @timestamp, @message
| filter @message like /Query:/
| parse @message 'Query: *' as query
| display @timestamp, query
| sort @timestamp desc
| limit 50
Bedrock Token Usage
fields @timestamp, @message
| filter @message like /Bedrock invocation/
| parse @message 'input_tokens=* output_tokens=*' as input, output
| stats sum(input) as total_input, sum(output) as total_output by bin(5m)
Structured Logging
Lambda Logger Setup:
import json
import logging
from datetime import datetime
logger = logging.getLogger()
logger.setLevel(logging.INFO)
def log_structured(level, message, **kwargs):
log_entry = {
'timestamp': datetime.utcnow().isoformat(),
'level': level,
'message': message,
**kwargs
}
logger.log(getattr(logging, level), json.dumps(log_entry))
def handler(event, context):
request_id = context.request_id
user_id = event['requestContext']['authorizer']['claims']['sub']
log_structured('INFO', 'Chat request received',
request_id=request_id,
user_id=user_id)
try:
result = process_request(event)
log_structured('INFO', 'Chat request completed',
request_id=request_id,
user_id=user_id,
duration_ms=context.get_remaining_time_in_millis())
return result
except Exception as e:
log_structured('ERROR', 'Chat request failed',
request_id=request_id,
user_id=user_id,
error=str(e))
raise
Query Structured Logs:
fields @timestamp, @message
| parse @message '{"timestamp": "*", "level": "*", "message": "*", "request_id": "*", "user_id": "*"}'
as timestamp, level, message, request_id, user_id
| filter level = "ERROR"
| stats count() by user_id
Distributed Tracing (Optional)
AWS X-Ray Integration
Enable X-Ray in Lambda:
resource "aws_lambda_function" "chat" {
tracing_config {
mode = "Active"
}
}
Instrument Lambda Code:
from aws_xray_sdk.core import xray_recorder
from aws_xray_sdk.core import patch_all
# Patch AWS SDK calls
patch_all()
def handler(event, context):
# Subsegment for Bedrock Knowledge Base retrieval
with xray_recorder.capture('bedrock_kb_retrieve'):
retrieval_result = bedrock_agent.retrieve(
knowledgeBaseId=KNOWLEDGE_BASE_ID,
retrievalQuery={'text': query}
)
# Subsegment for Bedrock LLM inference
with xray_recorder.capture('bedrock_inference'):
response = bedrock.invoke_model(
modelId='us.anthropic.claude-sonnet-4-5-20250929-v1:0',
body=json.dumps(prompt)
)
return response
View Traces:
# Get trace IDs for slow requests
aws xray get-trace-summaries \
--start-time $(date -u -d '1 hour ago' +%s) \
--end-time $(date -u +%s) \
--filter-expression 'duration > 5'
Service Map:
- Visualize request flow: API Gateway → Lambda → Bedrock
- Identify bottlenecks
- Track downstream dependencies
Performance Monitoring
Lambda Performance Tuning
Memory vs Duration Tradeoff:
# Test different memory settings
for mem in 512 1024 1536 2048; do
aws lambda update-function-configuration \
--function-name nb-rag-sys-chat \
--memory-size $mem
# Wait for update
sleep 10
# Invoke and measure
time aws lambda invoke --function-name nb-rag-sys-chat /dev/null
done
Cold Start Monitoring:
fields @timestamp, @message, @initDuration
| filter @initDuration > 1000
| stats count() as cold_starts, avg(@initDuration) as avg_cold_start_ms
| sort @timestamp desc
Provisioned Concurrency Analysis:
fields @timestamp, @message, @duration
| filter @message like /INIT_START/
| stats count() as invocations, sum(@duration) as total_init_time_ms
| display invocations, total_init_time_ms
Bedrock Performance
Track Token Usage:
def track_bedrock_usage(input_tokens, output_tokens):
cloudwatch.put_metric_data(
Namespace='NorthBuilt/RAG/Bedrock',
MetricData=[
{'MetricName': 'InputTokens', 'Value': input_tokens, 'Unit': 'Count'},
{'MetricName': 'OutputTokens', 'Value': output_tokens, 'Unit': 'Count'},
{'MetricName': 'TotalCost', 'Value': (input_tokens * 0.000003 + output_tokens * 0.000015), 'Unit': 'None'}
]
)
Latency Breakdown:
fields @timestamp, @message
| filter @message like /Bedrock/
| parse @message 'embedding_time=*ms retrieval_time=*ms inference_time=*ms'
as embed_ms, retrieval_ms, inference_ms
| stats avg(embed_ms) as avg_embed, avg(retrieval_ms) as avg_retrieval, avg(inference_ms) as avg_inference
Health Checks
API Health Endpoint
Create Health Check Lambda:
def health_check_handler(event, context):
checks = {}
# Check Bedrock connectivity
try:
bedrock.list_foundation_models()
checks['bedrock'] = 'ok'
except Exception as e:
checks['bedrock'] = f'error: {str(e)}'
# Check Knowledge Base connectivity
try:
bedrock_agent.get_knowledge_base(knowledgeBaseId='[kb-id]')
checks['knowledge_base'] = 'ok'
except Exception as e:
checks['knowledge_base'] = f'error: {str(e)}'
# Check DynamoDB connectivity
try:
dynamodb.describe_table(TableName='nb-rag-sys-classify')
checks['dynamodb'] = 'ok'
except Exception as e:
checks['dynamodb'] = f'error: {str(e)}'
all_healthy = all(v == 'ok' for v in checks.values())
return {
'statusCode': 200 if all_healthy else 503,
'body': json.dumps({
'status': 'healthy' if all_healthy else 'degraded',
'checks': checks,
'timestamp': datetime.utcnow().isoformat()
})
}
Route53 Health Check (optional):
resource "aws_route53_health_check" "api" {
fqdn = "api.yourdomain.com"
port = 443
type = "HTTPS"
resource_path = "/health"
failure_threshold = "3"
request_interval = "30"
tags = {
Name = "nb-rag-sys-api-health"
}
}
Synthetic Monitoring
CloudWatch Synthetics Canary:
resource "aws_synthetics_canary" "api" {
name = "nb-rag-sys-api-canary"
artifact_s3_location = "s3://${aws_s3_bucket.canary_artifacts.bucket}"
execution_role_arn = aws_iam_role.canary.arn
handler = "index.handler"
zip_file = "canary.zip"
runtime_version = "syn-python-selenium-1.0"
schedule {
expression = "rate(5 minutes)"
}
}
Canary Script (canary.py):
from aws_synthetics.selenium import synthetics_webdriver as webdriver
from aws_synthetics.common import synthetics_logger as logger
def main():
driver = webdriver.Chrome()
driver.get("https://yourdomain.com")
# Wait for page load
driver.implicitly_wait(10)
# Check for key elements
assert "NorthBuilt" in driver.title
assert driver.find_element_by_id("chat-input")
logger.info("Page loaded successfully")
driver.quit()
def handler(event, context):
return main()
Last updated: 2025-12-31