Security Architecture
Comprehensive security architecture and best practices for the NorthBuilt RAG System.
Security Overview
Defense in Depth Strategy
┌─────────────────────────────────────────────────────────────┐
│ Layer 7: Monitoring & Incident Response │
│ - CloudWatch Logs, Alarms │
│ - AWS Config, CloudTrail │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ Layer 6: Application Security │
│ - Input validation, Output encoding │
│ - Secure coding practices │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ Layer 5: Data Security │
│ - Encryption at rest (S3, DynamoDB, Secrets Manager) │
│ - Encryption in transit (TLS 1.2+) │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ Layer 4: Access Control │
│ - IAM policies (least privilege) │
│ - Cognito authentication │
│ - API key validation for webhooks │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ Layer 3: Network Security │
│ - HTTPS only (TLS 1.2+) │
│ - API Gateway throttling │
│ - CloudFront DDoS protection (AWS Shield) │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ Layer 2: Infrastructure Security │
│ - Serverless (no OS patching required) │
│ - Managed services (AWS responsibility) │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ Layer 1: Physical Security │
│ - AWS data centers (ISO 27001, SOC 2, etc.) │
└─────────────────────────────────────────────────────────────┘
Authentication & Authorization
User Authentication
Google OAuth 2.0 (via Cognito)
Flow:
1. User clicks "Sign in with Google"
2. Redirect to Cognito hosted UI:
https://nb-rag-sys-auth.auth.us-east-1.amazoncognito.com/oauth2/authorize
3. Cognito redirects to Google OAuth
4. User authorizes scopes: openid, email, profile
5. Google returns authorization code
6. Cognito exchanges code for Google tokens
7. Cognito issues JWT tokens:
- ID Token (user identity)
- Access Token (API authorization)
- Refresh Token (long-lived)
8. Client stores tokens in localStorage
9. Client includes ID token in Authorization header
Token Validation:
- JWT signature verified using Cognito public keys (JWKS)
- Token expiration checked (1 hour default)
- Token issuer verified (Cognito user pool)
- Token audience (client ID) verified
Security Configuration:
# Cognito User Pool
resource "aws_cognito_user_pool" "main" {
password_policy {
minimum_length = 12
require_lowercase = true
require_numbers = true
require_symbols = true
require_uppercase = true
temporary_password_validity_days = 7
}
account_recovery_setting {
recovery_mechanism {
name = "verified_email"
priority = 1
}
}
mfa_configuration = "OPTIONAL"
user_attribute_update_settings {
attributes_require_verification_before_update = ["email"]
}
}
API Authorization
JWT Authorizer (for /chat endpoint)
API Gateway Configuration:
resource "aws_apigatewayv2_authorizer" "cognito" {
api_id = aws_apigatewayv2_api.main.id
authorizer_type = "JWT"
identity_sources = ["$request.header.Authorization"]
name = "cognito-authorizer"
jwt_configuration {
audience = [aws_cognito_user_pool_client.web.id]
issuer = "https://cognito-idp.us-east-1.amazonaws.com/${aws_cognito_user_pool.main.id}"
}
}
Request Validation:
- Extract JWT from
Authorization: Bearer <token>header - Verify signature using Cognito JWKS
- Check expiration (exp claim)
- Verify issuer (iss claim)
- Verify audience (aud claim)
- Extract user identity (sub claim) for logging
Error Responses:
- Missing token:
401 Unauthorized - Invalid signature:
401 Unauthorized - Expired token:
401 Unauthorized - Invalid issuer/audience:
403 Forbidden
API Key Validation (for webhooks)
Lambda Validation Code (lambda/webhooks/*/handler.py):
import boto3
import json
secrets_manager = boto3.client('secretsmanager')
def validate_api_key(event, secret_name):
# Extract API key from header or query parameter
api_key = event.get('headers', {}).get('x-api-key') or \
event.get('queryStringParameters', {}).get('api_key')
if not api_key:
return False, "Missing API key"
# Fetch expected API key from Secrets Manager
try:
secret = secrets_manager.get_secret_value(SecretId=secret_name)
expected_key = json.loads(secret['SecretString'])['api_key']
except Exception as e:
print(f"Error fetching secret: {e}")
return False, "Internal error"
# Constant-time comparison to prevent timing attacks
if len(api_key) != len(expected_key):
return False, "Invalid API key"
result = 0
for a, b in zip(api_key, expected_key):
result |= ord(a) ^ ord(b)
if result != 0:
return False, "Invalid API key"
return True, None
Webhook Security Best Practices:
- Use HTTPS only (enforce in webhook configuration)
- Validate API key before processing payload
- Verify webhook signature if provider supports (e.g., HMAC)
- Rate limit webhook endpoints
- Log all webhook attempts for audit
IAM Policies
Principle of Least Privilege
Lambda Execution Role Example (Chat Lambda):
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"bedrock:InvokeModel"
],
"Resource": [
"arn:aws:bedrock:us-east-1::foundation-model/us.anthropic.claude-sonnet-4-5-20250929-v1:0",
"arn:aws:bedrock:us-east-1::foundation-model/us.anthropic.claude-3-5-haiku-20241022-v1:0"
]
},
{
"Effect": "Allow",
"Action": [
"bedrock:Retrieve"
],
"Resource": [
"arn:aws:bedrock:us-east-1:ACCOUNT_ID:knowledge-base/*"
]
},
{
"Effect": "Allow",
"Action": [
"dynamodb:Query",
"dynamodb:Scan"
],
"Resource": [
"arn:aws:dynamodb:us-east-1:ACCOUNT_ID:table/nb-rag-sys-classify",
"arn:aws:dynamodb:us-east-1:ACCOUNT_ID:table/nb-rag-sys-classify/index/*"
]
},
{
"Effect": "Allow",
"Action": [
"logs:CreateLogGroup",
"logs:CreateLogStream",
"logs:PutLogEvents"
],
"Resource": [
"arn:aws:logs:us-east-1:ACCOUNT_ID:log-group:/aws/lambda/nb-rag-sys-chat:*"
]
}
]
}
Key Principles:
- Specific resource ARNs (no wildcards)
- Minimum required actions
- Condition keys where applicable
- Separate roles per function
- No shared credentials
GitHub Actions OIDC Role
Trust Policy (allows GitHub to assume role):
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Federated": "arn:aws:iam::ACCOUNT_ID:oidc-provider/token.actions.githubusercontent.com"
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringEquals": {
"token.actions.githubusercontent.com:aud": "sts.amazonaws.com"
},
"StringLike": {
"token.actions.githubusercontent.com:sub": "repo:craftcodery/compass:*"
}
}
}
]
}
Permissions Policy:
- Full Terraform deployment permissions
- Read/write S3 (state, web assets)
- Lambda create/update/delete
- IAM role/policy management
- Secrets Manager read (for Terraform variables)
- CloudFront invalidation
- API Gateway management
Security Features:
- Short-lived credentials (~1 hour)
- No long-lived access keys
- Scoped to specific repository
- Can restrict to specific branches via condition
Data Security
Encryption at Rest
| Service | Encryption Method | Key Management | Notes |
|---|---|---|---|
| S3 | AES-256 (SSE-S3) | AWS managed | Default for all buckets |
| DynamoDB | AES-256 | AWS managed | Default encryption |
| Secrets Manager | AES-256 | AWS KMS | Separate CMK per secret |
| Lambda Environment Variables | AES-256 | AWS managed | Optional KMS for sensitive vars |
| CloudWatch Logs | AES-256 | AWS managed | Encrypted by default |
| S3 Vectors | AES-256 | AWS managed | Native Bedrock integration |
Customer Managed Keys (Optional):
resource "aws_kms_key" "secrets" {
description = "KMS key for Secrets Manager"
deletion_window_in_days = 30
enable_key_rotation = true
tags = {
Name = "nb-rag-sys-secrets"
}
}
resource "aws_kms_alias" "secrets" {
name = "alias/nb-rag-sys-secrets"
target_key_id = aws_kms_key.secrets.key_id
}
resource "aws_secretsmanager_secret" "fathom" {
name = "fathom-api-key"
kms_key_id = aws_kms_key.secrets.arn
}
Benefits of CMK:
- Full audit trail in CloudTrail
- Key rotation policy control
- Cross-account access control
- Key deletion protection (30-day window)
Encryption in Transit
TLS Configuration:
- Minimum version: TLS 1.2
- Cipher suites: Modern, secure ciphers only
- Certificate: AWS Certificate Manager (ACM)
- Perfect Forward Secrecy (PFS): Enabled
CloudFront TLS Policy:
resource "aws_cloudfront_distribution" "web" {
viewer_certificate {
cloudfront_default_certificate = false
acm_certificate_arn = aws_acm_certificate.web.arn
minimum_protocol_version = "TLSv1.2_2021"
ssl_support_method = "sni-only"
}
}
Internal Communication:
- Lambda ↔ Bedrock: HTTPS (TLS 1.2+)
- Lambda ↔ S3 Vectors: AWS internal (Bedrock managed)
- Lambda ↔ DynamoDB: AWS SigV4 over HTTPS
- Lambda ↔ Secrets Manager: AWS SigV4 over HTTPS
Secrets Management
AWS Secrets Manager
Secrets Stored:
fathom-api-key- Fathom API keyhelpscout-api-key- HelpScout API keylinear-api-key- Linear API keygoogle-oauth-client-secret- Google OAuth client secret
Access Control:
resource "aws_secretsmanager_secret_policy" "fathom" {
secret_arn = aws_secretsmanager_secret.fathom.arn
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Effect = "Allow"
Principal = {
AWS = aws_iam_role.sync_lambda.arn
}
Action = "secretsmanager:GetSecretValue"
Resource = aws_secretsmanager_secret.fathom.arn
}
]
})
}
Rotation Strategy:
- Manual rotation quarterly
- Automated rotation (optional, requires Lambda rotation function)
- Version tracking (old keys kept for 7 days during rotation)
Best Practices:
- Never commit secrets to Git
- Never log secrets
- Use environment variables in Lambda
- Audit secret access via CloudTrail
- Enable automatic rotation for supported services
GitHub Secrets
Secrets Required:
AWS_ROLE_ARN- GitHub Actions OIDC role ARNFATHOM_API_KEY- Fathom API keyGOOGLE_CLIENT_ID- Google OAuth client IDGOOGLE_CLIENT_SECRET- Google OAuth client secret
Adding Secrets:
gh secret set AWS_ROLE_ARN --body "arn:aws:iam::ACCOUNT:role/GitHubActionsOIDCRole"
gh secret set FATHOM_API_KEY --body "..."
Security:
- Secrets encrypted at rest by GitHub
- Only accessible during workflow runs
- Masked in logs
- Scoped to repository
- Can restrict to specific environments
Data Privacy
Personal Data Handling
Data Collected:
- User email (from Google OAuth)
- User name (from Google OAuth)
- User profile picture URL (from Google OAuth)
- Query history (optional, not currently stored)
Data Storage:
- Cognito: Email, name, picture URL
- CloudWatch Logs: User sub (UUID), not email
- S3 Vectors: No personal data (only document content)
- DynamoDB: No personal data
Data Retention:
- Cognito: Indefinite (until user deletes account)
- CloudWatch Logs: 7 days
- S3 Vectors: Indefinite (until document deleted from S3)
- DynamoDB: Indefinite (can enable TTL)
GDPR Compliance:
- Right to access: User can request Cognito data export
- Right to deletion: Delete Cognito user + CloudWatch logs
- Right to portability: Export Cognito user attributes
- Right to be forgotten: Delete all references to user sub
Data Anonymization:
- Use Cognito sub (UUID) instead of email in logs
- Avoid logging user queries (unless required for debugging)
- Aggregate metrics only (no individual user tracking)
Network Security
API Gateway Protection
Rate Limiting
Throttling Configuration:
resource "aws_apigatewayv2_stage" "production" {
api_id = aws_apigatewayv2_api.main.id
name = "production"
auto_deploy = true
default_route_settings {
throttling_rate_limit = 10 # requests per second
throttling_burst_limit = 20 # burst capacity
}
}
Per-User Rate Limiting (optional):
# In Lambda
import time
from collections import defaultdict
# In-memory rate limiter (use DynamoDB for distributed)
request_counts = defaultdict(lambda: {"count": 0, "reset": 0})
def rate_limit(user_id, limit=100, window=3600):
now = time.time()
user_data = request_counts[user_id]
if now > user_data["reset"]:
user_data["count"] = 0
user_data["reset"] = now + window
user_data["count"] += 1
if user_data["count"] > limit:
return False, "Rate limit exceeded"
return True, None
DDoS Protection
AWS Shield Standard:
- Enabled by default on CloudFront and API Gateway
- Protects against common DDoS attacks (SYN floods, UDP reflection)
- No additional cost
AWS Shield Advanced (optional, $3000/month):
- Advanced DDoS protection
- 24/7 DDoS Response Team (DRT)
- Cost protection (no charges during DDoS)
CloudFront as DDoS Mitigation:
- Absorbs traffic at edge locations
- Geo-blocking (block traffic from specific countries)
- Custom error pages (hide origin)
WAF Rules (Optional)
AWS WAF Configuration:
resource "aws_wafv2_web_acl" "api" {
name = "nb-rag-sys-api-acl"
scope = "REGIONAL"
default_action {
allow {}
}
rule {
name = "RateLimitRule"
priority = 1
statement {
rate_based_statement {
limit = 2000 # per 5 minutes
aggregate_key_type = "IP"
}
}
action {
block {}
}
visibility_config {
cloudwatch_metrics_enabled = true
metric_name = "RateLimitRule"
sampled_requests_enabled = true
}
}
rule {
name = "AWSManagedRulesCommonRuleSet"
priority = 2
override_action {
none {}
}
statement {
managed_rule_group_statement {
name = "AWSManagedRulesCommonRuleSet"
vendor_name = "AWS"
}
}
visibility_config {
cloudwatch_metrics_enabled = true
metric_name = "AWSManagedRulesCommonRuleSet"
sampled_requests_enabled = true
}
}
visibility_config {
cloudwatch_metrics_enabled = true
metric_name = "nb-rag-sys-api-acl"
sampled_requests_enabled = true
}
}
WAF Rules to Consider:
- Rate limiting (per IP)
- Geo-blocking (block non-US traffic)
- SQL injection protection
- XSS protection
- Known bad inputs (AWS managed rules)
CORS Configuration
API Gateway CORS:
resource "aws_apigatewayv2_api" "main" {
cors_configuration {
allow_origins = ["https://yourdomain.com", "http://localhost:8080"]
allow_methods = ["POST", "OPTIONS"]
allow_headers = ["Authorization", "Content-Type"]
max_age = 300
allow_credentials = true
}
}
Security Considerations:
- Whitelist specific origins (no wildcards in production)
- Allow only required methods (POST for /chat, no GET)
- Allow only required headers
- Set max_age to reduce preflight requests
- Enable credentials for cookie-based auth (if used)
Application Security
Input Validation
Lambda Input Validation:
import json
import re
def validate_chat_request(event):
try:
body = json.loads(event.get('body', '{}'))
except json.JSONDecodeError:
return False, "Invalid JSON"
# Validate query
query = body.get('query')
if not query or not isinstance(query, str):
return False, "Missing or invalid query"
if len(query) > 1000:
return False, "Query too long (max 1000 characters)"
# Validate max_results
max_results = body.get('max_results', 5)
if not isinstance(max_results, int) or max_results < 1 or max_results > 10:
return False, "Invalid max_results (must be 1-10)"
return True, None
def handler(event, context):
valid, error = validate_chat_request(event)
if not valid:
return {
'statusCode': 400,
'body': json.dumps({'error': error})
}
# Process request...
Validation Rules:
- Validate all user inputs
- Check data types (string, int, etc.)
- Enforce length limits
- Sanitize special characters
- Reject unexpected fields
Output Encoding
Prevent XSS in Responses:
import html
def sanitize_response(text):
# HTML encode special characters
return html.escape(text)
def handler(event, context):
# Generate response from Bedrock
response = bedrock.invoke_model(...)
# Sanitize before returning
sanitized = sanitize_response(response['answer'])
return {
'statusCode': 200,
'body': json.dumps({
'answer': sanitized,
'sources': [...]
})
}
Web UI Sanitization:
// Escape HTML before rendering
function escapeHtml(text) {
const div = document.createElement('div');
div.textContent = text;
return div.innerHTML;
}
// Render response
function renderResponse(answer) {
const escaped = escapeHtml(answer);
document.getElementById('response').innerHTML = escaped;
}
Dependency Management
Python Dependencies (requirements.txt):
boto3==1.34.0 # AWS SDK
requests==2.32.0 # HTTP client
# Pin versions to avoid supply chain attacks
Security Scanning:
# Check for known vulnerabilities
pip install safety
safety check
# Update dependencies quarterly
pip list --outdated
pip install --upgrade boto3 requests
GitHub Dependabot:
# .github/dependabot.yml
version: 2
updates:
- package-ecosystem: "pip"
directory: "/lambda/chat"
schedule:
interval: "weekly"
open-pull-requests-limit: 10
Monitoring & Incident Response
Security Monitoring
CloudTrail Logging
Enable CloudTrail:
resource "aws_cloudtrail" "main" {
name = "nb-rag-sys-trail"
s3_bucket_name = aws_s3_bucket.cloudtrail.id
include_global_service_events = true
is_multi_region_trail = true
enable_log_file_validation = true
event_selector {
read_write_type = "All"
include_management_events = true
data_resource {
type = "AWS::S3::Object"
values = ["${aws_s3_bucket.web.arn}/"]
}
data_resource {
type = "AWS::Lambda::Function"
values = ["arn:aws:lambda:us-east-1:ACCOUNT_ID:function/*"]
}
}
}
Events to Monitor:
- IAM policy changes
- Security group changes
- Secrets Manager access
- Lambda function updates
- S3 bucket policy changes
CloudWatch Alarms
Security Alarms:
resource "aws_cloudwatch_metric_alarm" "unauthorized_api_calls" {
alarm_name = "unauthorized-api-calls"
comparison_operator = "GreaterThanThreshold"
evaluation_periods = "1"
metric_name = "UnauthorizedAPICalls"
namespace = "AWS/ApiGateway"
period = "300"
statistic = "Sum"
threshold = "10"
alarm_description = "Alert on unauthorized API calls"
alarm_actions = [aws_sns_topic.security_alerts.arn]
}
resource "aws_cloudwatch_metric_alarm" "lambda_errors" {
alarm_name = "lambda-error-rate"
comparison_operator = "GreaterThanThreshold"
evaluation_periods = "2"
metric_name = "Errors"
namespace = "AWS/Lambda"
period = "300"
statistic = "Average"
threshold = "0.01" # 1% error rate
alarm_description = "Alert on Lambda error rate spike"
alarm_actions = [aws_sns_topic.security_alerts.arn]
}
Incident Response Plan
Security Incident Playbook
1. Detection
- CloudWatch alarm triggers
- AWS GuardDuty finding (if enabled)
- User reports suspicious activity
- Abnormal CloudTrail events
2. Assessment (15 minutes)
- Review CloudTrail logs for unauthorized activity
- Check Lambda logs for anomalous invocations
- Verify API Gateway access logs
- Assess scope: single user, multiple users, or system-wide
3. Containment (30 minutes)
- Disable compromised user accounts in Cognito
- Rotate compromised API keys in Secrets Manager
- Update Lambda environment variables
- Enable API Gateway rate limiting (if not already)
- Block malicious IPs via WAF (if enabled)
4. Eradication (1 hour)
- Remove malicious code from Lambda functions
- Delete unauthorized IAM roles/policies
- Restore from known-good Terraform state
- Patch vulnerabilities
5. Recovery (2 hours)
- Redeploy infrastructure via Terraform
- Restore data from backups (if needed)
- Re-enable user access
- Monitor for recurrence
6. Post-Incident (1 week)
- Root cause analysis
- Update security policies
- Implement additional controls
- Document lessons learned
- Update incident response plan
Emergency Contacts
Security Lead: security@yourcompany.com
AWS Support: 1-877-736-5437 (Business/Enterprise support)
Incident Response Slack: #security-incidents
Compliance & Auditing
Compliance Frameworks
SOC 2 Type II
AWS Services Used (All SOC 2 Compliant):
- Lambda, API Gateway, S3, DynamoDB, CloudFront
- Bedrock, Cognito, Secrets Manager
- CloudWatch, CloudTrail, KMS
Customer Responsibilities:
- Access control (IAM policies)
- Data encryption configuration
- Logging and monitoring setup
- Incident response procedures
GDPR
Data Controller: Your company Data Processor: AWS, Google
Requirements Met:
- [Supported] Right to access (Cognito export)
- [Supported] Right to deletion (Delete Cognito user)
- [Supported] Right to portability (JSON export)
- [Supported] Data minimization (only email, name collected)
- [Supported] Encryption at rest and in transit
- [Supported] Data Processing Agreement (AWS DPA)
Requirements NOT Met:
- [Not Implemented] Data residency (US-based by default, can deploy to EU)
- [Not Implemented] Explicit consent tracking (need consent management)
Audit Logging
CloudTrail Query Examples:
Find all Secrets Manager accesses:
aws cloudtrail lookup-events --lookup-attributes AttributeKey=EventName,AttributeValue=GetSecretValue
Find all Lambda function updates:
aws cloudtrail lookup-events --lookup-attributes AttributeKey=EventName,AttributeValue=UpdateFunctionCode
Find all IAM policy changes:
aws cloudtrail lookup-events --lookup-attributes AttributeKey=EventName,AttributeValue=PutUserPolicy
Log Retention:
- CloudTrail: 90 days in CloudWatch Logs, indefinite in S3
- Lambda logs: 7 days
- API Gateway logs: 7 days (optional)
- VPC Flow Logs: Not enabled (Lambda is serverless)
Last updated: 2025-12-30