Security Architecture

Comprehensive security architecture and best practices for the NorthBuilt RAG System.

Security Overview

Defense in Depth Strategy

┌─────────────────────────────────────────────────────────────┐
│ Layer 7: Monitoring & Incident Response                     │
│  - CloudWatch Logs, Alarms                                  │
│  - AWS Config, CloudTrail                                   │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ Layer 6: Application Security                               │
│  - Input validation, Output encoding                        │
│  - Secure coding practices                                  │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ Layer 5: Data Security                                      │
│  - Encryption at rest (S3, DynamoDB, Secrets Manager)      │
│  - Encryption in transit (TLS 1.2+)                        │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ Layer 4: Access Control                                     │
│  - IAM policies (least privilege)                           │
│  - Cognito authentication                                   │
│  - API key validation for webhooks                          │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ Layer 3: Network Security                                   │
│  - HTTPS only (TLS 1.2+)                                   │
│  - API Gateway throttling                                   │
│  - CloudFront DDoS protection (AWS Shield)                 │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ Layer 2: Infrastructure Security                            │
│  - Serverless (no OS patching required)                    │
│  - Managed services (AWS responsibility)                   │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ Layer 1: Physical Security                                  │
│  - AWS data centers (ISO 27001, SOC 2, etc.)              │
└─────────────────────────────────────────────────────────────┘

Authentication & Authorization

User Authentication

Google OAuth 2.0 (via Cognito)

Flow:

1. User clicks "Sign in with Google"
2. Redirect to Cognito hosted UI:
   https://nb-rag-sys-auth.auth.us-east-1.amazoncognito.com/oauth2/authorize
3. Cognito redirects to Google OAuth
4. User authorizes scopes: openid, email, profile
5. Google returns authorization code
6. Cognito exchanges code for Google tokens
7. Cognito issues JWT tokens:
   - ID Token (user identity)
   - Access Token (API authorization)
   - Refresh Token (long-lived)
8. Client stores tokens in localStorage
9. Client includes ID token in Authorization header

Token Validation:

JWT signature verified using Cognito public keys (JWKS)
Token expiration checked (1 hour default)
Token issuer verified (Cognito user pool)
Token audience (client ID) verified

Security Configuration:

# Cognito User Pool
resource "aws_cognito_user_pool" "main" {
  password_policy {
    minimum_length                   = 12
    require_lowercase                = true
    require_numbers                  = true
    require_symbols                  = true
    require_uppercase                = true
    temporary_password_validity_days = 7
  }

  account_recovery_setting {
    recovery_mechanism {
      name     = "verified_email"
      priority = 1
    }
  }

  mfa_configuration = "OPTIONAL"

  user_attribute_update_settings {
    attributes_require_verification_before_update = ["email"]
  }
}

API Authorization

JWT Authorizer (for /chat endpoint)

API Gateway Configuration:

resource "aws_apigatewayv2_authorizer" "cognito" {
  api_id           = aws_apigatewayv2_api.main.id
  authorizer_type  = "JWT"
  identity_sources = ["$request.header.Authorization"]
  name             = "cognito-authorizer"

  jwt_configuration {
    audience = [aws_cognito_user_pool_client.web.id]
    issuer   = "https://cognito-idp.us-east-1.amazonaws.com/${aws_cognito_user_pool.main.id}"
  }
}

Request Validation:

Extract JWT from Authorization: Bearer <token> header
Verify signature using Cognito JWKS
Check expiration (exp claim)
Verify issuer (iss claim)
Verify audience (aud claim)
Extract user identity (sub claim) for logging

Error Responses:

Missing token: 401 Unauthorized
Invalid signature: 401 Unauthorized
Expired token: 401 Unauthorized
Invalid issuer/audience: 403 Forbidden

API Key Validation (for webhooks)

Lambda Validation Code (lambda/webhooks/*/handler.py):

import boto3
import json

secrets_manager = boto3.client('secretsmanager')

def validate_api_key(event, secret_name):
    # Extract API key from header or query parameter
    api_key = event.get('headers', {}).get('x-api-key') or \
              event.get('queryStringParameters', {}).get('api_key')

    if not api_key:
        return False, "Missing API key"

    # Fetch expected API key from Secrets Manager
    try:
        secret = secrets_manager.get_secret_value(SecretId=secret_name)
        expected_key = json.loads(secret['SecretString'])['api_key']
    except Exception as e:
        print(f"Error fetching secret: {e}")
        return False, "Internal error"

    # Constant-time comparison to prevent timing attacks
    if len(api_key) != len(expected_key):
        return False, "Invalid API key"

    result = 0
    for a, b in zip(api_key, expected_key):
        result |= ord(a) ^ ord(b)

    if result != 0:
        return False, "Invalid API key"

    return True, None

Webhook Security Best Practices:

Use HTTPS only (enforce in webhook configuration)
Validate API key before processing payload
Verify webhook signature if provider supports (e.g., HMAC)
Rate limit webhook endpoints
Log all webhook attempts for audit

IAM Policies

Principle of Least Privilege

Lambda Execution Role Example (Chat Lambda):

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "bedrock:InvokeModel"
      ],
      "Resource": [
        "arn:aws:bedrock:us-east-1::foundation-model/us.anthropic.claude-sonnet-4-5-20250929-v1:0",
        "arn:aws:bedrock:us-east-1::foundation-model/us.anthropic.claude-3-5-haiku-20241022-v1:0"
      ]
    },
    {
      "Effect": "Allow",
      "Action": [
        "bedrock:Retrieve"
      ],
      "Resource": [
        "arn:aws:bedrock:us-east-1:ACCOUNT_ID:knowledge-base/*"
      ]
    },
    {
      "Effect": "Allow",
      "Action": [
        "dynamodb:Query",
        "dynamodb:Scan"
      ],
      "Resource": [
        "arn:aws:dynamodb:us-east-1:ACCOUNT_ID:table/nb-rag-sys",
        "arn:aws:dynamodb:us-east-1:ACCOUNT_ID:table/nb-rag-sys/index/*"
      ]
    },
    {
      "Effect": "Allow",
      "Action": [
        "logs:CreateLogGroup",
        "logs:CreateLogStream",
        "logs:PutLogEvents"
      ],
      "Resource": [
        "arn:aws:logs:us-east-1:ACCOUNT_ID:log-group:/aws/lambda/nb-rag-sys-chat:*"
      ]
    }
  ]
}

Key Principles:

Specific resource ARNs (no wildcards)
Minimum required actions
Condition keys where applicable
Separate roles per function
No shared credentials

GitHub Actions OIDC Role

Trust Policy (allows GitHub to assume role):

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Federated": "arn:aws:iam::ACCOUNT_ID:oidc-provider/token.actions.githubusercontent.com"
      },
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": {
        "StringEquals": {
          "token.actions.githubusercontent.com:aud": "sts.amazonaws.com"
        },
        "StringLike": {
          "token.actions.githubusercontent.com:sub": "repo:craftcodery/compass:*"
        }
      }
    }
  ]
}

Permissions Policy:

Full Terraform deployment permissions
Read/write S3 (state, web assets)
Lambda create/update/delete
IAM role/policy management
Secrets Manager read (for Terraform variables)
CloudFront invalidation
API Gateway management

Security Features:

Short-lived credentials (~1 hour)
No long-lived access keys
Scoped to specific repository
Can restrict to specific branches via condition

Data Security

Encryption at Rest

Service	Encryption Method	Key Management	Notes
S3	AES-256 (SSE-S3)	AWS managed	Default for all buckets
DynamoDB	AES-256	AWS managed	Default encryption
Secrets Manager	AES-256	AWS KMS	Separate CMK per secret
Lambda Environment Variables	AES-256	AWS managed	Optional KMS for sensitive vars
CloudWatch Logs	AES-256	AWS managed	Encrypted by default
S3 Vectors	AES-256	AWS managed	Native Bedrock integration

Customer Managed Keys (Optional):

resource "aws_kms_key" "secrets" {
  description             = "KMS key for Secrets Manager"
  deletion_window_in_days = 30
  enable_key_rotation     = true

  tags = {
    Name = "nb-rag-sys-secrets"
  }
}

resource "aws_kms_alias" "secrets" {
  name          = "alias/nb-rag-sys-secrets"
  target_key_id = aws_kms_key.secrets.key_id
}

resource "aws_secretsmanager_secret" "fathom" {
  name       = "fathom-api-key"
  kms_key_id = aws_kms_key.secrets.arn
}

Benefits of CMK:

Full audit trail in CloudTrail
Key rotation policy control
Cross-account access control
Key deletion protection (30-day window)

Encryption in Transit

TLS Configuration:

Minimum version: TLS 1.2
Cipher suites: Modern, secure ciphers only
Certificate: AWS Certificate Manager (ACM)
Perfect Forward Secrecy (PFS): Enabled

CloudFront TLS Policy:

resource "aws_cloudfront_distribution" "web" {
  viewer_certificate {
    cloudfront_default_certificate = false
    acm_certificate_arn            = aws_acm_certificate.web.arn
    minimum_protocol_version       = "TLSv1.2_2021"
    ssl_support_method             = "sni-only"
  }
}

Internal Communication:

Lambda ↔ Bedrock: HTTPS (TLS 1.2+)
Lambda ↔ S3 Vectors: AWS internal (Bedrock managed)
Lambda ↔ DynamoDB: AWS SigV4 over HTTPS
Lambda ↔ Secrets Manager: AWS SigV4 over HTTPS

Secrets Management

AWS Secrets Manager

Secrets Stored:

fathom-api-key - Fathom API key
helpscout-api-key - HelpScout API key
google-oauth-client-secret - Google OAuth client secret

Access Control:

resource "aws_secretsmanager_secret_policy" "fathom" {
  secret_arn = aws_secretsmanager_secret.fathom.arn

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Effect = "Allow"
        Principal = {
          AWS = aws_iam_role.sync_lambda.arn
        }
        Action   = "secretsmanager:GetSecretValue"
        Resource = aws_secretsmanager_secret.fathom.arn
      }
    ]
  })
}

Rotation Strategy:

Manual rotation quarterly
Automated rotation (optional, requires Lambda rotation function)
Version tracking (old keys kept for 7 days during rotation)

Best Practices:

Never commit secrets to Git
Never log secrets
Use environment variables in Lambda
Audit secret access via CloudTrail
Enable automatic rotation for supported services

GitHub Secrets

Secrets Required:

AWS_ROLE_ARN - GitHub Actions OIDC role ARN
FATHOM_API_KEY - Fathom API key
GOOGLE_CLIENT_ID - Google OAuth client ID
GOOGLE_CLIENT_SECRET - Google OAuth client secret

Adding Secrets:

gh secret set AWS_ROLE_ARN --body "arn:aws:iam::ACCOUNT:role/GitHubActionsOIDCRole"
gh secret set FATHOM_API_KEY --body "..."

Security:

Secrets encrypted at rest by GitHub
Only accessible during workflow runs
Masked in logs
Scoped to repository
Can restrict to specific environments

Data Privacy

Personal Data Handling

Data Collected:

User email (from Google OAuth)
User name (from Google OAuth)
User profile picture URL (from Google OAuth)
Query history (optional, not currently stored)

Data Storage:

Cognito: Email, name, picture URL
CloudWatch Logs: User sub (UUID), not email
S3 Vectors: No personal data (only document content)
DynamoDB: No personal data

Data Retention:

Cognito: Indefinite (until user deletes account)
CloudWatch Logs: 7 days
S3 Vectors: Indefinite (until document deleted from S3)
DynamoDB: Indefinite (can enable TTL)

GDPR Compliance:

Right to access: User can request Cognito data export
Right to deletion: Delete Cognito user + CloudWatch logs
Right to portability: Export Cognito user attributes
Right to be forgotten: Delete all references to user sub

Data Anonymization:

Use Cognito sub (UUID) instead of email in logs
Avoid logging user queries (unless required for debugging)
Aggregate metrics only (no individual user tracking)

Network Security

API Gateway Protection

Rate Limiting

Throttling Configuration:

resource "aws_apigatewayv2_stage" "production" {
  api_id      = aws_apigatewayv2_api.main.id
  name        = "production"
  auto_deploy = true

  default_route_settings {
    throttling_rate_limit  = 10    # requests per second
    throttling_burst_limit = 20    # burst capacity
  }
}

Per-User Rate Limiting (optional):

# In Lambda
import time
from collections import defaultdict

# In-memory rate limiter (use DynamoDB for distributed)
request_counts = defaultdict(lambda: {"count": 0, "reset": 0})

def rate_limit(user_id, limit=100, window=3600):
    now = time.time()
    user_data = request_counts[user_id]

    if now > user_data["reset"]:
        user_data["count"] = 0
        user_data["reset"] = now + window

    user_data["count"] += 1

    if user_data["count"] > limit:
        return False, "Rate limit exceeded"

    return True, None

DDoS Protection

AWS Shield Standard:

Enabled by default on CloudFront and API Gateway
Protects against common DDoS attacks (SYN floods, UDP reflection)
No additional cost

AWS Shield Advanced (optional, $3000/month):

Advanced DDoS protection
24/7 DDoS Response Team (DRT)
Cost protection (no charges during DDoS)

CloudFront as DDoS Mitigation:

Absorbs traffic at edge locations
Geo-blocking (block traffic from specific countries)
Custom error pages (hide origin)

WAF Rules (Optional)

AWS WAF Configuration:

resource "aws_wafv2_web_acl" "api" {
  name  = "nb-rag-sys-api-acl"
  scope = "REGIONAL"

  default_action {
    allow {}
  }

  rule {
    name     = "RateLimitRule"
    priority = 1

    statement {
      rate_based_statement {
        limit              = 2000  # per 5 minutes
        aggregate_key_type = "IP"
      }
    }

    action {
      block {}
    }

    visibility_config {
      cloudwatch_metrics_enabled = true
      metric_name                = "RateLimitRule"
      sampled_requests_enabled   = true
    }
  }

  rule {
    name     = "AWSManagedRulesCommonRuleSet"
    priority = 2

    override_action {
      none {}
    }

    statement {
      managed_rule_group_statement {
        name        = "AWSManagedRulesCommonRuleSet"
        vendor_name = "AWS"
      }
    }

    visibility_config {
      cloudwatch_metrics_enabled = true
      metric_name                = "AWSManagedRulesCommonRuleSet"
      sampled_requests_enabled   = true
    }
  }

  visibility_config {
    cloudwatch_metrics_enabled = true
    metric_name                = "nb-rag-sys-api-acl"
    sampled_requests_enabled   = true
  }
}

WAF Rules to Consider:

Rate limiting (per IP)
Geo-blocking (block non-US traffic)
SQL injection protection
XSS protection
Known bad inputs (AWS managed rules)

CORS Configuration

API Gateway CORS:

resource "aws_apigatewayv2_api" "main" {
  cors_configuration {
    allow_origins     = ["https://yourdomain.com", "http://localhost:8080"]
    allow_methods     = ["POST", "OPTIONS"]
    allow_headers     = ["Authorization", "Content-Type"]
    max_age           = 300
    allow_credentials = true
  }
}

Security Considerations:

Whitelist specific origins (no wildcards in production)
Allow only required methods (POST for /chat, no GET)
Allow only required headers
Set max_age to reduce preflight requests
Enable credentials for cookie-based auth (if used)

Application Security

Input Validation

Lambda Input Validation:

import json
import re

def validate_chat_request(event):
    try:
        body = json.loads(event.get('body', '{}'))
    except json.JSONDecodeError:
        return False, "Invalid JSON"

    # Validate query
    query = body.get('query')
    if not query or not isinstance(query, str):
        return False, "Missing or invalid query"

    if len(query) > 1000:
        return False, "Query too long (max 1000 characters)"

    # Validate max_results
    max_results = body.get('max_results', 5)
    if not isinstance(max_results, int) or max_results < 1 or max_results > 10:
        return False, "Invalid max_results (must be 1-10)"

    return True, None

def handler(event, context):
    valid, error = validate_chat_request(event)
    if not valid:
        return {
            'statusCode': 400,
            'body': json.dumps({'error': error})
        }

    # Process request...

Validation Rules:

Validate all user inputs
Check data types (string, int, etc.)
Enforce length limits
Sanitize special characters
Reject unexpected fields

Output Encoding

Prevent XSS in Responses:

import html

def sanitize_response(text):
    # HTML encode special characters
    return html.escape(text)

def handler(event, context):
    # Generate response from Bedrock
    response = bedrock.invoke_model(...)

    # Sanitize before returning
    sanitized = sanitize_response(response['answer'])

    return {
        'statusCode': 200,
        'body': json.dumps({
            'answer': sanitized,
            'sources': [...]
        })
    }

Web UI Sanitization:

// Escape HTML before rendering
function escapeHtml(text) {
  const div = document.createElement('div');
  div.textContent = text;
  return div.innerHTML;
}

// Render response
function renderResponse(answer) {
  const escaped = escapeHtml(answer);
  document.getElementById('response').innerHTML = escaped;
}

Dependency Management

Python Dependencies (requirements.txt):

boto3==1.34.0          # AWS SDK
requests==2.32.0       # HTTP client
# Pin versions to avoid supply chain attacks

Security Scanning:

# Check for known vulnerabilities
pip install safety
safety check

# Update dependencies quarterly
pip list --outdated
pip install --upgrade boto3 requests

GitHub Dependabot:

# .github/dependabot.yml
version: 2
updates:
  - package-ecosystem: "pip"
    directory: "/lambda/chat"
    schedule:
      interval: "weekly"
    open-pull-requests-limit: 10

Monitoring & Incident Response

Security Monitoring

CloudTrail Logging

Enable CloudTrail:

resource "aws_cloudtrail" "main" {
  name                          = "nb-rag-sys-trail"
  s3_bucket_name                = aws_s3_bucket.cloudtrail.id
  include_global_service_events = true
  is_multi_region_trail         = true
  enable_log_file_validation    = true

  event_selector {
    read_write_type           = "All"
    include_management_events = true

    data_resource {
      type   = "AWS::S3::Object"
      values = ["${aws_s3_bucket.web.arn}/"]
    }

    data_resource {
      type   = "AWS::Lambda::Function"
      values = ["arn:aws:lambda:us-east-1:ACCOUNT_ID:function/*"]
    }
  }
}

Events to Monitor:

IAM policy changes
Security group changes
Secrets Manager access
Lambda function updates
S3 bucket policy changes

CloudWatch Alarms

Security Alarms:

resource "aws_cloudwatch_metric_alarm" "unauthorized_api_calls" {
  alarm_name          = "unauthorized-api-calls"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = "1"
  metric_name         = "UnauthorizedAPICalls"
  namespace           = "AWS/ApiGateway"
  period              = "300"
  statistic           = "Sum"
  threshold           = "10"
  alarm_description   = "Alert on unauthorized API calls"
  alarm_actions       = [aws_sns_topic.security_alerts.arn]
}

resource "aws_cloudwatch_metric_alarm" "lambda_errors" {
  alarm_name          = "lambda-error-rate"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = "2"
  metric_name         = "Errors"
  namespace           = "AWS/Lambda"
  period              = "300"
  statistic           = "Average"
  threshold           = "0.01"  # 1% error rate
  alarm_description   = "Alert on Lambda error rate spike"
  alarm_actions       = [aws_sns_topic.security_alerts.arn]
}

Incident Response Plan

Security Incident Playbook

1. Detection

CloudWatch alarm triggers
AWS GuardDuty finding (if enabled)
User reports suspicious activity
Abnormal CloudTrail events

2. Assessment (15 minutes)

Review CloudTrail logs for unauthorized activity
Check Lambda logs for anomalous invocations
Verify API Gateway access logs
Assess scope: single user, multiple users, or system-wide

3. Containment (30 minutes)

Disable compromised user accounts in Cognito
Rotate compromised API keys in Secrets Manager
Update Lambda environment variables
Enable API Gateway rate limiting (if not already)
Block malicious IPs via WAF (if enabled)

4. Eradication (1 hour)

Remove malicious code from Lambda functions
Delete unauthorized IAM roles/policies
Restore from known-good Terraform state
Patch vulnerabilities

5. Recovery (2 hours)

Redeploy infrastructure via Terraform
Restore data from backups (if needed)
Re-enable user access
Monitor for recurrence

6. Post-Incident (1 week)

Root cause analysis
Update security policies
Implement additional controls
Document lessons learned
Update incident response plan

Emergency Contacts

Security Lead: security@yourcompany.com
AWS Support: 1-877-736-5437 (Business/Enterprise support)
Incident Response Slack: #security-incidents

Compliance & Auditing

Compliance Frameworks

SOC 2 Type II

AWS Services Used (All SOC 2 Compliant):

Lambda, API Gateway, S3, DynamoDB, CloudFront
Bedrock, Cognito, Secrets Manager
CloudWatch, CloudTrail, KMS

Customer Responsibilities:

Access control (IAM policies)
Data encryption configuration
Logging and monitoring setup
Incident response procedures

Data Controller: Your company Data Processor: AWS, Google

Requirements Met:

[Supported] Right to access (Cognito export)
[Supported] Right to deletion (Delete Cognito user)
[Supported] Right to portability (JSON export)
[Supported] Data minimization (only email, name collected)
[Supported] Encryption at rest and in transit
[Supported] Data Processing Agreement (AWS DPA)

Requirements NOT Met:

[Not Implemented] Data residency (US-based by default, can deploy to EU)
[Not Implemented] Explicit consent tracking (need consent management)

Audit Logging

CloudTrail Query Examples:

Find all Secrets Manager accesses:

aws cloudtrail lookup-events --lookup-attributes AttributeKey=EventName,AttributeValue=GetSecretValue

Find all Lambda function updates:

aws cloudtrail lookup-events --lookup-attributes AttributeKey=EventName,AttributeValue=UpdateFunctionCode

Find all IAM policy changes:

aws cloudtrail lookup-events --lookup-attributes AttributeKey=EventName,AttributeValue=PutUserPolicy

Log Retention:

CloudTrail: 90 days in CloudWatch Logs, indefinite in S3
Lambda logs: 7 days
API Gateway logs: 7 days (optional)
VPC Flow Logs: Not enabled (Lambda is serverless)

Last updated: 2025-12-30