Security Architecture

Comprehensive security architecture and best practices for the NorthBuilt RAG System.

Security Overview

Defense in Depth Strategy

┌─────────────────────────────────────────────────────────────┐
│ Layer 7: Monitoring & Incident Response                     │
│  - CloudWatch Logs, Alarms                                  │
│  - AWS Config, CloudTrail                                   │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ Layer 6: Application Security                               │
│  - Input validation, Output encoding                        │
│  - Secure coding practices                                  │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ Layer 5: Data Security                                      │
│  - Encryption at rest (S3, DynamoDB, Secrets Manager)      │
│  - Encryption in transit (TLS 1.2+)                        │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ Layer 4: Access Control                                     │
│  - IAM policies (least privilege)                           │
│  - Cognito authentication                                   │
│  - API key validation for webhooks                          │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ Layer 3: Network Security                                   │
│  - HTTPS only (TLS 1.2+)                                   │
│  - API Gateway throttling                                   │
│  - CloudFront DDoS protection (AWS Shield)                 │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ Layer 2: Infrastructure Security                            │
│  - Serverless (no OS patching required)                    │
│  - Managed services (AWS responsibility)                   │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ Layer 1: Physical Security                                  │
│  - AWS data centers (ISO 27001, SOC 2, etc.)              │
└─────────────────────────────────────────────────────────────┘

Authentication & Authorization

User Authentication

Google OAuth 2.0 (via Cognito)

Flow:

1. User clicks "Sign in with Google"
2. Redirect to Cognito hosted UI:
   https://nb-rag-sys-auth.auth.us-east-1.amazoncognito.com/oauth2/authorize
3. Cognito redirects to Google OAuth
4. User authorizes scopes: openid, email, profile
5. Google returns authorization code
6. Cognito exchanges code for Google tokens
7. Cognito issues JWT tokens:
   - ID Token (user identity)
   - Access Token (API authorization)
   - Refresh Token (long-lived)
8. Client stores tokens in localStorage
9. Client includes ID token in Authorization header

Token Validation:

  • JWT signature verified using Cognito public keys (JWKS)
  • Token expiration checked (1 hour default)
  • Token issuer verified (Cognito user pool)
  • Token audience (client ID) verified

Security Configuration:

# Cognito User Pool
resource "aws_cognito_user_pool" "main" {
  password_policy {
    minimum_length                   = 12
    require_lowercase                = true
    require_numbers                  = true
    require_symbols                  = true
    require_uppercase                = true
    temporary_password_validity_days = 7
  }

  account_recovery_setting {
    recovery_mechanism {
      name     = "verified_email"
      priority = 1
    }
  }

  mfa_configuration = "OPTIONAL"

  user_attribute_update_settings {
    attributes_require_verification_before_update = ["email"]
  }
}

API Authorization

JWT Authorizer (for /chat endpoint)

API Gateway Configuration:

resource "aws_apigatewayv2_authorizer" "cognito" {
  api_id           = aws_apigatewayv2_api.main.id
  authorizer_type  = "JWT"
  identity_sources = ["$request.header.Authorization"]
  name             = "cognito-authorizer"

  jwt_configuration {
    audience = [aws_cognito_user_pool_client.web.id]
    issuer   = "https://cognito-idp.us-east-1.amazonaws.com/${aws_cognito_user_pool.main.id}"
  }
}

Request Validation:

  1. Extract JWT from Authorization: Bearer <token> header
  2. Verify signature using Cognito JWKS
  3. Check expiration (exp claim)
  4. Verify issuer (iss claim)
  5. Verify audience (aud claim)
  6. Extract user identity (sub claim) for logging

Error Responses:

  • Missing token: 401 Unauthorized
  • Invalid signature: 401 Unauthorized
  • Expired token: 401 Unauthorized
  • Invalid issuer/audience: 403 Forbidden

API Key Validation (for webhooks)

Lambda Validation Code (lambda/webhooks/*/handler.py):

import boto3
import json

secrets_manager = boto3.client('secretsmanager')

def validate_api_key(event, secret_name):
    # Extract API key from header or query parameter
    api_key = event.get('headers', {}).get('x-api-key') or \
              event.get('queryStringParameters', {}).get('api_key')

    if not api_key:
        return False, "Missing API key"

    # Fetch expected API key from Secrets Manager
    try:
        secret = secrets_manager.get_secret_value(SecretId=secret_name)
        expected_key = json.loads(secret['SecretString'])['api_key']
    except Exception as e:
        print(f"Error fetching secret: {e}")
        return False, "Internal error"

    # Constant-time comparison to prevent timing attacks
    if len(api_key) != len(expected_key):
        return False, "Invalid API key"

    result = 0
    for a, b in zip(api_key, expected_key):
        result |= ord(a) ^ ord(b)

    if result != 0:
        return False, "Invalid API key"

    return True, None

Webhook Security Best Practices:

  • Use HTTPS only (enforce in webhook configuration)
  • Validate API key before processing payload
  • Verify webhook signature if provider supports (e.g., HMAC)
  • Rate limit webhook endpoints
  • Log all webhook attempts for audit

IAM Policies

Principle of Least Privilege

Lambda Execution Role Example (Chat Lambda):

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "bedrock:InvokeModel"
      ],
      "Resource": [
        "arn:aws:bedrock:us-east-1::foundation-model/us.anthropic.claude-sonnet-4-5-20250929-v1:0",
        "arn:aws:bedrock:us-east-1::foundation-model/us.anthropic.claude-3-5-haiku-20241022-v1:0"
      ]
    },
    {
      "Effect": "Allow",
      "Action": [
        "bedrock:Retrieve"
      ],
      "Resource": [
        "arn:aws:bedrock:us-east-1:ACCOUNT_ID:knowledge-base/*"
      ]
    },
    {
      "Effect": "Allow",
      "Action": [
        "dynamodb:Query",
        "dynamodb:Scan"
      ],
      "Resource": [
        "arn:aws:dynamodb:us-east-1:ACCOUNT_ID:table/nb-rag-sys-classify",
        "arn:aws:dynamodb:us-east-1:ACCOUNT_ID:table/nb-rag-sys-classify/index/*"
      ]
    },
    {
      "Effect": "Allow",
      "Action": [
        "logs:CreateLogGroup",
        "logs:CreateLogStream",
        "logs:PutLogEvents"
      ],
      "Resource": [
        "arn:aws:logs:us-east-1:ACCOUNT_ID:log-group:/aws/lambda/nb-rag-sys-chat:*"
      ]
    }
  ]
}

Key Principles:

  • Specific resource ARNs (no wildcards)
  • Minimum required actions
  • Condition keys where applicable
  • Separate roles per function
  • No shared credentials

GitHub Actions OIDC Role

Trust Policy (allows GitHub to assume role):

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Federated": "arn:aws:iam::ACCOUNT_ID:oidc-provider/token.actions.githubusercontent.com"
      },
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": {
        "StringEquals": {
          "token.actions.githubusercontent.com:aud": "sts.amazonaws.com"
        },
        "StringLike": {
          "token.actions.githubusercontent.com:sub": "repo:craftcodery/compass:*"
        }
      }
    }
  ]
}

Permissions Policy:

  • Full Terraform deployment permissions
  • Read/write S3 (state, web assets)
  • Lambda create/update/delete
  • IAM role/policy management
  • Secrets Manager read (for Terraform variables)
  • CloudFront invalidation
  • API Gateway management

Security Features:

  • Short-lived credentials (~1 hour)
  • No long-lived access keys
  • Scoped to specific repository
  • Can restrict to specific branches via condition

Data Security

Encryption at Rest

Service Encryption Method Key Management Notes
S3 AES-256 (SSE-S3) AWS managed Default for all buckets
DynamoDB AES-256 AWS managed Default encryption
Secrets Manager AES-256 AWS KMS Separate CMK per secret
Lambda Environment Variables AES-256 AWS managed Optional KMS for sensitive vars
CloudWatch Logs AES-256 AWS managed Encrypted by default
S3 Vectors AES-256 AWS managed Native Bedrock integration

Customer Managed Keys (Optional):

resource "aws_kms_key" "secrets" {
  description             = "KMS key for Secrets Manager"
  deletion_window_in_days = 30
  enable_key_rotation     = true

  tags = {
    Name = "nb-rag-sys-secrets"
  }
}

resource "aws_kms_alias" "secrets" {
  name          = "alias/nb-rag-sys-secrets"
  target_key_id = aws_kms_key.secrets.key_id
}

resource "aws_secretsmanager_secret" "fathom" {
  name       = "fathom-api-key"
  kms_key_id = aws_kms_key.secrets.arn
}

Benefits of CMK:

  • Full audit trail in CloudTrail
  • Key rotation policy control
  • Cross-account access control
  • Key deletion protection (30-day window)

Encryption in Transit

TLS Configuration:

  • Minimum version: TLS 1.2
  • Cipher suites: Modern, secure ciphers only
  • Certificate: AWS Certificate Manager (ACM)
  • Perfect Forward Secrecy (PFS): Enabled

CloudFront TLS Policy:

resource "aws_cloudfront_distribution" "web" {
  viewer_certificate {
    cloudfront_default_certificate = false
    acm_certificate_arn            = aws_acm_certificate.web.arn
    minimum_protocol_version       = "TLSv1.2_2021"
    ssl_support_method             = "sni-only"
  }
}

Internal Communication:

  • Lambda ↔ Bedrock: HTTPS (TLS 1.2+)
  • Lambda ↔ S3 Vectors: AWS internal (Bedrock managed)
  • Lambda ↔ DynamoDB: AWS SigV4 over HTTPS
  • Lambda ↔ Secrets Manager: AWS SigV4 over HTTPS

Secrets Management

AWS Secrets Manager

Secrets Stored:

  1. fathom-api-key - Fathom API key
  2. helpscout-api-key - HelpScout API key
  3. linear-api-key - Linear API key
  4. google-oauth-client-secret - Google OAuth client secret

Access Control:

resource "aws_secretsmanager_secret_policy" "fathom" {
  secret_arn = aws_secretsmanager_secret.fathom.arn

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Effect = "Allow"
        Principal = {
          AWS = aws_iam_role.sync_lambda.arn
        }
        Action   = "secretsmanager:GetSecretValue"
        Resource = aws_secretsmanager_secret.fathom.arn
      }
    ]
  })
}

Rotation Strategy:

  • Manual rotation quarterly
  • Automated rotation (optional, requires Lambda rotation function)
  • Version tracking (old keys kept for 7 days during rotation)

Best Practices:

  • Never commit secrets to Git
  • Never log secrets
  • Use environment variables in Lambda
  • Audit secret access via CloudTrail
  • Enable automatic rotation for supported services

GitHub Secrets

Secrets Required:

  • AWS_ROLE_ARN - GitHub Actions OIDC role ARN
  • FATHOM_API_KEY - Fathom API key
  • GOOGLE_CLIENT_ID - Google OAuth client ID
  • GOOGLE_CLIENT_SECRET - Google OAuth client secret

Adding Secrets:

gh secret set AWS_ROLE_ARN --body "arn:aws:iam::ACCOUNT:role/GitHubActionsOIDCRole"
gh secret set FATHOM_API_KEY --body "..."

Security:

  • Secrets encrypted at rest by GitHub
  • Only accessible during workflow runs
  • Masked in logs
  • Scoped to repository
  • Can restrict to specific environments

Data Privacy

Personal Data Handling

Data Collected:

  • User email (from Google OAuth)
  • User name (from Google OAuth)
  • User profile picture URL (from Google OAuth)
  • Query history (optional, not currently stored)

Data Storage:

  • Cognito: Email, name, picture URL
  • CloudWatch Logs: User sub (UUID), not email
  • S3 Vectors: No personal data (only document content)
  • DynamoDB: No personal data

Data Retention:

  • Cognito: Indefinite (until user deletes account)
  • CloudWatch Logs: 7 days
  • S3 Vectors: Indefinite (until document deleted from S3)
  • DynamoDB: Indefinite (can enable TTL)

GDPR Compliance:

  • Right to access: User can request Cognito data export
  • Right to deletion: Delete Cognito user + CloudWatch logs
  • Right to portability: Export Cognito user attributes
  • Right to be forgotten: Delete all references to user sub

Data Anonymization:

  • Use Cognito sub (UUID) instead of email in logs
  • Avoid logging user queries (unless required for debugging)
  • Aggregate metrics only (no individual user tracking)

Network Security

API Gateway Protection

Rate Limiting

Throttling Configuration:

resource "aws_apigatewayv2_stage" "production" {
  api_id      = aws_apigatewayv2_api.main.id
  name        = "production"
  auto_deploy = true

  default_route_settings {
    throttling_rate_limit  = 10    # requests per second
    throttling_burst_limit = 20    # burst capacity
  }
}

Per-User Rate Limiting (optional):

# In Lambda
import time
from collections import defaultdict

# In-memory rate limiter (use DynamoDB for distributed)
request_counts = defaultdict(lambda: {"count": 0, "reset": 0})

def rate_limit(user_id, limit=100, window=3600):
    now = time.time()
    user_data = request_counts[user_id]

    if now > user_data["reset"]:
        user_data["count"] = 0
        user_data["reset"] = now + window

    user_data["count"] += 1

    if user_data["count"] > limit:
        return False, "Rate limit exceeded"

    return True, None

DDoS Protection

AWS Shield Standard:

  • Enabled by default on CloudFront and API Gateway
  • Protects against common DDoS attacks (SYN floods, UDP reflection)
  • No additional cost

AWS Shield Advanced (optional, $3000/month):

  • Advanced DDoS protection
  • 24/7 DDoS Response Team (DRT)
  • Cost protection (no charges during DDoS)

CloudFront as DDoS Mitigation:

  • Absorbs traffic at edge locations
  • Geo-blocking (block traffic from specific countries)
  • Custom error pages (hide origin)

WAF Rules (Optional)

AWS WAF Configuration:

resource "aws_wafv2_web_acl" "api" {
  name  = "nb-rag-sys-api-acl"
  scope = "REGIONAL"

  default_action {
    allow {}
  }

  rule {
    name     = "RateLimitRule"
    priority = 1

    statement {
      rate_based_statement {
        limit              = 2000  # per 5 minutes
        aggregate_key_type = "IP"
      }
    }

    action {
      block {}
    }

    visibility_config {
      cloudwatch_metrics_enabled = true
      metric_name                = "RateLimitRule"
      sampled_requests_enabled   = true
    }
  }

  rule {
    name     = "AWSManagedRulesCommonRuleSet"
    priority = 2

    override_action {
      none {}
    }

    statement {
      managed_rule_group_statement {
        name        = "AWSManagedRulesCommonRuleSet"
        vendor_name = "AWS"
      }
    }

    visibility_config {
      cloudwatch_metrics_enabled = true
      metric_name                = "AWSManagedRulesCommonRuleSet"
      sampled_requests_enabled   = true
    }
  }

  visibility_config {
    cloudwatch_metrics_enabled = true
    metric_name                = "nb-rag-sys-api-acl"
    sampled_requests_enabled   = true
  }
}

WAF Rules to Consider:

  • Rate limiting (per IP)
  • Geo-blocking (block non-US traffic)
  • SQL injection protection
  • XSS protection
  • Known bad inputs (AWS managed rules)

CORS Configuration

API Gateway CORS:

resource "aws_apigatewayv2_api" "main" {
  cors_configuration {
    allow_origins     = ["https://yourdomain.com", "http://localhost:8080"]
    allow_methods     = ["POST", "OPTIONS"]
    allow_headers     = ["Authorization", "Content-Type"]
    max_age           = 300
    allow_credentials = true
  }
}

Security Considerations:

  • Whitelist specific origins (no wildcards in production)
  • Allow only required methods (POST for /chat, no GET)
  • Allow only required headers
  • Set max_age to reduce preflight requests
  • Enable credentials for cookie-based auth (if used)

Application Security

Input Validation

Lambda Input Validation:

import json
import re

def validate_chat_request(event):
    try:
        body = json.loads(event.get('body', '{}'))
    except json.JSONDecodeError:
        return False, "Invalid JSON"

    # Validate query
    query = body.get('query')
    if not query or not isinstance(query, str):
        return False, "Missing or invalid query"

    if len(query) > 1000:
        return False, "Query too long (max 1000 characters)"

    # Validate max_results
    max_results = body.get('max_results', 5)
    if not isinstance(max_results, int) or max_results < 1 or max_results > 10:
        return False, "Invalid max_results (must be 1-10)"

    return True, None

def handler(event, context):
    valid, error = validate_chat_request(event)
    if not valid:
        return {
            'statusCode': 400,
            'body': json.dumps({'error': error})
        }

    # Process request...

Validation Rules:

  • Validate all user inputs
  • Check data types (string, int, etc.)
  • Enforce length limits
  • Sanitize special characters
  • Reject unexpected fields

Output Encoding

Prevent XSS in Responses:

import html

def sanitize_response(text):
    # HTML encode special characters
    return html.escape(text)

def handler(event, context):
    # Generate response from Bedrock
    response = bedrock.invoke_model(...)

    # Sanitize before returning
    sanitized = sanitize_response(response['answer'])

    return {
        'statusCode': 200,
        'body': json.dumps({
            'answer': sanitized,
            'sources': [...]
        })
    }

Web UI Sanitization:

// Escape HTML before rendering
function escapeHtml(text) {
  const div = document.createElement('div');
  div.textContent = text;
  return div.innerHTML;
}

// Render response
function renderResponse(answer) {
  const escaped = escapeHtml(answer);
  document.getElementById('response').innerHTML = escaped;
}

Dependency Management

Python Dependencies (requirements.txt):

boto3==1.34.0          # AWS SDK
requests==2.32.0       # HTTP client
# Pin versions to avoid supply chain attacks

Security Scanning:

# Check for known vulnerabilities
pip install safety
safety check

# Update dependencies quarterly
pip list --outdated
pip install --upgrade boto3 requests

GitHub Dependabot:

# .github/dependabot.yml
version: 2
updates:
  - package-ecosystem: "pip"
    directory: "/lambda/chat"
    schedule:
      interval: "weekly"
    open-pull-requests-limit: 10

Monitoring & Incident Response

Security Monitoring

CloudTrail Logging

Enable CloudTrail:

resource "aws_cloudtrail" "main" {
  name                          = "nb-rag-sys-trail"
  s3_bucket_name                = aws_s3_bucket.cloudtrail.id
  include_global_service_events = true
  is_multi_region_trail         = true
  enable_log_file_validation    = true

  event_selector {
    read_write_type           = "All"
    include_management_events = true

    data_resource {
      type   = "AWS::S3::Object"
      values = ["${aws_s3_bucket.web.arn}/"]
    }

    data_resource {
      type   = "AWS::Lambda::Function"
      values = ["arn:aws:lambda:us-east-1:ACCOUNT_ID:function/*"]
    }
  }
}

Events to Monitor:

  • IAM policy changes
  • Security group changes
  • Secrets Manager access
  • Lambda function updates
  • S3 bucket policy changes

CloudWatch Alarms

Security Alarms:

resource "aws_cloudwatch_metric_alarm" "unauthorized_api_calls" {
  alarm_name          = "unauthorized-api-calls"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = "1"
  metric_name         = "UnauthorizedAPICalls"
  namespace           = "AWS/ApiGateway"
  period              = "300"
  statistic           = "Sum"
  threshold           = "10"
  alarm_description   = "Alert on unauthorized API calls"
  alarm_actions       = [aws_sns_topic.security_alerts.arn]
}

resource "aws_cloudwatch_metric_alarm" "lambda_errors" {
  alarm_name          = "lambda-error-rate"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = "2"
  metric_name         = "Errors"
  namespace           = "AWS/Lambda"
  period              = "300"
  statistic           = "Average"
  threshold           = "0.01"  # 1% error rate
  alarm_description   = "Alert on Lambda error rate spike"
  alarm_actions       = [aws_sns_topic.security_alerts.arn]
}

Incident Response Plan

Security Incident Playbook

1. Detection

  • CloudWatch alarm triggers
  • AWS GuardDuty finding (if enabled)
  • User reports suspicious activity
  • Abnormal CloudTrail events

2. Assessment (15 minutes)

  • Review CloudTrail logs for unauthorized activity
  • Check Lambda logs for anomalous invocations
  • Verify API Gateway access logs
  • Assess scope: single user, multiple users, or system-wide

3. Containment (30 minutes)

  • Disable compromised user accounts in Cognito
  • Rotate compromised API keys in Secrets Manager
  • Update Lambda environment variables
  • Enable API Gateway rate limiting (if not already)
  • Block malicious IPs via WAF (if enabled)

4. Eradication (1 hour)

  • Remove malicious code from Lambda functions
  • Delete unauthorized IAM roles/policies
  • Restore from known-good Terraform state
  • Patch vulnerabilities

5. Recovery (2 hours)

  • Redeploy infrastructure via Terraform
  • Restore data from backups (if needed)
  • Re-enable user access
  • Monitor for recurrence

6. Post-Incident (1 week)

  • Root cause analysis
  • Update security policies
  • Implement additional controls
  • Document lessons learned
  • Update incident response plan

Emergency Contacts

Security Lead: security@yourcompany.com
AWS Support: 1-877-736-5437 (Business/Enterprise support)
Incident Response Slack: #security-incidents

Compliance & Auditing

Compliance Frameworks

SOC 2 Type II

AWS Services Used (All SOC 2 Compliant):

  • Lambda, API Gateway, S3, DynamoDB, CloudFront
  • Bedrock, Cognito, Secrets Manager
  • CloudWatch, CloudTrail, KMS

Customer Responsibilities:

  • Access control (IAM policies)
  • Data encryption configuration
  • Logging and monitoring setup
  • Incident response procedures

GDPR

Data Controller: Your company Data Processor: AWS, Google

Requirements Met:

  • [Supported] Right to access (Cognito export)
  • [Supported] Right to deletion (Delete Cognito user)
  • [Supported] Right to portability (JSON export)
  • [Supported] Data minimization (only email, name collected)
  • [Supported] Encryption at rest and in transit
  • [Supported] Data Processing Agreement (AWS DPA)

Requirements NOT Met:

  • [Not Implemented] Data residency (US-based by default, can deploy to EU)
  • [Not Implemented] Explicit consent tracking (need consent management)

Audit Logging

CloudTrail Query Examples:

Find all Secrets Manager accesses:

aws cloudtrail lookup-events --lookup-attributes AttributeKey=EventName,AttributeValue=GetSecretValue

Find all Lambda function updates:

aws cloudtrail lookup-events --lookup-attributes AttributeKey=EventName,AttributeValue=UpdateFunctionCode

Find all IAM policy changes:

aws cloudtrail lookup-events --lookup-attributes AttributeKey=EventName,AttributeValue=PutUserPolicy

Log Retention:

  • CloudTrail: 90 days in CloudWatch Logs, indefinite in S3
  • Lambda logs: 7 days
  • API Gateway logs: 7 days (optional)
  • VPC Flow Logs: Not enabled (Lambda is serverless)

Last updated: 2025-12-30