System Architecture

Detailed technical architecture of the NorthBuilt RAG System.

Key AWS Services: This system is built on Amazon Bedrock Knowledge Bases with S3 Vectors for vector storage. For a complete list of AWS documentation references, see AWS Documentation References.

Architecture Overview

┌─────────────────────────────────────────────────────────────────────┐
│                           CloudFront CDN                             │
│                    (Global Edge Distribution)                        │
└──────────────────────┬──────────────────────────────────────────────┘
                       │
         ┌─────────────┴─────────────┐
         │                           │
    ┌────▼────┐              ┌───────▼────────┐
    │   S3    │              │  API Gateway   │
    │  (Web)  │              │   (REST API)   │
    └─────────┘              └───────┬────────┘
                                     │
                     ┌───────────────┼───────────────┐
                     │               │               │
              ┌──────▼─────┐  ┌─────▼──────┐  ┌────▼─────┐
              │  Cognito   │  │   Lambda    │  │  Lambda  │
              │   (Auth)   │  │   (Chat)    │  │ (Webhooks)│
              └────────────┘  └─────┬──────┘  └────┬─────┘
                                    │              │
                     ┌──────────────┼──────────────┘
                     │              │
              ┌──────▼─────┐  ┌─────▼──────┐
              │  Bedrock   │  │ S3 Vectors │
              │  (Claude)  │  │  (Bedrock) │
              └────────────┘  └────────────┘
                     │
              ┌──────▼─────┐
              │ DynamoDB   │
              │ (Classify) │
              └────────────┘

Component Details

1. Frontend Layer

CloudFront Distribution

  • Purpose: Global CDN for low-latency access
  • Origin: S3 bucket with static web assets
  • Features:
    • HTTPS-only with TLS 1.2+
    • HTTP/2 and HTTP/3 support
    • Gzip/Brotli compression
    • Custom domain support
    • Origin Access Identity (OAI) for S3 security

S3 Web Bucket

  • Contents: HTML, CSS, JavaScript, images
  • Access: Private (CloudFront only via OAI)
  • Versioning: Enabled for rollback capability
  • Lifecycle: Old versions expire after 90 days

Web Application

  • Framework: React 19 + TypeScript + Vite
  • Styling: Tailwind CSS 4
  • Authentication: Google OAuth via Cognito
  • Features:
    • Real-time chat interface with clarification prompts
    • Source citation display with relevance scores
    • Query understanding integration
    • Dark/light mode
    • Mobile responsive PWA

2. API Layer

API Gateway (HTTP API)

  • Type: HTTP API (not REST API - lower cost, better performance)
  • Routes:
    • POST /chat - Main query endpoint
    • POST /webhooks/fathom - Fathom video webhook
    • POST /webhooks/helpscout - HelpScout ticket webhook
    • POST /webhooks/linear - Linear issue webhook
  • Authentication:
    • /chat: JWT authorizer (Cognito)
    • Webhooks: API key validation in Lambda
  • Throttling:
    • Rate: 10 requests/second
    • Burst: 20 requests
  • CORS: Enabled for web UI origin

3. Authentication Layer

Amazon Cognito

  • User Pool: Manages user accounts and sessions
  • Identity Provider: Google OAuth 2.0 federation
  • Token Validation: JWT tokens with 1-hour expiration
  • Features:
    • Email/password fallback (optional)
    • MFA support (optional)
    • Account recovery
    • User attributes (email, name, picture)

Authorization Flow

1. User clicks "Sign in with Google"
2. Redirected to Cognito hosted UI
3. Cognito redirects to Google OAuth
4. User authorizes → Google returns code
5. Cognito exchanges code for tokens
6. User redirected to app with JWT
7. App stores JWT in localStorage
8. JWT sent in Authorization header for API calls

4. Compute Layer

Lambda Functions

Chat Lambda (lambda/chat/)

  • Runtime: Python 3.13
  • Memory: 1024 MB
  • Timeout: 60 seconds
  • Purpose: Main orchestrator for chat queries
  • Flow:
    1. Validate JWT token
    2. Extract query from request
    3. Call Query Lambda to retrieve relevant docs
    4. Call Bedrock with context + query
    5. Stream response back to client
    6. Return sources with citations
  • Environment Variables:
    • QUERY_LAMBDA_ARN - Query Lambda function ARN
    • BEDROCK_MODEL_ID - Claude model identifier
    • AWS_REGION - Deployment region

Note: Document retrieval is handled directly by the Chat Lambda via Bedrock Knowledge Base APIs. The Chat Lambda calls bedrock-agent-runtime:retrieve which:

  1. Generates embeddings via Titan Embeddings v2
  2. Searches S3 Vectors for similar documents
  3. Optionally reranks results using Bedrock Reranking
  4. Returns top K results (configurable, default 5)

Classify Lambda (lambda/classify/)

  • Runtime: Python 3.13
  • Memory: 512 MB
  • Timeout: 30 seconds
  • Purpose: Determine client/project for incoming documents using email domain matching
  • Flow:
    1. Receive source type and document data
    2. Select classification strategy (Fathom or HelpScout)
    3. Extract email domain(s) from document
    4. Query DynamoDB for DOMAIN# record matching the domain
    5. Fetch CLIENT# and PROJECT# records for metadata
    6. Return client name and project name for S3 metadata tagging
  • Strategies:
    • FathomStrategy: Extracts domains from meeting participant emails, selects highest-count match
    • HelpScoutStrategy: Extracts domain from primary customer email
  • Input Format:
    {"source": "fathom", "data": {"document": {"participants": ["user@client.com"]}}}
    
  • Output Format:
    {"clientId": "uuid", "clientName": "Client Name", "projectId": null, "projectName": null}
    
  • Environment Variables:
    • DYNAMODB_TABLE_NAME - Classify table name (nb-rag-sys-classify)

Fathom Webhook Lambda (lambda/webhooks/fathom/)

  • Runtime: Python 3.13
  • Memory: 512 MB
  • Timeout: 300 seconds (5 minutes)
  • Purpose: Process Fathom video recordings
  • Flow:
    1. Receive webhook from Fathom
    2. Validate API key
    3. Fetch video metadata and transcript
    4. Write document and .metadata.json sidecar to S3
    5. Trigger Bedrock KB sync (via scheduled ingestion)
    6. Return 200 OK
  • Environment Variables:
    • FATHOM_API_KEY - Stored in Secrets Manager
    • DOCUMENTS_BUCKET - S3 bucket for documents

HelpScout Webhook Lambda (lambda/webhooks/helpscout/)

  • Runtime: Python 3.13
  • Memory: 512 MB
  • Timeout: 60 seconds
  • Purpose: Process HelpScout support tickets
  • Flow:
    1. Receive webhook from HelpScout
    2. Validate API key
    3. Extract ticket content
    4. Call Classify Lambda
    5. Write document to S3
    6. Return 200 OK
  • Environment Variables:
    • HELPSCOUT_API_KEY - Stored in Secrets Manager
    • CLASSIFY_LAMBDA_ARN - Classify Lambda function ARN

Linear Webhook Lambda (lambda/webhooks/linear/)

  • Runtime: Python 3.13
  • Memory: 512 MB
  • Timeout: 60 seconds
  • Purpose: Sync Linear Teams (clients) and Projects to DynamoDB for entity registry
  • Supported Events: Team.create, Team.update, Team.remove, Project.create, Project.update, Project.remove
  • Flow (Team events):
    1. Receive webhook from Linear with HMAC signature
    2. Verify signature using webhook secret
    3. Create/update/delete CLIENT record in DynamoDB
    4. Uses update_item to preserve manual fields (Domains, Notes, Aliases)
  • Flow (Project events):
    1. Verify webhook signature
    2. Fetch full project details from Linear API (includes team info)
    3. Ensure parent CLIENT exists via ensure_client_exists()
    4. Create/update/delete PROJECT record in DynamoDB
    5. Uses if_not_exists(CreatedAt, :val) to preserve original timestamps
  • Environment Variables:
    • LINEAR_WEBHOOK_SECRET_ARN - Webhook secret in Secrets Manager
    • LINEAR_API_KEY_SECRET_ARN - API key in Secrets Manager
    • DYNAMODB_TABLE_NAME - Classify table name

Linear Sync Lambda (lambda/sync/linear/)

  • Runtime: Python 3.13
  • Memory: 512 MB
  • Timeout: 300 seconds (5 minutes)
  • Purpose: Full sync of all Linear Teams and Projects to DynamoDB
  • Architecture: Handler + Worker pattern (fire-and-forget for long-running sync)
  • Flow:
    1. Handler receives request, invokes worker asynchronously, returns 202 Accepted
    2. Worker fetches all teams from Linear GraphQL API
    3. For each team: creates/updates CLIENT record
    4. For each project: creates/updates PROJECT record with CLIENT parent
    5. Uses update_item with if_not_exists(CreatedAt) for idempotent updates
  • Invocation:
    aws lambda invoke --function-name nb-rag-sys-linear-sync --payload '{}' response.json
    
  • Environment Variables:
    • LINEAR_API_KEY_SECRET_ARN - API key in Secrets Manager
    • DYNAMODB_TABLE_NAME - Classify table name

5. AI Layer

Amazon Bedrock

For complete Bedrock documentation, see Amazon Bedrock User Guide.

Claude Sonnet 4.5 (us.anthropic.claude-sonnet-4-5-20250929-v1:0)

  • Purpose: Response generation
  • Input: System prompt + context + user query
  • Output: Natural language response with citations
  • Pricing: $3.00 per million input tokens, $15.00 per million output tokens
  • Performance: ~2-3 seconds for typical query
  • Documentation: Claude on Amazon Bedrock

Titan Embeddings V2 (amazon.titan-embed-text-v2:0)

  • Purpose: Generate vector embeddings for semantic search
  • Input: Text chunks (up to 8192 tokens)
  • Output: 1024-dimensional vector (also supports 256, 512)
  • Pricing: $0.0001 per 1000 tokens
  • Performance: ~200ms per embedding
  • Documentation: Amazon Titan Text Embeddings

System Prompt Strategy

You are a helpful assistant with access to company documentation.

Context:
{retrieved_documents}

User Query: {query}

Instructions:
1. Answer based on the provided context
2. Cite sources using [doc_N] notation
3. If information is not in context, say so
4. Be concise but thorough

6. Data Layer

S3 Vectors

S3 Vectors is purpose-built vector storage that provides cost-optimized, durable storage for AI workloads. See Working with S3 Vectors for complete documentation.

  • Index Configuration:
    • Dimension: 1024 (matches Titan Embeddings v2)
    • Metric: Cosine similarity
    • Data Type: FLOAT32
    • Storage: Fully managed by AWS
    • Non-filterable keys: AMAZON_BEDROCK_TEXT, AMAZON_BEDROCK_METADATA (required for 100% ingestion success)
    • See Vector indexes for configuration details
  • Integration with Bedrock Knowledge Base:
    • Native integration via S3_VECTORS storage type - see Using S3 Vectors with Bedrock KB
    • Automatic chunking (512 tokens, 20% overlap) - see Chunking strategies
    • LLM parsing: Disabled (sidecar .metadata.json files used instead)
    • Data deletion policy: DELETE (vectors auto-removed when S3 docs deleted)
  • Document Metadata Architecture: Two metadata systems:

    1. Bedrock KB Metadata (.metadata.json sidecar files for filtering): Each document has a companion metadata sidecar file stored alongside it in S3. For example: fathom/acme/alpha/meeting_12345.md has fathom/acme/alpha/meeting_12345.md.metadata.json

    {
      "metadataAttributes": {
        "source": {"value": {"type": "STRING", "stringValue": "fathom"}},
        "client": {"value": {"type": "STRING", "stringValue": "acme"}},
        "project": {"value": {"type": "STRING", "stringValue": "alpha"}},
        "category": {"value": {"type": "STRING", "stringValue": "meeting-transcript"}}
      }
    }
    

    These enable multi-tenant filtering in RAG queries (filter by client). Project metadata is preserved for display purposes but not used for filtering. See S3 Vectors metadata filtering and RetrievalFilter API for filter syntax. For multi-tenancy patterns, see Multi-tenancy with metadata filtering.

    2. S3 Object Metadata (HTTP headers, for S3 operations only): Technical metadata stored as S3 object headers. NOT used by Bedrock KB for filtering.

  • Query Performance: Fast retrieval via Bedrock KB API
  • Capacity: Scales automatically with usage

DynamoDB Tables

Classify Table (nb-rag-sys-classify)

The classify table is the central entity registry for multi-tenant document classification. It stores clients (from Linear teams), projects, and domain mappings used by the classification service to tag documents with the correct metadata.

Primary Key Structure:

  • PK (Partition Key): Entity type prefix + ID (e.g., CLIENT#uuid, PROJECT#uuid, DOMAIN#example.com)
  • SK (Sort Key): Relationship context (e.g., METADATA, CLIENT#parent-uuid)

Record Types:

Record Type PK SK Purpose
CLIENT CLIENT#<team-id> METADATA Client/Team from Linear
PROJECT PROJECT#<project-id> CLIENT#<team-id> Project with parent client
DOMAIN DOMAIN#<domain> METADATA Email domain to client mapping

CLIENT Record (from Linear Teams):

{
  "PK": "CLIENT#b2362d41-0364-4325-9b1e-a32b7e2d9255",
  "SK": "METADATA",
  "EntityType": "CLIENT",
  "Name": "American Cell Technology",
  "LinearTeamId": "b2362d41-0364-4325-9b1e-a32b7e2d9255",
  "LinearTeamKey": "ACT",
  "Description": "Biotech client",
  "CreatedAt": "2025-12-30T23:09:26.580013",
  "UpdatedAt": "2025-12-30T23:09:26.580022"
}

PROJECT Record (from Linear Projects):

{
  "PK": "PROJECT#97ae8b49-c653-4538-a526-ba4d5e91f79a",
  "SK": "CLIENT#b2362d41-0364-4325-9b1e-a32b7e2d9255",
  "EntityType": "PROJECT",
  "Name": "LIMS",
  "LinearProjectId": "97ae8b49-c653-4538-a526-ba4d5e91f79a",
  "ClientId": "b2362d41-0364-4325-9b1e-a32b7e2d9255",
  "Description": "Laboratory Information Management System",
  "State": "started",
  "CreatedAt": "2025-12-30T23:09:26.716056",
  "UpdatedAt": "2025-12-30T23:09:26.716068"
}

DOMAIN Record (for classification):

{
  "PK": "DOMAIN#americancelltechnology.com",
  "SK": "METADATA",
  "ClientId": "b2362d41-0364-4325-9b1e-a32b7e2d9255",
  "ProjectId": null,
  "CreatedAt": "2025-12-30T23:15:00.000000",
  "UpdatedAt": "2025-12-30T23:15:00.000000"
}

Global Secondary Index (GSI):

  • EntityTypeIndex: Query all entities of a specific type
    • Hash Key: EntityType (CLIENT PROJECT)
    • Used by query understanding to load all known entities for LLM context

Features:

  • Point-in-Time Recovery (PITR) enabled
  • On-demand billing
  • update_item pattern preserves manual fields across syncs
  • if_not_exists(CreatedAt, :val) preserves original creation timestamps

AWS Secrets Manager

  • Purpose: Secure storage of API keys and credentials
  • Secrets:
    • fathom-api-key - Fathom API key
    • helpscout-api-key - HelpScout API key
    • linear-api-key - Linear API key
    • google-oauth-client-secret - Google OAuth secret
  • Rotation: Manual (can be automated)
  • Access: IAM role-based (Lambda execution roles)

7. Infrastructure Layer

Terraform State

  • Backend: S3 + DynamoDB
  • State File: s3://nb-rag-sys-terraform-state/terraform.tfstate
  • Locking: DynamoDB table nb-rag-sys-terraform-locks
  • Encryption: AES-256 at rest
  • Versioning: Enabled for state recovery

IAM Roles

Lambda Execution Roles

  • Chat Lambda: Invoke Query Lambda, Bedrock access
  • Query Lambda: Secrets Manager read, Bedrock access
  • Classify Lambda: DynamoDB write, Bedrock access
  • Webhook Lambdas: Secrets Manager read, invoke other Lambdas

GitHub Actions OIDC Role

  • Trust policy: GitHub OIDC provider
  • Permissions: Full Terraform deployment access
  • Session duration: 1 hour
  • MFA: Not required (OIDC provides strong authentication)

Data Flow

Query Flow

The query flow uses Bedrock Knowledge Base Retrieve API with optional reranking.

1. User enters query in web UI
2. Web UI sends POST /chat with JWT token
3. API Gateway validates JWT with Cognito
4. Chat Lambda invoked:
   a. Calls Bedrock Knowledge Base retrieve API
   b. Knowledge Base:
      - Generates embedding via Bedrock Titan
      - Searches S3 Vectors for similar documents
      - Optionally reranks results using Bedrock Reranking (Cohere Rerank 3.5)
      - Returns top K documents (adaptive retrieval)
   c. Formats context from retrieved documents
   d. Calls Bedrock Claude with system prompt + context + query
   e. Returns response
5. Chat Lambda returns JSON response with answer + sources
6. Web UI displays answer with source citations

Ingestion Flow (Fathom Example)

1. Fathom video completes processing
2. Fathom sends webhook to API Gateway
3. API Gateway routes to Fathom Webhook Lambda
4. Lambda validates API key from Secrets Manager
5. Lambda fetches video metadata and transcript
6. Lambda classifies content (client, project) via classify Lambda
7. Lambda writes document and .metadata.json sidecar to S3
8. Bedrock Knowledge Base sync triggers (scheduled or manual)
9. Knowledge Base:
   a. Reads .metadata.json for filtering attributes
   b. Chunks document (~512 tokens, 20% overlap)
   c. Generates embeddings via Bedrock Titan
   d. Stores vectors in S3 Vectors with metadata
10. Lambda returns 200 OK to Fathom
11. Video content searchable after next sync cycle

Security Architecture

Network Security

  • All traffic over HTTPS/TLS 1.2+
  • API Gateway with WAF (optional)
  • CloudFront with AWS Shield Standard
  • VPC endpoints for AWS service communication (optional)

Authentication & Authorization

  • Google OAuth 2.0 for user authentication
  • Cognito JWT tokens for API authorization
  • API keys for webhook validation
  • IAM roles for Lambda execution

Data Security

  • Secrets Manager for sensitive credentials
  • S3 encryption at rest (AES-256)
  • DynamoDB encryption at rest
  • Lambda environment variable encryption (optional)

Compliance

  • GDPR: User data in EU can use eu-west-1
  • SOC 2: AWS services are SOC 2 compliant
  • HIPAA: Not HIPAA compliant (would require BAA)

Scalability

Current Capacity

  • Concurrent users: ~100
  • Queries per second: 10 (API Gateway limit)
  • Vector storage: 100K documents
  • Lambda concurrency: 10 reserved

Scaling Strategies

Horizontal Scaling

  • Increase API Gateway rate limit
  • Increase Lambda reserved concurrency
  • S3 Vectors scales automatically with usage
  • Enable CloudFront caching

Vertical Scaling

  • Increase Lambda memory allocation
  • Adjust Bedrock Knowledge Base retrieval limits
  • Switch to provisioned DynamoDB capacity

Performance Optimization

  • Cache frequent queries in Lambda
  • Optimize chunk size/overlap in Knowledge Base
  • Tune adaptive retrieval multiplier
  • Enable Bedrock reranking for better relevance

Disaster Recovery

Backup Strategy

  • Terraform State: S3 versioning enabled
  • S3 Documents: S3 versioning + cross-region replication (optional)
  • S3 Vectors: Automatically backed up with S3 data protection
  • DynamoDB: Point-in-Time Recovery (PITR) enabled
  • Secrets: Replicate to backup region
  • Web Assets: S3 versioning + lifecycle

Recovery Procedures

Complete Infrastructure Loss

  1. Deploy from Terraform (terraform apply)
  2. Restore S3 documents from backup/versioning
  3. Trigger Bedrock Knowledge Base sync to rebuild vectors
  4. Restore DynamoDB from PITR
  5. Update secrets in Secrets Manager
  6. Deploy web assets to S3
  7. Invalidate CloudFront cache RTO: ~30 minutes | RPO: Near-zero (S3 durability)

Region Failure

  1. Update Terraform region variable
  2. Deploy to new region
  3. Update DNS to point to new CloudFront
  4. Restore data from backups RTO: ~1 hour | RPO: ~24 hours

Monitoring & Observability

CloudWatch Metrics

  • Lambda: Invocations, Duration, Errors, Throttles
  • API Gateway: 4xx, 5xx, Latency, Request Count
  • Bedrock: Model invocations, token usage
  • DynamoDB: Read/write capacity, throttles

CloudWatch Logs

  • Lambda function logs (7-day retention)
  • API Gateway access logs (optional)
  • Structured JSON logging in Lambda

Alarms

  • Lambda error rate > 1%
  • API Gateway 5xx rate > 0.5%
  • Lambda duration > 30 seconds
  • DynamoDB throttling events

Distributed Tracing

  • X-Ray integration for Lambda (optional)
  • Request ID propagation across services

Last updated: 2026-01-01