Architecture Decision Records (ADRs)

Documentation of significant architectural decisions made during the development of the NorthBuilt RAG System.

ADR Format

Each ADR follows this structure:

  • Status: Accepted Superseded Deprecated
  • Date: Decision date
  • Context: Problem and constraints
  • Decision: What was decided
  • Consequences: Impact of the decision
  • Alternatives Considered: Other options evaluated

ADR-001: Serverless Architecture on AWS Lambda

Status: Accepted Date: 2025-10-01

Context

Need to build a RAG system with:

  • Variable workload (unpredictable query patterns)
  • Minimal operational overhead
  • Cost-effective at low scale
  • Fast time to market

Constraints:

  • Small team (no dedicated DevOps)
  • Budget-conscious
  • Need production-ready quickly

Decision

Use 100% serverless architecture on AWS:

  • Lambda for compute
  • API Gateway for HTTP endpoints
  • S3 for storage
  • DynamoDB for data
  • Managed AI services (Bedrock)

Consequences

Positive:

  • No server management or patching
  • Auto-scaling built-in
  • Pay only for usage
  • High availability by default
  • Fast deployment cycle

Negative:

  • Cold start latency (~1s)
  • Lambda timeout limits (15 min max)
  • Vendor lock-in to AWS
  • Debugging more complex than traditional servers

Cost Impact: ~$140/month at 1K queries/month

Alternatives Considered

  1. Container-based (ECS/EKS)
    • Pros: More control, no cold starts, can run any code
    • Cons: Higher baseline cost ($50/month min), requires container expertise, more operational overhead
    • Rejected: Overkill for current scale
  2. EC2 Instances
    • Pros: Full control, no timeouts, familiar
    • Cons: Fixed cost ($30-100/month), manual scaling, patching required
    • Rejected: Too much operational burden
  3. Managed Platform (Heroku, Render)
    • Pros: Simple deployment, less AWS-specific
    • Cons: Higher cost ($25-50/month), less control, still need to manage containers
    • Rejected: More expensive, less flexibility

ADR-002: Pinecone for Vector Storage

Status: Superseded by ADR-010 Date: 2025-11-01

Context

Initial implementation used OpenSearch Serverless for vector storage, but:

  • High cost: ~$700/month minimum
  • Slower queries: 200ms+ latency
  • Complex networking (VPC, security groups)
  • Frequent Terraform drift issues

Need vector database that is:

  • Cost-effective at small scale
  • Fast (<50ms query latency)
  • Simple to operate
  • Reliable

Decision

Migrate to Pinecone managed vector database.

Consequences

Positive:

  • Cost: $70/month (90% savings vs OpenSearch)
  • Performance: <25ms query latency (8× faster)
  • Simplicity: No VPC networking, no drift
  • Reliability: 99.9% SLA, managed service

Negative:

  • External dependency (not AWS)
  • Additional API key to manage
  • Data egress from AWS (minimal cost)
  • Migration effort required

Cost Impact: $70/month (fixed) + $0 per query

Alternatives Considered

  1. OpenSearch Serverless (original)
    • Pros: Fully AWS-native, powerful query language
    • Cons: $700/month minimum, slow, complex
    • Rejected: Too expensive
  2. Self-hosted Qdrant/Weaviate on EC2
    • Pros: Full control, cheaper than OpenSearch (~$40/month)
    • Cons: Operational burden, manual scaling, backups, patching
    • Rejected: Too much maintenance
  3. DynamoDB + FAISS
    • Pros: Fully AWS-native, very cheap
    • Cons: Complex to implement, slower than specialized vector DB
    • Rejected: Development time not worth savings
  4. Elasticsearch (self-managed)
    • Pros: Mature, powerful, familiar
    • Cons: Expensive to run (need 3 nodes), complex operations
    • Rejected: Operational overhead

ADR-003: Claude Sonnet 4.5 for Response Generation

Status: Accepted Date: 2025-10-15

Context

Need LLM for generating responses from retrieved context. Requirements:

  • High quality responses
  • Fast inference (<3s)
  • Accurate citation handling
  • Cost-effective

Decision

Use Claude Sonnet 4.5 via AWS Bedrock.

Pricing: $3/M input tokens, $15/M output tokens

Consequences

Positive:

  • Excellent response quality
  • Strong instruction following (citations, formatting)
  • Fast inference (~2s average)
  • AWS Bedrock integration (no separate API)
  • No model hosting required

Negative:

  • More expensive than smaller models
  • Token limits (200K context window, but we use ~2K)
  • Vendor lock-in (Anthropic model)

Cost Impact: ~$11/month for 1000 queries

Alternatives Considered

  1. Claude Haiku (cheaper)
    • Pros: 12× cheaper ($0.25 input, $1.25 output)
    • Cons: Lower quality responses, less nuanced
    • Future: May use for simple queries
  2. GPT-4 via OpenAI
    • Pros: Comparable quality, more familiar to some
    • Cons: More expensive ($30 input, $60 output), separate API to manage
    • Rejected: More expensive, one more service
  3. Self-hosted Llama 3
    • Pros: Free inference (after setup)
    • Cons: GPU required ($730/month for g5.2xlarge), complex deployment, lower quality
    • Rejected: Not cost-effective until >70K queries/month
  4. Claude Opus (higher quality)
    • Pros: Highest quality responses
    • Cons: 5× more expensive
    • Rejected: Quality difference not worth cost for most queries

ADR-004: Cognito + Google OAuth for Authentication

Status: Accepted Date: 2025-10-10

Context

Need user authentication for web UI. Requirements:

  • Secure (industry standard)
  • Low maintenance
  • Familiar UX (social login)
  • Cost-effective

Decision

Use AWS Cognito with Google OAuth federation.

Consequences

Positive:

  • Free for first 50K monthly active users
  • Fully managed (no password storage, MFA, etc.)
  • Standard OAuth 2.0 flow
  • JWT tokens for API authorization
  • No additional auth service needed

Negative:

  • AWS lock-in
  • Limited customization of UI
  • Redirect-based flow (not SPA-native)
  • Google API setup required

Cost Impact: $0/month (under 50K MAU)

Alternatives Considered

  1. Auth0
    • Pros: Better UX, more identity providers, more features
    • Cons: $35/month minimum, external service
    • Rejected: Unnecessary cost
  2. Firebase Authentication
    • Pros: Good Google integration, free tier
    • Cons: Ties to Google Cloud, harder to integrate with AWS
    • Rejected: Prefer AWS-native
  3. Custom JWT implementation
    • Pros: Full control, no cost
    • Cons: Security risk, maintenance burden, password management
    • Rejected: Not worth security risk
  4. No authentication
    • Pros: Simplest
    • Cons: No user tracking, no access control
    • Rejected: Need to track usage per user

ADR-005: HTTP API (not REST API) for API Gateway

Status: Accepted Date: 2025-10-12

Context

API Gateway offers two options:

  • REST API: Full-featured, more expensive
  • HTTP API: Simpler, cheaper, faster

Decision

Use HTTP API for lower cost and better performance.

Consequences

Positive:

  • Cost: 70% cheaper ($1/M vs $3.50/M requests)
  • Latency: ~10ms lower latency
  • Simpler: Fewer features to configure

Negative:

  • No resource policies or usage plans
  • No API key authentication (use JWT instead)
  • Limited request validation
  • No caching (need external cache)

Cost Impact: $1/month vs $3.50/month at 1M requests

Alternatives Considered

  1. REST API
    • Pros: More features (caching, usage plans, resource policies)
    • Cons: More expensive, slower
    • Rejected: Don’t need extra features
  2. Application Load Balancer
    • Pros: Lower cost at high scale (>10M requests/month)
    • Cons: Fixed cost (~$20/month), need to run targets (Lambda or containers)
    • Rejected: Current scale doesn’t justify

ADR-006: Migrate from OpenSearch to Pinecone

Status: Superseded by ADR-010 Date: 2025-11-01

Context

OpenSearch Serverless had multiple issues:

  • High cost: $700/month minimum (OCU pricing)
  • Performance: 200ms+ query latency (p95)
  • Complexity: VPC, security groups, collection policies
  • Terraform drift: Constant drift with policies
  • Overkill: Full-text search features unused

Decision

Migrate to Pinecone as the primary vector store.

Migration approach:

  1. Create Pinecone index
  2. Re-ingest all documents
  3. Update Query Lambda to use Pinecone
  4. Destroy OpenSearch collection
  5. Remove VPC resources

Consequences

Positive:

  • Cost savings: $630/month (90% reduction)
  • Performance: <25ms latency (8× faster)
  • Simpler architecture: Removed VPC, 50+ Terraform resources
  • No drift: Pinecone doesn’t use complex IAM policies
  • Better docs: Pinecone docs > AWS OpenSearch docs

Negative:

  • External dependency: Data stored outside AWS
  • Migration downtime: 2 hours to re-ingest
  • Lost features: No full-text search (only vector similarity)

Cost Impact: $70/month fixed (was $700/month)

Migration Steps

  1. Created Pinecone index (1024-dim, cosine similarity)
  2. Wrote migration script to fetch from OpenSearch, upsert to Pinecone
  3. Updated Lambda environment variables (API key, index name)
  4. Tested query performance (validated <25ms latency)
  5. Destroyed OpenSearch resources via Terraform
  6. Removed VPC, subnets, security groups, collection policies

Rollback plan: Keep OpenSearch for 7 days before destroying, can rollback if issues.


ADR-007: Terraform for Infrastructure as Code

Status: Accepted Date: 2025-10-01

Context

Need to manage AWS infrastructure. Requirements:

  • Repeatable deployments
  • Version control for infrastructure
  • Multiple environments (dev, prod)
  • Team collaboration

Decision

Use Terraform for infrastructure as code.

Consequences

Positive:

  • Declarative: Describe desired state, Terraform handles changes
  • State management: S3 + DynamoDB for locking
  • Modules: Reusable components
  • Plan before apply: Review changes before executing
  • Industry standard: Well-documented, large community

Negative:

  • State management: Need to protect state file
  • Learning curve: HCL syntax, Terraform concepts
  • Drift: Manual changes cause drift (must avoid)

Cost Impact: $0.10/month (S3 state storage)

Alternatives Considered

  1. AWS CloudFormation
    • Pros: AWS-native, no state file, free
    • Cons: YAML/JSON verbose, slower, AWS-only
    • Rejected: Prefer Terraform’s cleaner syntax
  2. AWS CDK
    • Pros: Real programming language (Python, TypeScript)
    • Cons: Less mature, generates CloudFormation (slower), more complex
    • Rejected: Overkill for our needs
  3. Pulumi
    • Pros: Real programming language, good UX
    • Cons: Less mature, smaller community, managed state
    • Rejected: Prefer Terraform’s larger ecosystem
  4. Manual (ClickOps)
    • Pros: Fastest initially, familiar
    • Cons: Not repeatable, no version control, error-prone
    • Rejected: Not sustainable

ADR-008: GitHub Actions for CI/CD

Status: Accepted Date: 2025-10-05

Context

Need CI/CD pipeline for deploying infrastructure and code. Requirements:

  • Automated deployments on push
  • Secure (no long-lived credentials)
  • Easy to configure
  • Free or cheap

Decision

Use GitHub Actions with OIDC authentication to AWS.

Consequences

Positive:

  • Free: Unlimited minutes for public repos, 2000 min/month for private
  • Integrated: Lives with code in .github/workflows
  • Secure: OIDC eliminates long-lived AWS keys
  • Flexible: Can run any bash command, install any tool

Negative:

  • GitHub lock-in: Workflow syntax specific to GitHub Actions
  • Limited debugging: Can’t SSH into runners
  • Cold start: Runners start fresh each time (must install tools)

Cost Impact: $0/month (free tier)

Alternatives Considered

  1. AWS CodePipeline
    • Pros: AWS-native, integrates with CodeBuild
    • Cons: $1/pipeline/month, more complex setup
    • Rejected: GitHub Actions simpler and free
  2. GitLab CI
    • Pros: Similar to GitHub Actions, good UX
    • Cons: Need to migrate repo, learning curve
    • Rejected: Already using GitHub
  3. Jenkins
    • Pros: Full control, highly customizable
    • Cons: Need to host ($30/month EC2), complex setup, maintenance
    • Rejected: Too much overhead
  4. CircleCI / Travis CI
    • Pros: Good UX, mature
    • Cons: Cost ($30+/month for private), not as integrated
    • Rejected: GitHub Actions more convenient

ADR-009: Python 3.13 for Lambda Runtime

Status: Accepted Date: 2025-10-15

Context

Need to choose Lambda runtime. Python is preferred for:

  • Team expertise
  • AWS SDK (boto3) built-in
  • AI/ML libraries (LangChain, etc.)

Decision

Use Python 3.13 runtime (latest available).

Consequences

Positive:

  • Performance: Faster than Python 3.11/3.12
  • Features: Latest Python features
  • Support: Will be supported for ~5 years
  • Libraries: All major libraries compatible

Negative:

  • Newer runtime: Less battle-tested (3.12 more stable)
  • Dependencies: Some libraries may lag

Cost Impact: None

Alternatives Considered

  1. Python 3.12
    • Pros: More stable, better tested
    • Cons: Slightly slower, older features
    • May switch if stability issues arise
  2. Node.js
    • Pros: Faster cold starts, async-native
    • Cons: Different language, less AI/ML ecosystem
    • Rejected: Team less familiar
  3. Go
    • Pros: Fastest cold starts, compiled
    • Cons: Verbose, harder to write, less AWS library support
    • Rejected: Development speed more important
  4. Java
    • Pros: Enterprise-grade, good AWS support
    • Cons: Slow cold starts (5s+), verbose
    • Rejected: Cold starts unacceptable

ADR-010: Migrate from Pinecone to S3 Vectors

Status: Accepted Date: 2025-12-15

Context

Pinecone worked well but had limitations for our use case:

  • External dependency: Data stored outside AWS ecosystem
  • API key management: Additional secret to manage and rotate
  • Cost structure: Fixed monthly cost regardless of usage
  • Integration complexity: Separate service from Bedrock Knowledge Base

AWS announced S3 Vectors, a purpose-built vector storage service that integrates natively with Bedrock Knowledge Bases.

Decision

Migrate from Pinecone to AWS S3 Vectors for vector storage.

Key benefits of S3 Vectors:

  • Native Bedrock Knowledge Base integration
  • Fully managed within AWS ecosystem
  • Pay-per-use pricing model
  • No external API keys required
  • Automatic scaling and high availability

Consequences

Positive:

  • Simplified architecture: Single AWS ecosystem, no external dependencies
  • Native integration: Direct integration with Bedrock Knowledge Base
  • Security: No external API keys, uses IAM for access control
  • Cost model: Pay only for storage and queries used
  • Compliance: Data stays within AWS, simplifies compliance

Negative:

  • Migration effort: Required re-ingestion of all documents
  • Feature differences: S3 Vectors has different metadata limits (1KB with Bedrock KB)
  • Newer service: Less battle-tested than Pinecone

Cost Impact: Variable based on usage (vs $70/month fixed with Pinecone)

Migration Steps

  1. Created S3 Vectors bucket and index (1024-dim, cosine similarity)
  2. Updated Terraform to use aws_bedrockagent_knowledge_base with S3 Vectors storage
  3. Configured IAM policies for Bedrock to access S3 Vectors
  4. Re-ingested all documents via Bedrock Knowledge Base sync
  5. Updated Lambda handlers to use Bedrock Knowledge Base Retrieve API
  6. Removed Pinecone provider and related Terraform resources
  7. Deleted Pinecone API key from Secrets Manager

Configuration Details

# S3 Vectors storage configuration
storage_configuration {
  type = "S3_VECTORS"
  s3_vectors_configuration {
    index_arn = var.s3_vectors_index_arn
  }
}

# Fixed-size chunking (512 tokens, 20% overlap)
chunking_configuration {
  chunking_strategy = "FIXED_SIZE"
  fixed_size_chunking_configuration {
    max_tokens         = 512
    overlap_percentage = 20
  }
}

Summary

ADR Decision Status Impact
001 Serverless on Lambda Accepted ~$140/month, no ops
002 Pinecone for vectors Superseded Replaced by S3 Vectors
003 Claude Sonnet 4.5 Accepted $11/month per 1K queries
004 Cognito + Google OAuth Accepted Free, managed auth
005 HTTP API (not REST) Accepted 70% cost savings
006 Migrate OpenSearch → Pinecone Superseded Replaced by S3 Vectors
007 Terraform for IaC Accepted Repeatable deployments
008 GitHub Actions for CI/CD Accepted Free, secure OIDC
009 Python 3.13 runtime Accepted Latest features, good perf
010 Migrate Pinecone → S3 Vectors Accepted Native AWS, pay-per-use

Last updated: 2025-12-29