Architecture Decision Records (ADRs)
Documentation of significant architectural decisions made during the development of the NorthBuilt RAG System.
ADR Format
Each ADR follows this structure:
-
Status: Accepted Superseded Deprecated - Date: Decision date
- Context: Problem and constraints
- Decision: What was decided
- Consequences: Impact of the decision
- Alternatives Considered: Other options evaluated
ADR-001: Serverless Architecture on AWS Lambda
Status: Accepted Date: 2025-10-01
Context
Need to build a RAG system with:
- Variable workload (unpredictable query patterns)
- Minimal operational overhead
- Cost-effective at low scale
- Fast time to market
Constraints:
- Small team (no dedicated DevOps)
- Budget-conscious
- Need production-ready quickly
Decision
Use 100% serverless architecture on AWS:
- Lambda for compute
- API Gateway for HTTP endpoints
- S3 for storage
- DynamoDB for data
- Managed AI services (Bedrock)
Consequences
Positive:
- No server management or patching
- Auto-scaling built-in
- Pay only for usage
- High availability by default
- Fast deployment cycle
Negative:
- Cold start latency (~1s)
- Lambda timeout limits (15 min max)
- Vendor lock-in to AWS
- Debugging more complex than traditional servers
Cost Impact: ~$140/month at 1K queries/month
Alternatives Considered
- Container-based (ECS/EKS)
- Pros: More control, no cold starts, can run any code
- Cons: Higher baseline cost ($50/month min), requires container expertise, more operational overhead
- Rejected: Overkill for current scale
- EC2 Instances
- Pros: Full control, no timeouts, familiar
- Cons: Fixed cost ($30-100/month), manual scaling, patching required
- Rejected: Too much operational burden
- Managed Platform (Heroku, Render)
- Pros: Simple deployment, less AWS-specific
- Cons: Higher cost ($25-50/month), less control, still need to manage containers
- Rejected: More expensive, less flexibility
ADR-002: Pinecone for Vector Storage
Status: Superseded by ADR-010 Date: 2025-11-01
Context
Initial implementation used OpenSearch Serverless for vector storage, but:
- High cost: ~$700/month minimum
- Slower queries: 200ms+ latency
- Complex networking (VPC, security groups)
- Frequent Terraform drift issues
Need vector database that is:
- Cost-effective at small scale
- Fast (<50ms query latency)
- Simple to operate
- Reliable
Decision
Migrate to Pinecone managed vector database.
Consequences
Positive:
- Cost: $70/month (90% savings vs OpenSearch)
- Performance: <25ms query latency (8× faster)
- Simplicity: No VPC networking, no drift
- Reliability: 99.9% SLA, managed service
Negative:
- External dependency (not AWS)
- Additional API key to manage
- Data egress from AWS (minimal cost)
- Migration effort required
Cost Impact: $70/month (fixed) + $0 per query
Alternatives Considered
- OpenSearch Serverless (original)
- Pros: Fully AWS-native, powerful query language
- Cons: $700/month minimum, slow, complex
- Rejected: Too expensive
- Self-hosted Qdrant/Weaviate on EC2
- Pros: Full control, cheaper than OpenSearch (~$40/month)
- Cons: Operational burden, manual scaling, backups, patching
- Rejected: Too much maintenance
- DynamoDB + FAISS
- Pros: Fully AWS-native, very cheap
- Cons: Complex to implement, slower than specialized vector DB
- Rejected: Development time not worth savings
- Elasticsearch (self-managed)
- Pros: Mature, powerful, familiar
- Cons: Expensive to run (need 3 nodes), complex operations
- Rejected: Operational overhead
ADR-003: Claude Sonnet 4.5 for Response Generation
Status: Accepted Date: 2025-10-15
Context
Need LLM for generating responses from retrieved context. Requirements:
- High quality responses
- Fast inference (<3s)
- Accurate citation handling
- Cost-effective
Decision
Use Claude Sonnet 4.5 via AWS Bedrock.
Pricing: $3/M input tokens, $15/M output tokens
Consequences
Positive:
- Excellent response quality
- Strong instruction following (citations, formatting)
- Fast inference (~2s average)
- AWS Bedrock integration (no separate API)
- No model hosting required
Negative:
- More expensive than smaller models
- Token limits (200K context window, but we use ~2K)
- Vendor lock-in (Anthropic model)
Cost Impact: ~$11/month for 1000 queries
Alternatives Considered
- Claude Haiku (cheaper)
- Pros: 12× cheaper ($0.25 input, $1.25 output)
- Cons: Lower quality responses, less nuanced
- Future: May use for simple queries
- GPT-4 via OpenAI
- Pros: Comparable quality, more familiar to some
- Cons: More expensive ($30 input, $60 output), separate API to manage
- Rejected: More expensive, one more service
- Self-hosted Llama 3
- Pros: Free inference (after setup)
- Cons: GPU required ($730/month for g5.2xlarge), complex deployment, lower quality
- Rejected: Not cost-effective until >70K queries/month
- Claude Opus (higher quality)
- Pros: Highest quality responses
- Cons: 5× more expensive
- Rejected: Quality difference not worth cost for most queries
ADR-004: Cognito + Google OAuth for Authentication
Status: Accepted Date: 2025-10-10
Context
Need user authentication for web UI. Requirements:
- Secure (industry standard)
- Low maintenance
- Familiar UX (social login)
- Cost-effective
Decision
Use AWS Cognito with Google OAuth federation.
Consequences
Positive:
- Free for first 50K monthly active users
- Fully managed (no password storage, MFA, etc.)
- Standard OAuth 2.0 flow
- JWT tokens for API authorization
- No additional auth service needed
Negative:
- AWS lock-in
- Limited customization of UI
- Redirect-based flow (not SPA-native)
- Google API setup required
Cost Impact: $0/month (under 50K MAU)
Alternatives Considered
- Auth0
- Pros: Better UX, more identity providers, more features
- Cons: $35/month minimum, external service
- Rejected: Unnecessary cost
- Firebase Authentication
- Pros: Good Google integration, free tier
- Cons: Ties to Google Cloud, harder to integrate with AWS
- Rejected: Prefer AWS-native
- Custom JWT implementation
- Pros: Full control, no cost
- Cons: Security risk, maintenance burden, password management
- Rejected: Not worth security risk
- No authentication
- Pros: Simplest
- Cons: No user tracking, no access control
- Rejected: Need to track usage per user
ADR-005: HTTP API (not REST API) for API Gateway
Status: Accepted Date: 2025-10-12
Context
API Gateway offers two options:
- REST API: Full-featured, more expensive
- HTTP API: Simpler, cheaper, faster
Decision
Use HTTP API for lower cost and better performance.
Consequences
Positive:
- Cost: 70% cheaper ($1/M vs $3.50/M requests)
- Latency: ~10ms lower latency
- Simpler: Fewer features to configure
Negative:
- No resource policies or usage plans
- No API key authentication (use JWT instead)
- Limited request validation
- No caching (need external cache)
Cost Impact: $1/month vs $3.50/month at 1M requests
Alternatives Considered
- REST API
- Pros: More features (caching, usage plans, resource policies)
- Cons: More expensive, slower
- Rejected: Don’t need extra features
- Application Load Balancer
- Pros: Lower cost at high scale (>10M requests/month)
- Cons: Fixed cost (~$20/month), need to run targets (Lambda or containers)
- Rejected: Current scale doesn’t justify
ADR-006: Migrate from OpenSearch to Pinecone
Status: Superseded by ADR-010 Date: 2025-11-01
Context
OpenSearch Serverless had multiple issues:
- High cost: $700/month minimum (OCU pricing)
- Performance: 200ms+ query latency (p95)
- Complexity: VPC, security groups, collection policies
- Terraform drift: Constant drift with policies
- Overkill: Full-text search features unused
Decision
Migrate to Pinecone as the primary vector store.
Migration approach:
- Create Pinecone index
- Re-ingest all documents
- Update Query Lambda to use Pinecone
- Destroy OpenSearch collection
- Remove VPC resources
Consequences
Positive:
- Cost savings: $630/month (90% reduction)
- Performance: <25ms latency (8× faster)
- Simpler architecture: Removed VPC, 50+ Terraform resources
- No drift: Pinecone doesn’t use complex IAM policies
- Better docs: Pinecone docs > AWS OpenSearch docs
Negative:
- External dependency: Data stored outside AWS
- Migration downtime: 2 hours to re-ingest
- Lost features: No full-text search (only vector similarity)
Cost Impact: $70/month fixed (was $700/month)
Migration Steps
- Created Pinecone index (1024-dim, cosine similarity)
- Wrote migration script to fetch from OpenSearch, upsert to Pinecone
- Updated Lambda environment variables (API key, index name)
- Tested query performance (validated <25ms latency)
- Destroyed OpenSearch resources via Terraform
- Removed VPC, subnets, security groups, collection policies
Rollback plan: Keep OpenSearch for 7 days before destroying, can rollback if issues.
ADR-007: Terraform for Infrastructure as Code
Status: Accepted Date: 2025-10-01
Context
Need to manage AWS infrastructure. Requirements:
- Repeatable deployments
- Version control for infrastructure
- Multiple environments (dev, prod)
- Team collaboration
Decision
Use Terraform for infrastructure as code.
Consequences
Positive:
- Declarative: Describe desired state, Terraform handles changes
- State management: S3 + DynamoDB for locking
- Modules: Reusable components
- Plan before apply: Review changes before executing
- Industry standard: Well-documented, large community
Negative:
- State management: Need to protect state file
- Learning curve: HCL syntax, Terraform concepts
- Drift: Manual changes cause drift (must avoid)
Cost Impact: $0.10/month (S3 state storage)
Alternatives Considered
- AWS CloudFormation
- Pros: AWS-native, no state file, free
- Cons: YAML/JSON verbose, slower, AWS-only
- Rejected: Prefer Terraform’s cleaner syntax
- AWS CDK
- Pros: Real programming language (Python, TypeScript)
- Cons: Less mature, generates CloudFormation (slower), more complex
- Rejected: Overkill for our needs
- Pulumi
- Pros: Real programming language, good UX
- Cons: Less mature, smaller community, managed state
- Rejected: Prefer Terraform’s larger ecosystem
- Manual (ClickOps)
- Pros: Fastest initially, familiar
- Cons: Not repeatable, no version control, error-prone
- Rejected: Not sustainable
ADR-008: GitHub Actions for CI/CD
Status: Accepted Date: 2025-10-05
Context
Need CI/CD pipeline for deploying infrastructure and code. Requirements:
- Automated deployments on push
- Secure (no long-lived credentials)
- Easy to configure
- Free or cheap
Decision
Use GitHub Actions with OIDC authentication to AWS.
Consequences
Positive:
- Free: Unlimited minutes for public repos, 2000 min/month for private
- Integrated: Lives with code in
.github/workflows - Secure: OIDC eliminates long-lived AWS keys
- Flexible: Can run any bash command, install any tool
Negative:
- GitHub lock-in: Workflow syntax specific to GitHub Actions
- Limited debugging: Can’t SSH into runners
- Cold start: Runners start fresh each time (must install tools)
Cost Impact: $0/month (free tier)
Alternatives Considered
- AWS CodePipeline
- Pros: AWS-native, integrates with CodeBuild
- Cons: $1/pipeline/month, more complex setup
- Rejected: GitHub Actions simpler and free
- GitLab CI
- Pros: Similar to GitHub Actions, good UX
- Cons: Need to migrate repo, learning curve
- Rejected: Already using GitHub
- Jenkins
- Pros: Full control, highly customizable
- Cons: Need to host ($30/month EC2), complex setup, maintenance
- Rejected: Too much overhead
- CircleCI / Travis CI
- Pros: Good UX, mature
- Cons: Cost ($30+/month for private), not as integrated
- Rejected: GitHub Actions more convenient
ADR-009: Python 3.13 for Lambda Runtime
Status: Accepted Date: 2025-10-15
Context
Need to choose Lambda runtime. Python is preferred for:
- Team expertise
- AWS SDK (boto3) built-in
- AI/ML libraries (LangChain, etc.)
Decision
Use Python 3.13 runtime (latest available).
Consequences
Positive:
- Performance: Faster than Python 3.11/3.12
- Features: Latest Python features
- Support: Will be supported for ~5 years
- Libraries: All major libraries compatible
Negative:
- Newer runtime: Less battle-tested (3.12 more stable)
- Dependencies: Some libraries may lag
Cost Impact: None
Alternatives Considered
- Python 3.12
- Pros: More stable, better tested
- Cons: Slightly slower, older features
- May switch if stability issues arise
- Node.js
- Pros: Faster cold starts, async-native
- Cons: Different language, less AI/ML ecosystem
- Rejected: Team less familiar
- Go
- Pros: Fastest cold starts, compiled
- Cons: Verbose, harder to write, less AWS library support
- Rejected: Development speed more important
- Java
- Pros: Enterprise-grade, good AWS support
- Cons: Slow cold starts (5s+), verbose
- Rejected: Cold starts unacceptable
ADR-010: Migrate from Pinecone to S3 Vectors
Status: Accepted Date: 2025-12-15
Context
Pinecone worked well but had limitations for our use case:
- External dependency: Data stored outside AWS ecosystem
- API key management: Additional secret to manage and rotate
- Cost structure: Fixed monthly cost regardless of usage
- Integration complexity: Separate service from Bedrock Knowledge Base
AWS announced S3 Vectors, a purpose-built vector storage service that integrates natively with Bedrock Knowledge Bases.
Decision
Migrate from Pinecone to AWS S3 Vectors for vector storage.
Key benefits of S3 Vectors:
- Native Bedrock Knowledge Base integration
- Fully managed within AWS ecosystem
- Pay-per-use pricing model
- No external API keys required
- Automatic scaling and high availability
Consequences
Positive:
- Simplified architecture: Single AWS ecosystem, no external dependencies
- Native integration: Direct integration with Bedrock Knowledge Base
- Security: No external API keys, uses IAM for access control
- Cost model: Pay only for storage and queries used
- Compliance: Data stays within AWS, simplifies compliance
Negative:
- Migration effort: Required re-ingestion of all documents
- Feature differences: S3 Vectors has different metadata limits (1KB with Bedrock KB)
- Newer service: Less battle-tested than Pinecone
Cost Impact: Variable based on usage (vs $70/month fixed with Pinecone)
Migration Steps
- Created S3 Vectors bucket and index (1024-dim, cosine similarity)
- Updated Terraform to use
aws_bedrockagent_knowledge_basewith S3 Vectors storage - Configured IAM policies for Bedrock to access S3 Vectors
- Re-ingested all documents via Bedrock Knowledge Base sync
- Updated Lambda handlers to use Bedrock Knowledge Base Retrieve API
- Removed Pinecone provider and related Terraform resources
- Deleted Pinecone API key from Secrets Manager
Configuration Details
# S3 Vectors storage configuration
storage_configuration {
type = "S3_VECTORS"
s3_vectors_configuration {
index_arn = var.s3_vectors_index_arn
}
}
# Fixed-size chunking (512 tokens, 20% overlap)
chunking_configuration {
chunking_strategy = "FIXED_SIZE"
fixed_size_chunking_configuration {
max_tokens = 512
overlap_percentage = 20
}
}
Summary
| ADR | Decision | Status | Impact |
|---|---|---|---|
| 001 | Serverless on Lambda | Accepted | ~$140/month, no ops |
| 002 | Pinecone for vectors | Superseded | Replaced by S3 Vectors |
| 003 | Claude Sonnet 4.5 | Accepted | $11/month per 1K queries |
| 004 | Cognito + Google OAuth | Accepted | Free, managed auth |
| 005 | HTTP API (not REST) | Accepted | 70% cost savings |
| 006 | Migrate OpenSearch → Pinecone | Superseded | Replaced by S3 Vectors |
| 007 | Terraform for IaC | Accepted | Repeatable deployments |
| 008 | GitHub Actions for CI/CD | Accepted | Free, secure OIDC |
| 009 | Python 3.13 runtime | Accepted | Latest features, good perf |
| 010 | Migrate Pinecone → S3 Vectors | Accepted | Native AWS, pay-per-use |
Last updated: 2025-12-29