RAG System Improvement Backlog
This document tracks pending improvements, optimizations, and technical debt for the NorthBuilt RAG system. Items are categorized by priority and component.
For completed improvements, see the RAG Changelog.
Key AWS Documentation:
Table of Contents
- Pending Improvements
- Bedrock Knowledge Base Configuration Constraints
- Cannot Change After Knowledge Base Creation
- Cannot Change After Data Source Creation
- Can Be Updated
- 8. Enable CloudTrail Data Event Logging for S3 Vectors
- 9. Configure VPC Endpoint for S3 Vectors (PrivateLink)
- 10. Consider Per-Client Vector Indexes for Strong Isolation
- 11. Export to OpenSearch for Hybrid Search (Alternative to Migration)
- 12. Include Metadata in Embeddings for Improved Semantic Search
- 13. Binary Embeddings for Cost Optimization
- 14. Implicit Filter Configuration
- 15. Guardrails Integration for Content Filtering
- 16. Custom Transformation Lambda for Meeting Transcripts
- S3 Vectors Specific Limitations
- Architecture Decision Record References
- Testing and CI/CD Improvements
- Review Schedule
Pending Improvements
MEDIUM Priority
These improvements should be addressed in the next development cycle.
1. Response Caching
Component: Chat Lambda
Files: lambda/node/chat/index.js, new cache module
Current State: Every query hits the Knowledge Base and LLM, even for frequently asked questions. This increases latency and cost.
Recommended Changes:
- Implement semantic caching using vector similarity
- Cache recent query-response pairs in DynamoDB or ElastiCache
- Return cached response if similarity > threshold (e.g., 0.95)
# Pseudocode for semantic caching
def get_cached_response(query_embedding, threshold=0.95):
"""Check if similar query was recently answered."""
# Query cache index with embedding
# If similarity > threshold, return cached response
# Otherwise, proceed with full RAG pipeline
pass
Trade-offs:
- Cache hit = faster response, lower cost
- Cache miss = slight overhead from cache lookup
- Risk of stale responses for rapidly changing knowledge
Effort: Medium (3-5 days) Impact: Reduced latency and cost for common queries
2. Enhanced Error Handling and Observability (Partially Implemented)
Component: All Lambdas Files: Multiple
Current State: CloudWatch metrics are implemented, but structured logging with correlation IDs and dashboards are not.
Already Implemented:
RAGMetricsclass inlambda/shared/utils/metrics.py- Retrieval latency metrics (
RetrievalLatencyMs) - Document count metrics (
CandidatesRetrieved,ResultsAfterFilter) - Filter effectiveness metrics (
FilterEffectiveness) - Error counts by type
Still Pending:
- Structured JSON logging with correlation IDs
- LLM token usage metrics
- CloudWatch dashboard for RAG metrics
# Example structured logging (NOT YET IMPLEMENTED)
logger.info({
'event': 'rag_query',
'correlation_id': correlation_id,
'query_length': len(query),
'documents_retrieved': len(documents),
'filters': {'client': client},
'latency_ms': retrieval_latency
})
Effort: Low-Medium (1-2 days for remaining items) Impact: Improved debugging and monitoring
LOW Priority
3. Multi-Region Disaster Recovery
Component: All infrastructure Files: Multiple Terraform files
Current State: Single-region deployment. If us-east-1 has an outage, the system is unavailable.
Recommended Architecture:
- S3 cross-region replication (already configured for backup)
- DynamoDB global tables for chat sessions
- Lambda deployment in secondary region
- Route 53 health checks and failover routing
- S3 Vectors replication (when supported)
Note: S3 Vectors is a relatively new service. Check AWS documentation for multi-region capabilities and cross-region replication support.
Effort: Very High (weeks of work) Impact: High availability for production workloads
4. Cost Optimization: Right-Size LLM Usage
Component: Chat Lambda
Files: lambda/node/chat/index.js, Terraform
Current State: Claude Sonnet 4.5 is used for all queries regardless of complexity.
Recommended Changes:
- Implement query complexity classification
- Use lighter models (Haiku) for simple queries
- Use Sonnet for complex queries requiring reasoning
function selectModel(query, documents) {
// Simple heuristics:
// - Short query + few documents = Haiku
// - Long query or many documents = Sonnet
if (query.length < 100 && documents.length <= 2) {
return process.env.BEDROCK_LIGHT_MODEL || 'anthropic.claude-3-haiku-...';
}
return process.env.BEDROCK_LLM_MODEL;
}
Trade-offs:
- Cost savings: Haiku is ~10x cheaper than Sonnet
- Quality: Simple queries may get adequate answers from lighter models
- Complexity: Need to define and tune classification criteria
Effort: Medium (3-4 days) Impact: 30-50% cost reduction on LLM usage
5. Hybrid Search Support
Component: Knowledge Base configuration Files: Terraform, Chat Lambda
Current State: S3 Vectors supports semantic (vector) search only. Hybrid search (combining keyword and semantic) is not available with S3 Vectors.
Consideration: If hybrid search is required in the future:
- Migrate to OpenSearch Serverless (supports hybrid search)
- Or implement application-level keyword filtering after vector retrieval
- Or use Bedrock’s built-in hybrid search with compatible vector stores
Note: This is a limitation of S3 Vectors, not a bug. Evaluate if hybrid search is necessary for your use case before considering migration.
Effort: Very High (vector store migration) Impact: Improved retrieval for keyword-heavy queries
6. Implement Direct Ingestion for Real-Time Updates
Component: Sync Lambdas (Fathom, HelpScout) Documentation:
Current State: Documents are uploaded to S3, then a scheduled ingestion job syncs them to the Knowledge Base. This creates a delay between document creation and availability for queries.
Issue:
- Latency: Documents may take minutes to become searchable
- Scheduling: Relies on EventBridge scheduled sync every 5 minutes
Recommended Change:
Use IngestKnowledgeBaseDocuments API for immediate ingestion:
response = bedrock_agent.ingest_knowledge_base_documents(
knowledgeBaseId=knowledge_base_id,
dataSourceId=data_source_id,
documents=[
{
'content': {
's3': {
's3Location': {
'uri': f's3://{bucket}/{key}'
}
},
'dataSourceType': 'S3'
},
'metadata': {
's3Location': {
'uri': f's3://{bucket}/{key}.metadata.json'
},
'type': 'S3_LOCATION'
}
}
]
)
Benefits:
- Immediate document availability (seconds vs minutes)
- Up to 25 documents per API call
- Can be called directly from webhook handlers
Caveats:
- Documents ingested directly are NOT added to S3 (add to S3 separately to prevent removal on next full sync)
- Do NOT call simultaneously with
StartIngestionJob
Effort: Medium (update webhook handlers) Impact: Near real-time document availability
7. Consider Semantic Chunking for Conversational Content
Component: Bedrock Data Source Documentation:
Current State: Using FIXED_SIZE chunking with 512 tokens and 20% overlap.
Consideration: For conversational content (meeting transcripts, support conversations), SEMANTIC chunking may provide better results:
How Semantic Chunking Works:
- Uses NLP to identify meaning boundaries
- Chunks based on semantic content rather than token count
- Parameters:
maxTokens: Maximum tokens per chunkbufferSize: Surrounding sentences for context (e.g., 1 = previous + current + next)breakpointPercentileThreshold: Dissimilarity threshold for splits
Trade-offs:
| Factor | FIXED_SIZE | SEMANTIC |
|---|---|---|
| Predictability | High | Variable |
| Cost | No extra cost | Foundation model costs |
| Conversation preservation | May split mid-conversation | Better at finding natural breaks |
| Metadata overhead | Predictable | Variable (may affect S3 Vectors limits) |
Recommendation: Evaluate with a subset of documents before changing. The current FIXED_SIZE is safe for S3 Vectors, while SEMANTIC adds cost and unpredictable chunk sizes.
IMPORTANT: Chunking strategy cannot be changed after data source creation. Would require recreating the data source.
Effort: High (requires data source recreation and re-ingestion) Impact: Potentially improved retrieval for conversational content
Bedrock Knowledge Base Configuration Constraints
Based on AWS documentation, these configurations are IMMUTABLE after creation:
Cannot Change After Knowledge Base Creation
| Configuration | Location | Impact |
|---|---|---|
| Vector store type | storage_configuration.type |
Must recreate entire KB |
| Embedding model | embedding_model_arn |
Must recreate entire KB |
| Embedding dimensions | embedding_model_configuration.dimensions |
Must recreate entire KB |
| Supplemental data storage | supplementalDataStorageConfiguration |
Cannot add multimodal support later |
Cannot Change After Data Source Creation
| Configuration | Location | Impact |
|---|---|---|
| Chunking strategy | chunking_configuration.chunking_strategy |
Must recreate data source |
| Chunking parameters | max_tokens, overlap_percentage |
Must recreate data source |
| Parsing strategy | parsing_configuration.parsing_strategy |
Must recreate data source |
| Parsing model | bedrock_foundation_model_configuration.model_arn |
Must recreate data source |
Can Be Updated
| Configuration | How |
|---|---|
| Knowledge base name/description | UpdateKnowledgeBase API |
| Data source files | Add/modify files in S3, then sync |
| Data deletion policy | UpdateDataSource API |
| KMS encryption key | UpdateDataSource API |
| IAM role | UpdateDataSource API (with new role having proper permissions) |
Implication: Plan chunking and parsing strategies carefully before creating data sources. Changing them requires:
- Creating new data source with desired configuration
- Re-syncing all documents
- Deleting old data source
- Testing thoroughly before production use
8. Enable CloudTrail Data Event Logging for S3 Vectors
Component: CloudTrail Configuration Documentation:
Current State: CloudTrail management events are logged by default, but data events (QueryVectors, PutVectors, GetVectors, DeleteVectors, ListVectors) are NOT logged.
Issue:
- Cannot audit who queried what vectors
- No visibility into vector operation patterns
- Limited security and compliance posture
Recommended Change: Enable CloudTrail data event logging for S3 Vectors:
resource "aws_cloudtrail" "s3_vectors_data_events" {
name = "${var.resource_prefix}-s3vectors-trail"
s3_bucket_name = aws_s3_bucket.cloudtrail_logs.id
event_selector {
read_write_type = "All"
include_management_events = false
data_resource {
type = "AWS::S3Vectors::Index"
values = [aws_s3vectors_index.main.arn]
}
}
}
Logged Operations:
PutVectors- Vector insertionsGetVectors- Vector retrievalsDeleteVectors- Vector deletionsListVectors- Vector listingsQueryVectors- Similarity queries
Effort: Low (Terraform configuration) Impact: Security and compliance visibility
9. Configure VPC Endpoint for S3 Vectors (PrivateLink)
Component: Network Security Documentation:
Current State: S3 Vectors accessed via public endpoints. Traffic traverses the internet.
Issue:
- For compliance-sensitive workloads, traffic should stay within AWS network
- Potential exposure to internet-based threats
- Some compliance frameworks require private connectivity
Recommended Change: Create VPC interface endpoint for S3 Vectors:
resource "aws_vpc_endpoint" "s3_vectors" {
vpc_id = var.vpc_id
service_name = "com.amazonaws.${var.region}.s3vectors"
vpc_endpoint_type = "Interface"
subnet_ids = var.private_subnet_ids
security_group_ids = [aws_security_group.s3_vectors_endpoint.id]
private_dns_enabled = true
policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Effect = "Allow"
Principal = "*"
Action = ["s3vectors:*"]
Resource = [
aws_s3vectors_vector_bucket.main.vector_bucket_arn,
"${aws_s3vectors_vector_bucket.main.vector_bucket_arn}/*"
]
}]
})
}
Benefits:
- Traffic stays within AWS network
- Enhanced security posture
- Meets compliance requirements for private connectivity
Effort: Medium (requires VPC configuration) Impact: Security and compliance
10. Consider Per-Client Vector Indexes for Strong Isolation
Component: S3 Vectors Architecture Documentation:
Current State: Single vector index with client-level metadata filtering for multi-tenant isolation:
filter={"client": "acme-corp"}
Note: Project metadata is stored for display but not used for filtering, allowing all documents from a client’s projects to contribute to RAG context.
Alternative Approach: AWS recommends using one vector index per tenant for:
- Stronger data isolation
- Per-tenant IAM policies
- Independent scaling per tenant
- Clearer cost attribution
Trade-offs:
| Approach | Current (Shared Index) | Per-Tenant Index |
|---|---|---|
| Isolation | Metadata filtering | Physical separation |
| IAM Control | Limited | Per-index policies |
| Scaling | Shared limits | Independent limits |
| Cost Tracking | Difficult | Per-index attribution |
| Complexity | Lower | Higher (index management) |
| Cross-tenant queries | Possible (with filter) | Requires multi-index query |
When to Consider:
- Regulatory requirements for physical data separation
- Need for per-client IAM policies
- Clients with vastly different usage patterns
- Strict cost attribution requirements
Effort: High (architecture change) Impact: Stronger tenant isolation
11. Export to OpenSearch for Hybrid Search (Alternative to Migration)
Component: Search Architecture Documentation:
Current State: S3 Vectors supports semantic search only. No hybrid (keyword + semantic) search.
Alternative to Full Migration: Instead of migrating away from S3 Vectors entirely, can export to OpenSearch Serverless for hybrid search while keeping S3 Vectors for cost-effective storage:
Two Integration Options:
- Export to OpenSearch Serverless:
- Point-in-time copy of vectors
- Full hybrid search, aggregations, faceted search
- Dual storage costs
- Best for: High-throughput, low-latency requirements
- OpenSearch with S3 Vectors Engine:
- S3 Vectors as storage backend for OpenSearch
- OpenSearch query API with S3 Vectors storage costs
- Best for: Lower throughput, cost-sensitive workloads
When to Consider:
- Users need keyword + semantic search
- Complex filtering or aggregations required
- Faceted search UI needed
Note: This is an alternative to full vector store migration, not a replacement.
Effort: Medium (OpenSearch configuration) Impact: Hybrid search capability without full migration
12. Include Metadata in Embeddings for Improved Semantic Search
Component: Document Ingestion (Sync Workers) Priority: LOW (marginal benefit) Documentation:
Current State:
The build_bedrock_metadata_json() function creates sidecar metadata files but doesn’t set the includeForEmbedding option:
# Current implementation (validation.py)
metadata_attributes[key] = {
"value": {
"type": "STRING",
"stringValue": str_value
}
}
Issue:
- Metadata is only used for filtering, not for semantic search
- In theory, queries mentioning client or project names might have slightly better semantic matches
Why Marginal Benefit:
The current query understanding system already extracts client names from queries and applies them as filters. For example, “What did Valley Equipment discuss?” is already processed to extract “Valley Equipment” and filter documents to that client. Adding includeForEmbedding would only provide a small additional signal in vector similarity scoring - the filtering already ensures only relevant client documents are searched.
Recommended Change:
Add includeForEmbedding: true for key metadata fields:
def build_bedrock_metadata_json(attributes: Dict[str, Any], embedding_fields: Optional[Set[str]] = None) -> str:
"""
Build metadata with optional embedding inclusion.
Args:
attributes: Metadata key-value pairs
embedding_fields: Set of field names to include in embeddings (default: {'client', 'project'})
"""
embedding_fields = embedding_fields or {'client', 'project'}
for key, value in filtered_attributes.items():
include_in_embedding = key in embedding_fields
if isinstance(value, str):
metadata_attributes[key] = {
"value": {"type": "STRING", "stringValue": value},
"includeForEmbedding": include_in_embedding
}
Expected Metadata Output:
{
"metadataAttributes": {
"client": {
"value": {"type": "STRING", "stringValue": "Valley Equipment"},
"includeForEmbedding": true
},
"project": {
"value": {"type": "STRING", "stringValue": "Equipment Sales"},
"includeForEmbedding": true
},
"source": {
"value": {"type": "STRING", "stringValue": "fathom"},
"includeForEmbedding": false
}
}
}
Benefits:
- Slight improvement in semantic similarity scoring for queries mentioning client/project names
- No additional API calls or infrastructure changes
- Backward compatible (existing documents continue to work)
Trade-offs:
- Slightly larger embedding vectors
- Requires re-ingestion of existing documents
- Benefit is marginal since query understanding already handles client extraction
Effort: Low (code change) + Medium (re-ingestion) Impact: Marginal improvement - existing query understanding and filtering handles the main use case
13. Binary Embeddings for Cost Optimization
Component: Knowledge Base Configuration Priority: MEDIUM Documentation:
Current State: Using FLOAT32 embeddings (default, highest precision):
# terraform/modules/bedrock/main.tf
embedding_model_configuration {
bedrock_embedding_model_configuration {
dimensions = 1024
embedding_data_type = "FLOAT32" # Current setting
}
}
Consideration: Titan Text Embeddings V2 supports BINARY embedding type for cost optimization:
embedding_model_configuration {
bedrock_embedding_model_configuration {
dimensions = 1024
embedding_data_type = "BINARY" # 32x smaller storage
}
}
Trade-offs:
| Factor | FLOAT32 | BINARY |
|---|---|---|
| Precision | Highest | Lower (~10% accuracy reduction) |
| Storage | 4 bytes/dimension | 1 bit/dimension |
| Cost | Higher | ~32x lower storage cost |
| Query speed | Baseline | Potentially faster |
When to Consider:
- Large document corpus (10,000+ documents)
- Cost-sensitive deployments
- Acceptable accuracy trade-off
IMPORTANT: Embedding data type cannot be changed after Knowledge Base creation. Would require recreating the entire KB.
Effort: Very High (requires KB recreation and re-ingestion) Impact: Significant cost reduction for large deployments
14. Implicit Filter Configuration
Component: Knowledge Base Configuration Priority: LOW Documentation:
Current State: Client filtering is implemented in application code:
// lambda/node/chat/index.js
if (clientFilter) {
retrievalConfig.vectorSearchConfiguration.filter = {
equals: { key: 'client', value: clientFilter }
};
}
Alternative Approach: Bedrock supports implicit filter configuration at the KB level:
response = bedrock_agent_runtime.retrieve(
knowledgeBaseId=knowledge_base_id,
retrievalQuery={'text': query},
retrievalConfiguration={
'vectorSearchConfiguration': {
'implicitFilterConfiguration': {
'metadataAttributes': [
{
'key': 'client',
'type': 'STRING',
'description': 'The client organization name'
}
],
'modelArn': 'arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-haiku-...'
}
}
}
)
How It Works:
- Bedrock uses a foundation model to automatically extract filter values from the query
- Example: “What equipment does Valley have?” →
filter: {client: "Valley Equipment"} - Similar to current query understanding but built into Bedrock
Trade-offs:
| Factor | Current (Custom) | Implicit Filter |
|---|---|---|
| Control | Full | Limited to KB capabilities |
| Clarification UI | Supported | Not supported |
| Known entities list | Custom DynamoDB | No custom list |
| Cost | Claude Haiku call | Bedrock implicit call |
Recommendation: Current custom implementation is more flexible (supports clarification prompts, custom entity validation). Consider implicit filters only for simpler use cases.
Effort: Medium (refactor retrieval code) Impact: Simplified code but less flexibility
15. Guardrails Integration for Content Filtering
Component: Knowledge Base Retrieval Priority: LOW Documentation:
Current State: No content filtering or guardrails applied to retrieved documents or generated responses.
Consideration: Bedrock Guardrails can filter:
- PII (personally identifiable information)
- Harmful content
- Custom denied topics
- Sensitive information patterns
Implementation (if using RetrieveAndGenerate):
response = bedrock_agent_runtime.retrieve_and_generate(
input={'text': query},
retrieveAndGenerateConfiguration={
'type': 'KNOWLEDGE_BASE',
'knowledgeBaseConfiguration': {
'knowledgeBaseId': knowledge_base_id,
'modelArn': model_arn,
'guardrailConfiguration': {
'guardrailId': 'your-guardrail-id',
'guardrailVersion': '1'
}
}
}
)
When to Consider:
- Compliance requirements (HIPAA, PCI-DSS)
- User-facing applications requiring content moderation
- Documents containing sensitive information
Prerequisites:
- Create Guardrail in Bedrock console
- Define content policies
- Migrate to RetrieveAndGenerate API (or apply guardrails at generation step)
Effort: Medium (requires Guardrail setup + API changes) Impact: Enhanced compliance and content safety
16. Custom Transformation Lambda for Meeting Transcripts
Component: Data Source Configuration Priority: LOW Documentation:
Current State: Meeting transcripts from Fathom are chunked using standard FIXED_SIZE strategy (512 tokens, 20% overlap).
Issue:
- Transcripts may be split mid-conversation
- Speaker context may be lost across chunks
- Action items and decisions may span multiple chunks
Recommended Change: Add a Lambda function for post-chunking transformation:
# terraform/modules/bedrock/main.tf
vector_ingestion_configuration {
chunking_configuration {
chunking_strategy = "FIXED_SIZE"
fixed_size_chunking_configuration {
max_tokens = 512
overlap_percentage = 20
}
}
custom_transformation_configuration {
transformations {
step_to_apply = "POST_CHUNKING"
transformation_function {
transformation_lambda_configuration {
lambda_arn = aws_lambda_function.chunk_transformer.arn
}
}
}
intermediate_storage {
s3_location {
uri = "s3://${var.bucket_name}/transform-output/"
}
}
}
}
Lambda Function Purpose:
def transform_chunks(event, context):
"""
Post-process chunks to improve meeting transcript quality.
- Add speaker context to each chunk
- Ensure action items are not split
- Add meeting metadata summary to each chunk
- Identify and tag key decisions
"""
for chunk in event['chunks']:
# Add speaker context
chunk['content'] = add_speaker_context(chunk['content'])
# Add chunk-level metadata
chunk['metadata']['has_action_items'] = detect_action_items(chunk['content'])
return {'chunks': chunks}
Trade-offs:
- Additional Lambda execution cost
- Increased ingestion time
- More complex debugging
When to Consider:
- Meeting transcripts are primary content source
- Users frequently search for action items or decisions
- Current retrieval quality for conversations is poor
IMPORTANT: Custom transformation cannot be added after data source creation. Would require recreating the data source.
Effort: High (Lambda development + data source recreation) Impact: Improved retrieval for conversational content
S3 Vectors Specific Limitations
Based on comprehensive S3 Vectors documentation review, these are the key limits. See official documentation:
- S3 Vectors Limitations and Restrictions
- S3 Vectors Regions, Endpoints, and Quotas
- S3 Vectors Metadata Filtering
Storage and Structural Limits
| Limit | Value | Current Usage |
|---|---|---|
| Vector buckets per region | 10,000 | 1 |
| Vector indexes per bucket | 10,000 | 1 |
| Vectors per index | 2 billion | TBD |
| Dimension range | 1-4,096 | 1,024 (Titan v2) |
Metadata Limits
| Limit | Value | Notes |
|---|---|---|
| Total metadata per vector | 40 KB | Sufficient |
| Filterable metadata per vector | 2 KB | Monitor size |
| Total metadata keys per vector | 50 | Track count |
| Non-filterable keys per index | 10 | Set at creation |
| Bedrock KB metadata limit | 1 KB custom, 35 keys | More restrictive |
Rate Limits
| Operation | Limit | Error Code |
|---|---|---|
| PutVectors + DeleteVectors requests | 1,000/second/index | 429 TooManyRequestsException |
| Vectors inserted/deleted | 2,500/second/index | 429 TooManyRequestsException |
| QueryVectors/GetVectors/ListVectors | Hundreds/second/index | 429 TooManyRequestsException |
API Limits
| Operation | Limit |
|---|---|
| PutVectors batch size | 500 vectors/call |
| DeleteVectors batch size | 500 vectors/call |
| GetVectors batch size | 100 vectors/call |
| QueryVectors TopK | 100 results/request |
| Request payload | 20 MiB |
Performance Characteristics
| Metric | Value |
|---|---|
| Cold query latency | Sub-second |
| Warm query latency | ~100ms |
| Average recall | 90%+ for most datasets |
| Write consistency | Strongly consistent (immediate access) |
Known Constraints
| Constraint | Impact |
|---|---|
| Hierarchical chunking | NOT recommended (metadata size limits) |
| Hybrid search | NOT supported (semantic only) |
| Non-filterable keys | Immutable after index creation |
| Encryption type | Immutable after bucket creation |
| Vector dimensions | Immutable after index creation |
| Distance metric | Immutable after index creation |
| LLM parsing | DISABLED - exceeds 2KB filterable metadata limit for large docs |
Current Configuration (as of 2025-12-31)
| Setting | Value | Notes |
|---|---|---|
| Chunking strategy | FIXED_SIZE | 512 tokens, 20% overlap |
| LLM parsing | Disabled | Sidecar metadata files used instead |
| Non-filterable keys | AMAZON_BEDROCK_TEXT, AMAZON_BEDROCK_METADATA |
Required for 100% ingestion success |
| Data deletion policy | DELETE | Vectors auto-removed when S3 docs deleted |
| Filterable metadata | source, client, category |
For multi-tenant isolation (project stored but not filtered) |
Architecture Decision Record References
Related ADRs for context:
- ADR-010: S3 Vectors Migration - Decision to migrate from Pinecone to S3 Vectors
- ADR-002: Chunking Strategy - FIXED_SIZE vs HIERARCHICAL decision
Testing and CI/CD Improvements
The following items track missing test coverage and CI/CD gaps.
24. Add Unit Tests for Dashboard Lambda (Node.js)
Component: Dashboard Lambda
Files: lambda/node/dashboard/index.js
Priority: MEDIUM
Current State: The Dashboard Lambda has no unit tests. It queries CloudWatch metrics and logs but has no test coverage.
Recommended Changes:
- Create
lambda/node/dashboard/index.test.js - Mock CloudWatch and CloudWatch Logs clients
- Test metric aggregation logic
- Test error handling for missing/invalid data
- Add to CI test matrix
Effort: Medium (2-3 days) Impact: Improved reliability and maintainability
25. Add Unit Tests for Chat Lambda (Node.js)
Component: Chat Lambda
Files: lambda/node/chat/index.js
Priority: MEDIUM
Current State: The Chat Lambda has Jest configured in package.json but no actual test files. It handles streaming responses, Bedrock KB queries, and conversation management without test coverage.
Recommended Changes:
- Create
lambda/node/chat/index.test.js - Mock Bedrock Agent Runtime, DynamoDB, S3 clients
- Test streaming response handling
- Test conversation history management
- Test error handling and timeout scenarios
Effort: Medium-High (3-5 days) Impact: Critical - this is the main user-facing Lambda
26. Add Ingest Lambda to Snyk Security Scan Matrix
Component: CI/CD Workflow
Files: .github/workflows/test.yml
Priority: HIGH
Current State:
The ingest Lambda is in the unit test matrix but missing from the Snyk Python security scan matrix.
Recommended Changes:
Add ingest to the Snyk Python matrix in .github/workflows/test.yml:
snyk-python:
strategy:
matrix:
function:
- classification
- ingest # ADD THIS
- webhooks/fathom
# ... rest of functions
Effort: Low (5 minutes) Impact: Security coverage for ingest function dependencies
Review Schedule
This document should be reviewed:
- After each major feature implementation
- Quarterly for prioritization updates
- When AWS announces new S3 Vectors or Bedrock features
Last updated: 2026-01-17 (Added Testing and CI/CD Improvements section)