RAG System Improvement Backlog

This document tracks pending improvements, optimizations, and technical debt for the NorthBuilt RAG system. Items are categorized by priority and component.

For completed improvements, see the RAG Changelog.

Key AWS Documentation:

Table of Contents

  1. Pending Improvements
    1. MEDIUM Priority
      1. 1. Response Caching
      2. 2. Enhanced Error Handling and Observability (Partially Implemented)
    2. LOW Priority
      1. 3. Multi-Region Disaster Recovery
      2. 4. Cost Optimization: Right-Size LLM Usage
      3. 5. Hybrid Search Support
      4. 6. Implement Direct Ingestion for Real-Time Updates
      5. 7. Consider Semantic Chunking for Conversational Content
  2. Bedrock Knowledge Base Configuration Constraints
    1. Cannot Change After Knowledge Base Creation
    2. Cannot Change After Data Source Creation
    3. Can Be Updated
      1. 8. Enable CloudTrail Data Event Logging for S3 Vectors
      2. 9. Configure VPC Endpoint for S3 Vectors (PrivateLink)
      3. 10. Consider Per-Client Vector Indexes for Strong Isolation
      4. 11. Export to OpenSearch for Hybrid Search (Alternative to Migration)
      5. 12. Include Metadata in Embeddings for Improved Semantic Search
      6. 13. Binary Embeddings for Cost Optimization
      7. 14. Implicit Filter Configuration
      8. 15. Guardrails Integration for Content Filtering
      9. 16. Custom Transformation Lambda for Meeting Transcripts
  3. S3 Vectors Specific Limitations
    1. Storage and Structural Limits
    2. Metadata Limits
    3. Rate Limits
    4. API Limits
    5. Performance Characteristics
    6. Known Constraints
    7. Current Configuration (as of 2025-12-31)
  4. Architecture Decision Record References
  5. Testing and CI/CD Improvements
    1. 24. Add Unit Tests for Dashboard Lambda (Node.js)
    2. 25. Add Unit Tests for Chat Lambda (Node.js)
    3. 26. Add Ingest Lambda to Snyk Security Scan Matrix
  6. Review Schedule

Pending Improvements

MEDIUM Priority

These improvements should be addressed in the next development cycle.

1. Response Caching

Component: Chat Lambda Files: lambda/node/chat/index.js, new cache module

Current State: Every query hits the Knowledge Base and LLM, even for frequently asked questions. This increases latency and cost.

Recommended Changes:

  1. Implement semantic caching using vector similarity
  2. Cache recent query-response pairs in DynamoDB or ElastiCache
  3. Return cached response if similarity > threshold (e.g., 0.95)
# Pseudocode for semantic caching
def get_cached_response(query_embedding, threshold=0.95):
    """Check if similar query was recently answered."""
    # Query cache index with embedding
    # If similarity > threshold, return cached response
    # Otherwise, proceed with full RAG pipeline
    pass

Trade-offs:

  • Cache hit = faster response, lower cost
  • Cache miss = slight overhead from cache lookup
  • Risk of stale responses for rapidly changing knowledge

Effort: Medium (3-5 days) Impact: Reduced latency and cost for common queries


2. Enhanced Error Handling and Observability (Partially Implemented)

Component: All Lambdas Files: Multiple

Current State: CloudWatch metrics are implemented, but structured logging with correlation IDs and dashboards are not.

Already Implemented:

  • RAGMetrics class in lambda/shared/utils/metrics.py
  • Retrieval latency metrics (RetrievalLatencyMs)
  • Document count metrics (CandidatesRetrieved, ResultsAfterFilter)
  • Filter effectiveness metrics (FilterEffectiveness)
  • Error counts by type

Still Pending:

  1. Structured JSON logging with correlation IDs
  2. LLM token usage metrics
  3. CloudWatch dashboard for RAG metrics
# Example structured logging (NOT YET IMPLEMENTED)
logger.info({
    'event': 'rag_query',
    'correlation_id': correlation_id,
    'query_length': len(query),
    'documents_retrieved': len(documents),
    'filters': {'client': client},
    'latency_ms': retrieval_latency
})

Effort: Low-Medium (1-2 days for remaining items) Impact: Improved debugging and monitoring


LOW Priority

3. Multi-Region Disaster Recovery

Component: All infrastructure Files: Multiple Terraform files

Current State: Single-region deployment. If us-east-1 has an outage, the system is unavailable.

Recommended Architecture:

  1. S3 cross-region replication (already configured for backup)
  2. DynamoDB global tables for chat sessions
  3. Lambda deployment in secondary region
  4. Route 53 health checks and failover routing
  5. S3 Vectors replication (when supported)

Note: S3 Vectors is a relatively new service. Check AWS documentation for multi-region capabilities and cross-region replication support.

Effort: Very High (weeks of work) Impact: High availability for production workloads


4. Cost Optimization: Right-Size LLM Usage

Component: Chat Lambda Files: lambda/node/chat/index.js, Terraform

Current State: Claude Sonnet 4.5 is used for all queries regardless of complexity.

Recommended Changes:

  1. Implement query complexity classification
  2. Use lighter models (Haiku) for simple queries
  3. Use Sonnet for complex queries requiring reasoning
function selectModel(query, documents) {
  // Simple heuristics:
  // - Short query + few documents = Haiku
  // - Long query or many documents = Sonnet
  if (query.length < 100 && documents.length <= 2) {
    return process.env.BEDROCK_LIGHT_MODEL || 'anthropic.claude-3-haiku-...';
  }
  return process.env.BEDROCK_LLM_MODEL;
}

Trade-offs:

  • Cost savings: Haiku is ~10x cheaper than Sonnet
  • Quality: Simple queries may get adequate answers from lighter models
  • Complexity: Need to define and tune classification criteria

Effort: Medium (3-4 days) Impact: 30-50% cost reduction on LLM usage


5. Hybrid Search Support

Component: Knowledge Base configuration Files: Terraform, Chat Lambda

Current State: S3 Vectors supports semantic (vector) search only. Hybrid search (combining keyword and semantic) is not available with S3 Vectors.

Consideration: If hybrid search is required in the future:

  1. Migrate to OpenSearch Serverless (supports hybrid search)
  2. Or implement application-level keyword filtering after vector retrieval
  3. Or use Bedrock’s built-in hybrid search with compatible vector stores

Note: This is a limitation of S3 Vectors, not a bug. Evaluate if hybrid search is necessary for your use case before considering migration.

Effort: Very High (vector store migration) Impact: Improved retrieval for keyword-heavy queries


6. Implement Direct Ingestion for Real-Time Updates

Component: Sync Lambdas (Fathom, HelpScout) Documentation:

Current State: Documents are uploaded to S3, then a scheduled ingestion job syncs them to the Knowledge Base. This creates a delay between document creation and availability for queries.

Issue:

  • Latency: Documents may take minutes to become searchable
  • Scheduling: Relies on EventBridge scheduled sync every 5 minutes

Recommended Change: Use IngestKnowledgeBaseDocuments API for immediate ingestion:

response = bedrock_agent.ingest_knowledge_base_documents(
    knowledgeBaseId=knowledge_base_id,
    dataSourceId=data_source_id,
    documents=[
        {
            'content': {
                's3': {
                    's3Location': {
                        'uri': f's3://{bucket}/{key}'
                    }
                },
                'dataSourceType': 'S3'
            },
            'metadata': {
                's3Location': {
                    'uri': f's3://{bucket}/{key}.metadata.json'
                },
                'type': 'S3_LOCATION'
            }
        }
    ]
)

Benefits:

  • Immediate document availability (seconds vs minutes)
  • Up to 25 documents per API call
  • Can be called directly from webhook handlers

Caveats:

  • Documents ingested directly are NOT added to S3 (add to S3 separately to prevent removal on next full sync)
  • Do NOT call simultaneously with StartIngestionJob

Effort: Medium (update webhook handlers) Impact: Near real-time document availability


7. Consider Semantic Chunking for Conversational Content

Component: Bedrock Data Source Documentation:

Current State: Using FIXED_SIZE chunking with 512 tokens and 20% overlap.

Consideration: For conversational content (meeting transcripts, support conversations), SEMANTIC chunking may provide better results:

How Semantic Chunking Works:

  • Uses NLP to identify meaning boundaries
  • Chunks based on semantic content rather than token count
  • Parameters:
    • maxTokens: Maximum tokens per chunk
    • bufferSize: Surrounding sentences for context (e.g., 1 = previous + current + next)
    • breakpointPercentileThreshold: Dissimilarity threshold for splits

Trade-offs:

Factor FIXED_SIZE SEMANTIC
Predictability High Variable
Cost No extra cost Foundation model costs
Conversation preservation May split mid-conversation Better at finding natural breaks
Metadata overhead Predictable Variable (may affect S3 Vectors limits)

Recommendation: Evaluate with a subset of documents before changing. The current FIXED_SIZE is safe for S3 Vectors, while SEMANTIC adds cost and unpredictable chunk sizes.

IMPORTANT: Chunking strategy cannot be changed after data source creation. Would require recreating the data source.

Effort: High (requires data source recreation and re-ingestion) Impact: Potentially improved retrieval for conversational content


Bedrock Knowledge Base Configuration Constraints

Based on AWS documentation, these configurations are IMMUTABLE after creation:

Cannot Change After Knowledge Base Creation

Configuration Location Impact
Vector store type storage_configuration.type Must recreate entire KB
Embedding model embedding_model_arn Must recreate entire KB
Embedding dimensions embedding_model_configuration.dimensions Must recreate entire KB
Supplemental data storage supplementalDataStorageConfiguration Cannot add multimodal support later

Cannot Change After Data Source Creation

Configuration Location Impact
Chunking strategy chunking_configuration.chunking_strategy Must recreate data source
Chunking parameters max_tokens, overlap_percentage Must recreate data source
Parsing strategy parsing_configuration.parsing_strategy Must recreate data source
Parsing model bedrock_foundation_model_configuration.model_arn Must recreate data source

Can Be Updated

Configuration How
Knowledge base name/description UpdateKnowledgeBase API
Data source files Add/modify files in S3, then sync
Data deletion policy UpdateDataSource API
KMS encryption key UpdateDataSource API
IAM role UpdateDataSource API (with new role having proper permissions)

Implication: Plan chunking and parsing strategies carefully before creating data sources. Changing them requires:

  1. Creating new data source with desired configuration
  2. Re-syncing all documents
  3. Deleting old data source
  4. Testing thoroughly before production use

8. Enable CloudTrail Data Event Logging for S3 Vectors

Component: CloudTrail Configuration Documentation:

Current State: CloudTrail management events are logged by default, but data events (QueryVectors, PutVectors, GetVectors, DeleteVectors, ListVectors) are NOT logged.

Issue:

  • Cannot audit who queried what vectors
  • No visibility into vector operation patterns
  • Limited security and compliance posture

Recommended Change: Enable CloudTrail data event logging for S3 Vectors:

resource "aws_cloudtrail" "s3_vectors_data_events" {
  name           = "${var.resource_prefix}-s3vectors-trail"
  s3_bucket_name = aws_s3_bucket.cloudtrail_logs.id

  event_selector {
    read_write_type           = "All"
    include_management_events = false

    data_resource {
      type   = "AWS::S3Vectors::Index"
      values = [aws_s3vectors_index.main.arn]
    }
  }
}

Logged Operations:

  • PutVectors - Vector insertions
  • GetVectors - Vector retrievals
  • DeleteVectors - Vector deletions
  • ListVectors - Vector listings
  • QueryVectors - Similarity queries

Effort: Low (Terraform configuration) Impact: Security and compliance visibility


Component: Network Security Documentation:

Current State: S3 Vectors accessed via public endpoints. Traffic traverses the internet.

Issue:

  • For compliance-sensitive workloads, traffic should stay within AWS network
  • Potential exposure to internet-based threats
  • Some compliance frameworks require private connectivity

Recommended Change: Create VPC interface endpoint for S3 Vectors:

resource "aws_vpc_endpoint" "s3_vectors" {
  vpc_id              = var.vpc_id
  service_name        = "com.amazonaws.${var.region}.s3vectors"
  vpc_endpoint_type   = "Interface"
  subnet_ids          = var.private_subnet_ids
  security_group_ids  = [aws_security_group.s3_vectors_endpoint.id]
  private_dns_enabled = true

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Effect    = "Allow"
      Principal = "*"
      Action    = ["s3vectors:*"]
      Resource  = [
        aws_s3vectors_vector_bucket.main.vector_bucket_arn,
        "${aws_s3vectors_vector_bucket.main.vector_bucket_arn}/*"
      ]
    }]
  })
}

Benefits:

  • Traffic stays within AWS network
  • Enhanced security posture
  • Meets compliance requirements for private connectivity

Effort: Medium (requires VPC configuration) Impact: Security and compliance


10. Consider Per-Client Vector Indexes for Strong Isolation

Component: S3 Vectors Architecture Documentation:

Current State: Single vector index with client-level metadata filtering for multi-tenant isolation:

filter={"client": "acme-corp"}

Note: Project metadata is stored for display but not used for filtering, allowing all documents from a client’s projects to contribute to RAG context.

Alternative Approach: AWS recommends using one vector index per tenant for:

  • Stronger data isolation
  • Per-tenant IAM policies
  • Independent scaling per tenant
  • Clearer cost attribution

Trade-offs:

Approach Current (Shared Index) Per-Tenant Index
Isolation Metadata filtering Physical separation
IAM Control Limited Per-index policies
Scaling Shared limits Independent limits
Cost Tracking Difficult Per-index attribution
Complexity Lower Higher (index management)
Cross-tenant queries Possible (with filter) Requires multi-index query

When to Consider:

  • Regulatory requirements for physical data separation
  • Need for per-client IAM policies
  • Clients with vastly different usage patterns
  • Strict cost attribution requirements

Effort: High (architecture change) Impact: Stronger tenant isolation


11. Export to OpenSearch for Hybrid Search (Alternative to Migration)

Component: Search Architecture Documentation:

Current State: S3 Vectors supports semantic search only. No hybrid (keyword + semantic) search.

Alternative to Full Migration: Instead of migrating away from S3 Vectors entirely, can export to OpenSearch Serverless for hybrid search while keeping S3 Vectors for cost-effective storage:

Two Integration Options:

  1. Export to OpenSearch Serverless:
    • Point-in-time copy of vectors
    • Full hybrid search, aggregations, faceted search
    • Dual storage costs
    • Best for: High-throughput, low-latency requirements
  2. OpenSearch with S3 Vectors Engine:
    • S3 Vectors as storage backend for OpenSearch
    • OpenSearch query API with S3 Vectors storage costs
    • Best for: Lower throughput, cost-sensitive workloads

When to Consider:

  • Users need keyword + semantic search
  • Complex filtering or aggregations required
  • Faceted search UI needed

Note: This is an alternative to full vector store migration, not a replacement.

Effort: Medium (OpenSearch configuration) Impact: Hybrid search capability without full migration


Component: Document Ingestion (Sync Workers) Priority: LOW (marginal benefit) Documentation:

Current State: The build_bedrock_metadata_json() function creates sidecar metadata files but doesn’t set the includeForEmbedding option:

# Current implementation (validation.py)
metadata_attributes[key] = {
    "value": {
        "type": "STRING",
        "stringValue": str_value
    }
}

Issue:

  • Metadata is only used for filtering, not for semantic search
  • In theory, queries mentioning client or project names might have slightly better semantic matches

Why Marginal Benefit: The current query understanding system already extracts client names from queries and applies them as filters. For example, “What did Valley Equipment discuss?” is already processed to extract “Valley Equipment” and filter documents to that client. Adding includeForEmbedding would only provide a small additional signal in vector similarity scoring - the filtering already ensures only relevant client documents are searched.

Recommended Change: Add includeForEmbedding: true for key metadata fields:

def build_bedrock_metadata_json(attributes: Dict[str, Any], embedding_fields: Optional[Set[str]] = None) -> str:
    """
    Build metadata with optional embedding inclusion.

    Args:
        attributes: Metadata key-value pairs
        embedding_fields: Set of field names to include in embeddings (default: {'client', 'project'})
    """
    embedding_fields = embedding_fields or {'client', 'project'}

    for key, value in filtered_attributes.items():
        include_in_embedding = key in embedding_fields

        if isinstance(value, str):
            metadata_attributes[key] = {
                "value": {"type": "STRING", "stringValue": value},
                "includeForEmbedding": include_in_embedding
            }

Expected Metadata Output:

{
  "metadataAttributes": {
    "client": {
      "value": {"type": "STRING", "stringValue": "Valley Equipment"},
      "includeForEmbedding": true
    },
    "project": {
      "value": {"type": "STRING", "stringValue": "Equipment Sales"},
      "includeForEmbedding": true
    },
    "source": {
      "value": {"type": "STRING", "stringValue": "fathom"},
      "includeForEmbedding": false
    }
  }
}

Benefits:

  • Slight improvement in semantic similarity scoring for queries mentioning client/project names
  • No additional API calls or infrastructure changes
  • Backward compatible (existing documents continue to work)

Trade-offs:

  • Slightly larger embedding vectors
  • Requires re-ingestion of existing documents
  • Benefit is marginal since query understanding already handles client extraction

Effort: Low (code change) + Medium (re-ingestion) Impact: Marginal improvement - existing query understanding and filtering handles the main use case


13. Binary Embeddings for Cost Optimization

Component: Knowledge Base Configuration Priority: MEDIUM Documentation:

Current State: Using FLOAT32 embeddings (default, highest precision):

# terraform/modules/bedrock/main.tf
embedding_model_configuration {
  bedrock_embedding_model_configuration {
    dimensions          = 1024
    embedding_data_type = "FLOAT32"  # Current setting
  }
}

Consideration: Titan Text Embeddings V2 supports BINARY embedding type for cost optimization:

embedding_model_configuration {
  bedrock_embedding_model_configuration {
    dimensions          = 1024
    embedding_data_type = "BINARY"  # 32x smaller storage
  }
}

Trade-offs:

Factor FLOAT32 BINARY
Precision Highest Lower (~10% accuracy reduction)
Storage 4 bytes/dimension 1 bit/dimension
Cost Higher ~32x lower storage cost
Query speed Baseline Potentially faster

When to Consider:

  • Large document corpus (10,000+ documents)
  • Cost-sensitive deployments
  • Acceptable accuracy trade-off

IMPORTANT: Embedding data type cannot be changed after Knowledge Base creation. Would require recreating the entire KB.

Effort: Very High (requires KB recreation and re-ingestion) Impact: Significant cost reduction for large deployments


14. Implicit Filter Configuration

Component: Knowledge Base Configuration Priority: LOW Documentation:

Current State: Client filtering is implemented in application code:

// lambda/node/chat/index.js
if (clientFilter) {
  retrievalConfig.vectorSearchConfiguration.filter = {
    equals: { key: 'client', value: clientFilter }
  };
}

Alternative Approach: Bedrock supports implicit filter configuration at the KB level:

response = bedrock_agent_runtime.retrieve(
    knowledgeBaseId=knowledge_base_id,
    retrievalQuery={'text': query},
    retrievalConfiguration={
        'vectorSearchConfiguration': {
            'implicitFilterConfiguration': {
                'metadataAttributes': [
                    {
                        'key': 'client',
                        'type': 'STRING',
                        'description': 'The client organization name'
                    }
                ],
                'modelArn': 'arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-haiku-...'
            }
        }
    }
)

How It Works:

  • Bedrock uses a foundation model to automatically extract filter values from the query
  • Example: “What equipment does Valley have?” → filter: {client: "Valley Equipment"}
  • Similar to current query understanding but built into Bedrock

Trade-offs:

Factor Current (Custom) Implicit Filter
Control Full Limited to KB capabilities
Clarification UI Supported Not supported
Known entities list Custom DynamoDB No custom list
Cost Claude Haiku call Bedrock implicit call

Recommendation: Current custom implementation is more flexible (supports clarification prompts, custom entity validation). Consider implicit filters only for simpler use cases.

Effort: Medium (refactor retrieval code) Impact: Simplified code but less flexibility


15. Guardrails Integration for Content Filtering

Component: Knowledge Base Retrieval Priority: LOW Documentation:

Current State: No content filtering or guardrails applied to retrieved documents or generated responses.

Consideration: Bedrock Guardrails can filter:

  • PII (personally identifiable information)
  • Harmful content
  • Custom denied topics
  • Sensitive information patterns

Implementation (if using RetrieveAndGenerate):

response = bedrock_agent_runtime.retrieve_and_generate(
    input={'text': query},
    retrieveAndGenerateConfiguration={
        'type': 'KNOWLEDGE_BASE',
        'knowledgeBaseConfiguration': {
            'knowledgeBaseId': knowledge_base_id,
            'modelArn': model_arn,
            'guardrailConfiguration': {
                'guardrailId': 'your-guardrail-id',
                'guardrailVersion': '1'
            }
        }
    }
)

When to Consider:

  • Compliance requirements (HIPAA, PCI-DSS)
  • User-facing applications requiring content moderation
  • Documents containing sensitive information

Prerequisites:

  • Create Guardrail in Bedrock console
  • Define content policies
  • Migrate to RetrieveAndGenerate API (or apply guardrails at generation step)

Effort: Medium (requires Guardrail setup + API changes) Impact: Enhanced compliance and content safety


16. Custom Transformation Lambda for Meeting Transcripts

Component: Data Source Configuration Priority: LOW Documentation:

Current State: Meeting transcripts from Fathom are chunked using standard FIXED_SIZE strategy (512 tokens, 20% overlap).

Issue:

  • Transcripts may be split mid-conversation
  • Speaker context may be lost across chunks
  • Action items and decisions may span multiple chunks

Recommended Change: Add a Lambda function for post-chunking transformation:

# terraform/modules/bedrock/main.tf
vector_ingestion_configuration {
  chunking_configuration {
    chunking_strategy = "FIXED_SIZE"
    fixed_size_chunking_configuration {
      max_tokens         = 512
      overlap_percentage = 20
    }
  }

  custom_transformation_configuration {
    transformations {
      step_to_apply = "POST_CHUNKING"
      transformation_function {
        transformation_lambda_configuration {
          lambda_arn = aws_lambda_function.chunk_transformer.arn
        }
      }
    }
    intermediate_storage {
      s3_location {
        uri = "s3://${var.bucket_name}/transform-output/"
      }
    }
  }
}

Lambda Function Purpose:

def transform_chunks(event, context):
    """
    Post-process chunks to improve meeting transcript quality.

    - Add speaker context to each chunk
    - Ensure action items are not split
    - Add meeting metadata summary to each chunk
    - Identify and tag key decisions
    """
    for chunk in event['chunks']:
        # Add speaker context
        chunk['content'] = add_speaker_context(chunk['content'])

        # Add chunk-level metadata
        chunk['metadata']['has_action_items'] = detect_action_items(chunk['content'])

    return {'chunks': chunks}

Trade-offs:

  • Additional Lambda execution cost
  • Increased ingestion time
  • More complex debugging

When to Consider:

  • Meeting transcripts are primary content source
  • Users frequently search for action items or decisions
  • Current retrieval quality for conversations is poor

IMPORTANT: Custom transformation cannot be added after data source creation. Would require recreating the data source.

Effort: High (Lambda development + data source recreation) Impact: Improved retrieval for conversational content


S3 Vectors Specific Limitations

Based on comprehensive S3 Vectors documentation review, these are the key limits. See official documentation:

Storage and Structural Limits

Limit Value Current Usage
Vector buckets per region 10,000 1
Vector indexes per bucket 10,000 1
Vectors per index 2 billion TBD
Dimension range 1-4,096 1,024 (Titan v2)

Metadata Limits

Limit Value Notes
Total metadata per vector 40 KB Sufficient
Filterable metadata per vector 2 KB Monitor size
Total metadata keys per vector 50 Track count
Non-filterable keys per index 10 Set at creation
Bedrock KB metadata limit 1 KB custom, 35 keys More restrictive

Rate Limits

Operation Limit Error Code
PutVectors + DeleteVectors requests 1,000/second/index 429 TooManyRequestsException
Vectors inserted/deleted 2,500/second/index 429 TooManyRequestsException
QueryVectors/GetVectors/ListVectors Hundreds/second/index 429 TooManyRequestsException

API Limits

Operation Limit
PutVectors batch size 500 vectors/call
DeleteVectors batch size 500 vectors/call
GetVectors batch size 100 vectors/call
QueryVectors TopK 100 results/request
Request payload 20 MiB

Performance Characteristics

Metric Value
Cold query latency Sub-second
Warm query latency ~100ms
Average recall 90%+ for most datasets
Write consistency Strongly consistent (immediate access)

Known Constraints

Constraint Impact
Hierarchical chunking NOT recommended (metadata size limits)
Hybrid search NOT supported (semantic only)
Non-filterable keys Immutable after index creation
Encryption type Immutable after bucket creation
Vector dimensions Immutable after index creation
Distance metric Immutable after index creation
LLM parsing DISABLED - exceeds 2KB filterable metadata limit for large docs

Current Configuration (as of 2025-12-31)

Setting Value Notes
Chunking strategy FIXED_SIZE 512 tokens, 20% overlap
LLM parsing Disabled Sidecar metadata files used instead
Non-filterable keys AMAZON_BEDROCK_TEXT, AMAZON_BEDROCK_METADATA Required for 100% ingestion success
Data deletion policy DELETE Vectors auto-removed when S3 docs deleted
Filterable metadata source, client, category For multi-tenant isolation (project stored but not filtered)

Architecture Decision Record References

Related ADRs for context:


Testing and CI/CD Improvements

The following items track missing test coverage and CI/CD gaps.


24. Add Unit Tests for Dashboard Lambda (Node.js)

Component: Dashboard Lambda Files: lambda/node/dashboard/index.js Priority: MEDIUM

Current State: The Dashboard Lambda has no unit tests. It queries CloudWatch metrics and logs but has no test coverage.

Recommended Changes:

  1. Create lambda/node/dashboard/index.test.js
  2. Mock CloudWatch and CloudWatch Logs clients
  3. Test metric aggregation logic
  4. Test error handling for missing/invalid data
  5. Add to CI test matrix

Effort: Medium (2-3 days) Impact: Improved reliability and maintainability


25. Add Unit Tests for Chat Lambda (Node.js)

Component: Chat Lambda Files: lambda/node/chat/index.js Priority: MEDIUM

Current State: The Chat Lambda has Jest configured in package.json but no actual test files. It handles streaming responses, Bedrock KB queries, and conversation management without test coverage.

Recommended Changes:

  1. Create lambda/node/chat/index.test.js
  2. Mock Bedrock Agent Runtime, DynamoDB, S3 clients
  3. Test streaming response handling
  4. Test conversation history management
  5. Test error handling and timeout scenarios

Effort: Medium-High (3-5 days) Impact: Critical - this is the main user-facing Lambda


26. Add Ingest Lambda to Snyk Security Scan Matrix

Component: CI/CD Workflow Files: .github/workflows/test.yml Priority: HIGH

Current State: The ingest Lambda is in the unit test matrix but missing from the Snyk Python security scan matrix.

Recommended Changes: Add ingest to the Snyk Python matrix in .github/workflows/test.yml:

snyk-python:
  strategy:
    matrix:
      function:
        - classification
        - ingest  # ADD THIS
        - webhooks/fathom
        # ... rest of functions

Effort: Low (5 minutes) Impact: Security coverage for ingest function dependencies


Review Schedule

This document should be reviewed:

  • After each major feature implementation
  • Quarterly for prioritization updates
  • When AWS announces new S3 Vectors or Bedrock features

Last updated: 2026-01-17 (Added Testing and CI/CD Improvements section)