RAG System Improvement Backlog

This document tracks pending improvements, optimizations, and technical debt for the NorthBuilt RAG system. Items are categorized by priority and component.

For completed improvements, see the RAG Changelog.

Key AWS Documentation:

Pending Improvements
1. MEDIUM Priority
  1. 1. Response Caching
  2. 2. Enhanced Error Handling and Observability (Partially Implemented)
2. LOW Priority
Bedrock Knowledge Base Configuration Constraints
S3 Vectors Specific Limitations
Architecture Decision Record References
Testing and CI/CD Improvements
Review Schedule

Pending Improvements

MEDIUM Priority

These improvements should be addressed in the next development cycle.

1. Response Caching

Component: Chat Lambda Files: lambda/node/chat/index.js, new cache module

Current State: Every query hits the Knowledge Base and LLM, even for frequently asked questions. This increases latency and cost.

Recommended Changes:

Implement semantic caching using vector similarity
Cache recent query-response pairs in DynamoDB or ElastiCache
Return cached response if similarity > threshold (e.g., 0.95)

# Pseudocode for semantic caching
def get_cached_response(query_embedding, threshold=0.95):
    """Check if similar query was recently answered."""
    # Query cache index with embedding
    # If similarity > threshold, return cached response
    # Otherwise, proceed with full RAG pipeline
    pass

Trade-offs:

Cache hit = faster response, lower cost
Cache miss = slight overhead from cache lookup
Risk of stale responses for rapidly changing knowledge

Effort: Medium (3-5 days) Impact: Reduced latency and cost for common queries

2. Enhanced Error Handling and Observability (Partially Implemented)

Component: All Lambdas Files: Multiple

Current State: CloudWatch metrics are implemented, but structured logging with correlation IDs and dashboards are not.

Already Implemented:

RAGMetrics class in lambda/shared/utils/metrics.py
Retrieval latency metrics (RetrievalLatencyMs)
Document count metrics (CandidatesRetrieved, ResultsAfterFilter)
Filter effectiveness metrics (FilterEffectiveness)
Error counts by type

Still Pending:

Structured JSON logging with correlation IDs
LLM token usage metrics
CloudWatch dashboard for RAG metrics

# Example structured logging (NOT YET IMPLEMENTED)
logger.info({
    'event': 'rag_query',
    'correlation_id': correlation_id,
    'query_length': len(query),
    'documents_retrieved': len(documents),
    'filters': {'client': client},
    'latency_ms': retrieval_latency
})

Effort: Low-Medium (1-2 days for remaining items) Impact: Improved debugging and monitoring

LOW Priority

3. Multi-Region Disaster Recovery

Component: All infrastructure Files: Multiple Terraform files

Current State: Single-region deployment. If us-east-1 has an outage, the system is unavailable.

Recommended Architecture:

S3 cross-region replication (already configured for backup)
DynamoDB global tables for chat sessions
Lambda deployment in secondary region
Route 53 health checks and failover routing
S3 Vectors replication (when supported)

Note: S3 Vectors is a relatively new service. Check AWS documentation for multi-region capabilities and cross-region replication support.

Effort: Very High (weeks of work) Impact: High availability for production workloads

4. Cost Optimization: Right-Size LLM Usage

Component: Chat Lambda Files: lambda/node/chat/index.js, Terraform

Current State: Claude Sonnet 4.5 is used for all queries regardless of complexity.

Recommended Changes:

Implement query complexity classification
Use lighter models (Haiku) for simple queries
Use Sonnet for complex queries requiring reasoning

function selectModel(query, documents) {
  // Simple heuristics:
  // - Short query + few documents = Haiku
  // - Long query or many documents = Sonnet
  if (query.length < 100 && documents.length <= 2) {
    return process.env.BEDROCK_LIGHT_MODEL || 'anthropic.claude-3-haiku-...';
  }
  return process.env.BEDROCK_LLM_MODEL;
}

Trade-offs:

Cost savings: Haiku is ~10x cheaper than Sonnet
Quality: Simple queries may get adequate answers from lighter models
Complexity: Need to define and tune classification criteria

Effort: Medium (3-4 days) Impact: 30-50% cost reduction on LLM usage

5. Hybrid Search Support

Component: Knowledge Base configuration Files: Terraform, Chat Lambda

Current State: S3 Vectors supports semantic (vector) search only. Hybrid search (combining keyword and semantic) is not available with S3 Vectors.

Consideration: If hybrid search is required in the future:

Migrate to OpenSearch Serverless (supports hybrid search)
Or implement application-level keyword filtering after vector retrieval
Or use Bedrock’s built-in hybrid search with compatible vector stores

Note: This is a limitation of S3 Vectors, not a bug. Evaluate if hybrid search is necessary for your use case before considering migration.

Effort: Very High (vector store migration) Impact: Improved retrieval for keyword-heavy queries

6. Implement Direct Ingestion for Real-Time Updates

Component: Sync Lambdas (Fathom, HelpScout) Documentation:

Current State: Documents are uploaded to S3, then a scheduled ingestion job syncs them to the Knowledge Base. This creates a delay between document creation and availability for queries.

Issue:

Latency: Documents may take minutes to become searchable
Scheduling: Relies on EventBridge scheduled sync every 5 minutes

Recommended Change: Use IngestKnowledgeBaseDocuments API for immediate ingestion:

response = bedrock_agent.ingest_knowledge_base_documents(
    knowledgeBaseId=knowledge_base_id,
    dataSourceId=data_source_id,
    documents=[
        {
            'content': {
                's3': {
                    's3Location': {
                        'uri': f's3://{bucket}/{key}'
                    }
                },
                'dataSourceType': 'S3'
            },
            'metadata': {
                's3Location': {
                    'uri': f's3://{bucket}/{key}.metadata.json'
                },
                'type': 'S3_LOCATION'
            }
        }
    ]
)

Benefits:

Immediate document availability (seconds vs minutes)
Up to 25 documents per API call
Can be called directly from webhook handlers

Caveats:

Documents ingested directly are NOT added to S3 (add to S3 separately to prevent removal on next full sync)
Do NOT call simultaneously with StartIngestionJob

Effort: Medium (update webhook handlers) Impact: Near real-time document availability

7. Consider Semantic Chunking for Conversational Content

Component: Bedrock Data Source Documentation:

Current State: Using FIXED_SIZE chunking with 512 tokens and 20% overlap.

Consideration: For conversational content (meeting transcripts, support conversations), SEMANTIC chunking may provide better results:

How Semantic Chunking Works:

Uses NLP to identify meaning boundaries
Chunks based on semantic content rather than token count
Parameters:
- maxTokens: Maximum tokens per chunk
- bufferSize: Surrounding sentences for context (e.g., 1 = previous + current + next)
- breakpointPercentileThreshold: Dissimilarity threshold for splits

Trade-offs:

Factor	FIXED_SIZE	SEMANTIC
Predictability	High	Variable
Cost	No extra cost	Foundation model costs
Conversation preservation	May split mid-conversation	Better at finding natural breaks
Metadata overhead	Predictable	Variable (may affect S3 Vectors limits)

Recommendation: Evaluate with a subset of documents before changing. The current FIXED_SIZE is safe for S3 Vectors, while SEMANTIC adds cost and unpredictable chunk sizes.

IMPORTANT: Chunking strategy cannot be changed after data source creation. Would require recreating the data source.

Effort: High (requires data source recreation and re-ingestion) Impact: Potentially improved retrieval for conversational content

Bedrock Knowledge Base Configuration Constraints

Based on AWS documentation, these configurations are IMMUTABLE after creation:

Cannot Change After Knowledge Base Creation

Configuration	Location	Impact
Vector store type	`storage_configuration.type`	Must recreate entire KB
Embedding model	`embedding_model_arn`	Must recreate entire KB
Embedding dimensions	`embedding_model_configuration.dimensions`	Must recreate entire KB
Supplemental data storage	`supplementalDataStorageConfiguration`	Cannot add multimodal support later

Cannot Change After Data Source Creation

Configuration	Location	Impact
Chunking strategy	`chunking_configuration.chunking_strategy`	Must recreate data source
Chunking parameters	`max_tokens`, `overlap_percentage`	Must recreate data source
Parsing strategy	`parsing_configuration.parsing_strategy`	Must recreate data source
Parsing model	`bedrock_foundation_model_configuration.model_arn`	Must recreate data source

Can Be Updated

Configuration	How
Knowledge base name/description	UpdateKnowledgeBase API
Data source files	Add/modify files in S3, then sync
Data deletion policy	UpdateDataSource API
KMS encryption key	UpdateDataSource API
IAM role	UpdateDataSource API (with new role having proper permissions)

Implication: Plan chunking and parsing strategies carefully before creating data sources. Changing them requires:

Creating new data source with desired configuration
Re-syncing all documents
Deleting old data source
Testing thoroughly before production use

8. Enable CloudTrail Data Event Logging for S3 Vectors

Component: CloudTrail Configuration Documentation:

Current State: CloudTrail management events are logged by default, but data events (QueryVectors, PutVectors, GetVectors, DeleteVectors, ListVectors) are NOT logged.

Issue:

Cannot audit who queried what vectors
No visibility into vector operation patterns
Limited security and compliance posture

Recommended Change: Enable CloudTrail data event logging for S3 Vectors:

resource "aws_cloudtrail" "s3_vectors_data_events" {
  name           = "${var.resource_prefix}-s3vectors-trail"
  s3_bucket_name = aws_s3_bucket.cloudtrail_logs.id

  event_selector {
    read_write_type           = "All"
    include_management_events = false

    data_resource {
      type   = "AWS::S3Vectors::Index"
      values = [aws_s3vectors_index.main.arn]
    }
  }
}

Logged Operations:

PutVectors - Vector insertions
GetVectors - Vector retrievals
DeleteVectors - Vector deletions
ListVectors - Vector listings
QueryVectors - Similarity queries

Effort: Low (Terraform configuration) Impact: Security and compliance visibility

9. Configure VPC Endpoint for S3 Vectors (PrivateLink)

Component: Network Security Documentation:

Current State: S3 Vectors accessed via public endpoints. Traffic traverses the internet.

Issue:

For compliance-sensitive workloads, traffic should stay within AWS network
Potential exposure to internet-based threats
Some compliance frameworks require private connectivity

Recommended Change: Create VPC interface endpoint for S3 Vectors:

resource "aws_vpc_endpoint" "s3_vectors" {
  vpc_id              = var.vpc_id
  service_name        = "com.amazonaws.${var.region}.s3vectors"
  vpc_endpoint_type   = "Interface"
  subnet_ids          = var.private_subnet_ids
  security_group_ids  = [aws_security_group.s3_vectors_endpoint.id]
  private_dns_enabled = true

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Effect    = "Allow"
      Principal = "*"
      Action    = ["s3vectors:*"]
      Resource  = [
        aws_s3vectors_vector_bucket.main.vector_bucket_arn,
        "${aws_s3vectors_vector_bucket.main.vector_bucket_arn}/*"
      ]
    }]
  })
}

Benefits:

Traffic stays within AWS network
Enhanced security posture
Meets compliance requirements for private connectivity

Effort: Medium (requires VPC configuration) Impact: Security and compliance

10. Consider Per-Client Vector Indexes for Strong Isolation

Component: S3 Vectors Architecture Documentation:

Current State: Single vector index with client-level metadata filtering for multi-tenant isolation:

filter={"client": "acme-corp"}

Note: Project metadata is stored for display but not used for filtering, allowing all documents from a client’s projects to contribute to RAG context.

Alternative Approach: AWS recommends using one vector index per tenant for:

Stronger data isolation
Per-tenant IAM policies
Independent scaling per tenant
Clearer cost attribution

Trade-offs:

Approach	Current (Shared Index)	Per-Tenant Index
Isolation	Metadata filtering	Physical separation
IAM Control	Limited	Per-index policies
Scaling	Shared limits	Independent limits
Cost Tracking	Difficult	Per-index attribution
Complexity	Lower	Higher (index management)
Cross-tenant queries	Possible (with filter)	Requires multi-index query

When to Consider:

Regulatory requirements for physical data separation
Need for per-client IAM policies
Clients with vastly different usage patterns
Strict cost attribution requirements

Effort: High (architecture change) Impact: Stronger tenant isolation

11. Export to OpenSearch for Hybrid Search (Alternative to Migration)

Component: Search Architecture Documentation:

Current State: S3 Vectors supports semantic search only. No hybrid (keyword + semantic) search.

Alternative to Full Migration: Instead of migrating away from S3 Vectors entirely, can export to OpenSearch Serverless for hybrid search while keeping S3 Vectors for cost-effective storage:

Two Integration Options:

Export to OpenSearch Serverless:
- Point-in-time copy of vectors
- Full hybrid search, aggregations, faceted search
- Dual storage costs
- Best for: High-throughput, low-latency requirements
OpenSearch with S3 Vectors Engine:
- S3 Vectors as storage backend for OpenSearch
- OpenSearch query API with S3 Vectors storage costs
- Best for: Lower throughput, cost-sensitive workloads

When to Consider:

Users need keyword + semantic search
Complex filtering or aggregations required
Faceted search UI needed

Note: This is an alternative to full vector store migration, not a replacement.

Effort: Medium (OpenSearch configuration) Impact: Hybrid search capability without full migration

12. Include Metadata in Embeddings for Improved Semantic Search

Component: Document Ingestion (Sync Workers) Priority: LOW (marginal benefit) Documentation:

Current State: The build_bedrock_metadata_json() function creates sidecar metadata files but doesn’t set the includeForEmbedding option:

# Current implementation (validation.py)
metadata_attributes[key] = {
    "value": {
        "type": "STRING",
        "stringValue": str_value
    }
}

Issue:

Metadata is only used for filtering, not for semantic search
In theory, queries mentioning client or project names might have slightly better semantic matches

Why Marginal Benefit: The current query understanding system already extracts client names from queries and applies them as filters. For example, “What did Valley Equipment discuss?” is already processed to extract “Valley Equipment” and filter documents to that client. Adding includeForEmbedding would only provide a small additional signal in vector similarity scoring - the filtering already ensures only relevant client documents are searched.

Recommended Change: Add includeForEmbedding: true for key metadata fields:

def build_bedrock_metadata_json(attributes: Dict[str, Any], embedding_fields: Optional[Set[str]] = None) -> str:
    """
    Build metadata with optional embedding inclusion.

    Args:
        attributes: Metadata key-value pairs
        embedding_fields: Set of field names to include in embeddings (default: {'client', 'project'})
    """
    embedding_fields = embedding_fields or {'client', 'project'}

    for key, value in filtered_attributes.items():
        include_in_embedding = key in embedding_fields

        if isinstance(value, str):
            metadata_attributes[key] = {
                "value": {"type": "STRING", "stringValue": value},
                "includeForEmbedding": include_in_embedding
            }

Expected Metadata Output:

{
  "metadataAttributes": {
    "client": {
      "value": {"type": "STRING", "stringValue": "Valley Equipment"},
      "includeForEmbedding": true
    },
    "project": {
      "value": {"type": "STRING", "stringValue": "Equipment Sales"},
      "includeForEmbedding": true
    },
    "source": {
      "value": {"type": "STRING", "stringValue": "fathom"},
      "includeForEmbedding": false
    }
  }
}

Benefits:

Slight improvement in semantic similarity scoring for queries mentioning client/project names
No additional API calls or infrastructure changes
Backward compatible (existing documents continue to work)

Trade-offs:

Slightly larger embedding vectors
Requires re-ingestion of existing documents
Benefit is marginal since query understanding already handles client extraction

Effort: Low (code change) + Medium (re-ingestion) Impact: Marginal improvement - existing query understanding and filtering handles the main use case

13. Binary Embeddings for Cost Optimization

Component: Knowledge Base Configuration Priority: MEDIUM Documentation:

Current State: Using FLOAT32 embeddings (default, highest precision):

# terraform/modules/bedrock/main.tf
embedding_model_configuration {
  bedrock_embedding_model_configuration {
    dimensions          = 1024
    embedding_data_type = "FLOAT32"  # Current setting
  }
}

Consideration: Titan Text Embeddings V2 supports BINARY embedding type for cost optimization:

embedding_model_configuration {
  bedrock_embedding_model_configuration {
    dimensions          = 1024
    embedding_data_type = "BINARY"  # 32x smaller storage
  }
}

Trade-offs:

Factor	FLOAT32	BINARY
Precision	Highest	Lower (~10% accuracy reduction)
Storage	4 bytes/dimension	1 bit/dimension
Cost	Higher	~32x lower storage cost
Query speed	Baseline	Potentially faster

When to Consider:

Large document corpus (10,000+ documents)
Cost-sensitive deployments
Acceptable accuracy trade-off

IMPORTANT: Embedding data type cannot be changed after Knowledge Base creation. Would require recreating the entire KB.

Effort: Very High (requires KB recreation and re-ingestion) Impact: Significant cost reduction for large deployments

14. Implicit Filter Configuration

Component: Knowledge Base Configuration Priority: LOW Documentation:

Current State: Client filtering is implemented in application code:

// lambda/node/chat/index.js
if (clientFilter) {
  retrievalConfig.vectorSearchConfiguration.filter = {
    equals: { key: 'client', value: clientFilter }
  };
}

Alternative Approach: Bedrock supports implicit filter configuration at the KB level:

response = bedrock_agent_runtime.retrieve(
    knowledgeBaseId=knowledge_base_id,
    retrievalQuery={'text': query},
    retrievalConfiguration={
        'vectorSearchConfiguration': {
            'implicitFilterConfiguration': {
                'metadataAttributes': [
                    {
                        'key': 'client',
                        'type': 'STRING',
                        'description': 'The client organization name'
                    }
                ],
                'modelArn': 'arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-haiku-...'
            }
        }
    }
)

How It Works:

Bedrock uses a foundation model to automatically extract filter values from the query
Example: “What equipment does Valley have?” → filter: {client: "Valley Equipment"}
Similar to current query understanding but built into Bedrock

Trade-offs:

Factor	Current (Custom)	Implicit Filter
Control	Full	Limited to KB capabilities
Clarification UI	Supported	Not supported
Known entities list	Custom DynamoDB	No custom list
Cost	Claude Haiku call	Bedrock implicit call

Recommendation: Current custom implementation is more flexible (supports clarification prompts, custom entity validation). Consider implicit filters only for simpler use cases.

Effort: Medium (refactor retrieval code) Impact: Simplified code but less flexibility

15. Guardrails Integration for Content Filtering

Component: Knowledge Base Retrieval Priority: LOW Documentation:

Current State: No content filtering or guardrails applied to retrieved documents or generated responses.

Consideration: Bedrock Guardrails can filter:

PII (personally identifiable information)
Harmful content
Custom denied topics
Sensitive information patterns

Implementation (if using RetrieveAndGenerate):

response = bedrock_agent_runtime.retrieve_and_generate(
    input={'text': query},
    retrieveAndGenerateConfiguration={
        'type': 'KNOWLEDGE_BASE',
        'knowledgeBaseConfiguration': {
            'knowledgeBaseId': knowledge_base_id,
            'modelArn': model_arn,
            'guardrailConfiguration': {
                'guardrailId': 'your-guardrail-id',
                'guardrailVersion': '1'
            }
        }
    }
)

When to Consider:

Compliance requirements (HIPAA, PCI-DSS)
User-facing applications requiring content moderation
Documents containing sensitive information

Prerequisites:

Create Guardrail in Bedrock console
Define content policies
Migrate to RetrieveAndGenerate API (or apply guardrails at generation step)

Effort: Medium (requires Guardrail setup + API changes) Impact: Enhanced compliance and content safety

16. Custom Transformation Lambda for Meeting Transcripts

Component: Data Source Configuration Priority: LOW Documentation:

Current State: Meeting transcripts from Fathom are chunked using standard FIXED_SIZE strategy (512 tokens, 20% overlap).

Issue:

Transcripts may be split mid-conversation
Speaker context may be lost across chunks
Action items and decisions may span multiple chunks

Recommended Change: Add a Lambda function for post-chunking transformation:

# terraform/modules/bedrock/main.tf
vector_ingestion_configuration {
  chunking_configuration {
    chunking_strategy = "FIXED_SIZE"
    fixed_size_chunking_configuration {
      max_tokens         = 512
      overlap_percentage = 20
    }
  }

  custom_transformation_configuration {
    transformations {
      step_to_apply = "POST_CHUNKING"
      transformation_function {
        transformation_lambda_configuration {
          lambda_arn = aws_lambda_function.chunk_transformer.arn
        }
      }
    }
    intermediate_storage {
      s3_location {
        uri = "s3://${var.bucket_name}/transform-output/"
      }
    }
  }
}

Lambda Function Purpose:

def transform_chunks(event, context):
    """
    Post-process chunks to improve meeting transcript quality.

    - Add speaker context to each chunk
    - Ensure action items are not split
    - Add meeting metadata summary to each chunk
    - Identify and tag key decisions
    """
    for chunk in event['chunks']:
        # Add speaker context
        chunk['content'] = add_speaker_context(chunk['content'])

        # Add chunk-level metadata
        chunk['metadata']['has_action_items'] = detect_action_items(chunk['content'])

    return {'chunks': chunks}

Trade-offs:

Additional Lambda execution cost
Increased ingestion time
More complex debugging

When to Consider:

Meeting transcripts are primary content source
Users frequently search for action items or decisions
Current retrieval quality for conversations is poor

IMPORTANT: Custom transformation cannot be added after data source creation. Would require recreating the data source.

Effort: High (Lambda development + data source recreation) Impact: Improved retrieval for conversational content

S3 Vectors Specific Limitations

Based on comprehensive S3 Vectors documentation review, these are the key limits. See official documentation:

Storage and Structural Limits

Limit	Value	Current Usage
Vector buckets per region	10,000	1
Vector indexes per bucket	10,000	1
Vectors per index	2 billion	TBD
Dimension range	1-4,096	1,024 (Titan v2)

Metadata Limits

Limit	Value	Notes
Total metadata per vector	40 KB	Sufficient
Filterable metadata per vector	2 KB	Monitor size
Total metadata keys per vector	50	Track count
Non-filterable keys per index	10	Set at creation
Bedrock KB metadata limit	1 KB custom, 35 keys	More restrictive

Rate Limits

Operation	Limit	Error Code
PutVectors + DeleteVectors requests	1,000/second/index	429 TooManyRequestsException
Vectors inserted/deleted	2,500/second/index	429 TooManyRequestsException
QueryVectors/GetVectors/ListVectors	Hundreds/second/index	429 TooManyRequestsException

API Limits

Operation	Limit
PutVectors batch size	500 vectors/call
DeleteVectors batch size	500 vectors/call
GetVectors batch size	100 vectors/call
QueryVectors TopK	100 results/request
Request payload	20 MiB

Performance Characteristics

Metric	Value
Cold query latency	Sub-second
Warm query latency	~100ms
Average recall	90%+ for most datasets
Write consistency	Strongly consistent (immediate access)

Known Constraints

Constraint	Impact
Hierarchical chunking	NOT recommended (metadata size limits)
Hybrid search	NOT supported (semantic only)
Non-filterable keys	Immutable after index creation
Encryption type	Immutable after bucket creation
Vector dimensions	Immutable after index creation
Distance metric	Immutable after index creation
LLM parsing	DISABLED - exceeds 2KB filterable metadata limit for large docs

Current Configuration (as of 2025-12-31)

Setting	Value	Notes
Chunking strategy	FIXED_SIZE	512 tokens, 20% overlap
LLM parsing	Disabled	Sidecar metadata files used instead
Non-filterable keys	`AMAZON_BEDROCK_TEXT`, `AMAZON_BEDROCK_METADATA`	Required for 100% ingestion success
Data deletion policy	DELETE	Vectors auto-removed when S3 docs deleted
Filterable metadata	`source`, `client`, `category`	For multi-tenant isolation (project stored but not filtered)

Architecture Decision Record References

Related ADRs for context:

ADR-010: S3 Vectors Migration - Decision to migrate from Pinecone to S3 Vectors
ADR-002: Chunking Strategy - FIXED_SIZE vs HIERARCHICAL decision

Testing and CI/CD Improvements

The following items track missing test coverage and CI/CD gaps.

24. Add Unit Tests for Dashboard Lambda (Node.js)

Component: Dashboard Lambda Files: lambda/node/dashboard/index.js Priority: MEDIUM

Current State: The Dashboard Lambda has no unit tests. It queries CloudWatch metrics and logs but has no test coverage.

Recommended Changes:

Create lambda/node/dashboard/index.test.js
Mock CloudWatch and CloudWatch Logs clients
Test metric aggregation logic
Test error handling for missing/invalid data
Add to CI test matrix

Effort: Medium (2-3 days) Impact: Improved reliability and maintainability

25. Add Unit Tests for Chat Lambda (Node.js)

Component: Chat Lambda Files: lambda/node/chat/index.js Priority: MEDIUM

Current State: The Chat Lambda has Jest configured in package.json but no actual test files. It handles streaming responses, Bedrock KB queries, and conversation management without test coverage.

Recommended Changes:

Create lambda/node/chat/index.test.js
Mock Bedrock Agent Runtime, DynamoDB, S3 clients
Test streaming response handling
Test conversation history management
Test error handling and timeout scenarios

Effort: Medium-High (3-5 days) Impact: Critical - this is the main user-facing Lambda

26. Add Ingest Lambda to Snyk Security Scan Matrix

Component: CI/CD Workflow Files: .github/workflows/test.yml Priority: HIGH

Current State: The ingest Lambda is in the unit test matrix but missing from the Snyk Python security scan matrix.

Recommended Changes: Add ingest to the Snyk Python matrix in .github/workflows/test.yml:

snyk-python:
  strategy:
    matrix:
      function:
        - classification
        - ingest  # ADD THIS
        - webhooks/fathom
        # ... rest of functions

Effort: Low (5 minutes) Impact: Security coverage for ingest function dependencies

Review Schedule

This document should be reviewed:

After each major feature implementation
Quarterly for prioritization updates
When AWS announces new S3 Vectors or Bedrock features

Last updated: 2026-01-17 (Added Testing and CI/CD Improvements section)

RAG System Improvement Backlog

Table of Contents

Pending Improvements

MEDIUM Priority

1. Response Caching

2. Enhanced Error Handling and Observability (Partially Implemented)

LOW Priority

3. Multi-Region Disaster Recovery

4. Cost Optimization: Right-Size LLM Usage

5. Hybrid Search Support

6. Implement Direct Ingestion for Real-Time Updates

7. Consider Semantic Chunking for Conversational Content

Bedrock Knowledge Base Configuration Constraints

Cannot Change After Knowledge Base Creation

Cannot Change After Data Source Creation

Can Be Updated

8. Enable CloudTrail Data Event Logging for S3 Vectors

9. Configure VPC Endpoint for S3 Vectors (PrivateLink)

10. Consider Per-Client Vector Indexes for Strong Isolation

11. Export to OpenSearch for Hybrid Search (Alternative to Migration)

12. Include Metadata in Embeddings for Improved Semantic Search

13. Binary Embeddings for Cost Optimization

14. Implicit Filter Configuration

15. Guardrails Integration for Content Filtering

16. Custom Transformation Lambda for Meeting Transcripts

S3 Vectors Specific Limitations

Storage and Structural Limits

Metadata Limits

Rate Limits

API Limits

Performance Characteristics

Known Constraints

Current Configuration (as of 2025-12-31)

Architecture Decision Record References

Testing and CI/CD Improvements

24. Add Unit Tests for Dashboard Lambda (Node.js)

25. Add Unit Tests for Chat Lambda (Node.js)

26. Add Ingest Lambda to Snyk Security Scan Matrix

Review Schedule