RAG System Changelog

Chronological record of implemented improvements, optimizations, and fixes for the NorthBuilt RAG system. For pending improvements and backlog items, see RAG Improvements.

Table of Contents

  1. 2026
    1. 2026-01-11: GET /chat/{id} Empty Response Fix
    2. 2026-01-10: Source Citation Improvements
    3. 2026-01-08: Streaming Response Support
    4. 2026-01-08: Bedrock Knowledge Base Documentation Audit
    5. 2026-01-07: S3 Vectors Documentation Audit
  2. 2025
    1. 2025-12-31: S3 Vectors Metadata Fix (Critical)
    2. 2025-12-29: RAG Retrieval Quality Improvements
    3. 2025-12-29: S3 Vectors Optimization

2026

2026-01-11: GET /chat/{id} Empty Response Fix

Fixed critical bug where GET /chat/{id} endpoint returned empty responses despite Lambda executing successfully.

Root Cause:

The Terraform deployment triggers for aws_api_gateway_deployment.streaming only included resource IDs, not the full integration objects. When response_transfer_mode = "STREAM" was added to the GET integration on 2026-01-10, Terraform didn’t detect this as a change requiring redeployment because the integration ID remained the same.

This caused the GET endpoint to use responseTransferMode: "BUFFERED" (the default) instead of "STREAM", resulting in empty responses. API Gateway access logs confirmed this discrepancy.

Fix:

Updated terraform/modules/api-gateway-rest/main.tf to include full integration objects in the deployment triggers instead of just their IDs:

resource "aws_api_gateway_deployment" "streaming" {
  triggers = {
    redeployment = sha1(jsonencode([
      aws_api_gateway_integration.chat_stream_lambda,      # Full object, not .id
      aws_api_gateway_integration.chat_stream_get_lambda,  # Full object, not .id
      # ... other resources
    ]))
  }
}

Debugging Approach:

  1. Invoked Lambda directly via AWS CLI - returned correct data
  2. Checked API Gateway access logs: aws logs tail /aws/apigateway/nb-rag-sys-rest
  3. Found GET requests showed "responseTransferMode":"BUFFERED" while POST showed "STREAM"
  4. Checked Terraform state - response_transfer_mode = "STREAM" was configured but deployment wasn’t triggered

Files Modified:

File Change
terraform/modules/api-gateway-rest/main.tf Changed deployment triggers to include full integration objects

Lesson Learned:

When API Gateway integration settings are changed, ensure the deployment triggers reference the full integration resource (not just ID) to detect setting changes like response_transfer_mode.


2026-01-10: Source Citation Improvements

Enhanced source citations with secure document links and improved snippet formatting.

New Features:

  1. Pre-signed S3 URLs for Source Documents
    • Each source now includes a “View Document” link
    • Generates secure, time-limited URLs (1-hour expiry) for direct document access
    • Uses AWS S3 pre-signed URL mechanism via @aws-sdk/s3-request-presigner
    • Added S3 GetObject permission to chat Lambda IAM role
  2. Markdown Snippet Rendering
    • Source snippets are now rendered as Markdown using ReactMarkdown with remark-gfm
    • Supports headers, bold/italic, lists, code blocks, and tables
    • Improves readability of structured content from meeting notes and documentation
  3. Intelligent Markdown Preprocessing
    • Added preprocessMarkdownSnippet() utility to restore logical line breaks
    • Bedrock KB strips newlines during indexing for vector similarity optimization
    • Preprocessing detects headers, bold labels, and list items to insert line breaks
    • Ensures proper rendering of structured Markdown content
  4. Intelligent Snippet Truncation
    • Increased snippet size from 300 to 500 characters for better context
    • Truncates at sentence boundaries when possible (looking for . , ? , ! )
    • Falls back to character limit with ellipsis if no sentence boundary found
  5. URL Regeneration on History Retrieval
    • Pre-signed URLs expire after 1 hour, so viewing old conversations showed broken links
    • GET /chat/{id} now regenerates fresh pre-signed URLs for all sources
    • Uses regenerateSourceUrls() helper with Promise.all for parallel URL generation
    • Fixed API Gateway integration to use streaming invocation endpoint with response_transfer_mode = "STREAM" (required for InvokeWithResponseStream)
  6. URL Validation for Security
    • Added isValidUrl() utility function
    • Only allows http: and https: protocols
    • Blocks dangerous protocols like javascript:, file:, data:
    • Validates URLs before rendering links in the UI

Files Modified:

File Change
web/src/types/index.ts Added document_url field to Source interface
web/src/components/Chat/ChatMessage.tsx Markdown rendering for snippets, View Document link
web/src/lib/utils.ts Added preprocessMarkdownSnippet() and isValidUrl()
web/src/lib/utils.test.ts 22 new tests for preprocessing and URL validation
web/src/components/Chat/ChatMessage.test.tsx 4 new tests for document_url feature
lambda/node/chat/index.js Pre-signed URL generation, intelligent snippet truncation, URL regeneration
lambda/node/chat/__tests__/helpers.test.js 41 tests for createSnippet, parseS3Uri, regenerateSourceUrls
lambda/node/chat/package.json Added S3 SDK dependencies
terraform/modules/lambda/main.tf Added S3 GetObject IAM permission
terraform/modules/api-gateway-rest/main.tf Fixed GET /chat/{id} to use streaming invocation endpoint
docs/reference/api.md Source Object Schema, history response documentation
docs/reference/user-guide.md Updated source citation documentation

Removed Dead Code:

  • Deleted orphaned SourceCard.tsx component (unused, ChatMessage handles sources inline)
  • Deleted corresponding SourceCard.test.tsx test file

2026-01-08: Streaming Response Support

Implemented real-time streaming responses using Lambda Response Streaming and Server-Sent Events (SSE). This was a significant architectural change from the batch-response Python Lambda to a streaming Node.js Lambda.

Architecture:

Component Technology Purpose
Lambda Runtime Node.js 22 Native streaming support via awslambda.streamifyResponse()
API Gateway REST API (not HTTP API) Response streaming URIs for SSE
Protocol Server-Sent Events (SSE) Real-time token delivery
Frontend React EventSource API Progressive response rendering

Implementation Details:

  1. Node.js Chat Lambda (lambda/node/chat/):
    • Uses @aws-sdk/client-bedrock-agent-runtime for Knowledge Base retrieval
    • Uses @aws-sdk/client-bedrock-runtime with invokeModelWithResponseStream for streaming
    • Implements conversation history via DynamoDB
    • Supports query understanding and clarification prompts
    • GET endpoint for history retrieval (GET /chat/{id})
  2. SSE Event Types:

    event: sources
    data: {"sources": [...]}
    
    event: token
    data: {"text": "partial response..."}
    
    event: done
    data: {"done": true, "message_id": "..."}
    
    event: error
    data: {"error": "message"}
    
  3. Clarification Handling:
    • Pre-flight check before streaming begins
    • Returns JSON (not SSE) when clarification needed
    • Frontend detects Content-Type: application/json vs text/event-stream
  4. Terraform Configuration:
    • REST API Gateway with Lambda proxy integration
    • Response streaming URI: arn:aws:apigateway:{region}:lambda:path/.../invocations/response-streaming
    • Cognito authorizer for authentication
    • CORS configuration for web access

Files Created/Modified:

File Change
lambda/node/chat/index.js Main streaming handler
lambda/node/chat/lib/dynamodb.js Conversation history operations
lambda/node/chat/lib/entities.js Entity loading from DynamoDB
lambda/node/chat/lib/query-understanding.js Client extraction with LLM
lambda/node/chat/lib/clarification.js Clarification prompt building
lambda/node/chat/lib/logger.js Structured logging with redaction
terraform/modules/api-gateway-rest/ New REST API module
terraform/modules/lambda/main.tf Chat Lambda configuration
web/src/lib/api.ts SSE streaming client
web/src/hooks/useChat.ts Streaming state management

Testing:

  • 150+ Jest unit tests for Node.js Lambda (npm test in lambda/node/chat/)
  • Integration tests in tests/test_chat_integration.py
  • Snyk security scanning in CI/CD pipeline

Performance Improvement:

  • Time to first token: < 2 seconds (previously 3-5 seconds for full response)
  • Users see response progressively instead of waiting for completion
  • Streaming enabled by default in production

Related: ADR-011: Lambda Response Streaming


2026-01-08: Bedrock Knowledge Base Documentation Audit

Comprehensive audit of all 36 Amazon Bedrock Knowledge Base documentation pages against current implementation. This audit complements the S3 Vectors audit from 2026-01-07.

Documentation Reviewed:

  • Knowledge Base overview, architecture, and how it works
  • Data ingestion process and retrieval mechanisms
  • Chunking strategies (NONE, FIXED_SIZE, HIERARCHICAL, SEMANTIC)
  • Advanced parsing options (default, Bedrock Data Automation, Foundation Model)
  • Custom transformation Lambda functions
  • Metadata configuration and filtering
  • S3 data source connector configuration
  • Direct ingestion API (IngestKnowledgeBaseDocuments)
  • Security configuration and deployment best practices
  • Supported models, regions, and vector stores

Confirmed Correct Implementation:

Configuration Current Value Documentation Alignment
Storage type S3 Vectors Supported, cost-effective option
Embedding model Titan Text Embeddings V2 (1024 dim) Correct for text-only content
Embedding data type FLOAT32 Standard precision (BINARY available for cost savings)
Chunking strategy FIXED_SIZE (512 tokens, 20% overlap) Recommended for S3 Vectors
Parsing strategy Disabled (sidecar metadata) Correct - avoids 2KB metadata limit
Data deletion policy DELETE Enables automatic vector cleanup
Reranking Cohere Rerank v3.5 (optional) Correct model for us-east-1
Multi-tenant isolation Client metadata filtering Proper implementation with defense-in-depth
IAM role configuration Bedrock service role with S3/S3Vectors permissions Correct trust policy with conditions
Retry configuration Adaptive retry with 3 attempts Best practice for Bedrock API reliability

Key Documentation Insights:

  1. Immutable Configurations: Confirmed that chunking strategy, parsing strategy, embedding model, and vector store type cannot be changed after creation.

  2. Sidecar Metadata Files: Documentation confirms our approach of using .metadata.json sidecar files instead of LLM parsing. Current implementation in build_bedrock_metadata_json() follows the correct format.

  3. Incremental Sync: Documentation confirms that StartIngestionJob only processes added, modified, or deleted documents since last sync.

  4. Direct Ingestion API: Documentation reveals IngestKnowledgeBaseDocuments API for immediate ingestion (up to 25 documents per call).

  5. includeForEmbedding Option: Documentation shows metadata can optionally be included in vector embeddings for improved semantic search.


2026-01-07: S3 Vectors Documentation Audit

Comprehensive audit of all 39 S3 Vectors documentation pages against current implementation.

Confirmed Correct Implementation:

Configuration Current Value Status
Non-filterable metadata keys AMAZON_BEDROCK_TEXT, AMAZON_BEDROCK_METADATA Correct - avoids 2KB limit
Fixed-size chunking 512 tokens, 20% overlap Optimal for S3 Vectors
LLM parsing Disabled Correct - sidecar metadata used
Distance metric Cosine Appropriate for text embeddings
Dimensions 1024 (Titan v2) Matches embedding model
Data deletion policy DELETE Enables automatic cleanup
Encryption SSE-S3 (AES256) Sufficient for current needs

Defense-in-Depth Validation:

The chat handler implements post-filtering validation (handler.py:366) that rejects documents where metadata client doesn’t match the filter. This provides a second layer of security beyond vector search filtering.

Sidecar Metadata Files:

The build_bedrock_metadata_json() utility (validation.py:360-448) correctly generates Bedrock KB metadata sidecar files with proper type annotations (STRING/NUMBER). Includes validation for Bedrock KB limits (35 keys max, 1KB per value).

Filter Validation:

The validate_filter_for_s3_vectors() function (filters.py:205-238) correctly rejects unsupported operators (startsWith, stringContains).

Recommendations Implemented:

  • Metadata Value Size Validation - Added to build_bedrock_metadata_json() in validation.py
  • Query Latency Metrics - Added RAGMetrics class in metrics.py, integrated into chat handler
  • S3 Vectors Debugging Runbook - Added to docs/operations/runbook.md

2025

2025-12-31: S3 Vectors Metadata Fix (Critical)

Fixed critical ingestion failures caused by S3 Vectors 2KB filterable metadata limit. Previously 97% of documents failed to index.

Improvement File Change
Non-filterable metadata keys terraform/main.tf Added AMAZON_BEDROCK_TEXT and AMAZON_BEDROCK_METADATA as non-filterable
LLM parsing disabled terraform/modules/bedrock/main.tf Removed parsing configuration (sidecar metadata is sufficient)
Data deletion policy terraform/modules/bedrock/main.tf Changed from RETAIN to DELETE for automatic vector cleanup

Root Cause Analysis:

  • S3 Vectors has a 2KB limit on filterable metadata per vector
  • Bedrock KB stores AMAZON_BEDROCK_TEXT (chunk content) and AMAZON_BEDROCK_METADATA (extracted metadata) as filterable by default
  • For larger document chunks, this exceeded the 2KB limit

Solution:

Configure these as non-filterable in the S3 Vectors index. They can still use the 40KB non-filterable limit while our custom fields (source, client, category) remain filterable for multi-tenant isolation. Project metadata is stored for display purposes but not used for filtering.

Result: 100% ingestion success rate (594/594 documents indexed)

References:


2025-12-29: RAG Retrieval Quality Improvements

Based on Bedrock Knowledge Base documentation audit, the following HIGH priority improvements were implemented:

Improvement File Change
Reranking support lambda/chat/handler.py Optional Bedrock reranking model (cohere.rerank-v3-5:0) to re-score retrieved documents
Adaptive retrieval multiplier lambda/chat/handler.py Dynamic multiplier: 2x base, 3x with client filter
Query sanitization lambda/shared/utils/validation.py Added sanitize_query() function for input safety
Query sanitization applied lambda/chat/handler.py Sanitizes user queries before RAG retrieval

Reranking Details:

  • Uses Cohere Rerank 3.5 (cohere.rerank-v3-5:0) to re-score documents against the actual query
  • Disabled by default - requires accepting Cohere AWS Marketplace agreement
  • Enable via ENABLE_RERANKING=true environment variable after accepting agreement
  • Note: Amazon Rerank 1.0 is not available in us-east-1; Cohere is the supported option
  • Significantly improves relevance for ambiguous queries when enabled
  • Configured to rerank to top max_results after initial retrieval
  • See Bedrock Reranking Guide for supported models and regions

Adaptive Retrieval Multiplier:

# No filters: 2x (standard over-retrieval)
# With client filter: 3x (compensates for filtering)

This ensures sufficient candidates are retrieved even when client metadata filtering reduces the result set. Note: The system uses client-only filtering (not project-level) to allow all documents from a client’s projects to be accessible for richer context.


2025-12-29: S3 Vectors Optimization

The following improvements were implemented based on a comprehensive audit of S3 Vectors documentation and codebase analysis:

Improvement File Change
Chunk size optimization terraform/modules/bedrock/main.tf Increased from 300 to 512 tokens
Chunk overlap terraform/modules/bedrock/main.tf Increased from 10% to 20%
Parsing prompt robustness terraform/modules/bedrock/main.tf Added multi-step extraction with fallbacks
Bedrock retry logic lambda/chat/handler.py Added adaptive retry with 3 attempts
Configurable temperature lambda/chat/handler.py Added LLM_TEMPERATURE env var
Configurable max tokens lambda/chat/handler.py Added LLM_MAX_TOKENS env var

Last updated: 2026-01-11