RAG System Changelog

Chronological record of implemented improvements, optimizations, and fixes for the NorthBuilt RAG system. For pending improvements and backlog items, see RAG Improvements.

2026
2025

2026

2026-01-11: GET /chat/{id} Empty Response Fix

Fixed critical bug where GET /chat/{id} endpoint returned empty responses despite Lambda executing successfully.

Root Cause:

The Terraform deployment triggers for aws_api_gateway_deployment.streaming only included resource IDs, not the full integration objects. When response_transfer_mode = "STREAM" was added to the GET integration on 2026-01-10, Terraform didn’t detect this as a change requiring redeployment because the integration ID remained the same.

This caused the GET endpoint to use responseTransferMode: "BUFFERED" (the default) instead of "STREAM", resulting in empty responses. API Gateway access logs confirmed this discrepancy.

Fix:

Updated terraform/modules/api-gateway-rest/main.tf to include full integration objects in the deployment triggers instead of just their IDs:

resource "aws_api_gateway_deployment" "streaming" {
  triggers = {
    redeployment = sha1(jsonencode([
      aws_api_gateway_integration.chat_stream_lambda,      # Full object, not .id
      aws_api_gateway_integration.chat_stream_get_lambda,  # Full object, not .id
      # ... other resources
    ]))
  }
}

Debugging Approach:

Invoked Lambda directly via AWS CLI - returned correct data
Checked API Gateway access logs: aws logs tail /aws/apigateway/nb-rag-sys-rest
Found GET requests showed "responseTransferMode":"BUFFERED" while POST showed "STREAM"
Checked Terraform state - response_transfer_mode = "STREAM" was configured but deployment wasn’t triggered

Files Modified:

File	Change
`terraform/modules/api-gateway-rest/main.tf`	Changed deployment triggers to include full integration objects

Lesson Learned:

When API Gateway integration settings are changed, ensure the deployment triggers reference the full integration resource (not just ID) to detect setting changes like response_transfer_mode.

2026-01-10: Source Citation Improvements

Enhanced source citations with secure document links and improved snippet formatting.

New Features:

Pre-signed S3 URLs for Source Documents
- Each source now includes a “View Document” link
- Generates secure, time-limited URLs (1-hour expiry) for direct document access
- Uses AWS S3 pre-signed URL mechanism via @aws-sdk/s3-request-presigner
- Added S3 GetObject permission to chat Lambda IAM role
Markdown Snippet Rendering
- Source snippets are now rendered as Markdown using ReactMarkdown with remark-gfm
- Supports headers, bold/italic, lists, code blocks, and tables
- Improves readability of structured content from meeting notes and documentation
Intelligent Markdown Preprocessing
- Added preprocessMarkdownSnippet() utility to restore logical line breaks
- Bedrock KB strips newlines during indexing for vector similarity optimization
- Preprocessing detects headers, bold labels, and list items to insert line breaks
- Ensures proper rendering of structured Markdown content
Intelligent Snippet Truncation
- Increased snippet size from 300 to 500 characters for better context
- Truncates at sentence boundaries when possible (looking for . , ? , ! )
- Falls back to character limit with ellipsis if no sentence boundary found
URL Regeneration on History Retrieval
- Pre-signed URLs expire after 1 hour, so viewing old conversations showed broken links
- GET /chat/{id} now regenerates fresh pre-signed URLs for all sources
- Uses regenerateSourceUrls() helper with Promise.all for parallel URL generation
- Fixed API Gateway integration to use streaming invocation endpoint with response_transfer_mode = "STREAM" (required for InvokeWithResponseStream)
URL Validation for Security
- Added isValidUrl() utility function
- Only allows http: and https: protocols
- Blocks dangerous protocols like javascript:, file:, data:
- Validates URLs before rendering links in the UI

Files Modified:

File	Change
`web/src/types/index.ts`	Added `document_url` field to Source interface
`web/src/components/Chat/ChatMessage.tsx`	Markdown rendering for snippets, View Document link
`web/src/lib/utils.ts`	Added `preprocessMarkdownSnippet()` and `isValidUrl()`
`web/src/lib/utils.test.ts`	22 new tests for preprocessing and URL validation
`web/src/components/Chat/ChatMessage.test.tsx`	4 new tests for document_url feature
`lambda/node/chat/index.js`	Pre-signed URL generation, intelligent snippet truncation, URL regeneration
`lambda/node/chat/__tests__/helpers.test.js`	41 tests for createSnippet, parseS3Uri, regenerateSourceUrls
`lambda/node/chat/package.json`	Added S3 SDK dependencies
`terraform/modules/lambda/main.tf`	Added S3 GetObject IAM permission
`terraform/modules/api-gateway-rest/main.tf`	Fixed GET /chat/{id} to use streaming invocation endpoint
`docs/reference/api.md`	Source Object Schema, history response documentation
`docs/reference/user-guide.md`	Updated source citation documentation

Removed Dead Code:

Deleted orphaned SourceCard.tsx component (unused, ChatMessage handles sources inline)
Deleted corresponding SourceCard.test.tsx test file

2026-01-08: Streaming Response Support

Implemented real-time streaming responses using Lambda Response Streaming and Server-Sent Events (SSE). This was a significant architectural change from the batch-response Python Lambda to a streaming Node.js Lambda.

Architecture:

Component	Technology	Purpose
Lambda Runtime	Node.js 22	Native streaming support via `awslambda.streamifyResponse()`
API Gateway	REST API (not HTTP API)	Response streaming URIs for SSE
Protocol	Server-Sent Events (SSE)	Real-time token delivery
Frontend	React EventSource API	Progressive response rendering

Implementation Details:

Node.js Chat Lambda (lambda/node/chat/):
- Uses @aws-sdk/client-bedrock-agent-runtime for Knowledge Base retrieval
- Uses @aws-sdk/client-bedrock-runtime with invokeModelWithResponseStream for streaming
- Implements conversation history via DynamoDB
- Supports query understanding and clarification prompts
- GET endpoint for history retrieval (GET /chat/{id})

SSE Event Types:

event: sources
data: {"sources": [...]}

event: token
data: {"text": "partial response..."}

event: done
data: {"done": true, "message_id": "..."}

event: error
data: {"error": "message"}

Clarification Handling:
- Pre-flight check before streaming begins
- Returns JSON (not SSE) when clarification needed
- Frontend detects Content-Type: application/json vs text/event-stream
Terraform Configuration:
- REST API Gateway with Lambda proxy integration
- Response streaming URI: arn:aws:apigateway:{region}:lambda:path/.../invocations/response-streaming
- Cognito authorizer for authentication
- CORS configuration for web access

Files Created/Modified:

File	Change
`lambda/node/chat/index.js`	Main streaming handler
`lambda/node/chat/lib/dynamodb.js`	Conversation history operations
`lambda/node/chat/lib/entities.js`	Entity loading from DynamoDB
`lambda/node/chat/lib/query-understanding.js`	Client extraction with LLM
`lambda/node/chat/lib/clarification.js`	Clarification prompt building
`lambda/node/chat/lib/logger.js`	Structured logging with redaction
`terraform/modules/api-gateway-rest/`	New REST API module
`terraform/modules/lambda/main.tf`	Chat Lambda configuration
`web/src/lib/api.ts`	SSE streaming client
`web/src/hooks/useChat.ts`	Streaming state management

Testing:

150+ Jest unit tests for Node.js Lambda (npm test in lambda/node/chat/)
Integration tests in tests/test_chat_integration.py
Snyk security scanning in CI/CD pipeline

Performance Improvement:

Time to first token: < 2 seconds (previously 3-5 seconds for full response)
Users see response progressively instead of waiting for completion
Streaming enabled by default in production

2026-01-08: Bedrock Knowledge Base Documentation Audit

Comprehensive audit of all 36 Amazon Bedrock Knowledge Base documentation pages against current implementation. This audit complements the S3 Vectors audit from 2026-01-07.

Documentation Reviewed:

Knowledge Base overview, architecture, and how it works
Data ingestion process and retrieval mechanisms
Chunking strategies (NONE, FIXED_SIZE, HIERARCHICAL, SEMANTIC)
Advanced parsing options (default, Bedrock Data Automation, Foundation Model)
Custom transformation Lambda functions
Metadata configuration and filtering
S3 data source connector configuration
Direct ingestion API (IngestKnowledgeBaseDocuments)
Security configuration and deployment best practices
Supported models, regions, and vector stores

Confirmed Correct Implementation:

Configuration	Current Value	Documentation Alignment
Storage type	S3 Vectors	Supported, cost-effective option
Embedding model	Titan Text Embeddings V2 (1024 dim)	Correct for text-only content
Embedding data type	FLOAT32	Standard precision (BINARY available for cost savings)
Chunking strategy	FIXED_SIZE (512 tokens, 20% overlap)	Recommended for S3 Vectors
Parsing strategy	Disabled (sidecar metadata)	Correct - avoids 2KB metadata limit
Data deletion policy	DELETE	Enables automatic vector cleanup
Reranking	Cohere Rerank v3.5 (optional)	Correct model for us-east-1
Multi-tenant isolation	Client metadata filtering	Proper implementation with defense-in-depth
IAM role configuration	Bedrock service role with S3/S3Vectors permissions	Correct trust policy with conditions
Retry configuration	Adaptive retry with 3 attempts	Best practice for Bedrock API reliability

Key Documentation Insights:

Immutable Configurations: Confirmed that chunking strategy, parsing strategy, embedding model, and vector store type cannot be changed after creation.
Sidecar Metadata Files: Documentation confirms our approach of using .metadata.json sidecar files instead of LLM parsing. Current implementation in build_bedrock_metadata_json() follows the correct format.
Incremental Sync: Documentation confirms that StartIngestionJob only processes added, modified, or deleted documents since last sync.
Direct Ingestion API: Documentation reveals IngestKnowledgeBaseDocuments API for immediate ingestion (up to 25 documents per call).
includeForEmbedding Option: Documentation shows metadata can optionally be included in vector embeddings for improved semantic search.

2026-01-07: S3 Vectors Documentation Audit

Comprehensive audit of all 39 S3 Vectors documentation pages against current implementation.

Confirmed Correct Implementation:

Configuration	Current Value	Status
Non-filterable metadata keys	`AMAZON_BEDROCK_TEXT`, `AMAZON_BEDROCK_METADATA`	Correct - avoids 2KB limit
Fixed-size chunking	512 tokens, 20% overlap	Optimal for S3 Vectors
LLM parsing	Disabled	Correct - sidecar metadata used
Distance metric	Cosine	Appropriate for text embeddings
Dimensions	1024 (Titan v2)	Matches embedding model
Data deletion policy	DELETE	Enables automatic cleanup
Encryption	SSE-S3 (AES256)	Sufficient for current needs

Defense-in-Depth Validation:

The chat handler implements post-filtering validation (handler.py:366) that rejects documents where metadata client doesn’t match the filter. This provides a second layer of security beyond vector search filtering.

Sidecar Metadata Files:

The build_bedrock_metadata_json() utility (validation.py:360-448) correctly generates Bedrock KB metadata sidecar files with proper type annotations (STRING/NUMBER). Includes validation for Bedrock KB limits (35 keys max, 1KB per value).

Filter Validation:

The validate_filter_for_s3_vectors() function (filters.py:205-238) correctly rejects unsupported operators (startsWith, stringContains).

Recommendations Implemented:

Metadata Value Size Validation - Added to build_bedrock_metadata_json() in validation.py
Query Latency Metrics - Added RAGMetrics class in metrics.py, integrated into chat handler
S3 Vectors Debugging Runbook - Added to docs/operations/runbook.md

2025

2025-12-31: S3 Vectors Metadata Fix (Critical)

Fixed critical ingestion failures caused by S3 Vectors 2KB filterable metadata limit. Previously 97% of documents failed to index.

Improvement	File	Change
Non-filterable metadata keys	`terraform/main.tf`	Added `AMAZON_BEDROCK_TEXT` and `AMAZON_BEDROCK_METADATA` as non-filterable
LLM parsing disabled	`terraform/modules/bedrock/main.tf`	Removed parsing configuration (sidecar metadata is sufficient)
Data deletion policy	`terraform/modules/bedrock/main.tf`	Changed from RETAIN to DELETE for automatic vector cleanup

Root Cause Analysis:

S3 Vectors has a 2KB limit on filterable metadata per vector
Bedrock KB stores AMAZON_BEDROCK_TEXT (chunk content) and AMAZON_BEDROCK_METADATA (extracted metadata) as filterable by default
For larger document chunks, this exceeded the 2KB limit

Solution:

Configure these as non-filterable in the S3 Vectors index. They can still use the 40KB non-filterable limit while our custom fields (source, client, category) remain filterable for multi-tenant isolation. Project metadata is stored for display purposes but not used for filtering.

Result: 100% ingestion success rate (594/594 documents indexed)

References:

2025-12-29: RAG Retrieval Quality Improvements

Based on Bedrock Knowledge Base documentation audit, the following HIGH priority improvements were implemented:

Improvement	File	Change
Reranking support	`lambda/chat/handler.py`	Optional Bedrock reranking model (cohere.rerank-v3-5:0) to re-score retrieved documents
Adaptive retrieval multiplier	`lambda/chat/handler.py`	Dynamic multiplier: 2x base, 3x with client filter
Query sanitization	`lambda/shared/utils/validation.py`	Added `sanitize_query()` function for input safety
Query sanitization applied	`lambda/chat/handler.py`	Sanitizes user queries before RAG retrieval

Reranking Details:

Uses Cohere Rerank 3.5 (cohere.rerank-v3-5:0) to re-score documents against the actual query
Disabled by default - requires accepting Cohere AWS Marketplace agreement
Enable via ENABLE_RERANKING=true environment variable after accepting agreement
Note: Amazon Rerank 1.0 is not available in us-east-1; Cohere is the supported option
Significantly improves relevance for ambiguous queries when enabled
Configured to rerank to top max_results after initial retrieval
See Bedrock Reranking Guide for supported models and regions

Adaptive Retrieval Multiplier:

# No filters: 2x (standard over-retrieval)
# With client filter: 3x (compensates for filtering)

This ensures sufficient candidates are retrieved even when client metadata filtering reduces the result set. Note: The system uses client-only filtering (not project-level) to allow all documents from a client’s projects to be accessible for richer context.

2025-12-29: S3 Vectors Optimization

The following improvements were implemented based on a comprehensive audit of S3 Vectors documentation and codebase analysis:

Improvement	File	Change
Chunk size optimization	`terraform/modules/bedrock/main.tf`	Increased from 300 to 512 tokens
Chunk overlap	`terraform/modules/bedrock/main.tf`	Increased from 10% to 20%
Parsing prompt robustness	`terraform/modules/bedrock/main.tf`	Added multi-step extraction with fallbacks
Bedrock retry logic	`lambda/chat/handler.py`	Added adaptive retry with 3 attempts
Configurable temperature	`lambda/chat/handler.py`	Added LLM_TEMPERATURE env var
Configurable max tokens	`lambda/chat/handler.py`	Added LLM_MAX_TOKENS env var

Last updated: 2026-01-11