Query-Driven Metadata Filtering
This document describes the query understanding system that extracts client filters from natural language queries to improve RAG retrieval accuracy.
Overview
Instead of requiring users to select clients from dropdown menus, the system automatically extracts client mentions from natural language queries and applies them as metadata filters to the Bedrock Knowledge Base retrieval. All projects under a client are accessible to build richer context.
Example:
User: "What did we discuss with Acme?"
|
System extracts: client=Acme Corporation
|
Filtered retrieval: Searches all documents for that client (across all projects)
Architecture
+---------------------------------------------------------------------+
| User Query |
| "What did we discuss with Acme?" |
+---------------------------------------------------------------------+
|
v
+---------------------------------------------------------------------+
| STEP 1: Load Known Clients |
| (DynamoDB Entity Registry) |
| |
| Fetch list of known clients with aliases |
+---------------------------------------------------------------------+
|
v
+---------------------------------------------------------------------+
| STEP 2: Query Understanding |
| (Claude via Bedrock Invoke) |
| |
| Input: User query + Known clients |
| Output: { |
| "cleaned_query": "What did we discuss?", |
| "filters": {"client": "Acme Corporation"}, |
| "confidence": 0.95, |
| "ambiguous": false |
| } |
+---------------------------------------------------------------------+
|
+-----------+-----------+
| |
confidence >= 0.8 confidence < 0.8
no ambiguity OR ambiguous
| |
v v
+---------------------------+ +---------------------------+
| STEP 3a: Build Filter | | STEP 3b: Prompt User |
| Apply to KB Retrieve | | Ask for clarification |
+---------------------------+ +---------------------------+
|
v
+---------------------------------------------------------------------+
| STEP 4: Filtered Retrieval |
| (Bedrock KB Retrieve API) |
| |
| retrieve(filter={"equals": {"key": "client", "value": "acme"}}) |
+---------------------------------------------------------------------+
|
v
+---------------------------------------------------------------------+
| STEP 5: Response Generation |
| (Claude via Bedrock) |
+---------------------------------------------------------------------+
Components
1. Entity Registry (DynamoDB)
Stores known clients with aliases for fuzzy matching. Clients are synced from Linear Teams via webhooks and the sync Lambda.
Table: nb-rag-sys-classify
| Attribute | Type | Description |
|---|---|---|
PK |
String (PK) | CLIENT#<id> |
SK |
String (SK) | METADATA |
EntityType |
String (GSI) | CLIENT (for EntityTypeIndex GSI) |
Name |
String | Display name (human-readable) |
Aliases |
List |
Alternative names for matching |
CreatedAt |
String | ISO8601 timestamp |
UpdatedAt |
String | ISO8601 timestamp |
The EntityTypeIndex GSI enables efficient queries by entity type for the query understanding module.
Example Client Item:
{
"PK": "CLIENT#team-123",
"SK": "METADATA",
"EntityType": "CLIENT",
"Name": "Acme Corporation",
"LinearTeamId": "team-123",
"LinearTeamKey": "ACME"
}
2. Query Understanding Module
Location: lambda/shared/utils/query_understanding.py
Uses Claude to analyze queries and extract client references.
Input:
- User query string
- List of known clients
Output:
@dataclass
class QueryUnderstanding:
cleaned_query: str # Query with client mentions removed
filters: Dict[str, str] # Extracted filters {client: "..."}
confidence: float # 0.0-1.0 confidence score
ambiguous: bool # True if multiple possible matches
ambiguous_matches: List[Dict] # List of ambiguous matches for user prompt
original_query: str # Original query for reference
Confidence Threshold: 0.8 (industry standard for entity extraction)
3. Filter Builder
Location: lambda/shared/utils/filters.py
Converts extracted metadata into Bedrock KB RetrievalFilter format.
Supported Operators (S3 Vectors compatible):
equals- Exact matchnotEquals- Exclusionin- Multiple valuesnotIn- Multiple exclusionsgreaterThan,lessThan- Numeric/date comparisonsandAll,orAll- Logical combinations
Note: startsWith and stringContains are NOT supported with S3 Vectors.
4. User Prompting
When the system cannot confidently extract filters, it prompts the user.
Scenarios requiring prompts:
- No client found - Show available clients and ask user to specify
- Low confidence (< 0.8) - Ask user to confirm or clarify
- Ambiguous match - Multiple clients match, ask user to choose
Prompt Response Format:
{
"type": "clarification_needed",
"message": "I found multiple clients that match 'Acme'. Which one did you mean?",
"options": [
{"id": "Acme Corporation", "display_name": "Acme Corporation", "entity_type": "client"},
{"id": "Acme Labs", "display_name": "Acme Labs", "entity_type": "client"}
],
"original_query": "What did Acme say about pricing?"
}
API Changes
Chat Lambda Request
{
"query": "What did Acme say about pricing?",
"session_id": "session-id",
"client": "optional-explicit-client"
}
Chat Lambda Response
Normal response:
{
"answer": "Based on the meeting notes...",
"sources": [...],
"filters_applied": {"client": "Acme Corporation"}
}
Clarification needed:
{
"type": "clarification_needed",
"message": "Which client did you mean?",
"options": [...],
"original_query": "..."
}
Why Client-Only Filtering
The system filters by client only (not project) for these reasons:
- Richer context: All projects under a client are accessible in RAG queries, providing more comprehensive answers
- Simpler queries: Users only need to specify the client, not both client and project
- Fewer clarification prompts: No project ambiguity to resolve
- Faster extraction: Simpler LLM prompt = faster response from Claude Haiku
- Security boundary: Client-level isolation is the real security boundary
Project metadata is still stored on documents for display purposes and auditing, but not used for filtering.
Cost Considerations
Query understanding adds one Claude API call per chat message for client extraction.
Estimated additional cost per query:
- Input tokens: ~300 (query + client list + prompt)
- Output tokens: ~50 (structured JSON response)
- Cost: ~$0.001 per query (Claude Haiku)
Recommendation: Use Claude Haiku for query understanding (fast, cost-effective for structured extraction).
Monthly cost estimate (1000 queries/month):
- Haiku: ~$1/month additional
Security Considerations
- Tenant isolation: Filters are applied server-side; users cannot bypass them
- Client validation: Only known clients from DynamoDB can be used as filters
- No client-side filter construction: Filters are built from LLM extraction, not user input
- Defense-in-depth: Retrieved documents are validated post-retrieval to ensure client match
Adaptive Retrieval Multiplier
To compensate for potential filter mismatches in metadata, the system retrieves more candidates when filtering:
- No filter: 2x multiplier (standard over-retrieval for quality)
- With client filter: 3x multiplier (some results may be filtered out)
This ensures sufficient high-quality results even after filtering.
References
- Multi-tenancy in RAG with metadata filtering - AWS Blog
- RetrievalFilter API Reference - AWS Docs
Last updated: 2025-12-31