API Reference
Complete API documentation for the NorthBuilt RAG System.
Table of Contents
- Overview
- Authentication
- Endpoints
- Rate Limits
- Error Codes
- Query Understanding
- Best Practices
- SDK Examples
- Changelog
Overview
The NorthBuilt RAG System exposes REST APIs via AWS API Gateway. All endpoints require JWT authentication via AWS Cognito.
Chat API: https://{api-id}.execute-api.us-east-1.amazonaws.com/v1
Authentication
All API requests require a valid JWT token from AWS Cognito in the Authorization header.
Obtaining a Token
Users authenticate via Google OAuth through the Cognito Hosted UI:
- Navigate to the web application
- Click “Sign in with Google”
- Complete Google OAuth flow
- Token is stored in browser and included in API requests automatically
For programmatic access, use the Cognito SDK:
import boto3
cognito = boto3.client('cognito-idp', region_name='us-east-1')
response = cognito.initiate_auth(
ClientId='your-client-id',
AuthFlow='USER_PASSWORD_AUTH',
AuthParameters={
'USERNAME': 'user@example.com',
'PASSWORD': 'password'
}
)
token = response['AuthenticationResult']['IdToken']
Using the Token
Include the JWT in the Authorization header:
Authorization: Bearer eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9...
Token Expiration
- ID Token: 1 hour
- Refresh Token: 30 days
When tokens expire, the web application automatically refreshes them using the refresh token.
Endpoints
POST /chat (Recommended)
Query the RAG system with real-time streaming responses via Server-Sent Events (SSE). Provides immediate feedback as the response generates.
URL: POST /chat
Headers:
| Header | Required | Description |
|---|---|---|
Authorization |
Yes | Bearer token from Cognito |
Content-Type |
Yes | application/json |
Accept |
No | text/event-stream (optional) |
Request Body:
{
"session_id": "unique-session-id",
"message": "What did we discuss with Acme about pricing?",
"client": "Optional explicit client filter",
"max_results": 5
}
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
session_id |
string | Yes | - | Session ID for conversation history |
message |
string | Yes | - | Natural language question |
client |
string | No | extracted from query | Explicit client filter |
max_results |
integer | No | 5 | Number of documents to retrieve (1-20) |
Response (SSE Stream):
The response is a stream of Server-Sent Events:
event: sources
data: {"sources": [{"title": "Meeting Notes", "client": "Acme Corp", ...}]}
event: token
data: {"text": "Based on"}
event: token
data: {"text": " the meeting"}
event: token
data: {"text": " notes..."}
event: done
data: {"done": true, "message_id": "msg_abc123"}
Event Types:
| Event | Description |
|---|---|
sources |
Retrieved documents used for the response |
token |
Partial response text (streamed incrementally) |
done |
Stream completion with message ID and filters applied |
error |
Error information if something fails |
Source Object Schema:
Each source in the sources event contains:
{
"document_number": 1,
"relevance_score": 0.8542,
"snippet": "Meeting notes excerpt with context...",
"metadata": {
"client": "Acme Corp",
"project": "Q1 Planning",
"source": "fathom_meeting_12345.md"
},
"location": {
"type": "S3",
"s3Location": {
"uri": "s3://bucket/path/to/document.md"
}
},
"document_url": "https://bucket.s3.amazonaws.com/path/to/document.md?X-Amz-..."
}
| Field | Type | Description |
|---|---|---|
document_number |
integer | Sequential number for display (1-indexed) |
relevance_score |
float | Relevance score from 0.0 to 1.0 |
snippet |
string | First 500 characters of document content |
metadata |
object | Document metadata (client, project, source, category) |
location |
object | S3 location information from Bedrock KB |
document_url |
string | Pre-signed S3 URL for secure document access (1-hour expiry) |
The document_url field provides a time-limited secure link to the original source document, allowing users to view the full document context.
Response (Clarification Needed):
When clarification is needed, returns JSON instead of SSE (check Content-Type header):
{
"type": "clarification_needed",
"message": "I found multiple clients matching 'Acme'. Which one?",
"options": [
{"id": "acme-corp", "display_name": "Acme Corporation"},
{"id": "acme-labs", "display_name": "Acme Labs"}
],
"original_query": "What about Acme?",
"clarification_type": "ambiguous",
"session_id": "sess_abc123"
}
GET /chat/{id}
Retrieve conversation history for a session.
URL: GET /chat/{id}
Headers:
| Header | Required | Description |
|---|---|---|
Authorization |
Yes | Bearer token from Cognito |
Response:
{
"session_id": "sess_abc123",
"messages": [
{
"role": "user",
"content": "What did we discuss with Acme?",
"timestamp": 1736334600
},
{
"role": "assistant",
"content": "Based on the meeting notes...",
"timestamp": 1736334605,
"sources": [
{
"document_number": 1,
"relevance_score": 0.8542,
"snippet": "Meeting notes excerpt...",
"metadata": {"client": "Acme Corp"},
"location": {"s3Location": {"uri": "s3://bucket/doc.md"}},
"document_url": "https://bucket.s3.amazonaws.com/doc.md?X-Amz-..."
}
]
}
]
}
Note: The document_url field contains a freshly-generated pre-signed URL (1-hour expiry). URLs are regenerated on each history retrieval to ensure they are always valid, even for older conversations.
Webhook Endpoints
Webhook endpoints receive data from external services. These are internal endpoints secured by API key validation.
POST /webhooks/fathom
Receives video transcript webhooks from Fathom.
Authentication: API key in request header or body (configured in Fathom)
Payload: Fathom webhook format (see Fathom documentation)
POST /webhooks/helpscout
Receives conversation webhooks from HelpScout.
Authentication: API key validation (configured in HelpScout)
Payload: HelpScout webhook format (see HelpScout documentation)
Sync Endpoints
Manual sync endpoints for triggering data ingestion. All sync endpoints require JWT authentication.
POST /sync/fathom
Trigger a manual sync of Fathom recordings.
Authentication: JWT Bearer token (Cognito)
Response: 202 Accepted with worker invocation confirmation
POST /sync/helpscout
Trigger a manual sync of HelpScout conversations.
Authentication: JWT Bearer token (Cognito)
Response: 202 Accepted with worker invocation confirmation
Rate Limits
The API enforces rate limits to ensure fair usage:
| Limit | Value |
|---|---|
| Rate | 10 requests/second |
| Burst | 20 requests |
When rate limited, the API returns HTTP 429:
{
"error": "TOO_MANY_REQUESTS",
"message": "Rate limit exceeded. Please wait and try again.",
"retry_after": 1
}
Error Codes
| HTTP Status | Error Code | Description | Resolution |
|---|---|---|---|
| 200 | - | Success | N/A |
| 400 | BAD_REQUEST |
Invalid request | Check request body format |
| 401 | UNAUTHORIZED |
Authentication failed | Refresh JWT token |
| 403 | FORBIDDEN |
Access denied | Check user permissions |
| 429 | TOO_MANY_REQUESTS |
Rate limited | Wait and retry |
| 500 | INTERNAL_ERROR |
Server error | Check CloudWatch logs |
| 503 | SERVICE_UNAVAILABLE |
Bedrock/KB unavailable | Retry after delay |
Query Understanding
The system automatically extracts client context from natural language queries. This enables users to ask questions naturally without needing to select filters from dropdowns.
How It Works
- User submits query: “What did we discuss with Acme?”
- System extracts client mention: “Acme”
- System matches to known client: “Acme Corporation”
- System applies filter and retrieves relevant documents
Supported Patterns
The query understanding recognizes various ways users might mention clients:
- Direct name: “What about Acme Corporation?”
- Abbreviation: “What did Acme say?”
- Aliases: “Tell me about AC” (if “AC” is registered as an alias)
Confidence and Clarification
When the system has low confidence or finds ambiguous matches, it returns a clarification request rather than potentially incorrect results.
Best Practices
Query Formulation
- Be specific: “What was discussed in the December meeting with Acme?” is better than “meetings”
- Include context: Mention client names, project names, or dates when relevant
- Ask one question at a time: Complex multi-part questions may yield less focused answers
Session Management
- Use consistent
session_idfor follow-up questions in a conversation - Start a new session for unrelated queries
- Sessions maintain context for 24 hours
Error Handling
- Implement exponential backoff for 429 and 503 errors
- Cache responses when appropriate to reduce API calls
- Log correlation IDs from responses for debugging
SDK Examples
Python (Streaming)
import requests
import json
API_URL = "https://your-api-id.execute-api.us-east-1.amazonaws.com/v1"
TOKEN = "your-jwt-token"
def query_rag_streaming(message: str, session_id: str):
"""Query the RAG system with streaming response."""
response = requests.post(
f"{API_URL}/chat",
headers={
"Authorization": f"Bearer {TOKEN}",
"Content-Type": "application/json",
"Accept": "text/event-stream"
},
json={
"message": message,
"session_id": session_id
},
stream=True
)
response.raise_for_status()
# Parse SSE events
for line in response.iter_lines():
if line:
line = line.decode('utf-8')
if line.startswith('data:'):
data = json.loads(line[5:].strip())
yield data
# Example usage
for event in query_rag_streaming("What did we discuss with Acme?", "session-123"):
if 'text' in event:
print(event['text'], end='', flush=True)
elif 'sources' in event:
print(f"\n\nSources: {len(event['sources'])} documents")
JavaScript/TypeScript (Streaming)
async function* queryRAGStreaming(
message: string,
sessionId: string
): AsyncGenerator<StreamEvent> {
const response = await fetch(`${API_URL}/chat`, {
method: 'POST',
headers: {
'Authorization': `Bearer ${token}`,
'Content-Type': 'application/json',
'Accept': 'text/event-stream',
},
body: JSON.stringify({
message,
session_id: sessionId,
}),
});
if (!response.ok) {
throw new Error(`API error: ${response.status}`);
}
const reader = response.body?.getReader();
const decoder = new TextDecoder();
while (reader) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value);
// Parse SSE events from chunk
for (const line of chunk.split('\n')) {
if (line.startsWith('data:')) {
yield JSON.parse(line.slice(5).trim());
}
}
}
}
// Example usage
for await (const event of queryRAGStreaming('What about Acme?', 'session-123')) {
if (event.text) process.stdout.write(event.text);
}
cURL (Streaming)
curl -N -X POST "https://your-api-id.execute-api.us-east-1.amazonaws.com/v1/chat" \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-H "Accept: text/event-stream" \
-d '{
"session_id": "test-session",
"message": "What did we discuss with Acme about pricing?",
"max_results": 5
}'
Changelog
| Date | Change |
|---|---|
| 2026-01-12 | Added Sync Endpoints section for manual data ingestion |
| 2026-01-09 | Removed batch /chat endpoint - use /chat for all chat operations |
| 2026-01-08 | Added streaming endpoint documentation (/chat) |
| 2026-01-01 | Updated for client-only filtering (removed project filter) |
| 2025-12-31 | Added clarification response format |
| 2025-12-29 | Added reranking for improved relevance |
Last updated: 2026-01-17