API Reference
Complete API documentation for the NorthBuilt RAG System.
Table of Contents
- Overview
- Authentication
- Endpoints
- Rate Limits
- Error Codes
- Query Understanding
- Best Practices
- SDK Examples
- Changelog
Overview
The NorthBuilt RAG System exposes a REST API via AWS API Gateway. All endpoints require JWT authentication via AWS Cognito.
Base URL: https://{api-id}.execute-api.us-east-1.amazonaws.com
Authentication
All API requests require a valid JWT token from AWS Cognito in the Authorization header.
Obtaining a Token
Users authenticate via Google OAuth through the Cognito Hosted UI:
- Navigate to the web application
- Click “Sign in with Google”
- Complete Google OAuth flow
- Token is stored in browser and included in API requests automatically
For programmatic access, use the Cognito SDK:
import boto3
cognito = boto3.client('cognito-idp', region_name='us-east-1')
response = cognito.initiate_auth(
ClientId='your-client-id',
AuthFlow='USER_PASSWORD_AUTH',
AuthParameters={
'USERNAME': 'user@example.com',
'PASSWORD': 'password'
}
)
token = response['AuthenticationResult']['IdToken']
Using the Token
Include the JWT in the Authorization header:
Authorization: Bearer eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9...
Token Expiration
- ID Token: 1 hour
- Refresh Token: 30 days
When tokens expire, the web application automatically refreshes them using the refresh token.
Endpoints
POST /chat
Query the RAG system with natural language. The system automatically extracts client context from the query and applies metadata filtering.
URL: POST /chat
Headers:
| Header | Required | Description |
|——–|———-|————-|
| Authorization | Yes | Bearer token from Cognito |
| Content-Type | Yes | application/json |
Request Body:
{
"query": "What did we discuss with Acme about pricing?",
"session_id": "optional-session-id",
"client": "Optional explicit client filter",
"max_results": 5
}
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
query |
string | Yes | - | Natural language question |
session_id |
string | No | auto-generated | Session ID for conversation history |
client |
string | No | extracted from query | Explicit client filter |
max_results |
integer | No | 5 | Number of documents to retrieve (1-20) |
Response (Success):
{
"query": "What did we discuss with Acme about pricing?",
"answer": "Based on the meeting notes from December 15th, the discussion with Acme Corporation covered several pricing topics...",
"sources": [
{
"document_number": 1,
"relevance_score": 0.92,
"snippet": "Meeting with Acme - we discussed the enterprise pricing tier...",
"metadata": {
"source": "fathom",
"client": "Acme Corporation",
"project": "Enterprise Deal",
"title": "Acme Pricing Discussion",
"date": "2024-12-15"
}
},
{
"document_number": 2,
"relevance_score": 0.85,
"snippet": "Follow-up email regarding pricing structure...",
"metadata": {
"source": "helpscout",
"client": "Acme Corporation",
"project": "Support",
"title": "RE: Pricing Questions"
}
}
],
"filters_applied": {
"client": "Acme Corporation"
},
"session_id": "sess_abc123"
}
| Field | Type | Description |
|---|---|---|
query |
string | Original query (echoed back) |
answer |
string | Generated response grounded in retrieved documents |
sources |
array | List of source documents used for the answer |
sources[].document_number |
integer | Document number for citation |
sources[].relevance_score |
float | Relevance score (0.0-1.0) |
sources[].snippet |
string | Relevant excerpt from the document |
sources[].metadata |
object | Document metadata |
filters_applied |
object | Metadata filters that were applied |
session_id |
string | Session ID for follow-up queries |
Response (Clarification Needed):
When the system cannot confidently determine the client from the query, it returns a clarification request:
{
"type": "clarification_needed",
"message": "I found multiple clients that match 'Acme'. Which one did you mean?",
"options": [
{
"id": "Acme Corporation",
"display_name": "Acme Corporation",
"entity_type": "client"
},
{
"id": "Acme Labs",
"display_name": "Acme Labs",
"entity_type": "client"
}
],
"original_query": "What did Acme say about pricing?"
}
Error Responses:
| Status | Code | Description |
|---|---|---|
| 400 | BAD_REQUEST |
Invalid request body or missing required fields |
| 401 | UNAUTHORIZED |
Missing or invalid JWT token |
| 429 | TOO_MANY_REQUESTS |
Rate limit exceeded |
| 500 | INTERNAL_ERROR |
Server error (check CloudWatch logs) |
Example Error:
{
"error": "BAD_REQUEST",
"message": "Query is required"
}
Webhook Endpoints
Webhook endpoints receive data from external services. These are internal endpoints secured by API key validation.
POST /webhooks/fathom
Receives video transcript webhooks from Fathom.
Authentication: API key in request header or body (configured in Fathom)
Payload: Fathom webhook format (see Fathom documentation)
POST /webhooks/helpscout
Receives conversation webhooks from HelpScout.
Authentication: API key validation (configured in HelpScout)
Payload: HelpScout webhook format (see HelpScout documentation)
POST /webhooks/linear
Receives team and project webhooks from Linear. Used to sync the entity registry for query understanding.
Authentication: Webhook signature validation
Payload: Linear webhook format (see Linear documentation)
Rate Limits
The API enforces rate limits to ensure fair usage:
| Limit | Value |
|---|---|
| Rate | 10 requests/second |
| Burst | 20 requests |
When rate limited, the API returns HTTP 429:
{
"error": "TOO_MANY_REQUESTS",
"message": "Rate limit exceeded. Please wait and try again.",
"retry_after": 1
}
Error Codes
| HTTP Status | Error Code | Description | Resolution |
|---|---|---|---|
| 200 | - | Success | N/A |
| 400 | BAD_REQUEST |
Invalid request | Check request body format |
| 401 | UNAUTHORIZED |
Authentication failed | Refresh JWT token |
| 403 | FORBIDDEN |
Access denied | Check user permissions |
| 429 | TOO_MANY_REQUESTS |
Rate limited | Wait and retry |
| 500 | INTERNAL_ERROR |
Server error | Check CloudWatch logs |
| 503 | SERVICE_UNAVAILABLE |
Bedrock/KB unavailable | Retry after delay |
Query Understanding
The system automatically extracts client context from natural language queries. This enables users to ask questions naturally without needing to select filters from dropdowns.
How It Works
- User submits query: “What did we discuss with Acme?”
- System extracts client mention: “Acme”
- System matches to known client: “Acme Corporation”
- System applies filter and retrieves relevant documents
Supported Patterns
The query understanding recognizes various ways users might mention clients:
- Direct name: “What about Acme Corporation?”
- Abbreviation: “What did Acme say?”
- Aliases: “Tell me about AC” (if “AC” is registered as an alias)
Confidence and Clarification
When the system has low confidence or finds ambiguous matches, it returns a clarification request rather than potentially incorrect results.
Best Practices
Query Formulation
- Be specific: “What was discussed in the December meeting with Acme?” is better than “meetings”
- Include context: Mention client names, project names, or dates when relevant
- Ask one question at a time: Complex multi-part questions may yield less focused answers
Session Management
- Use consistent
session_idfor follow-up questions in a conversation - Start a new session for unrelated queries
- Sessions maintain context for 24 hours
Error Handling
- Implement exponential backoff for 429 and 503 errors
- Cache responses when appropriate to reduce API calls
- Log correlation IDs from responses for debugging
SDK Examples
Python
import requests
API_URL = "https://your-api-id.execute-api.us-east-1.amazonaws.com"
TOKEN = "your-jwt-token"
def query_rag(query: str, session_id: str = None) -> dict:
"""Query the RAG system."""
response = requests.post(
f"{API_URL}/chat",
headers={
"Authorization": f"Bearer {TOKEN}",
"Content-Type": "application/json"
},
json={
"query": query,
"session_id": session_id
}
)
response.raise_for_status()
return response.json()
# Example usage
result = query_rag("What did we discuss with Acme about pricing?")
print(result["answer"])
for source in result["sources"]:
print(f"[{source['document_number']}] {source['snippet'][:100]}...")
JavaScript/TypeScript
interface ChatResponse {
query: string;
answer: string;
sources: Array<{
document_number: number;
relevance_score: number;
snippet: string;
metadata: Record<string, string>;
}>;
filters_applied: Record<string, string>;
session_id: string;
}
async function queryRAG(
query: string,
sessionId?: string
): Promise<ChatResponse> {
const response = await fetch(`${API_URL}/chat`, {
method: 'POST',
headers: {
'Authorization': `Bearer ${token}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
query,
session_id: sessionId,
}),
});
if (!response.ok) {
throw new Error(`API error: ${response.status}`);
}
return response.json();
}
// Example usage
const result = await queryRAG('What did we discuss with Acme?');
console.log(result.answer);
cURL
curl -X POST "https://your-api-id.execute-api.us-east-1.amazonaws.com/chat" \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"query": "What did we discuss with Acme about pricing?",
"max_results": 5
}'
Changelog
| Date | Change |
|---|---|
| 2026-01-01 | Updated for client-only filtering (removed project filter) |
| 2025-12-31 | Added clarification response format |
| 2025-12-29 | Added reranking for improved relevance |
| 2025-12-28 | Initial API documentation |
Last updated: 2026-01-01