API Reference

Complete API documentation for the NorthBuilt RAG System.

Table of Contents

  1. Overview
  2. Authentication
    1. Obtaining a Token
    2. Using the Token
    3. Token Expiration
  3. Endpoints
    1. POST /chat
    2. Webhook Endpoints
      1. POST /webhooks/fathom
      2. POST /webhooks/helpscout
      3. POST /webhooks/linear
  4. Rate Limits
  5. Error Codes
  6. Query Understanding
    1. How It Works
    2. Supported Patterns
    3. Confidence and Clarification
  7. Best Practices
    1. Query Formulation
    2. Session Management
    3. Error Handling
  8. SDK Examples
    1. Python
    2. JavaScript/TypeScript
    3. cURL
  9. Changelog

Overview

The NorthBuilt RAG System exposes a REST API via AWS API Gateway. All endpoints require JWT authentication via AWS Cognito.

Base URL: https://{api-id}.execute-api.us-east-1.amazonaws.com


Authentication

All API requests require a valid JWT token from AWS Cognito in the Authorization header.

Obtaining a Token

Users authenticate via Google OAuth through the Cognito Hosted UI:

  1. Navigate to the web application
  2. Click “Sign in with Google”
  3. Complete Google OAuth flow
  4. Token is stored in browser and included in API requests automatically

For programmatic access, use the Cognito SDK:

import boto3

cognito = boto3.client('cognito-idp', region_name='us-east-1')
response = cognito.initiate_auth(
    ClientId='your-client-id',
    AuthFlow='USER_PASSWORD_AUTH',
    AuthParameters={
        'USERNAME': 'user@example.com',
        'PASSWORD': 'password'
    }
)
token = response['AuthenticationResult']['IdToken']

Using the Token

Include the JWT in the Authorization header:

Authorization: Bearer eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9...

Token Expiration

  • ID Token: 1 hour
  • Refresh Token: 30 days

When tokens expire, the web application automatically refreshes them using the refresh token.


Endpoints

POST /chat

Query the RAG system with natural language. The system automatically extracts client context from the query and applies metadata filtering.

URL: POST /chat

Headers: | Header | Required | Description | |——–|———-|————-| | Authorization | Yes | Bearer token from Cognito | | Content-Type | Yes | application/json |

Request Body:

{
  "query": "What did we discuss with Acme about pricing?",
  "session_id": "optional-session-id",
  "client": "Optional explicit client filter",
  "max_results": 5
}
Field Type Required Default Description
query string Yes - Natural language question
session_id string No auto-generated Session ID for conversation history
client string No extracted from query Explicit client filter
max_results integer No 5 Number of documents to retrieve (1-20)

Response (Success):

{
  "query": "What did we discuss with Acme about pricing?",
  "answer": "Based on the meeting notes from December 15th, the discussion with Acme Corporation covered several pricing topics...",
  "sources": [
    {
      "document_number": 1,
      "relevance_score": 0.92,
      "snippet": "Meeting with Acme - we discussed the enterprise pricing tier...",
      "metadata": {
        "source": "fathom",
        "client": "Acme Corporation",
        "project": "Enterprise Deal",
        "title": "Acme Pricing Discussion",
        "date": "2024-12-15"
      }
    },
    {
      "document_number": 2,
      "relevance_score": 0.85,
      "snippet": "Follow-up email regarding pricing structure...",
      "metadata": {
        "source": "helpscout",
        "client": "Acme Corporation",
        "project": "Support",
        "title": "RE: Pricing Questions"
      }
    }
  ],
  "filters_applied": {
    "client": "Acme Corporation"
  },
  "session_id": "sess_abc123"
}
Field Type Description
query string Original query (echoed back)
answer string Generated response grounded in retrieved documents
sources array List of source documents used for the answer
sources[].document_number integer Document number for citation
sources[].relevance_score float Relevance score (0.0-1.0)
sources[].snippet string Relevant excerpt from the document
sources[].metadata object Document metadata
filters_applied object Metadata filters that were applied
session_id string Session ID for follow-up queries

Response (Clarification Needed):

When the system cannot confidently determine the client from the query, it returns a clarification request:

{
  "type": "clarification_needed",
  "message": "I found multiple clients that match 'Acme'. Which one did you mean?",
  "options": [
    {
      "id": "Acme Corporation",
      "display_name": "Acme Corporation",
      "entity_type": "client"
    },
    {
      "id": "Acme Labs",
      "display_name": "Acme Labs",
      "entity_type": "client"
    }
  ],
  "original_query": "What did Acme say about pricing?"
}

Error Responses:

Status Code Description
400 BAD_REQUEST Invalid request body or missing required fields
401 UNAUTHORIZED Missing or invalid JWT token
429 TOO_MANY_REQUESTS Rate limit exceeded
500 INTERNAL_ERROR Server error (check CloudWatch logs)

Example Error:

{
  "error": "BAD_REQUEST",
  "message": "Query is required"
}

Webhook Endpoints

Webhook endpoints receive data from external services. These are internal endpoints secured by API key validation.

POST /webhooks/fathom

Receives video transcript webhooks from Fathom.

Authentication: API key in request header or body (configured in Fathom)

Payload: Fathom webhook format (see Fathom documentation)

POST /webhooks/helpscout

Receives conversation webhooks from HelpScout.

Authentication: API key validation (configured in HelpScout)

Payload: HelpScout webhook format (see HelpScout documentation)

POST /webhooks/linear

Receives team and project webhooks from Linear. Used to sync the entity registry for query understanding.

Authentication: Webhook signature validation

Payload: Linear webhook format (see Linear documentation)


Rate Limits

The API enforces rate limits to ensure fair usage:

Limit Value
Rate 10 requests/second
Burst 20 requests

When rate limited, the API returns HTTP 429:

{
  "error": "TOO_MANY_REQUESTS",
  "message": "Rate limit exceeded. Please wait and try again.",
  "retry_after": 1
}

Error Codes

HTTP Status Error Code Description Resolution
200 - Success N/A
400 BAD_REQUEST Invalid request Check request body format
401 UNAUTHORIZED Authentication failed Refresh JWT token
403 FORBIDDEN Access denied Check user permissions
429 TOO_MANY_REQUESTS Rate limited Wait and retry
500 INTERNAL_ERROR Server error Check CloudWatch logs
503 SERVICE_UNAVAILABLE Bedrock/KB unavailable Retry after delay

Query Understanding

The system automatically extracts client context from natural language queries. This enables users to ask questions naturally without needing to select filters from dropdowns.

How It Works

  1. User submits query: “What did we discuss with Acme?”
  2. System extracts client mention: “Acme”
  3. System matches to known client: “Acme Corporation”
  4. System applies filter and retrieves relevant documents

Supported Patterns

The query understanding recognizes various ways users might mention clients:

  • Direct name: “What about Acme Corporation?”
  • Abbreviation: “What did Acme say?”
  • Aliases: “Tell me about AC” (if “AC” is registered as an alias)

Confidence and Clarification

When the system has low confidence or finds ambiguous matches, it returns a clarification request rather than potentially incorrect results.


Best Practices

Query Formulation

  • Be specific: “What was discussed in the December meeting with Acme?” is better than “meetings”
  • Include context: Mention client names, project names, or dates when relevant
  • Ask one question at a time: Complex multi-part questions may yield less focused answers

Session Management

  • Use consistent session_id for follow-up questions in a conversation
  • Start a new session for unrelated queries
  • Sessions maintain context for 24 hours

Error Handling

  • Implement exponential backoff for 429 and 503 errors
  • Cache responses when appropriate to reduce API calls
  • Log correlation IDs from responses for debugging

SDK Examples

Python

import requests

API_URL = "https://your-api-id.execute-api.us-east-1.amazonaws.com"
TOKEN = "your-jwt-token"

def query_rag(query: str, session_id: str = None) -> dict:
    """Query the RAG system."""
    response = requests.post(
        f"{API_URL}/chat",
        headers={
            "Authorization": f"Bearer {TOKEN}",
            "Content-Type": "application/json"
        },
        json={
            "query": query,
            "session_id": session_id
        }
    )
    response.raise_for_status()
    return response.json()

# Example usage
result = query_rag("What did we discuss with Acme about pricing?")
print(result["answer"])
for source in result["sources"]:
    print(f"[{source['document_number']}] {source['snippet'][:100]}...")

JavaScript/TypeScript

interface ChatResponse {
  query: string;
  answer: string;
  sources: Array<{
    document_number: number;
    relevance_score: number;
    snippet: string;
    metadata: Record<string, string>;
  }>;
  filters_applied: Record<string, string>;
  session_id: string;
}

async function queryRAG(
  query: string,
  sessionId?: string
): Promise<ChatResponse> {
  const response = await fetch(`${API_URL}/chat`, {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${token}`,
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({
      query,
      session_id: sessionId,
    }),
  });

  if (!response.ok) {
    throw new Error(`API error: ${response.status}`);
  }

  return response.json();
}

// Example usage
const result = await queryRAG('What did we discuss with Acme?');
console.log(result.answer);

cURL

curl -X POST "https://your-api-id.execute-api.us-east-1.amazonaws.com/chat" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What did we discuss with Acme about pricing?",
    "max_results": 5
  }'

Changelog

Date Change
2026-01-01 Updated for client-only filtering (removed project filter)
2025-12-31 Added clarification response format
2025-12-29 Added reranking for improved relevance
2025-12-28 Initial API documentation

Last updated: 2026-01-01