Files

3.4 KiB

V1 — Chat & Agent API Endpoints

This is the primary HTTP API surface for the chatbot agent system. It exposes both a custom streaming chat endpoint and an OpenAI-compatible /chat/completions endpoint so it works as a drop-in backend for OpenWebUI, LibreChat, or any OpenAI-compatible client.


Endpoints

Method Path Description
GET /v1/ Health check — returns {"status": "ok"}
GET /v1/agents List all registered agents (id + description)
GET /v1/models OpenAI-compatible model list (one entry per agent)
POST /v1/chat Chat with an agent — streaming (SSE)
POST /v1/chat/sync Chat with an agent — non-streaming
POST /v1/chat/completions OpenAI-compatible chat completions (supports stream: true)

All /v1/* endpoints are mounted by main.py via:

app.include_router(v1_router, prefix="/v1")

Agent Resolution

Each request can target a specific agent. The resolution order is:

  1. Explicit agent_id field in the request body
  2. OpenAI model field (OpenWebUI sends this — mapped to agent_id if a matching agent is registered)
  3. Fallback to the "naked" agent (a plain LLM with no tools)

This means an OpenWebUI client can simply set model: "media-agent" and get the full Media Agent with Seerr tools.


Request Flow

Client (OpenWebUI / HTTP)
  │  POST /v1/chat/completions
  │  { model: "media-agent", messages: [...], stream: true/false }
  ▼
chat_completions()
  │  1. _resolve_agent(req.model) → Agent(id="media-agent", skills=[...])
  │  2. get_agent_graph("media-agent", request)
  │     → lazy-compiled LangGraph StateGraph, cached on app.state
  │  3. stream=True  → _stream_graph(graph, messages)  → SSE token stream
  │     stream=False → _invoke_graph(graph, messages)   → plain response
  ▼
LangGraph StateGraph  (src/graph.py)
  │
  ├── agent_node: calls LLM with system prompt + tool definitions
  │   └── LLM returns text OR tool_calls
  │
  ├── _should_continue: if tool_calls → tool_node, else → END
  │
  └── tool_node: executes tool via agents/skills system → ToolMessage
      └── loops back to agent_node with the result

For a detailed walkthrough, see api.md.


Streaming

Two streaming modes exist:

SSE (Server-Sent Events) — /v1/chat

data: {"token": "Here"}
data: {"token": " are"}
data: {"token": " the"}
...
data: [DONE]

The graph runs to completion (tools execute silently), then the final text is yielded token-by-token as SSE events.

OpenAI-compatible — /v1/chat/completions with stream: true

data: {"id":"...","object":"chat.completion.chunk","choices":[{"delta":{"content":"Hello"}}]}
data: {"id":"...","object":"chat.completion.chunk","choices":[{"delta":{"content":"!"}}]}
data: [DONE]

Future improvement: true token-level streaming (tokens appear as the LLM generates them) would require using langchain-openai's ChatOpenAI in place of the raw openai client. The current approach avoids adding that dependency.


Dependencies

Endpoints receive shared singletons via FastAPI Depends:

  • get_llm_client(request) → returns request.app.state.llm_client (OpenAI client singleton, created once in main.py)
  • get_agent_graph(agent_id, request) → returns a lazy-compiled LangGraph from request.app.state.agent_graphs