V1 — Chat & Agent API Endpoints

This is the primary HTTP API surface for the chatbot agent system. It exposes both a custom streaming chat endpoint and an OpenAI-compatible /chat/completions endpoint so it works as a drop-in backend for OpenWebUI, LibreChat, or any OpenAI-compatible client.

Endpoints

Method	Path	Description
`GET`	`/v1/`	Health check — returns `{"status": "ok"}`
`GET`	`/v1/agents`	List all registered agents (id + description)
`GET`	`/v1/models`	OpenAI-compatible model list (one entry per agent)
`POST`	`/v1/chat`	Chat with an agent — streaming (SSE)
`POST`	`/v1/chat/sync`	Chat with an agent — non-streaming
`POST`	`/v1/chat/completions`	OpenAI-compatible chat completions (supports `stream: true`)

All /v1/* endpoints are mounted by main.py via:

app.include_router(v1_router, prefix="/v1")

Agent Resolution

Each request can target a specific agent. The resolution order is:

Explicit agent_id field in the request body
OpenAI model field (OpenWebUI sends this — mapped to agent_id if a matching agent is registered)
Fallback to the "naked" agent (a plain LLM with no tools)

This means an OpenWebUI client can simply set model: "media-agent" and get the full Media Agent with Seerr tools.

Request Flow

Client (OpenWebUI / HTTP)
  │  POST /v1/chat/completions
  │  { model: "media-agent", messages: [...], stream: true/false }
  ▼
chat_completions()
  │  1. _resolve_agent(req.model) → Agent(id="media-agent", skills=[...])
  │  2. get_agent_graph("media-agent", request)
  │     → lazy-compiled LangGraph StateGraph, cached on app.state
  │  3. stream=True  → _stream_graph(graph, messages)  → SSE token stream
  │     stream=False → _invoke_graph(graph, messages)   → plain response
  ▼
LangGraph StateGraph  (src/graph.py)
  │
  ├── agent_node: calls LLM with system prompt + tool definitions
  │   └── LLM returns text OR tool_calls
  │
  ├── _should_continue: if tool_calls → tool_node, else → END
  │
  └── tool_node: executes tool via agents/skills system → ToolMessage
      └── loops back to agent_node with the result

For a detailed walkthrough, see api.md.

Streaming

Two streaming modes exist:

SSE (Server-Sent Events) — `/v1/chat`

data: {"token": "Here"}
data: {"token": " are"}
data: {"token": " the"}
...
data: [DONE]

The graph runs to completion (tools execute silently), then the final text is yielded token-by-token as SSE events.

OpenAI-compatible — `/v1/chat/completions` with `stream: true`

data: {"id":"...","object":"chat.completion.chunk","choices":[{"delta":{"content":"Hello"}}]}
data: {"id":"...","object":"chat.completion.chunk","choices":[{"delta":{"content":"!"}}]}
data: [DONE]

Future improvement: true token-level streaming (tokens appear as the LLM generates them) would require using langchain-openai's ChatOpenAI in place of the raw openai client. The current approach avoids adding that dependency.

Dependencies

Endpoints receive shared singletons via FastAPI Depends:

get_llm_client(request) → returns request.app.state.llm_client (OpenAI client singleton, created once in main.py)
get_agent_graph(agent_id, request) → returns a lazy-compiled LangGraph from request.app.state.agent_graphs

3.4 KiB Raw Permalink Blame History