3.4 KiB
V1 — Chat & Agent API Endpoints
This is the primary HTTP API surface for the chatbot agent system. It exposes
both a custom streaming chat endpoint and an OpenAI-compatible
/chat/completions endpoint so it works as a drop-in backend for OpenWebUI,
LibreChat, or any OpenAI-compatible client.
Endpoints
| Method | Path | Description |
|---|---|---|
GET |
/v1/ |
Health check — returns {"status": "ok"} |
GET |
/v1/agents |
List all registered agents (id + description) |
GET |
/v1/models |
OpenAI-compatible model list (one entry per agent) |
POST |
/v1/chat |
Chat with an agent — streaming (SSE) |
POST |
/v1/chat/sync |
Chat with an agent — non-streaming |
POST |
/v1/chat/completions |
OpenAI-compatible chat completions (supports stream: true) |
All /v1/* endpoints are mounted by main.py via:
app.include_router(v1_router, prefix="/v1")
Agent Resolution
Each request can target a specific agent. The resolution order is:
- Explicit
agent_idfield in the request body - OpenAI
modelfield (OpenWebUI sends this — mapped toagent_idif a matching agent is registered) - Fallback to the
"naked"agent (a plain LLM with no tools)
This means an OpenWebUI client can simply set model: "media-agent" and get
the full Media Agent with Seerr tools.
Request Flow
Client (OpenWebUI / HTTP)
│ POST /v1/chat/completions
│ { model: "media-agent", messages: [...], stream: true/false }
▼
chat_completions()
│ 1. _resolve_agent(req.model) → Agent(id="media-agent", skills=[...])
│ 2. get_agent_graph("media-agent", request)
│ → lazy-compiled LangGraph StateGraph, cached on app.state
│ 3. stream=True → _stream_graph(graph, messages) → SSE token stream
│ stream=False → _invoke_graph(graph, messages) → plain response
▼
LangGraph StateGraph (src/graph.py)
│
├── agent_node: calls LLM with system prompt + tool definitions
│ └── LLM returns text OR tool_calls
│
├── _should_continue: if tool_calls → tool_node, else → END
│
└── tool_node: executes tool via agents/skills system → ToolMessage
└── loops back to agent_node with the result
For a detailed walkthrough, see api.md.
Streaming
Two streaming modes exist:
SSE (Server-Sent Events) — /v1/chat
data: {"token": "Here"}
data: {"token": " are"}
data: {"token": " the"}
...
data: [DONE]
The graph runs to completion (tools execute silently), then the final text is yielded token-by-token as SSE events.
OpenAI-compatible — /v1/chat/completions with stream: true
data: {"id":"...","object":"chat.completion.chunk","choices":[{"delta":{"content":"Hello"}}]}
data: {"id":"...","object":"chat.completion.chunk","choices":[{"delta":{"content":"!"}}]}
data: [DONE]
Future improvement: true token-level streaming (tokens appear as the LLM generates them) would require using
langchain-openai'sChatOpenAIin place of the rawopenaiclient. The current approach avoids adding that dependency.
Dependencies
Endpoints receive shared singletons via FastAPI Depends:
get_llm_client(request)→ returnsrequest.app.state.llm_client(OpenAI client singleton, created once inmain.py)get_agent_graph(agent_id, request)→ returns a lazy-compiled LangGraph fromrequest.app.state.agent_graphs