# V1 — Chat & Agent API Endpoints This is the primary HTTP API surface for the chatbot agent system. It exposes both a custom streaming chat endpoint and an OpenAI-compatible `/chat/completions` endpoint so it works as a drop-in backend for OpenWebUI, LibreChat, or any OpenAI-compatible client. --- ## Endpoints | Method | Path | Description | |---|---|---| | `GET ` | `/v1/` | Health check — returns `{"status": "ok"}` | | `GET ` | `/v1/agents` | List all registered agents (id + description) | | `GET ` | `/v1/models` | OpenAI-compatible model list (one entry per agent) | | `POST` | `/v1/chat` | Chat with an agent — streaming (SSE) | | `POST` | `/v1/chat/sync` | Chat with an agent — non-streaming | | `POST` | `/v1/chat/completions` | OpenAI-compatible chat completions (supports `stream: true`) | All `/v1/*` endpoints are mounted by `main.py` via: ```python app.include_router(v1_router, prefix="/v1") ``` --- ## Agent Resolution Each request can target a specific agent. The resolution order is: 1. **Explicit `agent_id`** field in the request body 2. **OpenAI `model` field** (OpenWebUI sends this — mapped to `agent_id` if a matching agent is registered) 3. **Fallback** to the `"naked"` agent (a plain LLM with no tools) This means an OpenWebUI client can simply set `model: "media-agent"` and get the full Media Agent with Seerr tools. --- ## Request Flow ``` Client (OpenWebUI / HTTP) │ POST /v1/chat/completions │ { model: "media-agent", messages: [...], stream: true/false } ▼ chat_completions() │ 1. _resolve_agent(req.model) → Agent(id="media-agent", skills=[...]) │ 2. get_agent_graph("media-agent", request) │ → lazy-compiled LangGraph StateGraph, cached on app.state │ 3. stream=True → _stream_graph(graph, messages) → SSE token stream │ stream=False → _invoke_graph(graph, messages) → plain response ▼ LangGraph StateGraph (src/graph.py) │ ├── agent_node: calls LLM with system prompt + tool definitions │ └── LLM returns text OR tool_calls │ ├── _should_continue: if tool_calls → tool_node, else → END │ └── tool_node: executes tool via agents/skills system → ToolMessage └── loops back to agent_node with the result ``` For a detailed walkthrough, see [api.md](../api.md). --- ## Streaming Two streaming modes exist: ### SSE (Server-Sent Events) — `/v1/chat` ``` data: {"token": "Here"} data: {"token": " are"} data: {"token": " the"} ... data: [DONE] ``` The graph runs to completion (tools execute silently), then the final text is yielded token-by-token as SSE events. ### OpenAI-compatible — `/v1/chat/completions` with `stream: true` ``` data: {"id":"...","object":"chat.completion.chunk","choices":[{"delta":{"content":"Hello"}}]} data: {"id":"...","object":"chat.completion.chunk","choices":[{"delta":{"content":"!"}}]} data: [DONE] ``` > **Future improvement:** true token-level streaming (tokens appear as the LLM > generates them) would require using `langchain-openai`'s `ChatOpenAI` in > place of the raw `openai` client. The current approach avoids adding that > dependency. --- ## Dependencies Endpoints receive shared singletons via FastAPI `Depends`: - **`get_llm_client(request)`** → returns `request.app.state.llm_client` (OpenAI client singleton, created once in `main.py`) - **`get_agent_graph(agent_id, request)`** → returns a lazy-compiled LangGraph from `request.app.state.agent_graphs`