# API Architecture — Agent + Skill + Graph Pipeline This document explains how the API routes user messages through the agent / skill / LangGraph pipeline to produce responses. --- ## Overview ``` ┌─────────────────────────────────────────────────────────────────┐ │ OpenWebUI / Client │ │ POST /v1/chat/completions { model, messages, stream } │ └──────────────────────────────┬──────────────────────────────────┘ │ ▼ ┌──────────────────────────────────────────────────────────────────┐ │ api/v1/chat.py — chat_completions() │ │ │ │ 1. _resolve_agent(req.model) → Agent │ │ 2. get_agent_graph(agent_id) → compiled StateGraph │ │ 3. graph.ainvoke(state) or _stream_graph(graph, messages) │ └──────────────────────────────┬───────────────────────────────────┘ │ ▼ ┌──────────────────────────────────────────────────────────────────┐ │ LangGraph StateGraph (core/graph.py) │ │ │ │ ┌──────────────┐ tool_calls? ┌──────────────┐ │ │ │ agent_node │ ───────────────▶ │ tool_node │ │ │ │ (LLM call) │ ◀─────────────── │ (skill exec) │ │ │ └──────┬───────┘ └──────────────┘ │ │ │ no tool_calls │ │ ▼ │ │ [END] │ └──────────────────────────────────────────────────────────────────┘ ## Key Concepts ### 1. Agent An **Agent** is a persona + skill bundle. Defined in `agents/`. ```python # agents/media_agent.py Agent( agent_id="media-agent", description="Media assistant with Seerr integration", skills=["media_info", "seerr", "triage"], base_prompt="You are a media assistant...", ) ``` - `agent_id` — unique name, exposed as a model in OpenWebUI - `skills` — list of skill names to load - `base_prompt` — starting system prompt, combined with skill fragments - `build_system_prompt()` — merges base_prompt + all skill prompt fragments Agents self-register at import time via `agents/__init__.py`'s `register()`. `main.py` calls `load_all_agents()` at startup to import every agent and skill module. ### 2. Skill A **Skill** is a capability bundle. Defined in `skills/`. ```python # skills/seerr.py Skill( name="seerr", description="Seerr integration — trending, discover, request media, submit issues", prompt_fragment="## Seerr Media Tools\n...", tools=[...], # OpenAI function-calling schema execute=_execute, # async handler: tool_name + args → ToolResult ) ``` - `prompt_fragment` — injected into the agent's system prompt. - `tools` — list of OpenAI function definitions (name, description, parameters). - `execute` — async callable that routes tool calls to API handlers. ### 3. Graph Each agent gets a **compiled LangGraph StateGraph** built by `core/graph.py:create_agent_graph()`. The graph is compiled lazily on the first request and cached on `app.state.agent_graphs` for the lifetime of the process. | Graph node / edge | What it does | |---|---| | `agent_node` | Converts state messages to OpenAI dicts, calls the LLM with the agent's system prompt + tool definitions, returns an `AIMessage` | | `tool_node` | Reads `tool_calls` from the last AI message, calls `execute_tool()` from the skill system, returns `ToolMessage` results | | `_should_continue` | Conditional edge — returns `"tool_node"` if the AI message has `tool_calls`, else `END` | ### 4. State Defined in `core/state.py`: ```python class AgentState(TypedDict): messages: Annotated[list, add_messages] ``` LangGraph's `add_messages` reducer appends new messages and replaces messages with matching IDs (so tool-call results overwrite their placeholders). ### 5. Message Conversion Because we use the raw `openai` client (not `langchain-openai`), messages must be converted between LangChain and OpenAI formats at every LLM call: - **LangChain → OpenAI** (`_lc_role_to_openai`, `_langchain_tc_to_openai`): Maps `type` → `role` and converts top-level `name`/`args` tool-calls into the nested `function` sub-object that the OpenAI API expects. - **OpenAI → LangChain** (inside `agent_node`): Converts the `ChatCompletionMessage` response into an `AIMessage` with LangChain-format `tool_calls` (top-level `name`/`args`/`id`). --- ## Full Request Flow ### Step-by-step: "What are trending movies?" ``` 1. OpenWebUI sends: POST /v1/chat/completions { "model": "media-agent", "messages": [ {"role": "user", "content": "What are trending movies?"} ], "stream": false } 2. chat_completions(): → _resolve_agent(model="media-agent") → get_agent("media-agent") → Agent(skills=["media_info", "seerr", "triage"]) → get_agent_graph("media-agent", request) → looks up app.state.agent_graphs["media-agent"] → first call → create_agent_graph() compiles the graph with 7 Seerr tools → run_agent_with_tools(request, messages, agent_id) → _invoke_graph(graph, messages) 3. Graph — Pass 1 (agent_node): → LLM receives: [system prompt] + [user: "What are trending movies?"] → LLM responds with tool_calls: seerr_trending(kind="movie") → agent_node returns AIMessage with tool_calls in LangChain format 4. Graph — _should_continue: → AIMessage has tool_calls → route to "tool_node" 5. Graph — tool_node: → Reads tool_call: name="seerr_trending", args={"kind": "movie"} → execute_tool(["media_info", "seerr", "triage"], "seerr_trending", ...) → Seerr API → GET /api/v1/discover/trending?mediaType=movie → Returns ToolMessage with formatted results including [tmdb:IDs] 6. Graph — Pass 2 (agent_node): → LLM receives previous exchange + tool result → LLM responds with text only (no tool_calls) → agent_node returns AIMessage(content="Here are the top trending movies!...") 7. Graph — _should_continue: → No tool_calls → route to END 8. chat_completions() returns: { "choices": [{"message": {"role": "assistant", "content": "Here are the top..."}}] } ``` ### Step-by-step: "Request the 2026 one" (multi-turn context) ``` 1. OpenWebUI sends the FULL history: { "model": "media-agent", "messages": [ {"role": "user", "content": "What are trending movies?"}, {"role": "assistant", "content": "Here are the top 10 trending movies! 1. **Mortal Kombat II** (2026) [tmdb:931285] — ..."}, {"role": "user", "content": "could request the mortal kombat one?"}, {"role": "assistant", "content": "There are several Mortal Kombat entries! ..."}, {"role": "user", "content": "the 2026 one"} ] } 2. chat_completions(): → req.messages contains the ENTIRE conversation history → graph.ainvoke({"messages": all_messages}) → agent_node prepends system prompt and sends everything to the LLM 3. LLM reasons from full context: - Previously listed Mortal Kombat II (2026) with [tmdb:931285] - The user said "request the mortal kombat one" → I searched and showed 4 options - Now they say "the 2026 one" → that matches Mortal Kombat II (2026) [tmdb:931285] - I should call seerr_request_media(kind="movie", title="Mortal Kombat II", tmdb_id=931285) 4. tool_node executes the request → ✅ Success ``` --- ## Streaming Streaming works slightly differently from the sync path: ``` chat_completions(stream=True) → _stream_graph(graph, messages) → graph.ainvoke(state) # runs graph to completion (tools execute silently) → yields content character-by-character via SSE ``` For true token-level streaming (tokens appear as the LLM generates them), the agent_node would need to use `langchain-openai`'s `ChatOpenAI` instead of the raw `openai` client. The current approach is a pragmatic middle ground that avoids adding another dependency while still giving the SSE client incremental output. --- ## File Map | File | Responsibility | |---|---| | `main.py` | FastAPI app, singleton creation, router mounting | | `api/v1/chat.py` | Endpoints — resolves agent, invokes graph, formats responses | | `api/dependencies.py` | `get_llm_client()`, `get_agent_graph()` — FastAPI `Depends` | | `core/graph.py` | `create_agent_graph()` — builds the StateGraph | | `core/state.py` | `AgentState` TypedDict | | `core/llm.py` | `create_client()` — OpenAI client factory | | `core/config.py` | Environment variable loader | | `agents/` | Agent definitions (dataclass + self-registration) | | `skills/` | Skill definitions (prompt fragments + tools + executors) |