9.9 KiB
API Architecture — Agent + Skill + Graph Pipeline
This document explains how the API routes user messages through the agent / skill / LangGraph pipeline to produce responses.
Overview
┌─────────────────────────────────────────────────────────────────┐
│ OpenWebUI / Client │
│ POST /v1/chat/completions { model, messages, stream } │
└──────────────────────────────┬──────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────────┐
│ api/v1/chat.py — chat_completions() │
│ │
│ 1. _resolve_agent(req.model) → Agent │
│ 2. get_agent_graph(agent_id) → compiled StateGraph │
│ 3. graph.ainvoke(state) or _stream_graph(graph, messages) │
└──────────────────────────────┬───────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────────┐
│ LangGraph StateGraph (core/graph.py) │
│ │
│ ┌──────────────┐ tool_calls? ┌──────────────┐ │
│ │ agent_node │ ───────────────▶ │ tool_node │ │
│ │ (LLM call) │ ◀─────────────── │ (skill exec) │ │
│ └──────┬───────┘ └──────────────┘ │
│ │ no tool_calls │
│ ▼ │
│ [END] │
└──────────────────────────────────────────────────────────────────┘
## Key Concepts
### 1. Agent
An **Agent** is a persona + skill bundle. Defined in `agents/`.
```python
# agents/media_agent.py
Agent(
agent_id="media-agent",
description="Media assistant with Seerr integration",
skills=["media_info", "seerr", "triage"],
base_prompt="You are a media assistant...",
)
agent_id— unique name, exposed as a model in OpenWebUIskills— list of skill names to loadbase_prompt— starting system prompt, combined with skill fragmentsbuild_system_prompt()— merges base_prompt + all skill prompt fragments
Agents self-register at import time via agents/__init__.py's register().
main.py calls load_all_agents() at startup to import every agent and skill
module.
2. Skill
A Skill is a capability bundle. Defined in skills/.
# skills/seerr.py
Skill(
name="seerr",
description="Seerr integration — trending, discover, request media, submit issues",
prompt_fragment="## Seerr Media Tools\n...",
tools=[...], # OpenAI function-calling schema
execute=_execute, # async handler: tool_name + args → ToolResult
)
prompt_fragment— injected into the agent's system prompt.tools— list of OpenAI function definitions (name, description, parameters).execute— async callable that routes tool calls to API handlers.
3. Graph
Each agent gets a compiled LangGraph StateGraph built by
core/graph.py:create_agent_graph(). The graph is compiled lazily on the
first request and cached on app.state.agent_graphs for the lifetime of the
process.
| Graph node / edge | What it does |
|---|---|
agent_node |
Converts state messages to OpenAI dicts, calls the LLM with the agent's system prompt + tool definitions, returns an AIMessage |
tool_node |
Reads tool_calls from the last AI message, calls execute_tool() from the skill system, returns ToolMessage results |
_should_continue |
Conditional edge — returns "tool_node" if the AI message has tool_calls, else END |
4. State
Defined in core/state.py:
class AgentState(TypedDict):
messages: Annotated[list, add_messages]
LangGraph's add_messages reducer appends new messages and replaces messages
with matching IDs (so tool-call results overwrite their placeholders).
5. Message Conversion
Because we use the raw openai client (not langchain-openai), messages must
be converted between LangChain and OpenAI formats at every LLM call:
-
LangChain → OpenAI (
_lc_role_to_openai,_langchain_tc_to_openai): Mapstype→roleand converts top-levelname/argstool-calls into the nestedfunctionsub-object that the OpenAI API expects. -
OpenAI → LangChain (inside
agent_node): Converts theChatCompletionMessageresponse into anAIMessagewith LangChain-formattool_calls(top-levelname/args/id).
Full Request Flow
Step-by-step: "What are trending movies?"
1. OpenWebUI sends:
POST /v1/chat/completions
{
"model": "media-agent",
"messages": [
{"role": "user", "content": "What are trending movies?"}
],
"stream": false
}
2. chat_completions():
→ _resolve_agent(model="media-agent")
→ get_agent("media-agent") → Agent(skills=["media_info", "seerr", "triage"])
→ get_agent_graph("media-agent", request)
→ looks up app.state.agent_graphs["media-agent"]
→ first call → create_agent_graph() compiles the graph with 7 Seerr tools
→ run_agent_with_tools(request, messages, agent_id)
→ _invoke_graph(graph, messages)
3. Graph — Pass 1 (agent_node):
→ LLM receives: [system prompt] + [user: "What are trending movies?"]
→ LLM responds with tool_calls: seerr_trending(kind="movie")
→ agent_node returns AIMessage with tool_calls in LangChain format
4. Graph — _should_continue:
→ AIMessage has tool_calls → route to "tool_node"
5. Graph — tool_node:
→ Reads tool_call: name="seerr_trending", args={"kind": "movie"}
→ execute_tool(["media_info", "seerr", "triage"], "seerr_trending", ...)
→ Seerr API → GET /api/v1/discover/trending?mediaType=movie
→ Returns ToolMessage with formatted results including [tmdb:IDs]
6. Graph — Pass 2 (agent_node):
→ LLM receives previous exchange + tool result
→ LLM responds with text only (no tool_calls)
→ agent_node returns AIMessage(content="Here are the top trending movies!...")
7. Graph — _should_continue:
→ No tool_calls → route to END
8. chat_completions() returns:
{ "choices": [{"message": {"role": "assistant", "content": "Here are the top..."}}] }
Step-by-step: "Request the 2026 one" (multi-turn context)
1. OpenWebUI sends the FULL history:
{
"model": "media-agent",
"messages": [
{"role": "user", "content": "What are trending movies?"},
{"role": "assistant", "content": "Here are the top 10 trending movies!
1. **Mortal Kombat II** (2026) [tmdb:931285] — ..."},
{"role": "user", "content": "could request the mortal kombat one?"},
{"role": "assistant", "content": "There are several Mortal Kombat entries! ..."},
{"role": "user", "content": "the 2026 one"}
]
}
2. chat_completions():
→ req.messages contains the ENTIRE conversation history
→ graph.ainvoke({"messages": all_messages})
→ agent_node prepends system prompt and sends everything to the LLM
3. LLM reasons from full context:
- Previously listed Mortal Kombat II (2026) with [tmdb:931285]
- The user said "request the mortal kombat one" → I searched and showed 4 options
- Now they say "the 2026 one" → that matches Mortal Kombat II (2026) [tmdb:931285]
- I should call seerr_request_media(kind="movie", title="Mortal Kombat II", tmdb_id=931285)
4. tool_node executes the request → ✅ Success
Streaming
Streaming works slightly differently from the sync path:
chat_completions(stream=True)
→ _stream_graph(graph, messages)
→ graph.ainvoke(state) # runs graph to completion (tools execute silently)
→ yields content character-by-character via SSE
For true token-level streaming (tokens appear as the LLM generates them),
the agent_node would need to use langchain-openai's ChatOpenAI instead of
the raw openai client. The current approach is a pragmatic middle ground
that avoids adding another dependency while still giving the SSE client
incremental output.
File Map
| File | Responsibility |
|---|---|
main.py |
FastAPI app, singleton creation, router mounting |
api/v1/chat.py |
Endpoints — resolves agent, invokes graph, formats responses |
api/dependencies.py |
get_llm_client(), get_agent_graph() — FastAPI Depends |
core/graph.py |
create_agent_graph() — builds the StateGraph |
core/state.py |
AgentState TypedDict |
core/llm.py |
create_client() — OpenAI client factory |
core/config.py |
Environment variable loader |
agents/ |
Agent definitions (dataclass + self-registration) |
skills/ |
Skill definitions (prompt fragments + tools + executors) |