9.9 KiB
API Architecture — Agent + Skill + Tool Pipeline
This document explains how the API routes user messages through the agent/skill/tool pipeline to produce responses.
Overview
┌─────────────────────────────────────────────────────────────────┐
│ OpenWebUI / Client │
│ POST /v1/chat/completions { model, messages, stream } │
└──────────────────────────────┬──────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────────┐
│ api/v1/chat.py — chat_completions() │
│ │
│ 1. _resolve_agent(req.model) → Agent │
│ 2. agent.build_system_prompt() → system prompt │
│ 3. Build full_messages = [system] + req.messages │
│ 4. run_agent_with_tools(client, messages, agent_id) │
└──────────────────────────────┬───────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────────┐
│ Tool-Calling Loop (run_agent_with_tools / run_agent_stream) │
│ │
│ while turns < max_turns: │
│ response = LLM.chat(messages, tools=agent_tools) │
│ if response has tool_calls: │
│ for each tool_call: │
│ result = execute_tool(skills, name, args) │
│ append result to messages │
│ else: │
│ return response.text (stream tokens if streaming) │
└──────────────────────────────────────────────────────────────────┘
Key Concepts
1. Agent
An Agent is a persona + skill bundle. Defined in agents/.
# agents/media_agent.py
Agent(
agent_id="media-agent",
description="Media assistant with Seerr integration",
skills=["media_info", "seerr", "triage"],
base_prompt="You are a media assistant...",
)
agent_id— unique name, exposed as a model in OpenWebUIskills— list of skill names to loadbase_prompt— starting system prompt, combined with skill fragmentsbuild_system_prompt()— merges base_prompt + all skill prompt fragments
Agents self-register at import time via agents/__init__.py's register().
main.py calls load_all_agents() at startup to import all agent/skill modules.
2. Skill
A Skill is a capability bundle. Defined in skills/.
# skills/seerr.py
Skill(
name="seerr",
description="Seerr integration — trending, discover, request media, submit issues",
prompt_fragment="## Seerr Media Tools\n...",
tools=[...], # OpenAI function-calling schema
execute=_execute, # async handler: tool_name + args → ToolResult
)
prompt_fragment— injected into the agent's system prompt. Teaches the LLM what tools are available and when to use them.tools— list of OpenAI function definitions (name, description, parameters).execute— async callable that routes tool calls to API handlers.
3. Tool
A Tool is a single function the LLM can call. Defined as part of a skill's tools list.
{
"type": "function",
"function": {
"name": "seerr_trending",
"description": "Get trending movies and TV shows from Seerr...",
"parameters": {
"type": "object",
"properties": {
"kind": {"type": "string", "enum": ["movie", "tv", "all"]},
"language": {"type": "string"},
},
"required": ["kind"],
},
},
}
When the LLM responds with a tool call, the loop:
- Extracts
function.name(e.g."seerr_trending") andfunction.arguments(e.g.{"kind": "movie"}) - Calls
execute_tool(agent.skills, name, args)which finds the owning skill and runs it - Appends the result text to the message history
- Sends back to the LLM for a follow-up response
Full Request Flow
Step-by-step: "What are trending movies?"
1. OpenWebUI sends:
POST /v1/chat/completions
{
"model": "media-agent",
"messages": [
{"role": "user", "content": "What are trending movies?"}
],
"stream": false
}
2. chat_completions():
→ _resolve_agent(model="media-agent")
→ get_agent("media-agent") → Agent(skills=["media_info", "seerr", "triage"])
→ tools = get_all_tools(["media_info", "seerr", "triage"])
→ Returns 7 tool definitions from seerr.py
→ system_prompt = agent.build_system_prompt()
→ base_prompt + media_info fragment + seerr fragment + triage fragment
3. run_agent_with_tools() — Turn 1:
→ LLM receives: [system prompt with tools] + [user: "What are trending movies?"]
→ LLM responds: tool_calls = [{"function": {"name": "seerr_trending", "arguments": {"kind": "movie"}}}]
4. Execute tool:
→ execute_tool(["media_info", "seerr", "triage"], "seerr_trending", {"kind": "movie"})
→ Finds seerr skill → calls _execute("seerr_trending", ...) → _trending(args)
→ GET /api/v1/discover/trending?mediaType=movie
→ Returns formatted list with [tmdb:IDs]
5. run_agent_with_tools() — Turn 2:
→ LLM receives: previous messages + [tool: "Found 20 trending movies..."]
→ LLM responds: text = "Here are the top trending movies! 🎬 ..."
→ finish_reason="stop" → return the text
6. chat_completions() returns:
{ "choices": [{"message": {"content": "Here are the top trending movies!..."}}] }
Step-by-step: "Request the 2026 one" (multi-turn context)
1. OpenWebUI sends the FULL history:
{
"model": "media-agent",
"messages": [
{"role": "user", "content": "What are trending movies?"},
{"role": "assistant", "content": "Here are the top 10 trending movies!
1. **Mortal Kombat II** (2026) [tmdb:931285] — ..."},
{"role": "user", "content": "could request the mortal kombat one?"},
{"role": "assistant", "content": "There are several Mortal Kombat entries! ..."},
{"role": "user", "content": "the 2026 one"}
]
}
2. chat_completions():
→ req.messages contains the ENTIRE conversation history
→ System prompt prepended → full_messages = [system] + 5 history messages
→ LLM sees everything: the trending list with [tmdb:931285], the disambiguation, "the 2026 one"
3. LLM reasons:
- I previously listed Mortal Kombat II (2026) with [tmdb:931285]
- The user said "request the mortal kombat one" → I searched and showed 4 options
- Now they say "the 2026 one" → that matches Mortal Kombat II (2026) [tmdb:931285]
- I should call seerr_request_media(kind="movie", title="Mortal Kombat II", tmdb_id=931285)
4. Tool executes the request → ✅ Success
File Map
main.py # FastAPI app entry point, creates singletons
├── core/
│ ├── config.py # .env loader, config constants
│ └── llm.py # create_client() factory for OpenAI client
├── api/
│ ├── dependencies.py # FastAPI Depends: get_llm_client()
│ └── v1/
│ └── chat.py # APIRouter, endpoints, tool-calling loop
├── agents/
│ ├── __init__.py # Agent dataclass, registry, load_all_agents()
│ ├── naked.py # Agent: barebone LLM, no skills
│ └── media_agent.py # Agent: media assistant with Seerr skills
└── skills/
├── __init__.py # Skill dataclass, ToolResult, registry, execution
├── media_info.py # Skill: base media assistant persona (prompt-only)
├── seerr.py # Skill: Seerr API tools (7 tools, real API calls)
└── triage.py # Skill: fallback for unsupported actions (prompt-only)
Key Design Decisions
-
Full multi-turn history:
req.messagespasses through unchanged. The LLM has access to its own previous responses (including[tmdb:IDs]). No external state management needed. -
No deterministic pre-processing: No affirmation detectors, reference resolvers, or hardcoded rules. The LLM interprets user intent naturally from full conversation context.
-
Agent selection via
modelfield: OpenWebUI sendsmodelin the request._resolve_agent()maps it to a registered agent. The/v1/modelsendpoint lists all agents as selectable models. -
Skills = prompts + tools: Skills inject prompt fragments AND optionally expose OpenAI function-calling tools. Prompt-only skills (like
triage) just shape behavior. Tool-enabled skills (likeseerr) let the LLM take real actions. -
Singleton LLM client: Created once in
main.py, stored onapp.state.llm_client, accessed via FastAPIDepends(get_llm_client).