API Architecture — Agent + Skill + Tool Pipeline

This document explains how the API routes user messages through the agent/skill/tool pipeline to produce responses.

Overview

┌─────────────────────────────────────────────────────────────────┐
│                      OpenWebUI / Client                         │
│  POST /v1/chat/completions  { model, messages, stream }         │
└──────────────────────────────┬──────────────────────────────────┘
                               │
                               ▼
┌──────────────────────────────────────────────────────────────────┐
│  api/v1/chat.py  —  chat_completions()                          │
│                                                                  │
│  1. _resolve_agent(req.model)  →  Agent                          │
│  2. agent.build_system_prompt()  →  system prompt                │
│  3. Build full_messages = [system] + req.messages                │
│  4. run_agent_with_tools(client, messages, agent_id)             │
└──────────────────────────────┬───────────────────────────────────┘
                               │
                               ▼
┌──────────────────────────────────────────────────────────────────┐
│  Tool-Calling Loop  (run_agent_with_tools / run_agent_stream)    │
│                                                                  │
│  while turns < max_turns:                                        │
│    response = LLM.chat(messages, tools=agent_tools)              │
│    if response has tool_calls:                                   │
│      for each tool_call:                                         │
│        result = execute_tool(skills, name, args)                 │
│        append result to messages                                 │
│    else:                                                         │
│      return response.text  (stream tokens if streaming)          │
└──────────────────────────────────────────────────────────────────┘

Key Concepts

1. Agent

An Agent is a persona + skill bundle. Defined in agents/.

# agents/media_agent.py
Agent(
    agent_id="media-agent",
    description="Media assistant with Seerr integration",
    skills=["media_info", "seerr", "triage"],
    base_prompt="You are a media assistant...",
)

agent_id — unique name, exposed as a model in OpenWebUI
skills — list of skill names to load
base_prompt — starting system prompt, combined with skill fragments
build_system_prompt() — merges base_prompt + all skill prompt fragments

Agents self-register at import time via agents/__init__.py's register(). main.py calls load_all_agents() at startup to import all agent/skill modules.

2. Skill

A Skill is a capability bundle. Defined in skills/.

# skills/seerr.py
Skill(
    name="seerr",
    description="Seerr integration — trending, discover, request media, submit issues",
    prompt_fragment="## Seerr Media Tools\n...",
    tools=[...],          # OpenAI function-calling schema
    execute=_execute,     # async handler: tool_name + args → ToolResult
)

prompt_fragment — injected into the agent's system prompt. Teaches the LLM what tools are available and when to use them.
tools — list of OpenAI function definitions (name, description, parameters).
execute — async callable that routes tool calls to API handlers.

3. Tool

A Tool is a single function the LLM can call. Defined as part of a skill's tools list.

{
    "type": "function",
    "function": {
        "name": "seerr_trending",
        "description": "Get trending movies and TV shows from Seerr...",
        "parameters": {
            "type": "object",
            "properties": {
                "kind": {"type": "string", "enum": ["movie", "tv", "all"]},
                "language": {"type": "string"},
            },
            "required": ["kind"],
        },
    },
}

When the LLM responds with a tool call, the loop:

Extracts function.name (e.g. "seerr_trending") and function.arguments (e.g. {"kind": "movie"})
Calls execute_tool(agent.skills, name, args) which finds the owning skill and runs it
Appends the result text to the message history
Sends back to the LLM for a follow-up response

Full Request Flow

1. OpenWebUI sends:
   POST /v1/chat/completions
   {
     "model": "media-agent",
     "messages": [
       {"role": "user", "content": "What are trending movies?"}
     ],
     "stream": false
   }

2. chat_completions():
   → _resolve_agent(model="media-agent")
     → get_agent("media-agent") → Agent(skills=["media_info", "seerr", "triage"])
   → tools = get_all_tools(["media_info", "seerr", "triage"])
     → Returns 7 tool definitions from seerr.py
   → system_prompt = agent.build_system_prompt()
     → base_prompt + media_info fragment + seerr fragment + triage fragment

3. run_agent_with_tools() — Turn 1:
   → LLM receives: [system prompt with tools] + [user: "What are trending movies?"]
   → LLM responds: tool_calls = [{"function": {"name": "seerr_trending", "arguments": {"kind": "movie"}}}]

4. Execute tool:
   → execute_tool(["media_info", "seerr", "triage"], "seerr_trending", {"kind": "movie"})
   → Finds seerr skill → calls _execute("seerr_trending", ...) → _trending(args)
   → GET /api/v1/discover/trending?mediaType=movie
   → Returns formatted list with [tmdb:IDs]

5. run_agent_with_tools() — Turn 2:
   → LLM receives: previous messages + [tool: "Found 20 trending movies..."]
   → LLM responds: text = "Here are the top trending movies! 🎬 ..."
   → finish_reason="stop" → return the text

6. chat_completions() returns:
   { "choices": [{"message": {"content": "Here are the top trending movies!..."}}] }

Step-by-step: "Request the 2026 one" (multi-turn context)

1. OpenWebUI sends the FULL history:
   {
     "model": "media-agent",
     "messages": [
       {"role": "user", "content": "What are trending movies?"},
       {"role": "assistant", "content": "Here are the top 10 trending movies!
        1. **Mortal Kombat II** (2026) [tmdb:931285] — ..."},
       {"role": "user", "content": "could request the mortal kombat one?"},
       {"role": "assistant", "content": "There are several Mortal Kombat entries! ..."},
       {"role": "user", "content": "the 2026 one"}
     ]
   }

2. chat_completions():
   → req.messages contains the ENTIRE conversation history
   → System prompt prepended → full_messages = [system] + 5 history messages
   → LLM sees everything: the trending list with [tmdb:931285], the disambiguation, "the 2026 one"

3. LLM reasons:
   - I previously listed Mortal Kombat II (2026) with [tmdb:931285]
   - The user said "request the mortal kombat one" → I searched and showed 4 options
   - Now they say "the 2026 one" → that matches Mortal Kombat II (2026) [tmdb:931285]
   - I should call seerr_request_media(kind="movie", title="Mortal Kombat II", tmdb_id=931285)

4. Tool executes the request → ✅ Success

File Map

main.py                          # FastAPI app entry point, creates singletons
├── core/
│   ├── config.py                # .env loader, config constants
│   └── llm.py                   # create_client() factory for OpenAI client
├── api/
│   ├── dependencies.py          # FastAPI Depends: get_llm_client()
│   └── v1/
│       └── chat.py              # APIRouter, endpoints, tool-calling loop
├── agents/
│   ├── __init__.py              # Agent dataclass, registry, load_all_agents()
│   ├── naked.py                 # Agent: barebone LLM, no skills
│   └── media_agent.py           # Agent: media assistant with Seerr skills
└── skills/
    ├── __init__.py              # Skill dataclass, ToolResult, registry, execution
    ├── media_info.py            # Skill: base media assistant persona (prompt-only)
    ├── seerr.py                 # Skill: Seerr API tools (7 tools, real API calls)
    └── triage.py                # Skill: fallback for unsupported actions (prompt-only)

Key Design Decisions

Full multi-turn history: req.messages passes through unchanged. The LLM has access to its own previous responses (including [tmdb:IDs]). No external state management needed.
No deterministic pre-processing: No affirmation detectors, reference resolvers, or hardcoded rules. The LLM interprets user intent naturally from full conversation context.
Agent selection via model field: OpenWebUI sends model in the request. _resolve_agent() maps it to a registered agent. The /v1/models endpoint lists all agents as selectable models.
Skills = prompts + tools: Skills inject prompt fragments AND optionally expose OpenAI function-calling tools. Prompt-only skills (like triage) just shape behavior. Tool-enabled skills (like seerr) let the LLM take real actions.
Singleton LLM client: Created once in main.py, stored on app.state.llm_client, accessed via FastAPI Depends(get_llm_client).

9.9 KiB Raw Blame History