API Architecture — Agent + Skill + Tool Pipeline

This document explains how the API routes user messages through the agent/skill/tool pipeline to produce responses.

Overview

┌─────────────────────────────────────────────────────────────────┐
│                      OpenWebUI / Client                         │
│  POST /v1/chat/completions  { model, messages, stream }         │
└──────────────────────────────┬──────────────────────────────────┘
                               │
                               ▼
┌──────────────────────────────────────────────────────────────────┐
│  api/v1/chat.py  —  chat_completions()                          │
│                                                                  │
│  1. _resolve_agent(req.model)  →  Agent                          │
│  2. agent.build_system_prompt()  →  system prompt                │
│  3. Build full_messages = [system] + req.messages                │
│  4. run_agent_with_tools(client, messages, agent_id)             │
└──────────────────────────────┬───────────────────────────────────┘
                               │
                               ▼
┌──────────────────────────────────────────────────────────────────┐
│  Tool-Calling Loop  (run_agent_with_tools / run_agent_stream)    │
│                                                                  │
│  while turns < max_turns:                                        │
│    response = LLM.chat(messages, tools=agent_tools)              │
│    if response has tool_calls:                                   │
│      for each tool_call:                                         │
│        result = execute_tool(skills, name, args)                 │
│        append result to messages                                 │
│    else:                                                         │
│      return response.text  (stream tokens if streaming)          │
└──────────────────────────────────────────────────────────────────┘

Key Concepts

1. Agent

An Agent is a persona + skill bundle. Defined in agents/.

# agents/media_agent.py
Agent(
    agent_id="media-agent",
    description="Media assistant with Seerr integration",
    skills=["media_info", "seerr", "triage"],
    base_prompt="You are a media assistant...",
)

agent_id — unique name, exposed as a model in OpenWebUI
skills — list of skill names to load
base_prompt — starting system prompt, combined with skill fragments
build_system_prompt() — merges base_prompt + all skill prompt fragments

Agents self-register at import time via agents/__init__.py's register(). main.py calls load_all_agents() at startup to import all agent/skill modules.

2. Skill

A Skill is a capability bundle. Defined in skills/.

# skills/seerr.py
Skill(
    name="seerr",
    description="Seerr integration — trending, discover, request media, submit issues",
    prompt_fragment="## Seerr Media Tools\n...",
    tools=[...],          # OpenAI function-calling schema
    execute=_execute,     # async handler: tool_name + args → ToolResult
)

prompt_fragment — injected into the agent's system prompt. Teaches the LLM what tools are available and when to use them.
tools — list of OpenAI function definitions (name, description, parameters).
execute — async callable that routes tool calls to API handlers.

3. Tool

A Tool is a single function the LLM can call. Defined as part of a skill's tools list.

{
    "type": "function",
    "function": {
        "name": "seerr_trending",
        "description": "Get trending movies and TV shows from Seerr...",
        "parameters": {
            "type": "object",
            "properties": {
                "kind": {"type": "string", "enum": ["movie", "tv", "all"]},
                "language": {"type": "string"},
            },
            "required": ["kind"],
        },
    },
}

When the LLM responds with a tool call, the loop:

Extracts function.name (e.g. "seerr_trending") and function.arguments (e.g. {"kind": "movie"})
Calls execute_tool(agent.skills, name, args) which finds the owning skill and runs it
Appends the result text to the message history
Sends back to the LLM for a follow-up response

Full Request Flow

1. OpenWebUI sends:
   POST /v1/chat/completions
   {
     "model": "media-agent",
     "messages": [
       {"role": "user", "content": "What are trending movies?"}
     ],
     "stream": false
   }

2. chat_completions():
   → _resolve_agent(model="media-agent")
     → get_agent("media-agent") → Agent(skills=["media_info", "seerr", "triage"])
   → tools = get_all_tools(["media_info", "seerr", "triage"])
     → Returns 7 tool definitions from seerr.py
   → system_prompt = agent.build_system_prompt()
     → base_prompt + media_info fragment + seerr fragment + triage fragment

3. run_agent_with_tools() — Turn 1:
   → LLM receives: [system prompt with tools] + [user: "What are trending movies?"]
   → LLM responds: tool_calls = [{"function": {"name": "seerr_trending", "arguments": {"kind": "movie"}}}]

4. Execute tool:
   → execute_tool(["media_info", "seerr", "triage"], "seerr_trending", {"kind": "movie"})
   → Finds seerr skill → calls _execute("seerr_trending", ...) → _trending(args)
   → GET /api/v1/discover/trending?mediaType=movie
   → Returns formatted list with [tmdb:IDs]

5. run_agent_with_tools() — Turn 2:
   → LLM receives: previous messages + [tool: "Found 20 trending movies..."]
   → LLM responds: text = "Here are the top trending movies! 🎬 ..."
   → finish_reason="stop" → return the text

6. chat_completions() returns:
   { "choices": [{"message": {"content": "Here are the top trending movies!..."}}] }

Step-by-step: "Request the 2026 one" (multi-turn context)

1. OpenWebUI sends the FULL history:
   {
     "model": "media-agent",
     "messages": [
       {"role": "user", "content": "What are trending movies?"},
       {"role": "assistant", "content": "Here are the top 10 trending movies!
        1. **Mortal Kombat II** (2026) [tmdb:931285] — ..."},
       {"role": "user", "content": "could request the mortal kombat one?"},
       {"role": "assistant", "content": "There are several Mortal Kombat entries! ..."},
       {"role": "user", "content": "the 2026 one"}
     ]
   }

2. chat_completions():
   → req.messages contains the ENTIRE conversation history
   → System prompt prepended → full_messages = [system] + 5 history messages
   → LLM sees everything: the trending list with [tmdb:931285], the disambiguation, "the 2026 one"

3. LLM reasons:
   - I previously listed Mortal Kombat II (2026) with [tmdb:931285]
   - The user said "request the mortal kombat one" → I searched and showed 4 options
   - Now they say "the 2026 one" → that matches Mortal Kombat II (2026) [tmdb:931285]
   - I should call seerr_request_media(kind="movie", title="Mortal Kombat II", tmdb_id=931285)

4. Tool executes the request → ✅ Success

7.9 KiB Raw Blame History