Add API architecture documentation for Agent, Skill, and Tool pipeline

2026-05-14 14:28:17 +02:00
parent 2adf17493a
commit 0634e7400a
1 changed files with 0 additions and 36 deletions
@@ -0,0 +1,185 @@
+# API Architecture — Agent + Skill + Tool Pipeline
+
+This document explains how the API routes user messages through the agent/skill/tool pipeline to produce responses.
+
+---
+
+## Overview
+
+```
+┌─────────────────────────────────────────────────────────────────┐
+│                      OpenWebUI / Client                         │
+│  POST /v1/chat/completions  { model, messages, stream }         │
+└──────────────────────────────┬──────────────────────────────────┘
+                               │
+                               ▼
+┌──────────────────────────────────────────────────────────────────┐
+│  api/v1/chat.py  —  chat_completions()                          │
+│                                                                  │
+│  1. _resolve_agent(req.model)  →  Agent                          │
+│  2. agent.build_system_prompt()  →  system prompt                │
+│  3. Build full_messages = [system] + req.messages                │
+│  4. run_agent_with_tools(client, messages, agent_id)             │
+└──────────────────────────────┬───────────────────────────────────┘
+                               │
+                               ▼
+┌──────────────────────────────────────────────────────────────────┐
+│  Tool-Calling Loop  (run_agent_with_tools / run_agent_stream)    │
+│                                                                  │
+│  while turns < max_turns:                                        │
+│    response = LLM.chat(messages, tools=agent_tools)              │
+│    if response has tool_calls:                                   │
+│      for each tool_call:                                         │
+│        result = execute_tool(skills, name, args)                 │
+│        append result to messages                                 │
+│    else:                                                         │
+│      return response.text  (stream tokens if streaming)          │
+└──────────────────────────────────────────────────────────────────┘
+```
+
+---
+
+## Key Concepts
+
+### 1. Agent
+
+An **Agent** is a persona + skill bundle. Defined in `agents/`.
+
+```python
+# agents/media_agent.py
+Agent(
+    agent_id="media-agent",
+    description="Media assistant with Seerr integration",
+    skills=["media_info", "seerr", "triage"],
+    base_prompt="You are a media assistant...",
+)
+```
+
+- `agent_id` — unique name, exposed as a model in OpenWebUI
+- `skills` — list of skill names to load
+- `base_prompt` — starting system prompt, combined with skill fragments
+- `build_system_prompt()` — merges base_prompt + all skill prompt fragments
+
+Agents self-register at import time via `agents/__init__.py`'s `register()`.
+`main.py` calls `load_all_agents()` at startup to import all agent/skill modules.
+
+### 2. Skill
+
+A **Skill** is a capability bundle. Defined in `skills/`.
+
+```python
+# skills/seerr.py
+Skill(
+    name="seerr",
+    description="Seerr integration — trending, discover, request media, submit issues",
+    prompt_fragment="## Seerr Media Tools\n...",
+    tools=[...],          # OpenAI function-calling schema
+    execute=_execute,     # async handler: tool_name + args → ToolResult
+)
+```
+
+- `prompt_fragment` — injected into the agent's system prompt. Teaches the LLM what tools are available and when to use them.
+- `tools` — list of OpenAI function definitions (name, description, parameters).
+- `execute` — async callable that routes tool calls to API handlers.
+
+### 3. Tool
+
+A **Tool** is a single function the LLM can call. Defined as part of a skill's `tools` list.
+
+```python
+{
+    "type": "function",
+    "function": {
+        "name": "seerr_trending",
+        "description": "Get trending movies and TV shows from Seerr...",
+        "parameters": {
+            "type": "object",
+            "properties": {
+                "kind": {"type": "string", "enum": ["movie", "tv", "all"]},
+                "language": {"type": "string"},
+            },
+            "required": ["kind"],
+        },
+    },
+}
+```
+
+When the LLM responds with a tool call, the loop:
+1. Extracts `function.name` (e.g. `"seerr_trending"`) and `function.arguments` (e.g. `{"kind": "movie"}`)
+2. Calls `execute_tool(agent.skills, name, args)` which finds the owning skill and runs it
+3. Appends the result text to the message history
+4. Sends back to the LLM for a follow-up response
+
+---
+
+## Full Request Flow
+
+### Step-by-step: "What are trending movies?"
+
+```
+1. OpenWebUI sends:
+   POST /v1/chat/completions
+   {
+     "model": "media-agent",
+     "messages": [
+       {"role": "user", "content": "What are trending movies?"}
+     ],
+     "stream": false
+   }
+
+2. chat_completions():
+   → _resolve_agent(model="media-agent")
+     → get_agent("media-agent") → Agent(skills=["media_info", "seerr", "triage"])
+   → tools = get_all_tools(["media_info", "seerr", "triage"])
+     → Returns 7 tool definitions from seerr.py
+   → system_prompt = agent.build_system_prompt()
+     → base_prompt + media_info fragment + seerr fragment + triage fragment
+
+3. run_agent_with_tools() — Turn 1:
+   → LLM receives: [system prompt with tools] + [user: "What are trending movies?"]
+   → LLM responds: tool_calls = [{"function": {"name": "seerr_trending", "arguments": {"kind": "movie"}}}]
+
+4. Execute tool:
+   → execute_tool(["media_info", "seerr", "triage"], "seerr_trending", {"kind": "movie"})
+   → Finds seerr skill → calls _execute("seerr_trending", ...) → _trending(args)
+   → GET /api/v1/discover/trending?mediaType=movie
+   → Returns formatted list with [tmdb:IDs]
+
+5. run_agent_with_tools() — Turn 2:
+   → LLM receives: previous messages + [tool: "Found 20 trending movies..."]
+   → LLM responds: text = "Here are the top trending movies! 🎬 ..."
+   → finish_reason="stop" → return the text
+
+6. chat_completions() returns:
+   { "choices": [{"message": {"content": "Here are the top trending movies!..."}}] }
+```
+
+### Step-by-step: "Request the 2026 one" (multi-turn context)
+
+```
+1. OpenWebUI sends the FULL history:
+   {
+     "model": "media-agent",
+     "messages": [
+       {"role": "user", "content": "What are trending movies?"},
+       {"role": "assistant", "content": "Here are the top 10 trending movies!
+        1. **Mortal Kombat II** (2026) [tmdb:931285] — ..."},
+       {"role": "user", "content": "could request the mortal kombat one?"},
+       {"role": "assistant", "content": "There are several Mortal Kombat entries! ..."},
+       {"role": "user", "content": "the 2026 one"}
+     ]
+   }
+
+2. chat_completions():
+   → req.messages contains the ENTIRE conversation history
+   → System prompt prepended → full_messages = [system] + 5 history messages
+   → LLM sees everything: the trending list with [tmdb:931285], the disambiguation, "the 2026 one"
+
+3. LLM reasons:
+   - I previously listed Mortal Kombat II (2026) with [tmdb:931285]
+   - The user said "request the mortal kombat one" → I searched and showed 4 options
+   - Now they say "the 2026 one" → that matches Mortal Kombat II (2026) [tmdb:931285]
+   - I should call seerr_request_media(kind="movie", title="Mortal Kombat II", tmdb_id=931285)
+
+4. Tool executes the request → ✅ Success
+```