fixed api calls with seerr, added full context for models, beginning to standardizing single id as source of truths for future tools

2026-05-14 14:25:48 +02:00
parent d943d4bd31
commit 2adf17493a
5 changed files with 692 additions and 161 deletions
@@ -0,0 +1,221 @@
+# API Architecture — Agent + Skill + Tool Pipeline
+
+This document explains how the API routes user messages through the agent/skill/tool pipeline to produce responses.
+
+---
+
+## Overview
+
+```
+┌─────────────────────────────────────────────────────────────────┐
+│                      OpenWebUI / Client                         │
+│  POST /v1/chat/completions  { model, messages, stream }         │
+└──────────────────────────────┬──────────────────────────────────┘
+                               │
+                               ▼
+┌──────────────────────────────────────────────────────────────────┐
+│  api/v1/chat.py  —  chat_completions()                          │
+│                                                                  │
+│  1. _resolve_agent(req.model)  →  Agent                          │
+│  2. agent.build_system_prompt()  →  system prompt                │
+│  3. Build full_messages = [system] + req.messages                │
+│  4. run_agent_with_tools(client, messages, agent_id)             │
+└──────────────────────────────┬───────────────────────────────────┘
+                               │
+                               ▼
+┌──────────────────────────────────────────────────────────────────┐
+│  Tool-Calling Loop  (run_agent_with_tools / run_agent_stream)    │
+│                                                                  │
+│  while turns < max_turns:                                        │
+│    response = LLM.chat(messages, tools=agent_tools)              │
+│    if response has tool_calls:                                   │
+│      for each tool_call:                                         │
+│        result = execute_tool(skills, name, args)                 │
+│        append result to messages                                 │
+│    else:                                                         │
+│      return response.text  (stream tokens if streaming)          │
+└──────────────────────────────────────────────────────────────────┘
+```
+
+---
+
+## Key Concepts
+
+### 1. Agent
+
+An **Agent** is a persona + skill bundle. Defined in `agents/`.
+
+```python
+# agents/media_agent.py
+Agent(
+    agent_id="media-agent",
+    description="Media assistant with Seerr integration",
+    skills=["media_info", "seerr", "triage"],
+    base_prompt="You are a media assistant...",
+)
+```
+
+- `agent_id` — unique name, exposed as a model in OpenWebUI
+- `skills` — list of skill names to load
+- `base_prompt` — starting system prompt, combined with skill fragments
+- `build_system_prompt()` — merges base_prompt + all skill prompt fragments
+
+Agents self-register at import time via `agents/__init__.py`'s `register()`.
+`main.py` calls `load_all_agents()` at startup to import all agent/skill modules.
+
+### 2. Skill
+
+A **Skill** is a capability bundle. Defined in `skills/`.
+
+```python
+# skills/seerr.py
+Skill(
+    name="seerr",
+    description="Seerr integration — trending, discover, request media, submit issues",
+    prompt_fragment="## Seerr Media Tools\n...",
+    tools=[...],          # OpenAI function-calling schema
+    execute=_execute,     # async handler: tool_name + args → ToolResult
+)
+```
+
+- `prompt_fragment` — injected into the agent's system prompt. Teaches the LLM what tools are available and when to use them.
+- `tools` — list of OpenAI function definitions (name, description, parameters).
+- `execute` — async callable that routes tool calls to API handlers.
+
+### 3. Tool
+
+A **Tool** is a single function the LLM can call. Defined as part of a skill's `tools` list.
+
+```python
+{
+    "type": "function",
+    "function": {
+        "name": "seerr_trending",
+        "description": "Get trending movies and TV shows from Seerr...",
+        "parameters": {
+            "type": "object",
+            "properties": {
+                "kind": {"type": "string", "enum": ["movie", "tv", "all"]},
+                "language": {"type": "string"},
+            },
+            "required": ["kind"],
+        },
+    },
+}
+```
+
+When the LLM responds with a tool call, the loop:
+1. Extracts `function.name` (e.g. `"seerr_trending"`) and `function.arguments` (e.g. `{"kind": "movie"}`)
+2. Calls `execute_tool(agent.skills, name, args)` which finds the owning skill and runs it
+3. Appends the result text to the message history
+4. Sends back to the LLM for a follow-up response
+
+---
+
+## Full Request Flow
+
+### Step-by-step: "What are trending movies?"
+
+```
+1. OpenWebUI sends:
+   POST /v1/chat/completions
+   {
+     "model": "media-agent",
+     "messages": [
+       {"role": "user", "content": "What are trending movies?"}
+     ],
+     "stream": false
+   }
+
+2. chat_completions():
+   → _resolve_agent(model="media-agent")
+     → get_agent("media-agent") → Agent(skills=["media_info", "seerr", "triage"])
+   → tools = get_all_tools(["media_info", "seerr", "triage"])
+     → Returns 7 tool definitions from seerr.py
+   → system_prompt = agent.build_system_prompt()
+     → base_prompt + media_info fragment + seerr fragment + triage fragment
+
+3. run_agent_with_tools() — Turn 1:
+   → LLM receives: [system prompt with tools] + [user: "What are trending movies?"]
+   → LLM responds: tool_calls = [{"function": {"name": "seerr_trending", "arguments": {"kind": "movie"}}}]
+
+4. Execute tool:
+   → execute_tool(["media_info", "seerr", "triage"], "seerr_trending", {"kind": "movie"})
+   → Finds seerr skill → calls _execute("seerr_trending", ...) → _trending(args)
+   → GET /api/v1/discover/trending?mediaType=movie
+   → Returns formatted list with [tmdb:IDs]
+
+5. run_agent_with_tools() — Turn 2:
+   → LLM receives: previous messages + [tool: "Found 20 trending movies..."]
+   → LLM responds: text = "Here are the top trending movies! 🎬 ..."
+   → finish_reason="stop" → return the text
+
+6. chat_completions() returns:
+   { "choices": [{"message": {"content": "Here are the top trending movies!..."}}] }
+```
+
+### Step-by-step: "Request the 2026 one" (multi-turn context)
+
+```
+1. OpenWebUI sends the FULL history:
+   {
+     "model": "media-agent",
+     "messages": [
+       {"role": "user", "content": "What are trending movies?"},
+       {"role": "assistant", "content": "Here are the top 10 trending movies!
+        1. **Mortal Kombat II** (2026) [tmdb:931285] — ..."},
+       {"role": "user", "content": "could request the mortal kombat one?"},
+       {"role": "assistant", "content": "There are several Mortal Kombat entries! ..."},
+       {"role": "user", "content": "the 2026 one"}
+     ]
+   }
+
+2. chat_completions():
+   → req.messages contains the ENTIRE conversation history
+   → System prompt prepended → full_messages = [system] + 5 history messages
+   → LLM sees everything: the trending list with [tmdb:931285], the disambiguation, "the 2026 one"
+
+3. LLM reasons:
+   - I previously listed Mortal Kombat II (2026) with [tmdb:931285]
+   - The user said "request the mortal kombat one" → I searched and showed 4 options
+   - Now they say "the 2026 one" → that matches Mortal Kombat II (2026) [tmdb:931285]
+   - I should call seerr_request_media(kind="movie", title="Mortal Kombat II", tmdb_id=931285)
+
+4. Tool executes the request → ✅ Success
+```
+
+---
+
+## File Map
+
+```
+main.py                          # FastAPI app entry point, creates singletons
+├── core/
+│   ├── config.py                # .env loader, config constants
+│   └── llm.py                   # create_client() factory for OpenAI client
+├── api/
+│   ├── dependencies.py          # FastAPI Depends: get_llm_client()
+│   └── v1/
+│       └── chat.py              # APIRouter, endpoints, tool-calling loop
+├── agents/
+│   ├── __init__.py              # Agent dataclass, registry, load_all_agents()
+│   ├── naked.py                 # Agent: barebone LLM, no skills
+│   └── media_agent.py           # Agent: media assistant with Seerr skills
+└── skills/
+    ├── __init__.py              # Skill dataclass, ToolResult, registry, execution
+    ├── media_info.py            # Skill: base media assistant persona (prompt-only)
+    ├── seerr.py                 # Skill: Seerr API tools (7 tools, real API calls)
+    └── triage.py                # Skill: fallback for unsupported actions (prompt-only)
+```
+
+## Key Design Decisions
+
+1. **Full multi-turn history**: `req.messages` passes through unchanged. The LLM has access to its own previous responses (including `[tmdb:IDs]`). No external state management needed.
+
+2. **No deterministic pre-processing**: No affirmation detectors, reference resolvers, or hardcoded rules. The LLM interprets user intent naturally from full conversation context.
+
+3. **Agent selection via `model` field**: OpenWebUI sends `model` in the request. `_resolve_agent()` maps it to a registered agent. The `/v1/models` endpoint lists all agents as selectable models.
+
+4. **Skills = prompts + tools**: Skills inject prompt fragments AND optionally expose OpenAI function-calling tools. Prompt-only skills (like `triage`) just shape behavior. Tool-enabled skills (like `seerr`) let the LLM take real actions.
+
+5. **Singleton LLM client**: Created once in `main.py`, stored on `app.state.llm_client`, accessed via FastAPI `Depends(get_llm_client)`.