Agents/api/ARCHITECTURE.md

# API Architecture — Agent + Skill + Tool Pipeline

This document explains how the API routes user messages through the agent/skill/tool pipeline to produce responses.

---

## Overview

```
┌─────────────────────────────────────────────────────────────────┐
│                      OpenWebUI / Client                         │
│  POST /v1/chat/completions  { model, messages, stream }         │
└──────────────────────────────┬──────────────────────────────────┘
                               │
                               ▼
┌──────────────────────────────────────────────────────────────────┐
│  api/v1/chat.py  —  chat_completions()                          │
│                                                                  │
│  1. _resolve_agent(req.model)  →  Agent                          │
│  2. agent.build_system_prompt()  →  system prompt                │
│  3. Build full_messages = [system] + req.messages                │
│  4. run_agent_with_tools(client, messages, agent_id)             │
└──────────────────────────────┬───────────────────────────────────┘
                               │
                               ▼
┌──────────────────────────────────────────────────────────────────┐
│  Tool-Calling Loop  (run_agent_with_tools / run_agent_stream)    │
│                                                                  │
│  while turns < max_turns:                                        │
│    response = LLM.chat(messages, tools=agent_tools)              │
│    if response has tool_calls:                                   │
│      for each tool_call:                                         │
│        result = execute_tool(skills, name, args)                 │
│        append result to messages                                 │
│    else:                                                         │
│      return response.text  (stream tokens if streaming)          │
└──────────────────────────────────────────────────────────────────┘
```

---

## Key Concepts

### 1. Agent

An **Agent** is a persona + skill bundle. Defined in `agents/`.

```python
# agents/media_agent.py
Agent(
    agent_id="media-agent",
    description="Media assistant with Seerr integration",
    skills=["media_info", "seerr", "triage"],
    base_prompt="You are a media assistant...",
)
```

- `agent_id` — unique name, exposed as a model in OpenWebUI
- `skills` — list of skill names to load
- `base_prompt` — starting system prompt, combined with skill fragments
- `build_system_prompt()` — merges base_prompt + all skill prompt fragments

Agents self-register at import time via `agents/__init__.py`'s `register()`.
`main.py` calls `load_all_agents()` at startup to import all agent/skill modules.

### 2. Skill

A **Skill** is a capability bundle. Defined in `skills/`.

```python
# skills/seerr.py
Skill(
    name="seerr",
    description="Seerr integration — trending, discover, request media, submit issues",
    prompt_fragment="## Seerr Media Tools\n...",
    tools=[...],          # OpenAI function-calling schema
    execute=_execute,     # async handler: tool_name + args → ToolResult
)
```

- `prompt_fragment` — injected into the agent's system prompt. Teaches the LLM what tools are available and when to use them.
- `tools` — list of OpenAI function definitions (name, description, parameters).
- `execute` — async callable that routes tool calls to API handlers.

### 3. Tool

A **Tool** is a single function the LLM can call. Defined as part of a skill's `tools` list.

```python
{
    "type": "function",
    "function": {
        "name": "seerr_trending",
        "description": "Get trending movies and TV shows from Seerr...",
        "parameters": {
            "type": "object",
            "properties": {
                "kind": {"type": "string", "enum": ["movie", "tv", "all"]},
                "language": {"type": "string"},
            },
            "required": ["kind"],
        },
    },
}
```

When the LLM responds with a tool call, the loop:
1. Extracts `function.name` (e.g. `"seerr_trending"`) and `function.arguments` (e.g. `{"kind": "movie"}`)
2. Calls `execute_tool(agent.skills, name, args)` which finds the owning skill and runs it
3. Appends the result text to the message history
4. Sends back to the LLM for a follow-up response

---

## Full Request Flow

### Step-by-step: "What are trending movies?"

```
1. OpenWebUI sends:
   POST /v1/chat/completions
   {
     "model": "media-agent",
     "messages": [
       {"role": "user", "content": "What are trending movies?"}
     ],
     "stream": false
   }

2. chat_completions():
   → _resolve_agent(model="media-agent")
     → get_agent("media-agent") → Agent(skills=["media_info", "seerr", "triage"])
   → tools = get_all_tools(["media_info", "seerr", "triage"])
     → Returns 7 tool definitions from seerr.py
   → system_prompt = agent.build_system_prompt()
     → base_prompt + media_info fragment + seerr fragment + triage fragment

3. run_agent_with_tools() — Turn 1:
   → LLM receives: [system prompt with tools] + [user: "What are trending movies?"]
   → LLM responds: tool_calls = [{"function": {"name": "seerr_trending", "arguments": {"kind": "movie"}}}]

4. Execute tool:
   → execute_tool(["media_info", "seerr", "triage"], "seerr_trending", {"kind": "movie"})
   → Finds seerr skill → calls _execute("seerr_trending", ...) → _trending(args)
   → GET /api/v1/discover/trending?mediaType=movie
   → Returns formatted list with [tmdb:IDs]

5. run_agent_with_tools() — Turn 2:
   → LLM receives: previous messages + [tool: "Found 20 trending movies..."]
   → LLM responds: text = "Here are the top trending movies! 🎬 ..."
   → finish_reason="stop" → return the text

6. chat_completions() returns:
   { "choices": [{"message": {"content": "Here are the top trending movies!..."}}] }
```

### Step-by-step: "Request the 2026 one" (multi-turn context)

```
1. OpenWebUI sends the FULL history:
   {
     "model": "media-agent",
     "messages": [
       {"role": "user", "content": "What are trending movies?"},
       {"role": "assistant", "content": "Here are the top 10 trending movies!
        1. **Mortal Kombat II** (2026) [tmdb:931285] — ..."},
       {"role": "user", "content": "could request the mortal kombat one?"},
       {"role": "assistant", "content": "There are several Mortal Kombat entries! ..."},
       {"role": "user", "content": "the 2026 one"}
     ]
   }

2. chat_completions():
   → req.messages contains the ENTIRE conversation history
   → System prompt prepended → full_messages = [system] + 5 history messages
   → LLM sees everything: the trending list with [tmdb:931285], the disambiguation, "the 2026 one"

3. LLM reasons:
   - I previously listed Mortal Kombat II (2026) with [tmdb:931285]
   - The user said "request the mortal kombat one" → I searched and showed 4 options
   - Now they say "the 2026 one" → that matches Mortal Kombat II (2026) [tmdb:931285]
   - I should call seerr_request_media(kind="movie", title="Mortal Kombat II", tmdb_id=931285)

4. Tool executes the request → ✅ Success
```

---

## File Map

```
main.py                          # FastAPI app entry point, creates singletons
├── core/
│   ├── config.py                # .env loader, config constants
│   └── llm.py                   # create_client() factory for OpenAI client
├── api/
│   ├── dependencies.py          # FastAPI Depends: get_llm_client()
│   └── v1/
│       └── chat.py              # APIRouter, endpoints, tool-calling loop
├── agents/
│   ├── __init__.py              # Agent dataclass, registry, load_all_agents()
│   ├── naked.py                 # Agent: barebone LLM, no skills
│   └── media_agent.py           # Agent: media assistant with Seerr skills
└── skills/
    ├── __init__.py              # Skill dataclass, ToolResult, registry, execution
    ├── media_info.py            # Skill: base media assistant persona (prompt-only)
    ├── seerr.py                 # Skill: Seerr API tools (7 tools, real API calls)
    └── triage.py                # Skill: fallback for unsupported actions (prompt-only)
```

## Key Design Decisions

1. **Full multi-turn history**: `req.messages` passes through unchanged. The LLM has access to its own previous responses (including `[tmdb:IDs]`). No external state management needed.

2. **No deterministic pre-processing**: No affirmation detectors, reference resolvers, or hardcoded rules. The LLM interprets user intent naturally from full conversation context.

3. **Agent selection via `model` field**: OpenWebUI sends `model` in the request. `_resolve_agent()` maps it to a registered agent. The `/v1/models` endpoint lists all agents as selectable models.

4. **Skills = prompts + tools**: Skills inject prompt fragments AND optionally expose OpenAI function-calling tools. Prompt-only skills (like `triage`) just shape behavior. Tool-enabled skills (like `seerr`) let the LLM take real actions.

5. **Singleton LLM client**: Created once in `main.py`, stored on `app.state.llm_client`, accessed via FastAPI `Depends(get_llm_client)`.