222 lines
9.9 KiB
Markdown
222 lines
9.9 KiB
Markdown
# API Architecture — Agent + Skill + Tool Pipeline
|
|
|
|
This document explains how the API routes user messages through the agent/skill/tool pipeline to produce responses.
|
|
|
|
---
|
|
|
|
## Overview
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────────┐
|
|
│ OpenWebUI / Client │
|
|
│ POST /v1/chat/completions { model, messages, stream } │
|
|
└──────────────────────────────┬──────────────────────────────────┘
|
|
│
|
|
▼
|
|
┌──────────────────────────────────────────────────────────────────┐
|
|
│ api/v1/chat.py — chat_completions() │
|
|
│ │
|
|
│ 1. _resolve_agent(req.model) → Agent │
|
|
│ 2. agent.build_system_prompt() → system prompt │
|
|
│ 3. Build full_messages = [system] + req.messages │
|
|
│ 4. run_agent_with_tools(client, messages, agent_id) │
|
|
└──────────────────────────────┬───────────────────────────────────┘
|
|
│
|
|
▼
|
|
┌──────────────────────────────────────────────────────────────────┐
|
|
│ Tool-Calling Loop (run_agent_with_tools / run_agent_stream) │
|
|
│ │
|
|
│ while turns < max_turns: │
|
|
│ response = LLM.chat(messages, tools=agent_tools) │
|
|
│ if response has tool_calls: │
|
|
│ for each tool_call: │
|
|
│ result = execute_tool(skills, name, args) │
|
|
│ append result to messages │
|
|
│ else: │
|
|
│ return response.text (stream tokens if streaming) │
|
|
└──────────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
---
|
|
|
|
## Key Concepts
|
|
|
|
### 1. Agent
|
|
|
|
An **Agent** is a persona + skill bundle. Defined in `agents/`.
|
|
|
|
```python
|
|
# agents/media_agent.py
|
|
Agent(
|
|
agent_id="media-agent",
|
|
description="Media assistant with Seerr integration",
|
|
skills=["media_info", "seerr", "triage"],
|
|
base_prompt="You are a media assistant...",
|
|
)
|
|
```
|
|
|
|
- `agent_id` — unique name, exposed as a model in OpenWebUI
|
|
- `skills` — list of skill names to load
|
|
- `base_prompt` — starting system prompt, combined with skill fragments
|
|
- `build_system_prompt()` — merges base_prompt + all skill prompt fragments
|
|
|
|
Agents self-register at import time via `agents/__init__.py`'s `register()`.
|
|
`main.py` calls `load_all_agents()` at startup to import all agent/skill modules.
|
|
|
|
### 2. Skill
|
|
|
|
A **Skill** is a capability bundle. Defined in `skills/`.
|
|
|
|
```python
|
|
# skills/seerr.py
|
|
Skill(
|
|
name="seerr",
|
|
description="Seerr integration — trending, discover, request media, submit issues",
|
|
prompt_fragment="## Seerr Media Tools\n...",
|
|
tools=[...], # OpenAI function-calling schema
|
|
execute=_execute, # async handler: tool_name + args → ToolResult
|
|
)
|
|
```
|
|
|
|
- `prompt_fragment` — injected into the agent's system prompt. Teaches the LLM what tools are available and when to use them.
|
|
- `tools` — list of OpenAI function definitions (name, description, parameters).
|
|
- `execute` — async callable that routes tool calls to API handlers.
|
|
|
|
### 3. Tool
|
|
|
|
A **Tool** is a single function the LLM can call. Defined as part of a skill's `tools` list.
|
|
|
|
```python
|
|
{
|
|
"type": "function",
|
|
"function": {
|
|
"name": "seerr_trending",
|
|
"description": "Get trending movies and TV shows from Seerr...",
|
|
"parameters": {
|
|
"type": "object",
|
|
"properties": {
|
|
"kind": {"type": "string", "enum": ["movie", "tv", "all"]},
|
|
"language": {"type": "string"},
|
|
},
|
|
"required": ["kind"],
|
|
},
|
|
},
|
|
}
|
|
```
|
|
|
|
When the LLM responds with a tool call, the loop:
|
|
1. Extracts `function.name` (e.g. `"seerr_trending"`) and `function.arguments` (e.g. `{"kind": "movie"}`)
|
|
2. Calls `execute_tool(agent.skills, name, args)` which finds the owning skill and runs it
|
|
3. Appends the result text to the message history
|
|
4. Sends back to the LLM for a follow-up response
|
|
|
|
---
|
|
|
|
## Full Request Flow
|
|
|
|
### Step-by-step: "What are trending movies?"
|
|
|
|
```
|
|
1. OpenWebUI sends:
|
|
POST /v1/chat/completions
|
|
{
|
|
"model": "media-agent",
|
|
"messages": [
|
|
{"role": "user", "content": "What are trending movies?"}
|
|
],
|
|
"stream": false
|
|
}
|
|
|
|
2. chat_completions():
|
|
→ _resolve_agent(model="media-agent")
|
|
→ get_agent("media-agent") → Agent(skills=["media_info", "seerr", "triage"])
|
|
→ tools = get_all_tools(["media_info", "seerr", "triage"])
|
|
→ Returns 7 tool definitions from seerr.py
|
|
→ system_prompt = agent.build_system_prompt()
|
|
→ base_prompt + media_info fragment + seerr fragment + triage fragment
|
|
|
|
3. run_agent_with_tools() — Turn 1:
|
|
→ LLM receives: [system prompt with tools] + [user: "What are trending movies?"]
|
|
→ LLM responds: tool_calls = [{"function": {"name": "seerr_trending", "arguments": {"kind": "movie"}}}]
|
|
|
|
4. Execute tool:
|
|
→ execute_tool(["media_info", "seerr", "triage"], "seerr_trending", {"kind": "movie"})
|
|
→ Finds seerr skill → calls _execute("seerr_trending", ...) → _trending(args)
|
|
→ GET /api/v1/discover/trending?mediaType=movie
|
|
→ Returns formatted list with [tmdb:IDs]
|
|
|
|
5. run_agent_with_tools() — Turn 2:
|
|
→ LLM receives: previous messages + [tool: "Found 20 trending movies..."]
|
|
→ LLM responds: text = "Here are the top trending movies! 🎬 ..."
|
|
→ finish_reason="stop" → return the text
|
|
|
|
6. chat_completions() returns:
|
|
{ "choices": [{"message": {"content": "Here are the top trending movies!..."}}] }
|
|
```
|
|
|
|
### Step-by-step: "Request the 2026 one" (multi-turn context)
|
|
|
|
```
|
|
1. OpenWebUI sends the FULL history:
|
|
{
|
|
"model": "media-agent",
|
|
"messages": [
|
|
{"role": "user", "content": "What are trending movies?"},
|
|
{"role": "assistant", "content": "Here are the top 10 trending movies!
|
|
1. **Mortal Kombat II** (2026) [tmdb:931285] — ..."},
|
|
{"role": "user", "content": "could request the mortal kombat one?"},
|
|
{"role": "assistant", "content": "There are several Mortal Kombat entries! ..."},
|
|
{"role": "user", "content": "the 2026 one"}
|
|
]
|
|
}
|
|
|
|
2. chat_completions():
|
|
→ req.messages contains the ENTIRE conversation history
|
|
→ System prompt prepended → full_messages = [system] + 5 history messages
|
|
→ LLM sees everything: the trending list with [tmdb:931285], the disambiguation, "the 2026 one"
|
|
|
|
3. LLM reasons:
|
|
- I previously listed Mortal Kombat II (2026) with [tmdb:931285]
|
|
- The user said "request the mortal kombat one" → I searched and showed 4 options
|
|
- Now they say "the 2026 one" → that matches Mortal Kombat II (2026) [tmdb:931285]
|
|
- I should call seerr_request_media(kind="movie", title="Mortal Kombat II", tmdb_id=931285)
|
|
|
|
4. Tool executes the request → ✅ Success
|
|
```
|
|
|
|
---
|
|
|
|
## File Map
|
|
|
|
```
|
|
main.py # FastAPI app entry point, creates singletons
|
|
├── core/
|
|
│ ├── config.py # .env loader, config constants
|
|
│ └── llm.py # create_client() factory for OpenAI client
|
|
├── api/
|
|
│ ├── dependencies.py # FastAPI Depends: get_llm_client()
|
|
│ └── v1/
|
|
│ └── chat.py # APIRouter, endpoints, tool-calling loop
|
|
├── agents/
|
|
│ ├── __init__.py # Agent dataclass, registry, load_all_agents()
|
|
│ ├── naked.py # Agent: barebone LLM, no skills
|
|
│ └── media_agent.py # Agent: media assistant with Seerr skills
|
|
└── skills/
|
|
├── __init__.py # Skill dataclass, ToolResult, registry, execution
|
|
├── media_info.py # Skill: base media assistant persona (prompt-only)
|
|
├── seerr.py # Skill: Seerr API tools (7 tools, real API calls)
|
|
└── triage.py # Skill: fallback for unsupported actions (prompt-only)
|
|
```
|
|
|
|
## Key Design Decisions
|
|
|
|
1. **Full multi-turn history**: `req.messages` passes through unchanged. The LLM has access to its own previous responses (including `[tmdb:IDs]`). No external state management needed.
|
|
|
|
2. **No deterministic pre-processing**: No affirmation detectors, reference resolvers, or hardcoded rules. The LLM interprets user intent naturally from full conversation context.
|
|
|
|
3. **Agent selection via `model` field**: OpenWebUI sends `model` in the request. `_resolve_agent()` maps it to a registered agent. The `/v1/models` endpoint lists all agents as selectable models.
|
|
|
|
4. **Skills = prompts + tools**: Skills inject prompt fragments AND optionally expose OpenAI function-calling tools. Prompt-only skills (like `triage`) just shape behavior. Tool-enabled skills (like `seerr`) let the LLM take real actions.
|
|
|
|
5. **Singleton LLM client**: Created once in `main.py`, stored on `app.state.llm_client`, accessed via FastAPI `Depends(get_llm_client)`.
|