fixed api calls with seerr, added full context for models, beginning to standardizing single id as source of truths for future tools
Build and Push Agent API / build (push) Successful in 14s
Build and Push Agent API / build (push) Successful in 14s
This commit is contained in:
@@ -0,0 +1,221 @@
|
||||
# API Architecture — Agent + Skill + Tool Pipeline
|
||||
|
||||
This document explains how the API routes user messages through the agent/skill/tool pipeline to produce responses.
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ OpenWebUI / Client │
|
||||
│ POST /v1/chat/completions { model, messages, stream } │
|
||||
└──────────────────────────────┬──────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌──────────────────────────────────────────────────────────────────┐
|
||||
│ api/v1/chat.py — chat_completions() │
|
||||
│ │
|
||||
│ 1. _resolve_agent(req.model) → Agent │
|
||||
│ 2. agent.build_system_prompt() → system prompt │
|
||||
│ 3. Build full_messages = [system] + req.messages │
|
||||
│ 4. run_agent_with_tools(client, messages, agent_id) │
|
||||
└──────────────────────────────┬───────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌──────────────────────────────────────────────────────────────────┐
|
||||
│ Tool-Calling Loop (run_agent_with_tools / run_agent_stream) │
|
||||
│ │
|
||||
│ while turns < max_turns: │
|
||||
│ response = LLM.chat(messages, tools=agent_tools) │
|
||||
│ if response has tool_calls: │
|
||||
│ for each tool_call: │
|
||||
│ result = execute_tool(skills, name, args) │
|
||||
│ append result to messages │
|
||||
│ else: │
|
||||
│ return response.text (stream tokens if streaming) │
|
||||
└──────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Key Concepts
|
||||
|
||||
### 1. Agent
|
||||
|
||||
An **Agent** is a persona + skill bundle. Defined in `agents/`.
|
||||
|
||||
```python
|
||||
# agents/media_agent.py
|
||||
Agent(
|
||||
agent_id="media-agent",
|
||||
description="Media assistant with Seerr integration",
|
||||
skills=["media_info", "seerr", "triage"],
|
||||
base_prompt="You are a media assistant...",
|
||||
)
|
||||
```
|
||||
|
||||
- `agent_id` — unique name, exposed as a model in OpenWebUI
|
||||
- `skills` — list of skill names to load
|
||||
- `base_prompt` — starting system prompt, combined with skill fragments
|
||||
- `build_system_prompt()` — merges base_prompt + all skill prompt fragments
|
||||
|
||||
Agents self-register at import time via `agents/__init__.py`'s `register()`.
|
||||
`main.py` calls `load_all_agents()` at startup to import all agent/skill modules.
|
||||
|
||||
### 2. Skill
|
||||
|
||||
A **Skill** is a capability bundle. Defined in `skills/`.
|
||||
|
||||
```python
|
||||
# skills/seerr.py
|
||||
Skill(
|
||||
name="seerr",
|
||||
description="Seerr integration — trending, discover, request media, submit issues",
|
||||
prompt_fragment="## Seerr Media Tools\n...",
|
||||
tools=[...], # OpenAI function-calling schema
|
||||
execute=_execute, # async handler: tool_name + args → ToolResult
|
||||
)
|
||||
```
|
||||
|
||||
- `prompt_fragment` — injected into the agent's system prompt. Teaches the LLM what tools are available and when to use them.
|
||||
- `tools` — list of OpenAI function definitions (name, description, parameters).
|
||||
- `execute` — async callable that routes tool calls to API handlers.
|
||||
|
||||
### 3. Tool
|
||||
|
||||
A **Tool** is a single function the LLM can call. Defined as part of a skill's `tools` list.
|
||||
|
||||
```python
|
||||
{
|
||||
"type": "function",
|
||||
"function": {
|
||||
"name": "seerr_trending",
|
||||
"description": "Get trending movies and TV shows from Seerr...",
|
||||
"parameters": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"kind": {"type": "string", "enum": ["movie", "tv", "all"]},
|
||||
"language": {"type": "string"},
|
||||
},
|
||||
"required": ["kind"],
|
||||
},
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
When the LLM responds with a tool call, the loop:
|
||||
1. Extracts `function.name` (e.g. `"seerr_trending"`) and `function.arguments` (e.g. `{"kind": "movie"}`)
|
||||
2. Calls `execute_tool(agent.skills, name, args)` which finds the owning skill and runs it
|
||||
3. Appends the result text to the message history
|
||||
4. Sends back to the LLM for a follow-up response
|
||||
|
||||
---
|
||||
|
||||
## Full Request Flow
|
||||
|
||||
### Step-by-step: "What are trending movies?"
|
||||
|
||||
```
|
||||
1. OpenWebUI sends:
|
||||
POST /v1/chat/completions
|
||||
{
|
||||
"model": "media-agent",
|
||||
"messages": [
|
||||
{"role": "user", "content": "What are trending movies?"}
|
||||
],
|
||||
"stream": false
|
||||
}
|
||||
|
||||
2. chat_completions():
|
||||
→ _resolve_agent(model="media-agent")
|
||||
→ get_agent("media-agent") → Agent(skills=["media_info", "seerr", "triage"])
|
||||
→ tools = get_all_tools(["media_info", "seerr", "triage"])
|
||||
→ Returns 7 tool definitions from seerr.py
|
||||
→ system_prompt = agent.build_system_prompt()
|
||||
→ base_prompt + media_info fragment + seerr fragment + triage fragment
|
||||
|
||||
3. run_agent_with_tools() — Turn 1:
|
||||
→ LLM receives: [system prompt with tools] + [user: "What are trending movies?"]
|
||||
→ LLM responds: tool_calls = [{"function": {"name": "seerr_trending", "arguments": {"kind": "movie"}}}]
|
||||
|
||||
4. Execute tool:
|
||||
→ execute_tool(["media_info", "seerr", "triage"], "seerr_trending", {"kind": "movie"})
|
||||
→ Finds seerr skill → calls _execute("seerr_trending", ...) → _trending(args)
|
||||
→ GET /api/v1/discover/trending?mediaType=movie
|
||||
→ Returns formatted list with [tmdb:IDs]
|
||||
|
||||
5. run_agent_with_tools() — Turn 2:
|
||||
→ LLM receives: previous messages + [tool: "Found 20 trending movies..."]
|
||||
→ LLM responds: text = "Here are the top trending movies! 🎬 ..."
|
||||
→ finish_reason="stop" → return the text
|
||||
|
||||
6. chat_completions() returns:
|
||||
{ "choices": [{"message": {"content": "Here are the top trending movies!..."}}] }
|
||||
```
|
||||
|
||||
### Step-by-step: "Request the 2026 one" (multi-turn context)
|
||||
|
||||
```
|
||||
1. OpenWebUI sends the FULL history:
|
||||
{
|
||||
"model": "media-agent",
|
||||
"messages": [
|
||||
{"role": "user", "content": "What are trending movies?"},
|
||||
{"role": "assistant", "content": "Here are the top 10 trending movies!
|
||||
1. **Mortal Kombat II** (2026) [tmdb:931285] — ..."},
|
||||
{"role": "user", "content": "could request the mortal kombat one?"},
|
||||
{"role": "assistant", "content": "There are several Mortal Kombat entries! ..."},
|
||||
{"role": "user", "content": "the 2026 one"}
|
||||
]
|
||||
}
|
||||
|
||||
2. chat_completions():
|
||||
→ req.messages contains the ENTIRE conversation history
|
||||
→ System prompt prepended → full_messages = [system] + 5 history messages
|
||||
→ LLM sees everything: the trending list with [tmdb:931285], the disambiguation, "the 2026 one"
|
||||
|
||||
3. LLM reasons:
|
||||
- I previously listed Mortal Kombat II (2026) with [tmdb:931285]
|
||||
- The user said "request the mortal kombat one" → I searched and showed 4 options
|
||||
- Now they say "the 2026 one" → that matches Mortal Kombat II (2026) [tmdb:931285]
|
||||
- I should call seerr_request_media(kind="movie", title="Mortal Kombat II", tmdb_id=931285)
|
||||
|
||||
4. Tool executes the request → ✅ Success
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## File Map
|
||||
|
||||
```
|
||||
main.py # FastAPI app entry point, creates singletons
|
||||
├── core/
|
||||
│ ├── config.py # .env loader, config constants
|
||||
│ └── llm.py # create_client() factory for OpenAI client
|
||||
├── api/
|
||||
│ ├── dependencies.py # FastAPI Depends: get_llm_client()
|
||||
│ └── v1/
|
||||
│ └── chat.py # APIRouter, endpoints, tool-calling loop
|
||||
├── agents/
|
||||
│ ├── __init__.py # Agent dataclass, registry, load_all_agents()
|
||||
│ ├── naked.py # Agent: barebone LLM, no skills
|
||||
│ └── media_agent.py # Agent: media assistant with Seerr skills
|
||||
└── skills/
|
||||
├── __init__.py # Skill dataclass, ToolResult, registry, execution
|
||||
├── media_info.py # Skill: base media assistant persona (prompt-only)
|
||||
├── seerr.py # Skill: Seerr API tools (7 tools, real API calls)
|
||||
└── triage.py # Skill: fallback for unsupported actions (prompt-only)
|
||||
```
|
||||
|
||||
## Key Design Decisions
|
||||
|
||||
1. **Full multi-turn history**: `req.messages` passes through unchanged. The LLM has access to its own previous responses (including `[tmdb:IDs]`). No external state management needed.
|
||||
|
||||
2. **No deterministic pre-processing**: No affirmation detectors, reference resolvers, or hardcoded rules. The LLM interprets user intent naturally from full conversation context.
|
||||
|
||||
3. **Agent selection via `model` field**: OpenWebUI sends `model` in the request. `_resolve_agent()` maps it to a registered agent. The `/v1/models` endpoint lists all agents as selectable models.
|
||||
|
||||
4. **Skills = prompts + tools**: Skills inject prompt fragments AND optionally expose OpenAI function-calling tools. Prompt-only skills (like `triage`) just shape behavior. Tool-enabled skills (like `seerr`) let the LLM take real actions.
|
||||
|
||||
5. **Singleton LLM client**: Created once in `main.py`, stored on `app.state.llm_client`, accessed via FastAPI `Depends(get_llm_client)`.
|
||||
Reference in New Issue
Block a user