236 lines
9.9 KiB
Markdown
236 lines
9.9 KiB
Markdown
# API Architecture — Agent + Skill + Graph Pipeline
|
|
|
|
This document explains how the API routes user messages through the
|
|
agent / skill / LangGraph pipeline to produce responses.
|
|
|
|
---
|
|
|
|
## Overview
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────────┐
|
|
│ OpenWebUI / Client │
|
|
│ POST /v1/chat/completions { model, messages, stream } │
|
|
└──────────────────────────────┬──────────────────────────────────┘
|
|
│
|
|
▼
|
|
┌──────────────────────────────────────────────────────────────────┐
|
|
│ api/v1/chat.py — chat_completions() │
|
|
│ │
|
|
│ 1. _resolve_agent(req.model) → Agent │
|
|
│ 2. get_agent_graph(agent_id) → compiled StateGraph │
|
|
│ 3. graph.ainvoke(state) or _stream_graph(graph, messages) │
|
|
└──────────────────────────────┬───────────────────────────────────┘
|
|
│
|
|
▼
|
|
┌──────────────────────────────────────────────────────────────────┐
|
|
│ LangGraph StateGraph (core/graph.py) │
|
|
│ │
|
|
│ ┌──────────────┐ tool_calls? ┌──────────────┐ │
|
|
│ │ agent_node │ ───────────────▶ │ tool_node │ │
|
|
│ │ (LLM call) │ ◀─────────────── │ (skill exec) │ │
|
|
│ └──────┬───────┘ └──────────────┘ │
|
|
│ │ no tool_calls │
|
|
│ ▼ │
|
|
│ [END] │
|
|
└──────────────────────────────────────────────────────────────────┘
|
|
|
|
## Key Concepts
|
|
|
|
### 1. Agent
|
|
|
|
An **Agent** is a persona + skill bundle. Defined in `agents/`.
|
|
|
|
```python
|
|
# agents/media_agent.py
|
|
Agent(
|
|
agent_id="media-agent",
|
|
description="Media assistant with Seerr integration",
|
|
skills=["media_info", "seerr", "triage"],
|
|
base_prompt="You are a media assistant...",
|
|
)
|
|
```
|
|
|
|
- `agent_id` — unique name, exposed as a model in OpenWebUI
|
|
- `skills` — list of skill names to load
|
|
- `base_prompt` — starting system prompt, combined with skill fragments
|
|
- `build_system_prompt()` — merges base_prompt + all skill prompt fragments
|
|
|
|
Agents self-register at import time via `agents/__init__.py`'s `register()`.
|
|
`main.py` calls `load_all_agents()` at startup to import every agent and skill
|
|
module.
|
|
|
|
### 2. Skill
|
|
|
|
A **Skill** is a capability bundle. Defined in `skills/`.
|
|
|
|
```python
|
|
# skills/seerr.py
|
|
Skill(
|
|
name="seerr",
|
|
description="Seerr integration — trending, discover, request media, submit issues",
|
|
prompt_fragment="## Seerr Media Tools\n...",
|
|
tools=[...], # OpenAI function-calling schema
|
|
execute=_execute, # async handler: tool_name + args → ToolResult
|
|
)
|
|
```
|
|
|
|
- `prompt_fragment` — injected into the agent's system prompt.
|
|
- `tools` — list of OpenAI function definitions (name, description, parameters).
|
|
- `execute` — async callable that routes tool calls to API handlers.
|
|
|
|
### 3. Graph
|
|
|
|
Each agent gets a **compiled LangGraph StateGraph** built by
|
|
`core/graph.py:create_agent_graph()`. The graph is compiled lazily on the
|
|
first request and cached on `app.state.agent_graphs` for the lifetime of the
|
|
process.
|
|
|
|
| Graph node / edge | What it does |
|
|
|---|---|
|
|
| `agent_node` | Converts state messages to OpenAI dicts, calls the LLM with the agent's system prompt + tool definitions, returns an `AIMessage` |
|
|
| `tool_node` | Reads `tool_calls` from the last AI message, calls `execute_tool()` from the skill system, returns `ToolMessage` results |
|
|
| `_should_continue` | Conditional edge — returns `"tool_node"` if the AI message has `tool_calls`, else `END` |
|
|
|
|
### 4. State
|
|
|
|
Defined in `core/state.py`:
|
|
|
|
```python
|
|
class AgentState(TypedDict):
|
|
messages: Annotated[list, add_messages]
|
|
```
|
|
|
|
LangGraph's `add_messages` reducer appends new messages and replaces messages
|
|
with matching IDs (so tool-call results overwrite their placeholders).
|
|
|
|
### 5. Message Conversion
|
|
|
|
Because we use the raw `openai` client (not `langchain-openai`), messages must
|
|
be converted between LangChain and OpenAI formats at every LLM call:
|
|
|
|
- **LangChain → OpenAI** (`_lc_role_to_openai`, `_langchain_tc_to_openai`):
|
|
Maps `type` → `role` and converts top-level `name`/`args` tool-calls into
|
|
the nested `function` sub-object that the OpenAI API expects.
|
|
|
|
- **OpenAI → LangChain** (inside `agent_node`):
|
|
Converts the `ChatCompletionMessage` response into an `AIMessage` with
|
|
LangChain-format `tool_calls` (top-level `name`/`args`/`id`).
|
|
|
|
---
|
|
|
|
## Full Request Flow
|
|
|
|
### Step-by-step: "What are trending movies?"
|
|
|
|
```
|
|
1. OpenWebUI sends:
|
|
POST /v1/chat/completions
|
|
{
|
|
"model": "media-agent",
|
|
"messages": [
|
|
{"role": "user", "content": "What are trending movies?"}
|
|
],
|
|
"stream": false
|
|
}
|
|
|
|
2. chat_completions():
|
|
→ _resolve_agent(model="media-agent")
|
|
→ get_agent("media-agent") → Agent(skills=["media_info", "seerr", "triage"])
|
|
→ get_agent_graph("media-agent", request)
|
|
→ looks up app.state.agent_graphs["media-agent"]
|
|
→ first call → create_agent_graph() compiles the graph with 7 Seerr tools
|
|
→ run_agent_with_tools(request, messages, agent_id)
|
|
→ _invoke_graph(graph, messages)
|
|
|
|
3. Graph — Pass 1 (agent_node):
|
|
→ LLM receives: [system prompt] + [user: "What are trending movies?"]
|
|
→ LLM responds with tool_calls: seerr_trending(kind="movie")
|
|
→ agent_node returns AIMessage with tool_calls in LangChain format
|
|
|
|
4. Graph — _should_continue:
|
|
→ AIMessage has tool_calls → route to "tool_node"
|
|
|
|
5. Graph — tool_node:
|
|
→ Reads tool_call: name="seerr_trending", args={"kind": "movie"}
|
|
→ execute_tool(["media_info", "seerr", "triage"], "seerr_trending", ...)
|
|
→ Seerr API → GET /api/v1/discover/trending?mediaType=movie
|
|
→ Returns ToolMessage with formatted results including [tmdb:IDs]
|
|
|
|
6. Graph — Pass 2 (agent_node):
|
|
→ LLM receives previous exchange + tool result
|
|
→ LLM responds with text only (no tool_calls)
|
|
→ agent_node returns AIMessage(content="Here are the top trending movies!...")
|
|
|
|
7. Graph — _should_continue:
|
|
→ No tool_calls → route to END
|
|
|
|
8. chat_completions() returns:
|
|
{ "choices": [{"message": {"role": "assistant", "content": "Here are the top..."}}] }
|
|
```
|
|
|
|
### Step-by-step: "Request the 2026 one" (multi-turn context)
|
|
|
|
```
|
|
1. OpenWebUI sends the FULL history:
|
|
{
|
|
"model": "media-agent",
|
|
"messages": [
|
|
{"role": "user", "content": "What are trending movies?"},
|
|
{"role": "assistant", "content": "Here are the top 10 trending movies!
|
|
1. **Mortal Kombat II** (2026) [tmdb:931285] — ..."},
|
|
{"role": "user", "content": "could request the mortal kombat one?"},
|
|
{"role": "assistant", "content": "There are several Mortal Kombat entries! ..."},
|
|
{"role": "user", "content": "the 2026 one"}
|
|
]
|
|
}
|
|
|
|
2. chat_completions():
|
|
→ req.messages contains the ENTIRE conversation history
|
|
→ graph.ainvoke({"messages": all_messages})
|
|
→ agent_node prepends system prompt and sends everything to the LLM
|
|
|
|
3. LLM reasons from full context:
|
|
- Previously listed Mortal Kombat II (2026) with [tmdb:931285]
|
|
- The user said "request the mortal kombat one" → I searched and showed 4 options
|
|
- Now they say "the 2026 one" → that matches Mortal Kombat II (2026) [tmdb:931285]
|
|
- I should call seerr_request_media(kind="movie", title="Mortal Kombat II", tmdb_id=931285)
|
|
|
|
4. tool_node executes the request → ✅ Success
|
|
```
|
|
|
|
---
|
|
|
|
## Streaming
|
|
|
|
Streaming works slightly differently from the sync path:
|
|
|
|
```
|
|
chat_completions(stream=True)
|
|
→ _stream_graph(graph, messages)
|
|
→ graph.ainvoke(state) # runs graph to completion (tools execute silently)
|
|
→ yields content character-by-character via SSE
|
|
```
|
|
|
|
For true token-level streaming (tokens appear as the LLM generates them),
|
|
the agent_node would need to use `langchain-openai`'s `ChatOpenAI` instead of
|
|
the raw `openai` client. The current approach is a pragmatic middle ground
|
|
that avoids adding another dependency while still giving the SSE client
|
|
incremental output.
|
|
|
|
---
|
|
|
|
## File Map
|
|
|
|
| File | Responsibility |
|
|
|---|---|
|
|
| `main.py` | FastAPI app, singleton creation, router mounting |
|
|
| `api/v1/chat.py` | Endpoints — resolves agent, invokes graph, formats responses |
|
|
| `api/dependencies.py` | `get_llm_client()`, `get_agent_graph()` — FastAPI `Depends` |
|
|
| `core/graph.py` | `create_agent_graph()` — builds the StateGraph |
|
|
| `core/state.py` | `AgentState` TypedDict |
|
|
| `core/llm.py` | `create_client()` — OpenAI client factory |
|
|
| `core/config.py` | Environment variable loader |
|
|
| `agents/` | Agent definitions (dataclass + self-registration) |
|
|
| `skills/` | Skill definitions (prompt fragments + tools + executors) |
|