small refactor of the structure

2026-05-25 12:16:24 +02:00
parent 51e099acdd
commit b0f10b6bb1
26 changed files with 37 additions and 37 deletions
@@ -0,0 +1,235 @@
+# API Architecture — Agent + Skill + Graph Pipeline
+
+This document explains how the API routes user messages through the
+agent / skill / LangGraph pipeline to produce responses.
+
+---
+
+## Overview
+
+```
+┌─────────────────────────────────────────────────────────────────┐
+│                      OpenWebUI / Client                         │
+│  POST /v1/chat/completions  { model, messages, stream }         │
+└──────────────────────────────┬──────────────────────────────────┘
+                               │
+                               ▼
+┌──────────────────────────────────────────────────────────────────┐
+│  api/v1/chat.py  —  chat_completions()                          │
+│                                                                  │
+│  1. _resolve_agent(req.model)  →  Agent                          │
+│  2. get_agent_graph(agent_id)  →  compiled StateGraph            │
+│  3. graph.ainvoke(state)  or  _stream_graph(graph, messages)     │
+└──────────────────────────────┬───────────────────────────────────┘
+                               │
+                               ▼
+┌──────────────────────────────────────────────────────────────────┐
+│  LangGraph StateGraph  (core/graph.py)                           │
+│                                                                  │
+│   ┌──────────────┐   tool_calls?    ┌──────────────┐            │
+│   │  agent_node  │ ───────────────▶ │  tool_node   │            │
+│   │  (LLM call)  │ ◀─────────────── │ (skill exec) │            │
+│   └──────┬───────┘                  └──────────────┘            │
+│          │ no tool_calls                                         │
+│          ▼                                                       │
+│        [END]                                                     │
+└──────────────────────────────────────────────────────────────────┘
+
+## Key Concepts
+
+### 1. Agent
+
+An **Agent** is a persona + skill bundle. Defined in `agents/`.
+
+```python
+# agents/media_agent.py
+Agent(
+    agent_id="media-agent",
+    description="Media assistant with Seerr integration",
+    skills=["media_info", "seerr", "triage"],
+    base_prompt="You are a media assistant...",
+)
+```
+
+- `agent_id` — unique name, exposed as a model in OpenWebUI
+- `skills` — list of skill names to load
+- `base_prompt` — starting system prompt, combined with skill fragments
+- `build_system_prompt()` — merges base_prompt + all skill prompt fragments
+
+Agents self-register at import time via `agents/__init__.py`'s `register()`.
+`main.py` calls `load_all_agents()` at startup to import every agent and skill
+module.
+
+### 2. Skill
+
+A **Skill** is a capability bundle. Defined in `skills/`.
+
+```python
+# skills/seerr.py
+Skill(
+    name="seerr",
+    description="Seerr integration — trending, discover, request media, submit issues",
+    prompt_fragment="## Seerr Media Tools\n...",
+    tools=[...],          # OpenAI function-calling schema
+    execute=_execute,     # async handler: tool_name + args → ToolResult
+)
+```
+
+- `prompt_fragment` — injected into the agent's system prompt.
+- `tools` — list of OpenAI function definitions (name, description, parameters).
+- `execute` — async callable that routes tool calls to API handlers.
+
+### 3. Graph
+
+Each agent gets a **compiled LangGraph StateGraph** built by
+`core/graph.py:create_agent_graph()`.  The graph is compiled lazily on the
+first request and cached on `app.state.agent_graphs` for the lifetime of the
+process.
+
+| Graph node / edge | What it does |
+|---|---|
+| `agent_node` | Converts state messages to OpenAI dicts, calls the LLM with the agent's system prompt + tool definitions, returns an `AIMessage` |
+| `tool_node` | Reads `tool_calls` from the last AI message, calls `execute_tool()` from the skill system, returns `ToolMessage` results |
+| `_should_continue` | Conditional edge — returns `"tool_node"` if the AI message has `tool_calls`, else `END` |
+
+### 4. State
+
+Defined in `core/state.py`:
+
+```python
+class AgentState(TypedDict):
+    messages: Annotated[list, add_messages]
+```
+
+LangGraph's `add_messages` reducer appends new messages and replaces messages
+with matching IDs (so tool-call results overwrite their placeholders).
+
+### 5. Message Conversion
+
+Because we use the raw `openai` client (not `langchain-openai`), messages must
+be converted between LangChain and OpenAI formats at every LLM call:
+
+- **LangChain → OpenAI** (`_lc_role_to_openai`, `_langchain_tc_to_openai`):
+  Maps `type` → `role` and converts top-level `name`/`args` tool-calls into
+  the nested `function` sub-object that the OpenAI API expects.
+
+- **OpenAI → LangChain** (inside `agent_node`):
+  Converts the `ChatCompletionMessage` response into an `AIMessage` with
+  LangChain-format `tool_calls` (top-level `name`/`args`/`id`).
+
+---
+
+## Full Request Flow
+
+### Step-by-step: "What are trending movies?"
+
+```
+1. OpenWebUI sends:
+   POST /v1/chat/completions
+   {
+     "model": "media-agent",
+     "messages": [
+       {"role": "user", "content": "What are trending movies?"}
+     ],
+     "stream": false
+   }
+
+2. chat_completions():
+   → _resolve_agent(model="media-agent")
+     → get_agent("media-agent") → Agent(skills=["media_info", "seerr", "triage"])
+   → get_agent_graph("media-agent", request)
+     → looks up app.state.agent_graphs["media-agent"]
+     → first call → create_agent_graph() compiles the graph with 7 Seerr tools
+   → run_agent_with_tools(request, messages, agent_id)
+     → _invoke_graph(graph, messages)
+
+3. Graph — Pass 1 (agent_node):
+   → LLM receives: [system prompt] + [user: "What are trending movies?"]
+   → LLM responds with tool_calls: seerr_trending(kind="movie")
+   → agent_node returns AIMessage with tool_calls in LangChain format
+
+4. Graph — _should_continue:
+   → AIMessage has tool_calls → route to "tool_node"
+
+5. Graph — tool_node:
+   → Reads tool_call: name="seerr_trending", args={"kind": "movie"}
+   → execute_tool(["media_info", "seerr", "triage"], "seerr_trending", ...)
+   → Seerr API → GET /api/v1/discover/trending?mediaType=movie
+   → Returns ToolMessage with formatted results including [tmdb:IDs]
+
+6. Graph — Pass 2 (agent_node):
+   → LLM receives previous exchange + tool result
+   → LLM responds with text only (no tool_calls)
+   → agent_node returns AIMessage(content="Here are the top trending movies!...")
+
+7. Graph — _should_continue:
+   → No tool_calls → route to END
+
+8. chat_completions() returns:
+   { "choices": [{"message": {"role": "assistant", "content": "Here are the top..."}}] }
+```
+
+### Step-by-step: "Request the 2026 one" (multi-turn context)
+
+```
+1. OpenWebUI sends the FULL history:
+   {
+     "model": "media-agent",
+     "messages": [
+       {"role": "user", "content": "What are trending movies?"},
+       {"role": "assistant", "content": "Here are the top 10 trending movies!
+        1. **Mortal Kombat II** (2026) [tmdb:931285] — ..."},
+       {"role": "user", "content": "could request the mortal kombat one?"},
+       {"role": "assistant", "content": "There are several Mortal Kombat entries! ..."},
+       {"role": "user", "content": "the 2026 one"}
+     ]
+   }
+
+2. chat_completions():
+   → req.messages contains the ENTIRE conversation history
+   → graph.ainvoke({"messages": all_messages})
+   → agent_node prepends system prompt and sends everything to the LLM
+
+3. LLM reasons from full context:
+   - Previously listed Mortal Kombat II (2026) with [tmdb:931285]
+   - The user said "request the mortal kombat one" → I searched and showed 4 options
+   - Now they say "the 2026 one" → that matches Mortal Kombat II (2026) [tmdb:931285]
+   - I should call seerr_request_media(kind="movie", title="Mortal Kombat II", tmdb_id=931285)
+
+4. tool_node executes the request → ✅ Success
+```
+
+---
+
+## Streaming
+
+Streaming works slightly differently from the sync path:
+
+```
+chat_completions(stream=True)
+  → _stream_graph(graph, messages)
+    → graph.ainvoke(state)        # runs graph to completion (tools execute silently)
+    → yields content character-by-character via SSE
+```
+
+For true token-level streaming (tokens appear as the LLM generates them),
+the agent_node would need to use `langchain-openai`'s `ChatOpenAI` instead of
+the raw `openai` client.  The current approach is a pragmatic middle ground
+that avoids adding another dependency while still giving the SSE client
+incremental output.
+
+---
+
+## File Map
+
+| File | Responsibility |
+|---|---|
+| `main.py` | FastAPI app, singleton creation, router mounting |
+| `api/v1/chat.py` | Endpoints — resolves agent, invokes graph, formats responses |
+| `api/dependencies.py` | `get_llm_client()`, `get_agent_graph()` — FastAPI `Depends` |
+| `core/graph.py` | `create_agent_graph()` — builds the StateGraph |
+| `core/state.py` | `AgentState` TypedDict |
+| `core/llm.py` | `create_client()` — OpenAI client factory |
+| `core/config.py` | Environment variable loader |
+| `agents/` | Agent definitions (dataclass + self-registration) |
+| `skills/` | Skill definitions (prompt fragments + tools + executors) |