# API Architecture — Agent + Skill + Graph Pipeline

This document explains how the API routes user messages through the
agent / skill / LangGraph pipeline to produce responses.

---

## Overview

```
┌─────────────────────────────────────────────────────────────────┐
│                      OpenWebUI / Client                         │
│  POST /v1/chat/completions  { model, messages, stream }         │
└──────────────────────────────┬──────────────────────────────────┘
                               │
                               ▼
┌──────────────────────────────────────────────────────────────────┐
│  api/v1/chat.py  —  chat_completions()                          │
│                                                                  │
│  1. _resolve_agent(req.model)  →  Agent                          │
│  2. get_agent_graph(agent_id)  →  compiled StateGraph            │
│  3. graph.ainvoke(state)  or  _stream_graph(graph, messages)     │
└──────────────────────────────┬───────────────────────────────────┘
                               │
                               ▼
┌──────────────────────────────────────────────────────────────────┐
│  LangGraph StateGraph  (core/graph.py)                           │
│                                                                  │
│   ┌──────────────┐   tool_calls?    ┌──────────────┐            │
│   │  agent_node  │ ───────────────▶ │  tool_node   │            │
│   │  (LLM call)  │ ◀─────────────── │ (skill exec) │            │
│   └──────┬───────┘                  └──────────────┘            │
│          │ no tool_calls                                         │
│          ▼                                                       │
│        [END]                                                     │
└──────────────────────────────────────────────────────────────────┘

## Key Concepts

### 1. Agent

An **Agent** is a persona + skill bundle. Defined in `agents/`.

```python
# agents/media_agent.py
Agent(
    agent_id="media-agent",
    description="Media assistant with Seerr integration",
    skills=["media_info", "seerr", "triage"],
    base_prompt="You are a media assistant...",
)
```

- `agent_id` — unique name, exposed as a model in OpenWebUI
- `skills` — list of skill names to load
- `base_prompt` — starting system prompt, combined with skill fragments
- `build_system_prompt()` — merges base_prompt + all skill prompt fragments

Agents self-register at import time via `agents/__init__.py`'s `register()`.
`main.py` calls `load_all_agents()` at startup to import every agent and skill
module.

### 2. Skill

A **Skill** is a capability bundle. Defined in `skills/`.

```python
# skills/seerr.py
Skill(
    name="seerr",
    description="Seerr integration — trending, discover, request media, submit issues",
    prompt_fragment="## Seerr Media Tools\n...",
    tools=[...],          # OpenAI function-calling schema
    execute=_execute,     # async handler: tool_name + args → ToolResult
)
```

- `prompt_fragment` — injected into the agent's system prompt.
- `tools` — list of OpenAI function definitions (name, description, parameters).
- `execute` — async callable that routes tool calls to API handlers.

### 3. Graph

Each agent gets a **compiled LangGraph StateGraph** built by
`core/graph.py:create_agent_graph()`.  The graph is compiled lazily on the
first request and cached on `app.state.agent_graphs` for the lifetime of the
process.

| Graph node / edge | What it does |
|---|---|
| `agent_node` | Converts state messages to OpenAI dicts, calls the LLM with the agent's system prompt + tool definitions, returns an `AIMessage` |
| `tool_node` | Reads `tool_calls` from the last AI message, calls `execute_tool()` from the skill system, returns `ToolMessage` results |
| `_should_continue` | Conditional edge — returns `"tool_node"` if the AI message has `tool_calls`, else `END` |

### 4. State

Defined in `core/state.py`:

```python
class AgentState(TypedDict):
    messages: Annotated[list, add_messages]
```

LangGraph's `add_messages` reducer appends new messages and replaces messages
with matching IDs (so tool-call results overwrite their placeholders).

### 5. Message Conversion

Because we use the raw `openai` client (not `langchain-openai`), messages must
be converted between LangChain and OpenAI formats at every LLM call:

- **LangChain → OpenAI** (`_lc_role_to_openai`, `_langchain_tc_to_openai`):
  Maps `type` → `role` and converts top-level `name`/`args` tool-calls into
  the nested `function` sub-object that the OpenAI API expects.

- **OpenAI → LangChain** (inside `agent_node`):
  Converts the `ChatCompletionMessage` response into an `AIMessage` with
  LangChain-format `tool_calls` (top-level `name`/`args`/`id`).

---

## Full Request Flow

### Step-by-step: "What are trending movies?"

```
1. OpenWebUI sends:
   POST /v1/chat/completions
   {
     "model": "media-agent",
     "messages": [
       {"role": "user", "content": "What are trending movies?"}
     ],
     "stream": false
   }

2. chat_completions():
   → _resolve_agent(model="media-agent")
     → get_agent("media-agent") → Agent(skills=["media_info", "seerr", "triage"])
   → get_agent_graph("media-agent", request)
     → looks up app.state.agent_graphs["media-agent"]
     → first call → create_agent_graph() compiles the graph with 7 Seerr tools
   → run_agent_with_tools(request, messages, agent_id)
     → _invoke_graph(graph, messages)

3. Graph — Pass 1 (agent_node):
   → LLM receives: [system prompt] + [user: "What are trending movies?"]
   → LLM responds with tool_calls: seerr_trending(kind="movie")
   → agent_node returns AIMessage with tool_calls in LangChain format

4. Graph — _should_continue:
   → AIMessage has tool_calls → route to "tool_node"

5. Graph — tool_node:
   → Reads tool_call: name="seerr_trending", args={"kind": "movie"}
   → execute_tool(["media_info", "seerr", "triage"], "seerr_trending", ...)
   → Seerr API → GET /api/v1/discover/trending?mediaType=movie
   → Returns ToolMessage with formatted results including [tmdb:IDs]

6. Graph — Pass 2 (agent_node):
   → LLM receives previous exchange + tool result
   → LLM responds with text only (no tool_calls)
   → agent_node returns AIMessage(content="Here are the top trending movies!...")

7. Graph — _should_continue:
   → No tool_calls → route to END

8. chat_completions() returns:
   { "choices": [{"message": {"role": "assistant", "content": "Here are the top..."}}] }
```

### Step-by-step: "Request the 2026 one" (multi-turn context)

```
1. OpenWebUI sends the FULL history:
   {
     "model": "media-agent",
     "messages": [
       {"role": "user", "content": "What are trending movies?"},
       {"role": "assistant", "content": "Here are the top 10 trending movies!
        1. **Mortal Kombat II** (2026) [tmdb:931285] — ..."},
       {"role": "user", "content": "could request the mortal kombat one?"},
       {"role": "assistant", "content": "There are several Mortal Kombat entries! ..."},
       {"role": "user", "content": "the 2026 one"}
     ]
   }

2. chat_completions():
   → req.messages contains the ENTIRE conversation history
   → graph.ainvoke({"messages": all_messages})
   → agent_node prepends system prompt and sends everything to the LLM

3. LLM reasons from full context:
   - Previously listed Mortal Kombat II (2026) with [tmdb:931285]
   - The user said "request the mortal kombat one" → I searched and showed 4 options
   - Now they say "the 2026 one" → that matches Mortal Kombat II (2026) [tmdb:931285]
   - I should call seerr_request_media(kind="movie", title="Mortal Kombat II", tmdb_id=931285)

4. tool_node executes the request → ✅ Success
```

---

## Streaming

Streaming works slightly differently from the sync path:

```
chat_completions(stream=True)
  → _stream_graph(graph, messages)
    → graph.ainvoke(state)        # runs graph to completion (tools execute silently)
    → yields content character-by-character via SSE
```

For true token-level streaming (tokens appear as the LLM generates them),
the agent_node would need to use `langchain-openai`'s `ChatOpenAI` instead of
the raw `openai` client.  The current approach is a pragmatic middle ground
that avoids adding another dependency while still giving the SSE client
incremental output.

---

## File Map

| File | Responsibility |
|---|---|
| `main.py` | FastAPI app, singleton creation, router mounting |
| `api/v1/chat.py` | Endpoints — resolves agent, invokes graph, formats responses |
| `api/dependencies.py` | `get_llm_client()`, `get_agent_graph()` — FastAPI `Depends` |
| `core/graph.py` | `create_agent_graph()` — builds the StateGraph |
| `core/state.py` | `AgentState` TypedDict |
| `core/llm.py` | `create_client()` — OpenAI client factory |
| `core/config.py` | Environment variable loader |
| `agents/` | Agent definitions (dataclass + self-registration) |
| `skills/` | Skill definitions (prompt fragments + tools + executors) |