added quick connect auth from jellyfin, still needs to have some more cleaning before push to prod #2
+58
-218
@@ -1,235 +1,75 @@
|
||||
# API Architecture — Agent + Skill + Graph Pipeline
|
||||
# Gateway Architecture — Agent + Skill + Graph Pipeline
|
||||
|
||||
This document explains how the API routes user messages through the
|
||||
agent / skill / LangGraph pipeline to produce responses.
|
||||
This is the **interface layer** of the Agents project. Everything that connects
|
||||
the outside world to the agent system lives here — REST APIs, Discord bot,
|
||||
and authentication.
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
## Directory Map
|
||||
|
||||
| Path | Description | Docs |
|
||||
|---|---|---|
|
||||
| `gateway/v1/` | REST API endpoints — chat, agent listing, OpenAI-compatible completions | [v1.md](v1/v1.md) |
|
||||
| `gateway/discord/` | Discord bot connector — in-process DM handler with LangGraph integration | [discord.md](discord/discord.md) |
|
||||
| `gateway/auth/` | Auth service registry + Jellyfin Quick Connect implementation | [auth.md](auth/auth.md) |
|
||||
|
||||
---
|
||||
|
||||
## Supporting Modules
|
||||
|
||||
| Path | Purpose |
|
||||
|---|---|
|
||||
| `gateway/dependencies.py` | FastAPI `Depends` providers — `get_llm_client()`, `get_agent_graph()` |
|
||||
| `src/config.py` | `.env` loader and config accessor |
|
||||
| `src/llm.py` | OpenAI-compatible client factory (DeepSeek) |
|
||||
| `src/state.py` | LangGraph `AgentState` TypedDict |
|
||||
| `src/graph.py` | LangGraph StateGraph factory — agent_node, tool_node, routing |
|
||||
| `src/tools_adapter.py` | Wraps skill tools as LangChain `@tool` functions |
|
||||
| `src/auth_store.py` | SQLite persistence for Discord → service auth linking |
|
||||
| `agents/` | Agent definitions (dataclass + registry) |
|
||||
| `agents/skills/` | Skill definitions — prompt fragments, tool schemas, executors |
|
||||
|
||||
---
|
||||
|
||||
## High-Level Request Flow
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ OpenWebUI / Client │
|
||||
│ POST /v1/chat/completions { model, messages, stream } │
|
||||
└──────────────────────────────┬──────────────────────────────────┘
|
||||
┌──────────────────────────────┐
|
||||
│ Client (OpenWebUI / HTTP) │
|
||||
└──────────────┬───────────────┘
|
||||
│ POST /v1/chat/completions
|
||||
▼
|
||||
┌──────────────────────────────┐
|
||||
│ gateway/v1/chat.py │ ← resolves agent, invokes graph
|
||||
└──────────────┬───────────────┘
|
||||
│
|
||||
▼
|
||||
┌──────────────────────────────────────────────────────────────────┐
|
||||
│ api/v1/chat.py — chat_completions() │
|
||||
│ │
|
||||
│ 1. _resolve_agent(req.model) → Agent │
|
||||
│ 2. get_agent_graph(agent_id) → compiled StateGraph │
|
||||
│ 3. graph.ainvoke(state) or _stream_graph(graph, messages) │
|
||||
└──────────────────────────────┬───────────────────────────────────┘
|
||||
┌──────────────────────────────┐
|
||||
│ LangGraph StateGraph │ ← src/graph.py
|
||||
│ ┌──────────┐ ┌──────────┐│
|
||||
│ │agent_node│──▶│tool_node ││
|
||||
│ │(LLM call)│◀──│(skills) ││
|
||||
│ └──────────┘ └──────────┘│
|
||||
└──────────────┬───────────────┘
|
||||
│
|
||||
▼
|
||||
┌──────────────────────────────────────────────────────────────────┐
|
||||
│ LangGraph StateGraph (core/graph.py) │
|
||||
│ │
|
||||
│ ┌──────────────┐ tool_calls? ┌──────────────┐ │
|
||||
│ │ agent_node │ ───────────────▶ │ tool_node │ │
|
||||
│ │ (LLM call) │ ◀─────────────── │ (skill exec) │ │
|
||||
│ └──────┬───────┘ └──────────────┘ │
|
||||
│ │ no tool_calls │
|
||||
│ ▼ │
|
||||
│ [END] │
|
||||
└──────────────────────────────────────────────────────────────────┘
|
||||
|
||||
## Key Concepts
|
||||
|
||||
### 1. Agent
|
||||
|
||||
An **Agent** is a persona + skill bundle. Defined in `agents/`.
|
||||
|
||||
```python
|
||||
# agents/media_agent.py
|
||||
Agent(
|
||||
agent_id="media-agent",
|
||||
description="Media assistant with Seerr integration",
|
||||
skills=["media_info", "seerr", "triage"],
|
||||
base_prompt="You are a media assistant...",
|
||||
)
|
||||
┌──────────────────────────────┐
|
||||
│ agents/skills/ │ ← Seerr API, Jellyfin API, etc.
|
||||
└──────────────────────────────┘
|
||||
```
|
||||
|
||||
- `agent_id` — unique name, exposed as a model in OpenWebUI
|
||||
- `skills` — list of skill names to load
|
||||
- `base_prompt` — starting system prompt, combined with skill fragments
|
||||
- `build_system_prompt()` — merges base_prompt + all skill prompt fragments
|
||||
|
||||
Agents self-register at import time via `agents/__init__.py`'s `register()`.
|
||||
`main.py` calls `load_all_agents()` at startup to import every agent and skill
|
||||
module.
|
||||
|
||||
### 2. Skill
|
||||
|
||||
A **Skill** is a capability bundle. Defined in `skills/`.
|
||||
|
||||
```python
|
||||
# skills/seerr.py
|
||||
Skill(
|
||||
name="seerr",
|
||||
description="Seerr integration — trending, discover, request media, submit issues",
|
||||
prompt_fragment="## Seerr Media Tools\n...",
|
||||
tools=[...], # OpenAI function-calling schema
|
||||
execute=_execute, # async handler: tool_name + args → ToolResult
|
||||
)
|
||||
```
|
||||
|
||||
- `prompt_fragment` — injected into the agent's system prompt.
|
||||
- `tools` — list of OpenAI function definitions (name, description, parameters).
|
||||
- `execute` — async callable that routes tool calls to API handlers.
|
||||
|
||||
### 3. Graph
|
||||
|
||||
Each agent gets a **compiled LangGraph StateGraph** built by
|
||||
`core/graph.py:create_agent_graph()`. The graph is compiled lazily on the
|
||||
first request and cached on `app.state.agent_graphs` for the lifetime of the
|
||||
process.
|
||||
|
||||
| Graph node / edge | What it does |
|
||||
|---|---|
|
||||
| `agent_node` | Converts state messages to OpenAI dicts, calls the LLM with the agent's system prompt + tool definitions, returns an `AIMessage` |
|
||||
| `tool_node` | Reads `tool_calls` from the last AI message, calls `execute_tool()` from the skill system, returns `ToolMessage` results |
|
||||
| `_should_continue` | Conditional edge — returns `"tool_node"` if the AI message has `tool_calls`, else `END` |
|
||||
|
||||
### 4. State
|
||||
|
||||
Defined in `core/state.py`:
|
||||
|
||||
```python
|
||||
class AgentState(TypedDict):
|
||||
messages: Annotated[list, add_messages]
|
||||
```
|
||||
|
||||
LangGraph's `add_messages` reducer appends new messages and replaces messages
|
||||
with matching IDs (so tool-call results overwrite their placeholders).
|
||||
|
||||
### 5. Message Conversion
|
||||
|
||||
Because we use the raw `openai` client (not `langchain-openai`), messages must
|
||||
be converted between LangChain and OpenAI formats at every LLM call:
|
||||
|
||||
- **LangChain → OpenAI** (`_lc_role_to_openai`, `_langchain_tc_to_openai`):
|
||||
Maps `type` → `role` and converts top-level `name`/`args` tool-calls into
|
||||
the nested `function` sub-object that the OpenAI API expects.
|
||||
|
||||
- **OpenAI → LangChain** (inside `agent_node`):
|
||||
Converts the `ChatCompletionMessage` response into an `AIMessage` with
|
||||
LangChain-format `tool_calls` (top-level `name`/`args`/`id`).
|
||||
For a detailed step-by-step walkthrough of the graph execution (including
|
||||
multi-turn context and tool-calling loops), see [v1.md](v1/v1.md).
|
||||
|
||||
---
|
||||
|
||||
## Full Request Flow
|
||||
## Startup
|
||||
|
||||
### Step-by-step: "What are trending movies?"
|
||||
`main.py` is the entry point. At startup it:
|
||||
|
||||
```
|
||||
1. OpenWebUI sends:
|
||||
POST /v1/chat/completions
|
||||
{
|
||||
"model": "media-agent",
|
||||
"messages": [
|
||||
{"role": "user", "content": "What are trending movies?"}
|
||||
],
|
||||
"stream": false
|
||||
}
|
||||
|
||||
2. chat_completions():
|
||||
→ _resolve_agent(model="media-agent")
|
||||
→ get_agent("media-agent") → Agent(skills=["media_info", "seerr", "triage"])
|
||||
→ get_agent_graph("media-agent", request)
|
||||
→ looks up app.state.agent_graphs["media-agent"]
|
||||
→ first call → create_agent_graph() compiles the graph with 7 Seerr tools
|
||||
→ run_agent_with_tools(request, messages, agent_id)
|
||||
→ _invoke_graph(graph, messages)
|
||||
|
||||
3. Graph — Pass 1 (agent_node):
|
||||
→ LLM receives: [system prompt] + [user: "What are trending movies?"]
|
||||
→ LLM responds with tool_calls: seerr_trending(kind="movie")
|
||||
→ agent_node returns AIMessage with tool_calls in LangChain format
|
||||
|
||||
4. Graph — _should_continue:
|
||||
→ AIMessage has tool_calls → route to "tool_node"
|
||||
|
||||
5. Graph — tool_node:
|
||||
→ Reads tool_call: name="seerr_trending", args={"kind": "movie"}
|
||||
→ execute_tool(["media_info", "seerr", "triage"], "seerr_trending", ...)
|
||||
→ Seerr API → GET /api/v1/discover/trending?mediaType=movie
|
||||
→ Returns ToolMessage with formatted results including [tmdb:IDs]
|
||||
|
||||
6. Graph — Pass 2 (agent_node):
|
||||
→ LLM receives previous exchange + tool result
|
||||
→ LLM responds with text only (no tool_calls)
|
||||
→ agent_node returns AIMessage(content="Here are the top trending movies!...")
|
||||
|
||||
7. Graph — _should_continue:
|
||||
→ No tool_calls → route to END
|
||||
|
||||
8. chat_completions() returns:
|
||||
{ "choices": [{"message": {"role": "assistant", "content": "Here are the top..."}}] }
|
||||
```
|
||||
|
||||
### Step-by-step: "Request the 2026 one" (multi-turn context)
|
||||
|
||||
```
|
||||
1. OpenWebUI sends the FULL history:
|
||||
{
|
||||
"model": "media-agent",
|
||||
"messages": [
|
||||
{"role": "user", "content": "What are trending movies?"},
|
||||
{"role": "assistant", "content": "Here are the top 10 trending movies!
|
||||
1. **Mortal Kombat II** (2026) [tmdb:931285] — ..."},
|
||||
{"role": "user", "content": "could request the mortal kombat one?"},
|
||||
{"role": "assistant", "content": "There are several Mortal Kombat entries! ..."},
|
||||
{"role": "user", "content": "the 2026 one"}
|
||||
]
|
||||
}
|
||||
|
||||
2. chat_completions():
|
||||
→ req.messages contains the ENTIRE conversation history
|
||||
→ graph.ainvoke({"messages": all_messages})
|
||||
→ agent_node prepends system prompt and sends everything to the LLM
|
||||
|
||||
3. LLM reasons from full context:
|
||||
- Previously listed Mortal Kombat II (2026) with [tmdb:931285]
|
||||
- The user said "request the mortal kombat one" → I searched and showed 4 options
|
||||
- Now they say "the 2026 one" → that matches Mortal Kombat II (2026) [tmdb:931285]
|
||||
- I should call seerr_request_media(kind="movie", title="Mortal Kombat II", tmdb_id=931285)
|
||||
|
||||
4. tool_node executes the request → ✅ Success
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Streaming
|
||||
|
||||
Streaming works slightly differently from the sync path:
|
||||
|
||||
```
|
||||
chat_completions(stream=True)
|
||||
→ _stream_graph(graph, messages)
|
||||
→ graph.ainvoke(state) # runs graph to completion (tools execute silently)
|
||||
→ yields content character-by-character via SSE
|
||||
```
|
||||
|
||||
For true token-level streaming (tokens appear as the LLM generates them),
|
||||
the agent_node would need to use `langchain-openai`'s `ChatOpenAI` instead of
|
||||
the raw `openai` client. The current approach is a pragmatic middle ground
|
||||
that avoids adding another dependency while still giving the SSE client
|
||||
incremental output.
|
||||
|
||||
---
|
||||
|
||||
## File Map
|
||||
|
||||
| File | Responsibility |
|
||||
|---|---|
|
||||
| `main.py` | FastAPI app, singleton creation, router mounting |
|
||||
| `api/v1/chat.py` | Endpoints — resolves agent, invokes graph, formats responses |
|
||||
| `api/dependencies.py` | `get_llm_client()`, `get_agent_graph()` — FastAPI `Depends` |
|
||||
| `core/graph.py` | `create_agent_graph()` — builds the StateGraph |
|
||||
| `core/state.py` | `AgentState` TypedDict |
|
||||
| `core/llm.py` | `create_client()` — OpenAI client factory |
|
||||
| `core/config.py` | Environment variable loader |
|
||||
| `agents/` | Agent definitions (dataclass + self-registration) |
|
||||
| `skills/` | Skill definitions (prompt fragments + tools + executors) |
|
||||
1. Loads `.env` → creates the LLM client (DeepSeek) → stores on `app.state.llm_client`
|
||||
2. Calls `load_all_agents()` → imports every agent and skill module (they self-register)
|
||||
3. Imports `gateway.auth.jellyfin` → self-registers the Jellyfin auth service
|
||||
4. Mounts routers: `/v1/*` (chat endpoints) and `/api/v1/auth/*` (auth endpoints)
|
||||
5. Starts the Discord bot as a background asyncio task (lifespan)
|
||||
|
||||
@@ -0,0 +1,152 @@
|
||||
# Auth — Service Registry & Persistence
|
||||
|
||||
The authentication system lets Discord users link their accounts to external
|
||||
services (currently **Jellyfin**) so the agent can perform actions on their
|
||||
behalf (e.g. checking watch history).
|
||||
|
||||
---
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
gateway/auth/ gateway/v1/auth.py
|
||||
┌──────────────────────┐ ┌──────────────────────────────┐
|
||||
│ AuthService (ABC) │ │ GET /api/v1/auth/login │
|
||||
│ ├─ JellyfinAuth │◀─────────│ POST /api/v1/auth/login │
|
||||
│ └─ (Plex, Seerr…) │ │ GET /api/v1/auth/status │
|
||||
│ │ │ GET /api/v1/auth/reset │
|
||||
└─────────┬────────────┘ └──────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
src/auth_store.py
|
||||
┌──────────────────────┐
|
||||
│ SQLite │
|
||||
│ ├─ link_tokens │ one-time tokens sent via Discord DM
|
||||
│ └─ user_auth │ per-user, per-service credentials
|
||||
└──────────────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Files
|
||||
|
||||
| File | Purpose |
|
||||
|---|---|
|
||||
| `gateway/auth/__init__.py` | Abstract `AuthService` base class + global registry |
|
||||
| `gateway/auth/jellyfin.py` | Jellyfin implementation — Quick Connect + username/password |
|
||||
| `gateway/v1/auth.py` | REST endpoints for the web-based login flow |
|
||||
| `src/auth_store.py` | SQLite persistence for link tokens and stored credentials |
|
||||
|
||||
---
|
||||
|
||||
## Flow: Discord User Links Jellyfin
|
||||
|
||||
```
|
||||
Discord DM Web Browser Jellyfin Server
|
||||
│ │ │
|
||||
│ 1. /login jellyfin │ │
|
||||
│ ──────────────────────────────▶│ │
|
||||
│ Bot creates link token in │ │
|
||||
│ SQLite, DMs the user a URL │ │
|
||||
│ │ │
|
||||
│ 2. User clicks link │ │
|
||||
│ ◀─────────────────────────────▶│ │
|
||||
│ │ GET /api/v1/auth/login │
|
||||
│ │ ?service=jellyfin │
|
||||
│ │ &token=xxx&discord_id=123 │
|
||||
│ │ │
|
||||
│ │ 3. Serve Quick Connect form │
|
||||
│ │ ◀──────────────────────────── │
|
||||
│ │ │
|
||||
│ │ 4. Initiate Quick Connect │
|
||||
│ │ ─────────────────────────────▶│
|
||||
│ │ POST /QuickConnect/Initiate │
|
||||
│ │ ◀── { Code: "ABC123" } │
|
||||
│ │ │
|
||||
│ 5. User enters code in │ │
|
||||
│ Jellyfin app │ │
|
||||
│ │ │
|
||||
│ │ 6. Poll: is it authorized? │
|
||||
│ │ ─────────────────────────────▶│
|
||||
│ │ GET /QuickConnect/Connect │
|
||||
│ │ ◀── Authenticated + Token │
|
||||
│ │ │
|
||||
│ 7. auth_store saves: │ │
|
||||
│ (discord_id, jellyfin, │ │
|
||||
│ AccessToken, username) │ │
|
||||
│ │ │
|
||||
│ 8. "✅ Linked to Jellyfin!" │ │
|
||||
│ ◀───────────────────────────── │ │
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## AuthService Base Class
|
||||
|
||||
```python
|
||||
class AuthService(ABC):
|
||||
name: str # "jellyfin"
|
||||
display_name: str # "Jellyfin"
|
||||
|
||||
def render_login_form(token, discord_id) -> str: ...
|
||||
async def authenticate(form_data) -> AuthResult: ...
|
||||
```
|
||||
|
||||
Add a new service (e.g. Plex, Seerr) by subclassing `AuthService`, dropping
|
||||
the module in `gateway/auth/`, and calling `register_auth_service()` at import
|
||||
time. The REST endpoints and auth store work generically — no changes needed.
|
||||
|
||||
---
|
||||
|
||||
## Current Implementation: Jellyfin
|
||||
|
||||
`gateway/auth/jellyfin.py` supports two flows:
|
||||
|
||||
| Method | How it works |
|
||||
|---|---|
|
||||
| **Quick Connect** (primary) | Calls Jellyfin's `/QuickConnect/Initiate` → polls `/QuickConnect/Connect` → stores the `AccessToken` |
|
||||
| **Username/Password** (fallback) | Renders an HTML form → user submits credentials → calls `/Users/AuthenticateByName` → stores the `AccessToken` |
|
||||
|
||||
The stored credentials include:
|
||||
- `external_user_id` — Jellyfin user ID
|
||||
- `external_name` — Jellyfin username
|
||||
- `credentials` dict — `{"AccessToken": "...", "ServerURL": "..."}`
|
||||
|
||||
---
|
||||
|
||||
## Auth Store (SQLite)
|
||||
|
||||
Two tables in `data/auth.db`:
|
||||
|
||||
```sql
|
||||
-- One-time tokens for the web login flow (expire after 10 min)
|
||||
CREATE TABLE link_tokens (
|
||||
token TEXT PRIMARY KEY,
|
||||
discord_id INTEGER NOT NULL,
|
||||
service TEXT NOT NULL,
|
||||
created_at TEXT NOT NULL,
|
||||
used INTEGER DEFAULT 0
|
||||
);
|
||||
|
||||
-- Per-user, per-service stored credentials
|
||||
CREATE TABLE user_auth (
|
||||
discord_id INTEGER NOT NULL,
|
||||
service TEXT NOT NULL,
|
||||
external_user_id TEXT,
|
||||
external_name TEXT,
|
||||
credentials TEXT, -- JSON
|
||||
created_at TEXT NOT NULL,
|
||||
PRIMARY KEY (discord_id, service)
|
||||
);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Skill-Level Auth Gating
|
||||
|
||||
Skills can declare `requires_auth=["jellyfin"]`. When a tool is executed,
|
||||
the skill system checks the auth store. If the user isn't linked:
|
||||
|
||||
1. The tool returns `ToolResult.fail("Please login first using /login jellyfin")`
|
||||
2. The LLM relays this message to the user in Discord
|
||||
3. The user types `/login jellyfin` → Quick Connect flow → re-linked → try again
|
||||
@@ -0,0 +1,73 @@
|
||||
# Discord — Connector
|
||||
|
||||
The Discord module embeds a Discord bot **in-process** alongside FastAPI.
|
||||
It uses the same LangGraph graphs and LLM client as the REST API — there is
|
||||
no HTTP loopback, no separate process, and no code duplication.
|
||||
|
||||
---
|
||||
|
||||
## Files
|
||||
|
||||
| File | Purpose |
|
||||
|---|---|
|
||||
| `bot.py` | Discord `Client` subclass (`AgentBot`) — DM handler, command parser, graph invoker, Quick Connect orchestrator |
|
||||
| `conversation.py` | In-memory conversation history store, keyed by Discord user ID |
|
||||
|
||||
---
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
Discord Gateway (websocket)
|
||||
│ DM: "What's trending?"
|
||||
▼
|
||||
discord.Client.on_message()
|
||||
│ 1. Check: is this a DM? shares a guild? not a command?
|
||||
│ 2. Build message history from ConversationStore
|
||||
│ 3. Append user message
|
||||
▼
|
||||
_create_agent_graph(agent_id="media-agent")
|
||||
│ Uses the exact same create_agent_graph() from src/graph.py
|
||||
│ as the REST API — same LLM client, same tools, same cache.
|
||||
▼
|
||||
graph.ainvoke({"messages": [...]})
|
||||
│ LangGraph runs agent_node → tool_node → agent_node → END
|
||||
▼
|
||||
Response text → split into ≤2000-char Discord messages → sent to user
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Commands
|
||||
|
||||
Commands are DMs that start with `/`. The bot parses them before hitting the
|
||||
LLM:
|
||||
|
||||
| Command | Action |
|
||||
|---|---|
|
||||
| `/login <service>` | Generate a one-time auth link, DM it to the user |
|
||||
| `/jellyfin login` | Alias for `/login jellyfin` |
|
||||
| `/help` | Show available agents and commands |
|
||||
| `/<agent_id>` | Switch to a different agent for future messages |
|
||||
|
||||
---
|
||||
|
||||
## Auth Flow (Quick Connect)
|
||||
|
||||
When a user types `/login jellyfin`:
|
||||
|
||||
1. Bot generates a one-time token via `auth_store`
|
||||
2. Bot calls `auth_store.create_link_token(discord_id, "jellyfin")`
|
||||
3. Bot DMs the user: `https://<BASE_URL>/api/v1/auth/login?service=jellyfin&token=...&discord_id=...`
|
||||
4. User clicks the link → FastAPI serves the Jellyfin login form (or Quick Connect prompt)
|
||||
5. User authenticates → credentials stored in `auth_store`
|
||||
6. Future tool calls (e.g. `watch_history`) automatically use the stored Jellyfin session
|
||||
|
||||
---
|
||||
|
||||
## Conversation Persistence
|
||||
|
||||
- Per-user history stored in `ConversationStore` (in-memory dict)
|
||||
- Max history length configurable via `DISCORD_MAX_HISTORY` env var (default: 7)
|
||||
- Oldest messages are silently dropped when the limit is exceeded
|
||||
- History is NOT persisted across restarts (future: could use SQLite)
|
||||
@@ -0,0 +1,106 @@
|
||||
# V1 — Chat & Agent API Endpoints
|
||||
|
||||
This is the primary HTTP API surface for the chatbot agent system. It exposes
|
||||
both a custom streaming chat endpoint and an OpenAI-compatible
|
||||
`/chat/completions` endpoint so it works as a drop-in backend for OpenWebUI,
|
||||
LibreChat, or any OpenAI-compatible client.
|
||||
|
||||
---
|
||||
|
||||
## Endpoints
|
||||
|
||||
| Method | Path | Description |
|
||||
|---|---|---|
|
||||
| `GET ` | `/v1/` | Health check — returns `{"status": "ok"}` |
|
||||
| `GET ` | `/v1/agents` | List all registered agents (id + description) |
|
||||
| `GET ` | `/v1/models` | OpenAI-compatible model list (one entry per agent) |
|
||||
| `POST` | `/v1/chat` | Chat with an agent — streaming (SSE) |
|
||||
| `POST` | `/v1/chat/sync` | Chat with an agent — non-streaming |
|
||||
| `POST` | `/v1/chat/completions` | OpenAI-compatible chat completions (supports `stream: true`) |
|
||||
|
||||
All `/v1/*` endpoints are mounted by `main.py` via:
|
||||
|
||||
```python
|
||||
app.include_router(v1_router, prefix="/v1")
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Agent Resolution
|
||||
|
||||
Each request can target a specific agent. The resolution order is:
|
||||
|
||||
1. **Explicit `agent_id`** field in the request body
|
||||
2. **OpenAI `model` field** (OpenWebUI sends this — mapped to `agent_id` if a matching agent is registered)
|
||||
3. **Fallback** to the `"naked"` agent (a plain LLM with no tools)
|
||||
|
||||
This means an OpenWebUI client can simply set `model: "media-agent"` and get
|
||||
the full Media Agent with Seerr tools.
|
||||
|
||||
---
|
||||
|
||||
## Request Flow
|
||||
|
||||
```
|
||||
Client (OpenWebUI / HTTP)
|
||||
│ POST /v1/chat/completions
|
||||
│ { model: "media-agent", messages: [...], stream: true/false }
|
||||
▼
|
||||
chat_completions()
|
||||
│ 1. _resolve_agent(req.model) → Agent(id="media-agent", skills=[...])
|
||||
│ 2. get_agent_graph("media-agent", request)
|
||||
│ → lazy-compiled LangGraph StateGraph, cached on app.state
|
||||
│ 3. stream=True → _stream_graph(graph, messages) → SSE token stream
|
||||
│ stream=False → _invoke_graph(graph, messages) → plain response
|
||||
▼
|
||||
LangGraph StateGraph (src/graph.py)
|
||||
│
|
||||
├── agent_node: calls LLM with system prompt + tool definitions
|
||||
│ └── LLM returns text OR tool_calls
|
||||
│
|
||||
├── _should_continue: if tool_calls → tool_node, else → END
|
||||
│
|
||||
└── tool_node: executes tool via agents/skills system → ToolMessage
|
||||
└── loops back to agent_node with the result
|
||||
```
|
||||
|
||||
For a detailed walkthrough, see [api.md](../api.md).
|
||||
|
||||
---
|
||||
|
||||
## Streaming
|
||||
|
||||
Two streaming modes exist:
|
||||
|
||||
### SSE (Server-Sent Events) — `/v1/chat`
|
||||
```
|
||||
data: {"token": "Here"}
|
||||
data: {"token": " are"}
|
||||
data: {"token": " the"}
|
||||
...
|
||||
data: [DONE]
|
||||
```
|
||||
|
||||
The graph runs to completion (tools execute silently), then the final text is
|
||||
yielded token-by-token as SSE events.
|
||||
|
||||
### OpenAI-compatible — `/v1/chat/completions` with `stream: true`
|
||||
```
|
||||
data: {"id":"...","object":"chat.completion.chunk","choices":[{"delta":{"content":"Hello"}}]}
|
||||
data: {"id":"...","object":"chat.completion.chunk","choices":[{"delta":{"content":"!"}}]}
|
||||
data: [DONE]
|
||||
```
|
||||
|
||||
> **Future improvement:** true token-level streaming (tokens appear as the LLM
|
||||
> generates them) would require using `langchain-openai`'s `ChatOpenAI` in
|
||||
> place of the raw `openai` client. The current approach avoids adding that
|
||||
> dependency.
|
||||
|
||||
---
|
||||
|
||||
## Dependencies
|
||||
|
||||
Endpoints receive shared singletons via FastAPI `Depends`:
|
||||
|
||||
- **`get_llm_client(request)`** → returns `request.app.state.llm_client` (OpenAI client singleton, created once in `main.py`)
|
||||
- **`get_agent_graph(agent_id, request)`** → returns a lazy-compiled LangGraph from `request.app.state.agent_graphs`
|
||||
Reference in New Issue
Block a user