diff --git a/gateway/api.md b/gateway/api.md index 1153989..5f3479f 100644 --- a/gateway/api.md +++ b/gateway/api.md @@ -1,235 +1,75 @@ -# API Architecture — Agent + Skill + Graph Pipeline +# Gateway Architecture — Agent + Skill + Graph Pipeline -This document explains how the API routes user messages through the -agent / skill / LangGraph pipeline to produce responses. +This is the **interface layer** of the Agents project. Everything that connects +the outside world to the agent system lives here — REST APIs, Discord bot, +and authentication. --- -## Overview +## Directory Map -``` -┌─────────────────────────────────────────────────────────────────┐ -│ OpenWebUI / Client │ -│ POST /v1/chat/completions { model, messages, stream } │ -└──────────────────────────────┬──────────────────────────────────┘ - │ - ▼ -┌──────────────────────────────────────────────────────────────────┐ -│ api/v1/chat.py — chat_completions() │ -│ │ -│ 1. _resolve_agent(req.model) → Agent │ -│ 2. get_agent_graph(agent_id) → compiled StateGraph │ -│ 3. graph.ainvoke(state) or _stream_graph(graph, messages) │ -└──────────────────────────────┬───────────────────────────────────┘ - │ - ▼ -┌──────────────────────────────────────────────────────────────────┐ -│ LangGraph StateGraph (core/graph.py) │ -│ │ -│ ┌──────────────┐ tool_calls? ┌──────────────┐ │ -│ │ agent_node │ ───────────────▶ │ tool_node │ │ -│ │ (LLM call) │ ◀─────────────── │ (skill exec) │ │ -│ └──────┬───────┘ └──────────────┘ │ -│ │ no tool_calls │ -│ ▼ │ -│ [END] │ -└──────────────────────────────────────────────────────────────────┘ +| Path | Description | Docs | +|---|---|---| +| `gateway/v1/` | REST API endpoints — chat, agent listing, OpenAI-compatible completions | [v1.md](v1/v1.md) | +| `gateway/discord/` | Discord bot connector — in-process DM handler with LangGraph integration | [discord.md](discord/discord.md) | +| `gateway/auth/` | Auth service registry + Jellyfin Quick Connect implementation | [auth.md](auth/auth.md) | -## Key Concepts +--- -### 1. Agent +## Supporting Modules -An **Agent** is a persona + skill bundle. Defined in `agents/`. - -```python -# agents/media_agent.py -Agent( - agent_id="media-agent", - description="Media assistant with Seerr integration", - skills=["media_info", "seerr", "triage"], - base_prompt="You are a media assistant...", -) -``` - -- `agent_id` — unique name, exposed as a model in OpenWebUI -- `skills` — list of skill names to load -- `base_prompt` — starting system prompt, combined with skill fragments -- `build_system_prompt()` — merges base_prompt + all skill prompt fragments - -Agents self-register at import time via `agents/__init__.py`'s `register()`. -`main.py` calls `load_all_agents()` at startup to import every agent and skill -module. - -### 2. Skill - -A **Skill** is a capability bundle. Defined in `skills/`. - -```python -# skills/seerr.py -Skill( - name="seerr", - description="Seerr integration — trending, discover, request media, submit issues", - prompt_fragment="## Seerr Media Tools\n...", - tools=[...], # OpenAI function-calling schema - execute=_execute, # async handler: tool_name + args → ToolResult -) -``` - -- `prompt_fragment` — injected into the agent's system prompt. -- `tools` — list of OpenAI function definitions (name, description, parameters). -- `execute` — async callable that routes tool calls to API handlers. - -### 3. Graph - -Each agent gets a **compiled LangGraph StateGraph** built by -`core/graph.py:create_agent_graph()`. The graph is compiled lazily on the -first request and cached on `app.state.agent_graphs` for the lifetime of the -process. - -| Graph node / edge | What it does | +| Path | Purpose | |---|---| -| `agent_node` | Converts state messages to OpenAI dicts, calls the LLM with the agent's system prompt + tool definitions, returns an `AIMessage` | -| `tool_node` | Reads `tool_calls` from the last AI message, calls `execute_tool()` from the skill system, returns `ToolMessage` results | -| `_should_continue` | Conditional edge — returns `"tool_node"` if the AI message has `tool_calls`, else `END` | - -### 4. State - -Defined in `core/state.py`: - -```python -class AgentState(TypedDict): - messages: Annotated[list, add_messages] -``` - -LangGraph's `add_messages` reducer appends new messages and replaces messages -with matching IDs (so tool-call results overwrite their placeholders). - -### 5. Message Conversion - -Because we use the raw `openai` client (not `langchain-openai`), messages must -be converted between LangChain and OpenAI formats at every LLM call: - -- **LangChain → OpenAI** (`_lc_role_to_openai`, `_langchain_tc_to_openai`): - Maps `type` → `role` and converts top-level `name`/`args` tool-calls into - the nested `function` sub-object that the OpenAI API expects. - -- **OpenAI → LangChain** (inside `agent_node`): - Converts the `ChatCompletionMessage` response into an `AIMessage` with - LangChain-format `tool_calls` (top-level `name`/`args`/`id`). +| `gateway/dependencies.py` | FastAPI `Depends` providers — `get_llm_client()`, `get_agent_graph()` | +| `src/config.py` | `.env` loader and config accessor | +| `src/llm.py` | OpenAI-compatible client factory (DeepSeek) | +| `src/state.py` | LangGraph `AgentState` TypedDict | +| `src/graph.py` | LangGraph StateGraph factory — agent_node, tool_node, routing | +| `src/tools_adapter.py` | Wraps skill tools as LangChain `@tool` functions | +| `src/auth_store.py` | SQLite persistence for Discord → service auth linking | +| `agents/` | Agent definitions (dataclass + registry) | +| `agents/skills/` | Skill definitions — prompt fragments, tool schemas, executors | --- -## Full Request Flow - -### Step-by-step: "What are trending movies?" +## High-Level Request Flow ``` -1. OpenWebUI sends: - POST /v1/chat/completions - { - "model": "media-agent", - "messages": [ - {"role": "user", "content": "What are trending movies?"} - ], - "stream": false - } - -2. chat_completions(): - → _resolve_agent(model="media-agent") - → get_agent("media-agent") → Agent(skills=["media_info", "seerr", "triage"]) - → get_agent_graph("media-agent", request) - → looks up app.state.agent_graphs["media-agent"] - → first call → create_agent_graph() compiles the graph with 7 Seerr tools - → run_agent_with_tools(request, messages, agent_id) - → _invoke_graph(graph, messages) - -3. Graph — Pass 1 (agent_node): - → LLM receives: [system prompt] + [user: "What are trending movies?"] - → LLM responds with tool_calls: seerr_trending(kind="movie") - → agent_node returns AIMessage with tool_calls in LangChain format - -4. Graph — _should_continue: - → AIMessage has tool_calls → route to "tool_node" - -5. Graph — tool_node: - → Reads tool_call: name="seerr_trending", args={"kind": "movie"} - → execute_tool(["media_info", "seerr", "triage"], "seerr_trending", ...) - → Seerr API → GET /api/v1/discover/trending?mediaType=movie - → Returns ToolMessage with formatted results including [tmdb:IDs] - -6. Graph — Pass 2 (agent_node): - → LLM receives previous exchange + tool result - → LLM responds with text only (no tool_calls) - → agent_node returns AIMessage(content="Here are the top trending movies!...") - -7. Graph — _should_continue: - → No tool_calls → route to END - -8. chat_completions() returns: - { "choices": [{"message": {"role": "assistant", "content": "Here are the top..."}}] } +┌──────────────────────────────┐ +│ Client (OpenWebUI / HTTP) │ +└──────────────┬───────────────┘ + │ POST /v1/chat/completions + ▼ +┌──────────────────────────────┐ +│ gateway/v1/chat.py │ ← resolves agent, invokes graph +└──────────────┬───────────────┘ + │ + ▼ +┌──────────────────────────────┐ +│ LangGraph StateGraph │ ← src/graph.py +│ ┌──────────┐ ┌──────────┐│ +│ │agent_node│──▶│tool_node ││ +│ │(LLM call)│◀──│(skills) ││ +│ └──────────┘ └──────────┘│ +└──────────────┬───────────────┘ + │ + ▼ +┌──────────────────────────────┐ +│ agents/skills/ │ ← Seerr API, Jellyfin API, etc. +└──────────────────────────────┘ ``` -### Step-by-step: "Request the 2026 one" (multi-turn context) - -``` -1. OpenWebUI sends the FULL history: - { - "model": "media-agent", - "messages": [ - {"role": "user", "content": "What are trending movies?"}, - {"role": "assistant", "content": "Here are the top 10 trending movies! - 1. **Mortal Kombat II** (2026) [tmdb:931285] — ..."}, - {"role": "user", "content": "could request the mortal kombat one?"}, - {"role": "assistant", "content": "There are several Mortal Kombat entries! ..."}, - {"role": "user", "content": "the 2026 one"} - ] - } - -2. chat_completions(): - → req.messages contains the ENTIRE conversation history - → graph.ainvoke({"messages": all_messages}) - → agent_node prepends system prompt and sends everything to the LLM - -3. LLM reasons from full context: - - Previously listed Mortal Kombat II (2026) with [tmdb:931285] - - The user said "request the mortal kombat one" → I searched and showed 4 options - - Now they say "the 2026 one" → that matches Mortal Kombat II (2026) [tmdb:931285] - - I should call seerr_request_media(kind="movie", title="Mortal Kombat II", tmdb_id=931285) - -4. tool_node executes the request → ✅ Success -``` +For a detailed step-by-step walkthrough of the graph execution (including +multi-turn context and tool-calling loops), see [v1.md](v1/v1.md). --- -## Streaming +## Startup -Streaming works slightly differently from the sync path: +`main.py` is the entry point. At startup it: -``` -chat_completions(stream=True) - → _stream_graph(graph, messages) - → graph.ainvoke(state) # runs graph to completion (tools execute silently) - → yields content character-by-character via SSE -``` - -For true token-level streaming (tokens appear as the LLM generates them), -the agent_node would need to use `langchain-openai`'s `ChatOpenAI` instead of -the raw `openai` client. The current approach is a pragmatic middle ground -that avoids adding another dependency while still giving the SSE client -incremental output. - ---- - -## File Map - -| File | Responsibility | -|---|---| -| `main.py` | FastAPI app, singleton creation, router mounting | -| `api/v1/chat.py` | Endpoints — resolves agent, invokes graph, formats responses | -| `api/dependencies.py` | `get_llm_client()`, `get_agent_graph()` — FastAPI `Depends` | -| `core/graph.py` | `create_agent_graph()` — builds the StateGraph | -| `core/state.py` | `AgentState` TypedDict | -| `core/llm.py` | `create_client()` — OpenAI client factory | -| `core/config.py` | Environment variable loader | -| `agents/` | Agent definitions (dataclass + self-registration) | -| `skills/` | Skill definitions (prompt fragments + tools + executors) | +1. Loads `.env` → creates the LLM client (DeepSeek) → stores on `app.state.llm_client` +2. Calls `load_all_agents()` → imports every agent and skill module (they self-register) +3. Imports `gateway.auth.jellyfin` → self-registers the Jellyfin auth service +4. Mounts routers: `/v1/*` (chat endpoints) and `/api/v1/auth/*` (auth endpoints) +5. Starts the Discord bot as a background asyncio task (lifespan) diff --git a/gateway/auth/auth.md b/gateway/auth/auth.md new file mode 100644 index 0000000..df39c50 --- /dev/null +++ b/gateway/auth/auth.md @@ -0,0 +1,152 @@ +# Auth — Service Registry & Persistence + +The authentication system lets Discord users link their accounts to external +services (currently **Jellyfin**) so the agent can perform actions on their +behalf (e.g. checking watch history). + +--- + +## Architecture + +``` +gateway/auth/ gateway/v1/auth.py +┌──────────────────────┐ ┌──────────────────────────────┐ +│ AuthService (ABC) │ │ GET /api/v1/auth/login │ +│ ├─ JellyfinAuth │◀─────────│ POST /api/v1/auth/login │ +│ └─ (Plex, Seerr…) │ │ GET /api/v1/auth/status │ +│ │ │ GET /api/v1/auth/reset │ +└─────────┬────────────┘ └──────────────────────────────┘ + │ + ▼ +src/auth_store.py +┌──────────────────────┐ +│ SQLite │ +│ ├─ link_tokens │ one-time tokens sent via Discord DM +│ └─ user_auth │ per-user, per-service credentials +└──────────────────────┘ +``` + +--- + +## Files + +| File | Purpose | +|---|---| +| `gateway/auth/__init__.py` | Abstract `AuthService` base class + global registry | +| `gateway/auth/jellyfin.py` | Jellyfin implementation — Quick Connect + username/password | +| `gateway/v1/auth.py` | REST endpoints for the web-based login flow | +| `src/auth_store.py` | SQLite persistence for link tokens and stored credentials | + +--- + +## Flow: Discord User Links Jellyfin + +``` +Discord DM Web Browser Jellyfin Server + │ │ │ + │ 1. /login jellyfin │ │ + │ ──────────────────────────────▶│ │ + │ Bot creates link token in │ │ + │ SQLite, DMs the user a URL │ │ + │ │ │ + │ 2. User clicks link │ │ + │ ◀─────────────────────────────▶│ │ + │ │ GET /api/v1/auth/login │ + │ │ ?service=jellyfin │ + │ │ &token=xxx&discord_id=123 │ + │ │ │ + │ │ 3. Serve Quick Connect form │ + │ │ ◀──────────────────────────── │ + │ │ │ + │ │ 4. Initiate Quick Connect │ + │ │ ─────────────────────────────▶│ + │ │ POST /QuickConnect/Initiate │ + │ │ ◀── { Code: "ABC123" } │ + │ │ │ + │ 5. User enters code in │ │ + │ Jellyfin app │ │ + │ │ │ + │ │ 6. Poll: is it authorized? │ + │ │ ─────────────────────────────▶│ + │ │ GET /QuickConnect/Connect │ + │ │ ◀── Authenticated + Token │ + │ │ │ + │ 7. auth_store saves: │ │ + │ (discord_id, jellyfin, │ │ + │ AccessToken, username) │ │ + │ │ │ + │ 8. "✅ Linked to Jellyfin!" │ │ + │ ◀───────────────────────────── │ │ +``` + +--- + +## AuthService Base Class + +```python +class AuthService(ABC): + name: str # "jellyfin" + display_name: str # "Jellyfin" + + def render_login_form(token, discord_id) -> str: ... + async def authenticate(form_data) -> AuthResult: ... +``` + +Add a new service (e.g. Plex, Seerr) by subclassing `AuthService`, dropping +the module in `gateway/auth/`, and calling `register_auth_service()` at import +time. The REST endpoints and auth store work generically — no changes needed. + +--- + +## Current Implementation: Jellyfin + +`gateway/auth/jellyfin.py` supports two flows: + +| Method | How it works | +|---|---| +| **Quick Connect** (primary) | Calls Jellyfin's `/QuickConnect/Initiate` → polls `/QuickConnect/Connect` → stores the `AccessToken` | +| **Username/Password** (fallback) | Renders an HTML form → user submits credentials → calls `/Users/AuthenticateByName` → stores the `AccessToken` | + +The stored credentials include: +- `external_user_id` — Jellyfin user ID +- `external_name` — Jellyfin username +- `credentials` dict — `{"AccessToken": "...", "ServerURL": "..."}` + +--- + +## Auth Store (SQLite) + +Two tables in `data/auth.db`: + +```sql +-- One-time tokens for the web login flow (expire after 10 min) +CREATE TABLE link_tokens ( + token TEXT PRIMARY KEY, + discord_id INTEGER NOT NULL, + service TEXT NOT NULL, + created_at TEXT NOT NULL, + used INTEGER DEFAULT 0 +); + +-- Per-user, per-service stored credentials +CREATE TABLE user_auth ( + discord_id INTEGER NOT NULL, + service TEXT NOT NULL, + external_user_id TEXT, + external_name TEXT, + credentials TEXT, -- JSON + created_at TEXT NOT NULL, + PRIMARY KEY (discord_id, service) +); +``` + +--- + +## Skill-Level Auth Gating + +Skills can declare `requires_auth=["jellyfin"]`. When a tool is executed, +the skill system checks the auth store. If the user isn't linked: + +1. The tool returns `ToolResult.fail("Please login first using /login jellyfin")` +2. The LLM relays this message to the user in Discord +3. The user types `/login jellyfin` → Quick Connect flow → re-linked → try again diff --git a/gateway/discord/discord.md b/gateway/discord/discord.md new file mode 100644 index 0000000..7afbe2f --- /dev/null +++ b/gateway/discord/discord.md @@ -0,0 +1,73 @@ +# Discord — Connector + +The Discord module embeds a Discord bot **in-process** alongside FastAPI. +It uses the same LangGraph graphs and LLM client as the REST API — there is +no HTTP loopback, no separate process, and no code duplication. + +--- + +## Files + +| File | Purpose | +|---|---| +| `bot.py` | Discord `Client` subclass (`AgentBot`) — DM handler, command parser, graph invoker, Quick Connect orchestrator | +| `conversation.py` | In-memory conversation history store, keyed by Discord user ID | + +--- + +## Architecture + +``` +Discord Gateway (websocket) + │ DM: "What's trending?" + ▼ +discord.Client.on_message() + │ 1. Check: is this a DM? shares a guild? not a command? + │ 2. Build message history from ConversationStore + │ 3. Append user message + ▼ +_create_agent_graph(agent_id="media-agent") + │ Uses the exact same create_agent_graph() from src/graph.py + │ as the REST API — same LLM client, same tools, same cache. + ▼ +graph.ainvoke({"messages": [...]}) + │ LangGraph runs agent_node → tool_node → agent_node → END + ▼ +Response text → split into ≤2000-char Discord messages → sent to user +``` + +--- + +## Commands + +Commands are DMs that start with `/`. The bot parses them before hitting the +LLM: + +| Command | Action | +|---|---| +| `/login ` | Generate a one-time auth link, DM it to the user | +| `/jellyfin login` | Alias for `/login jellyfin` | +| `/help` | Show available agents and commands | +| `/` | Switch to a different agent for future messages | + +--- + +## Auth Flow (Quick Connect) + +When a user types `/login jellyfin`: + +1. Bot generates a one-time token via `auth_store` +2. Bot calls `auth_store.create_link_token(discord_id, "jellyfin")` +3. Bot DMs the user: `https:///api/v1/auth/login?service=jellyfin&token=...&discord_id=...` +4. User clicks the link → FastAPI serves the Jellyfin login form (or Quick Connect prompt) +5. User authenticates → credentials stored in `auth_store` +6. Future tool calls (e.g. `watch_history`) automatically use the stored Jellyfin session + +--- + +## Conversation Persistence + +- Per-user history stored in `ConversationStore` (in-memory dict) +- Max history length configurable via `DISCORD_MAX_HISTORY` env var (default: 7) +- Oldest messages are silently dropped when the limit is exceeded +- History is NOT persisted across restarts (future: could use SQLite) diff --git a/gateway/v1/v1.md b/gateway/v1/v1.md new file mode 100644 index 0000000..a627a02 --- /dev/null +++ b/gateway/v1/v1.md @@ -0,0 +1,106 @@ +# V1 — Chat & Agent API Endpoints + +This is the primary HTTP API surface for the chatbot agent system. It exposes +both a custom streaming chat endpoint and an OpenAI-compatible +`/chat/completions` endpoint so it works as a drop-in backend for OpenWebUI, +LibreChat, or any OpenAI-compatible client. + +--- + +## Endpoints + +| Method | Path | Description | +|---|---|---| +| `GET ` | `/v1/` | Health check — returns `{"status": "ok"}` | +| `GET ` | `/v1/agents` | List all registered agents (id + description) | +| `GET ` | `/v1/models` | OpenAI-compatible model list (one entry per agent) | +| `POST` | `/v1/chat` | Chat with an agent — streaming (SSE) | +| `POST` | `/v1/chat/sync` | Chat with an agent — non-streaming | +| `POST` | `/v1/chat/completions` | OpenAI-compatible chat completions (supports `stream: true`) | + +All `/v1/*` endpoints are mounted by `main.py` via: + +```python +app.include_router(v1_router, prefix="/v1") +``` + +--- + +## Agent Resolution + +Each request can target a specific agent. The resolution order is: + +1. **Explicit `agent_id`** field in the request body +2. **OpenAI `model` field** (OpenWebUI sends this — mapped to `agent_id` if a matching agent is registered) +3. **Fallback** to the `"naked"` agent (a plain LLM with no tools) + +This means an OpenWebUI client can simply set `model: "media-agent"` and get +the full Media Agent with Seerr tools. + +--- + +## Request Flow + +``` +Client (OpenWebUI / HTTP) + │ POST /v1/chat/completions + │ { model: "media-agent", messages: [...], stream: true/false } + ▼ +chat_completions() + │ 1. _resolve_agent(req.model) → Agent(id="media-agent", skills=[...]) + │ 2. get_agent_graph("media-agent", request) + │ → lazy-compiled LangGraph StateGraph, cached on app.state + │ 3. stream=True → _stream_graph(graph, messages) → SSE token stream + │ stream=False → _invoke_graph(graph, messages) → plain response + ▼ +LangGraph StateGraph (src/graph.py) + │ + ├── agent_node: calls LLM with system prompt + tool definitions + │ └── LLM returns text OR tool_calls + │ + ├── _should_continue: if tool_calls → tool_node, else → END + │ + └── tool_node: executes tool via agents/skills system → ToolMessage + └── loops back to agent_node with the result +``` + +For a detailed walkthrough, see [api.md](../api.md). + +--- + +## Streaming + +Two streaming modes exist: + +### SSE (Server-Sent Events) — `/v1/chat` +``` +data: {"token": "Here"} +data: {"token": " are"} +data: {"token": " the"} +... +data: [DONE] +``` + +The graph runs to completion (tools execute silently), then the final text is +yielded token-by-token as SSE events. + +### OpenAI-compatible — `/v1/chat/completions` with `stream: true` +``` +data: {"id":"...","object":"chat.completion.chunk","choices":[{"delta":{"content":"Hello"}}]} +data: {"id":"...","object":"chat.completion.chunk","choices":[{"delta":{"content":"!"}}]} +data: [DONE] +``` + +> **Future improvement:** true token-level streaming (tokens appear as the LLM +> generates them) would require using `langchain-openai`'s `ChatOpenAI` in +> place of the raw `openai` client. The current approach avoids adding that +> dependency. + +--- + +## Dependencies + +Endpoints receive shared singletons via FastAPI `Depends`: + +- **`get_llm_client(request)`** → returns `request.app.state.llm_client` (OpenAI client singleton, created once in `main.py`) +- **`get_agent_graph(agent_id, request)`** → returns a lazy-compiled LangGraph from `request.app.state.agent_graphs`