diff --git a/gateway/api.md b/gateway/api.md
index 1153989..5f3479f 100644
--- a/gateway/api.md
+++ b/gateway/api.md
@@ -1,235 +1,75 @@
-# API Architecture — Agent + Skill + Graph Pipeline
+# Gateway Architecture — Agent + Skill + Graph Pipeline
 
-This document explains how the API routes user messages through the
-agent / skill / LangGraph pipeline to produce responses.
+This is the **interface layer** of the Agents project. Everything that connects
+the outside world to the agent system lives here — REST APIs, Discord bot,
+and authentication.
 
 ---
 
-## Overview
+## Directory Map
 
-```
-┌─────────────────────────────────────────────────────────────────┐
-│                      OpenWebUI / Client                         │
-│  POST /v1/chat/completions  { model, messages, stream }         │
-└──────────────────────────────┬──────────────────────────────────┘
-                               │
-                               ▼
-┌──────────────────────────────────────────────────────────────────┐
-│  api/v1/chat.py  —  chat_completions()                          │
-│                                                                  │
-│  1. _resolve_agent(req.model)  →  Agent                          │
-│  2. get_agent_graph(agent_id)  →  compiled StateGraph            │
-│  3. graph.ainvoke(state)  or  _stream_graph(graph, messages)     │
-└──────────────────────────────┬───────────────────────────────────┘
-                               │
-                               ▼
-┌──────────────────────────────────────────────────────────────────┐
-│  LangGraph StateGraph  (core/graph.py)                           │
-│                                                                  │
-│   ┌──────────────┐   tool_calls?    ┌──────────────┐            │
-│   │  agent_node  │ ───────────────▶ │  tool_node   │            │
-│   │  (LLM call)  │ ◀─────────────── │ (skill exec) │            │
-│   └──────┬───────┘                  └──────────────┘            │
-│          │ no tool_calls                                         │
-│          ▼                                                       │
-│        [END]                                                     │
-└──────────────────────────────────────────────────────────────────┘
+| Path | Description | Docs |
+|---|---|---|
+| `gateway/v1/` | REST API endpoints — chat, agent listing, OpenAI-compatible completions | [v1.md](v1/v1.md) |
+| `gateway/discord/` | Discord bot connector — in-process DM handler with LangGraph integration | [discord.md](discord/discord.md) |
+| `gateway/auth/` | Auth service registry + Jellyfin Quick Connect implementation | [auth.md](auth/auth.md) |
 
-## Key Concepts
+---
 
-### 1. Agent
+## Supporting Modules
 
-An **Agent** is a persona + skill bundle. Defined in `agents/`.
-
-```python
-# agents/media_agent.py
-Agent(
-    agent_id="media-agent",
-    description="Media assistant with Seerr integration",
-    skills=["media_info", "seerr", "triage"],
-    base_prompt="You are a media assistant...",
-)
-```
-
-- `agent_id` — unique name, exposed as a model in OpenWebUI
-- `skills` — list of skill names to load
-- `base_prompt` — starting system prompt, combined with skill fragments
-- `build_system_prompt()` — merges base_prompt + all skill prompt fragments
-
-Agents self-register at import time via `agents/__init__.py`'s `register()`.
-`main.py` calls `load_all_agents()` at startup to import every agent and skill
-module.
-
-### 2. Skill
-
-A **Skill** is a capability bundle. Defined in `skills/`.
-
-```python
-# skills/seerr.py
-Skill(
-    name="seerr",
-    description="Seerr integration — trending, discover, request media, submit issues",
-    prompt_fragment="## Seerr Media Tools\n...",
-    tools=[...],          # OpenAI function-calling schema
-    execute=_execute,     # async handler: tool_name + args → ToolResult
-)
-```
-
-- `prompt_fragment` — injected into the agent's system prompt.
-- `tools` — list of OpenAI function definitions (name, description, parameters).
-- `execute` — async callable that routes tool calls to API handlers.
-
-### 3. Graph
-
-Each agent gets a **compiled LangGraph StateGraph** built by
-`core/graph.py:create_agent_graph()`.  The graph is compiled lazily on the
-first request and cached on `app.state.agent_graphs` for the lifetime of the
-process.
-
-| Graph node / edge | What it does |
+| Path | Purpose |
 |---|---|
-| `agent_node` | Converts state messages to OpenAI dicts, calls the LLM with the agent's system prompt + tool definitions, returns an `AIMessage` |
-| `tool_node` | Reads `tool_calls` from the last AI message, calls `execute_tool()` from the skill system, returns `ToolMessage` results |
-| `_should_continue` | Conditional edge — returns `"tool_node"` if the AI message has `tool_calls`, else `END` |
-
-### 4. State
-
-Defined in `core/state.py`:
-
-```python
-class AgentState(TypedDict):
-    messages: Annotated[list, add_messages]
-```
-
-LangGraph's `add_messages` reducer appends new messages and replaces messages
-with matching IDs (so tool-call results overwrite their placeholders).
-
-### 5. Message Conversion
-
-Because we use the raw `openai` client (not `langchain-openai`), messages must
-be converted between LangChain and OpenAI formats at every LLM call:
-
-- **LangChain → OpenAI** (`_lc_role_to_openai`, `_langchain_tc_to_openai`):
-  Maps `type` → `role` and converts top-level `name`/`args` tool-calls into
-  the nested `function` sub-object that the OpenAI API expects.
-
-- **OpenAI → LangChain** (inside `agent_node`):
-  Converts the `ChatCompletionMessage` response into an `AIMessage` with
-  LangChain-format `tool_calls` (top-level `name`/`args`/`id`).
+| `gateway/dependencies.py` | FastAPI `Depends` providers — `get_llm_client()`, `get_agent_graph()` |
+| `src/config.py` | `.env` loader and config accessor |
+| `src/llm.py` | OpenAI-compatible client factory (DeepSeek) |
+| `src/state.py` | LangGraph `AgentState` TypedDict |
+| `src/graph.py` | LangGraph StateGraph factory — agent_node, tool_node, routing |
+| `src/tools_adapter.py` | Wraps skill tools as LangChain `@tool` functions |
+| `src/auth_store.py` | SQLite persistence for Discord → service auth linking |
+| `agents/` | Agent definitions (dataclass + registry) |
+| `agents/skills/` | Skill definitions — prompt fragments, tool schemas, executors |
 
 ---
 
-## Full Request Flow
-
-### Step-by-step: "What are trending movies?"
+## High-Level Request Flow
 
 ```
-1. OpenWebUI sends:
-   POST /v1/chat/completions
-   {
-     "model": "media-agent",
-     "messages": [
-       {"role": "user", "content": "What are trending movies?"}
-     ],
-     "stream": false
-   }
-
-2. chat_completions():
-   → _resolve_agent(model="media-agent")
-     → get_agent("media-agent") → Agent(skills=["media_info", "seerr", "triage"])
-   → get_agent_graph("media-agent", request)
-     → looks up app.state.agent_graphs["media-agent"]
-     → first call → create_agent_graph() compiles the graph with 7 Seerr tools
-   → run_agent_with_tools(request, messages, agent_id)
-     → _invoke_graph(graph, messages)
-
-3. Graph — Pass 1 (agent_node):
-   → LLM receives: [system prompt] + [user: "What are trending movies?"]
-   → LLM responds with tool_calls: seerr_trending(kind="movie")
-   → agent_node returns AIMessage with tool_calls in LangChain format
-
-4. Graph — _should_continue:
-   → AIMessage has tool_calls → route to "tool_node"
-
-5. Graph — tool_node:
-   → Reads tool_call: name="seerr_trending", args={"kind": "movie"}
-   → execute_tool(["media_info", "seerr", "triage"], "seerr_trending", ...)
-   → Seerr API → GET /api/v1/discover/trending?mediaType=movie
-   → Returns ToolMessage with formatted results including [tmdb:IDs]
-
-6. Graph — Pass 2 (agent_node):
-   → LLM receives previous exchange + tool result
-   → LLM responds with text only (no tool_calls)
-   → agent_node returns AIMessage(content="Here are the top trending movies!...")
-
-7. Graph — _should_continue:
-   → No tool_calls → route to END
-
-8. chat_completions() returns:
-   { "choices": [{"message": {"role": "assistant", "content": "Here are the top..."}}] }
+┌──────────────────────────────┐
+│  Client (OpenWebUI / HTTP)   │
+└──────────────┬───────────────┘
+               │ POST /v1/chat/completions
+               ▼
+┌──────────────────────────────┐
+│  gateway/v1/chat.py          │  ← resolves agent, invokes graph
+└──────────────┬───────────────┘
+               │
+               ▼
+┌──────────────────────────────┐
+│  LangGraph StateGraph        │  ← src/graph.py
+│  ┌──────────┐   ┌──────────┐│
+│  │agent_node│──▶│tool_node ││
+│  │(LLM call)│◀──│(skills)  ││
+│  └──────────┘   └──────────┘│
+└──────────────┬───────────────┘
+               │
+               ▼
+┌──────────────────────────────┐
+│  agents/skills/              │  ← Seerr API, Jellyfin API, etc.
+└──────────────────────────────┘
 ```
 
-### Step-by-step: "Request the 2026 one" (multi-turn context)
-
-```
-1. OpenWebUI sends the FULL history:
-   {
-     "model": "media-agent",
-     "messages": [
-       {"role": "user", "content": "What are trending movies?"},
-       {"role": "assistant", "content": "Here are the top 10 trending movies!
-        1. **Mortal Kombat II** (2026) [tmdb:931285] — ..."},
-       {"role": "user", "content": "could request the mortal kombat one?"},
-       {"role": "assistant", "content": "There are several Mortal Kombat entries! ..."},
-       {"role": "user", "content": "the 2026 one"}
-     ]
-   }
-
-2. chat_completions():
-   → req.messages contains the ENTIRE conversation history
-   → graph.ainvoke({"messages": all_messages})
-   → agent_node prepends system prompt and sends everything to the LLM
-
-3. LLM reasons from full context:
-   - Previously listed Mortal Kombat II (2026) with [tmdb:931285]
-   - The user said "request the mortal kombat one" → I searched and showed 4 options
-   - Now they say "the 2026 one" → that matches Mortal Kombat II (2026) [tmdb:931285]
-   - I should call seerr_request_media(kind="movie", title="Mortal Kombat II", tmdb_id=931285)
-
-4. tool_node executes the request → ✅ Success
-```
+For a detailed step-by-step walkthrough of the graph execution (including
+multi-turn context and tool-calling loops), see [v1.md](v1/v1.md).
 
 ---
 
-## Streaming
+## Startup
 
-Streaming works slightly differently from the sync path:
+`main.py` is the entry point. At startup it:
 
-```
-chat_completions(stream=True)
-  → _stream_graph(graph, messages)
-    → graph.ainvoke(state)        # runs graph to completion (tools execute silently)
-    → yields content character-by-character via SSE
-```
-
-For true token-level streaming (tokens appear as the LLM generates them),
-the agent_node would need to use `langchain-openai`'s `ChatOpenAI` instead of
-the raw `openai` client.  The current approach is a pragmatic middle ground
-that avoids adding another dependency while still giving the SSE client
-incremental output.
-
----
-
-## File Map
-
-| File | Responsibility |
-|---|---|
-| `main.py` | FastAPI app, singleton creation, router mounting |
-| `api/v1/chat.py` | Endpoints — resolves agent, invokes graph, formats responses |
-| `api/dependencies.py` | `get_llm_client()`, `get_agent_graph()` — FastAPI `Depends` |
-| `core/graph.py` | `create_agent_graph()` — builds the StateGraph |
-| `core/state.py` | `AgentState` TypedDict |
-| `core/llm.py` | `create_client()` — OpenAI client factory |
-| `core/config.py` | Environment variable loader |
-| `agents/` | Agent definitions (dataclass + self-registration) |
-| `skills/` | Skill definitions (prompt fragments + tools + executors) |
+1. Loads `.env` → creates the LLM client (DeepSeek) → stores on `app.state.llm_client`
+2. Calls `load_all_agents()` → imports every agent and skill module (they self-register)
+3. Imports `gateway.auth.jellyfin` → self-registers the Jellyfin auth service
+4. Mounts routers: `/v1/*` (chat endpoints) and `/api/v1/auth/*` (auth endpoints)
+5. Starts the Discord bot as a background asyncio task (lifespan)
diff --git a/gateway/auth/auth.md b/gateway/auth/auth.md
new file mode 100644
index 0000000..df39c50
--- /dev/null
+++ b/gateway/auth/auth.md
@@ -0,0 +1,152 @@
+# Auth — Service Registry & Persistence
+
+The authentication system lets Discord users link their accounts to external
+services (currently **Jellyfin**) so the agent can perform actions on their
+behalf (e.g. checking watch history).
+
+---
+
+## Architecture
+
+```
+gateway/auth/                     gateway/v1/auth.py
+┌──────────────────────┐          ┌──────────────────────────────┐
+│  AuthService (ABC)   │          │  GET  /api/v1/auth/login     │
+│  ├─ JellyfinAuth     │◀─────────│  POST /api/v1/auth/login     │
+│  └─ (Plex, Seerr…)   │          │  GET  /api/v1/auth/status    │
+│                      │          │  GET  /api/v1/auth/reset     │
+└─────────┬────────────┘          └──────────────────────────────┘
+          │
+          ▼
+src/auth_store.py
+┌──────────────────────┐
+│  SQLite              │
+│  ├─ link_tokens      │  one-time tokens sent via Discord DM
+│  └─ user_auth        │  per-user, per-service credentials
+└──────────────────────┘
+```
+
+---
+
+## Files
+
+| File | Purpose |
+|---|---|
+| `gateway/auth/__init__.py` | Abstract `AuthService` base class + global registry |
+| `gateway/auth/jellyfin.py` | Jellyfin implementation — Quick Connect + username/password |
+| `gateway/v1/auth.py` | REST endpoints for the web-based login flow |
+| `src/auth_store.py` | SQLite persistence for link tokens and stored credentials |
+
+---
+
+## Flow: Discord User Links Jellyfin
+
+```
+Discord DM                        Web Browser                     Jellyfin Server
+    │                                 │                                │
+    │  1. /login jellyfin             │                                │
+    │  ──────────────────────────────▶│                                │
+    │  Bot creates link token in      │                                │
+    │  SQLite, DMs the user a URL     │                                │
+    │                                 │                                │
+    │  2. User clicks link            │                                │
+    │  ◀─────────────────────────────▶│                                │
+    │                                 │  GET /api/v1/auth/login        │
+    │                                 │  ?service=jellyfin             │
+    │                                 │  &token=xxx&discord_id=123     │
+    │                                 │                                │
+    │                                 │  3. Serve Quick Connect form   │
+    │                                 │  ◀──────────────────────────── │
+    │                                 │                                │
+    │                                 │  4. Initiate Quick Connect     │
+    │                                 │  ─────────────────────────────▶│
+    │                                 │  POST /QuickConnect/Initiate   │
+    │                                 │  ◀── { Code: "ABC123" }       │
+    │                                 │                                │
+    │  5. User enters code in         │                                │
+    │     Jellyfin app                │                                │
+    │                                 │                                │
+    │                                 │  6. Poll: is it authorized?    │
+    │                                 │  ─────────────────────────────▶│
+    │                                 │  GET /QuickConnect/Connect     │
+    │                                 │  ◀── Authenticated + Token     │
+    │                                 │                                │
+    │  7. auth_store saves:           │                                │
+    │     (discord_id, jellyfin,      │                                │
+    │      AccessToken, username)     │                                │
+    │                                 │                                │
+    │  8. "✅ Linked to Jellyfin!"    │                                │
+    │  ◀───────────────────────────── │                                │
+```
+
+---
+
+## AuthService Base Class
+
+```python
+class AuthService(ABC):
+    name: str           # "jellyfin"
+    display_name: str   # "Jellyfin"
+
+    def render_login_form(token, discord_id) -> str: ...
+    async def authenticate(form_data) -> AuthResult: ...
+```
+
+Add a new service (e.g. Plex, Seerr) by subclassing `AuthService`, dropping
+the module in `gateway/auth/`, and calling `register_auth_service()` at import
+time. The REST endpoints and auth store work generically — no changes needed.
+
+---
+
+## Current Implementation: Jellyfin
+
+`gateway/auth/jellyfin.py` supports two flows:
+
+| Method | How it works |
+|---|---|
+| **Quick Connect** (primary) | Calls Jellyfin's `/QuickConnect/Initiate` → polls `/QuickConnect/Connect` → stores the `AccessToken` |
+| **Username/Password** (fallback) | Renders an HTML form → user submits credentials → calls `/Users/AuthenticateByName` → stores the `AccessToken` |
+
+The stored credentials include:
+- `external_user_id` — Jellyfin user ID
+- `external_name` — Jellyfin username
+- `credentials` dict — `{"AccessToken": "...", "ServerURL": "..."}`
+
+---
+
+## Auth Store (SQLite)
+
+Two tables in `data/auth.db`:
+
+```sql
+-- One-time tokens for the web login flow (expire after 10 min)
+CREATE TABLE link_tokens (
+    token TEXT PRIMARY KEY,
+    discord_id INTEGER NOT NULL,
+    service TEXT NOT NULL,
+    created_at TEXT NOT NULL,
+    used INTEGER DEFAULT 0
+);
+
+-- Per-user, per-service stored credentials
+CREATE TABLE user_auth (
+    discord_id INTEGER NOT NULL,
+    service TEXT NOT NULL,
+    external_user_id TEXT,
+    external_name TEXT,
+    credentials TEXT,  -- JSON
+    created_at TEXT NOT NULL,
+    PRIMARY KEY (discord_id, service)
+);
+```
+
+---
+
+## Skill-Level Auth Gating
+
+Skills can declare `requires_auth=["jellyfin"]`. When a tool is executed,
+the skill system checks the auth store. If the user isn't linked:
+
+1. The tool returns `ToolResult.fail("Please login first using /login jellyfin")`
+2. The LLM relays this message to the user in Discord
+3. The user types `/login jellyfin` → Quick Connect flow → re-linked → try again
diff --git a/gateway/discord/discord.md b/gateway/discord/discord.md
new file mode 100644
index 0000000..7afbe2f
--- /dev/null
+++ b/gateway/discord/discord.md
@@ -0,0 +1,73 @@
+# Discord — Connector
+
+The Discord module embeds a Discord bot **in-process** alongside FastAPI.
+It uses the same LangGraph graphs and LLM client as the REST API — there is
+no HTTP loopback, no separate process, and no code duplication.
+
+---
+
+## Files
+
+| File | Purpose |
+|---|---|
+| `bot.py` | Discord `Client` subclass (`AgentBot`) — DM handler, command parser, graph invoker, Quick Connect orchestrator |
+| `conversation.py` | In-memory conversation history store, keyed by Discord user ID |
+
+---
+
+## Architecture
+
+```
+Discord Gateway (websocket)
+  │  DM: "What's trending?"
+  ▼
+discord.Client.on_message()
+  │  1. Check: is this a DM? shares a guild? not a command?
+  │  2. Build message history from ConversationStore
+  │  3. Append user message
+  ▼
+_create_agent_graph(agent_id="media-agent")
+  │  Uses the exact same create_agent_graph() from src/graph.py
+  │  as the REST API — same LLM client, same tools, same cache.
+  ▼
+graph.ainvoke({"messages": [...]})
+  │  LangGraph runs agent_node → tool_node → agent_node → END
+  ▼
+Response text → split into ≤2000-char Discord messages → sent to user
+```
+
+---
+
+## Commands
+
+Commands are DMs that start with `/`. The bot parses them before hitting the
+LLM:
+
+| Command | Action |
+|---|---|
+| `/login <service>` | Generate a one-time auth link, DM it to the user |
+| `/jellyfin login` | Alias for `/login jellyfin` |
+| `/help` | Show available agents and commands |
+| `/<agent_id>` | Switch to a different agent for future messages |
+
+---
+
+## Auth Flow (Quick Connect)
+
+When a user types `/login jellyfin`:
+
+1. Bot generates a one-time token via `auth_store`
+2. Bot calls `auth_store.create_link_token(discord_id, "jellyfin")`
+3. Bot DMs the user: `https://<BASE_URL>/api/v1/auth/login?service=jellyfin&token=...&discord_id=...`
+4. User clicks the link → FastAPI serves the Jellyfin login form (or Quick Connect prompt)
+5. User authenticates → credentials stored in `auth_store`
+6. Future tool calls (e.g. `watch_history`) automatically use the stored Jellyfin session
+
+---
+
+## Conversation Persistence
+
+- Per-user history stored in `ConversationStore` (in-memory dict)
+- Max history length configurable via `DISCORD_MAX_HISTORY` env var (default: 7)
+- Oldest messages are silently dropped when the limit is exceeded
+- History is NOT persisted across restarts (future: could use SQLite)
diff --git a/gateway/v1/v1.md b/gateway/v1/v1.md
new file mode 100644
index 0000000..a627a02
--- /dev/null
+++ b/gateway/v1/v1.md
@@ -0,0 +1,106 @@
+# V1 — Chat & Agent API Endpoints
+
+This is the primary HTTP API surface for the chatbot agent system. It exposes
+both a custom streaming chat endpoint and an OpenAI-compatible
+`/chat/completions` endpoint so it works as a drop-in backend for OpenWebUI,
+LibreChat, or any OpenAI-compatible client.
+
+---
+
+## Endpoints
+
+| Method | Path | Description |
+|---|---|---|
+| `GET ` | `/v1/` | Health check — returns `{"status": "ok"}` |
+| `GET ` | `/v1/agents` | List all registered agents (id + description) |
+| `GET ` | `/v1/models` | OpenAI-compatible model list (one entry per agent) |
+| `POST` | `/v1/chat` | Chat with an agent — streaming (SSE) |
+| `POST` | `/v1/chat/sync` | Chat with an agent — non-streaming |
+| `POST` | `/v1/chat/completions` | OpenAI-compatible chat completions (supports `stream: true`) |
+
+All `/v1/*` endpoints are mounted by `main.py` via:
+
+```python
+app.include_router(v1_router, prefix="/v1")
+```
+
+---
+
+## Agent Resolution
+
+Each request can target a specific agent. The resolution order is:
+
+1. **Explicit `agent_id`** field in the request body
+2. **OpenAI `model` field** (OpenWebUI sends this — mapped to `agent_id` if a matching agent is registered)
+3. **Fallback** to the `"naked"` agent (a plain LLM with no tools)
+
+This means an OpenWebUI client can simply set `model: "media-agent"` and get
+the full Media Agent with Seerr tools.
+
+---
+
+## Request Flow
+
+```
+Client (OpenWebUI / HTTP)
+  │  POST /v1/chat/completions
+  │  { model: "media-agent", messages: [...], stream: true/false }
+  ▼
+chat_completions()
+  │  1. _resolve_agent(req.model) → Agent(id="media-agent", skills=[...])
+  │  2. get_agent_graph("media-agent", request)
+  │     → lazy-compiled LangGraph StateGraph, cached on app.state
+  │  3. stream=True  → _stream_graph(graph, messages)  → SSE token stream
+  │     stream=False → _invoke_graph(graph, messages)   → plain response
+  ▼
+LangGraph StateGraph  (src/graph.py)
+  │
+  ├── agent_node: calls LLM with system prompt + tool definitions
+  │   └── LLM returns text OR tool_calls
+  │
+  ├── _should_continue: if tool_calls → tool_node, else → END
+  │
+  └── tool_node: executes tool via agents/skills system → ToolMessage
+      └── loops back to agent_node with the result
+```
+
+For a detailed walkthrough, see [api.md](../api.md).
+
+---
+
+## Streaming
+
+Two streaming modes exist:
+
+### SSE (Server-Sent Events) — `/v1/chat`
+```
+data: {"token": "Here"}
+data: {"token": " are"}
+data: {"token": " the"}
+...
+data: [DONE]
+```
+
+The graph runs to completion (tools execute silently), then the final text is
+yielded token-by-token as SSE events.
+
+### OpenAI-compatible — `/v1/chat/completions` with `stream: true`
+```
+data: {"id":"...","object":"chat.completion.chunk","choices":[{"delta":{"content":"Hello"}}]}
+data: {"id":"...","object":"chat.completion.chunk","choices":[{"delta":{"content":"!"}}]}
+data: [DONE]
+```
+
+> **Future improvement:** true token-level streaming (tokens appear as the LLM
+> generates them) would require using `langchain-openai`'s `ChatOpenAI` in
+> place of the raw `openai` client. The current approach avoids adding that
+> dependency.
+
+---
+
+## Dependencies
+
+Endpoints receive shared singletons via FastAPI `Depends`:
+
+- **`get_llm_client(request)`** → returns `request.app.state.llm_client` (OpenAI client singleton, created once in `main.py`)
+- **`get_agent_graph(agent_id, request)`** → returns a lazy-compiled LangGraph from `request.app.state.agent_graphs`