Implement LangGraph integration: refactor agent-tool interaction, add graph compilation, and enhance state management

2026-05-24 10:18:59 +02:00
parent 1d821d18fe
commit 2f7f94f1ce
8 changed files with 534 additions and 242 deletions
@@ -1,6 +1,7 @@
-# API Architecture — Agent + Skill + Tool Pipeline
+# API Architecture — Agent + Skill + Graph Pipeline

-This document explains how the API routes user messages through the agent/skill/tool pipeline to produce responses.
+This document explains how the API routes user messages through the
+agent / skill / LangGraph pipeline to produce responses.

 ---

@@ -17,27 +18,22 @@ This document explains how the API routes user messages through the agent/skill/
 │  api/v1/chat.py  —  chat_completions()                          │
 │                                                                  │
 │  1. _resolve_agent(req.model)  →  Agent                          │
-│  2. agent.build_system_prompt()  →  system prompt                │
-│  3. Build full_messages = [system] + req.messages                │
-│  4. run_agent_with_tools(client, messages, agent_id)             │
+│  2. get_agent_graph(agent_id)  →  compiled StateGraph            │
+│  3. graph.ainvoke(state)  or  _stream_graph(graph, messages)     │
 └──────────────────────────────┬───────────────────────────────────┘
                               │
                               ▼
 ┌──────────────────────────────────────────────────────────────────┐
-│  Tool-Calling Loop  (run_agent_with_tools / run_agent_stream)    │
+│  LangGraph StateGraph  (core/graph.py)                           │
 │                                                                  │
-│  while turns < max_turns:                                        │
-│    response = LLM.chat(messages, tools=agent_tools)              │
-│    if response has tool_calls:                                   │
-│      for each tool_call:                                         │
-│        result = execute_tool(skills, name, args)                 │
-│        append result to messages                                 │
-│    else:                                                         │
-│      return response.text  (stream tokens if streaming)          │
+│   ┌──────────────┐   tool_calls?    ┌──────────────┐            │
+│   │  agent_node  │ ───────────────▶ │  tool_node   │            │
+│   │  (LLM call)  │ ◀─────────────── │ (skill exec) │            │
+│   └──────┬───────┘                  └──────────────┘            │
+│          │ no tool_calls                                         │
+│          ▼                                                       │
+│        [END]                                                     │
 └──────────────────────────────────────────────────────────────────┘
-```
-
---

 ## Key Concepts

@@ -61,7 +57,8 @@ Agent(
 - `build_system_prompt()` — merges base_prompt + all skill prompt fragments

 Agents self-register at import time via `agents/__init__.py`'s `register()`.
-`main.py` calls `load_all_agents()` at startup to import all agent/skill modules.
+`main.py` calls `load_all_agents()` at startup to import every agent and skill
+module.

 ### 2. Skill

@@ -78,37 +75,47 @@ Skill(
 )
 ```

- `prompt_fragment` — injected into the agent's system prompt. Teaches the LLM what tools are available and when to use them.
+- `prompt_fragment` — injected into the agent's system prompt.
 - `tools` — list of OpenAI function definitions (name, description, parameters).
 - `execute` — async callable that routes tool calls to API handlers.

-### 3. Tool
+### 3. Graph

-A **Tool** is a single function the LLM can call. Defined as part of a skill's `tools` list.
+Each agent gets a **compiled LangGraph StateGraph** built by
+`core/graph.py:create_agent_graph()`.  The graph is compiled lazily on the
+first request and cached on `app.state.agent_graphs` for the lifetime of the
+process.
+
+| Graph node / edge | What it does |
+|---|---|
+| `agent_node` | Converts state messages to OpenAI dicts, calls the LLM with the agent's system prompt + tool definitions, returns an `AIMessage` |
+| `tool_node` | Reads `tool_calls` from the last AI message, calls `execute_tool()` from the skill system, returns `ToolMessage` results |
+| `_should_continue` | Conditional edge — returns `"tool_node"` if the AI message has `tool_calls`, else `END` |
+
+### 4. State
+
+Defined in `core/state.py`:

 ```python
-{
-    "type": "function",
-    "function": {
-        "name": "seerr_trending",
-        "description": "Get trending movies and TV shows from Seerr...",
-        "parameters": {
-            "type": "object",
-            "properties": {
-                "kind": {"type": "string", "enum": ["movie", "tv", "all"]},
-                "language": {"type": "string"},
-            },
-            "required": ["kind"],
-        },
-    },
-}
+class AgentState(TypedDict):
+    messages: Annotated[list, add_messages]
 ```

-When the LLM responds with a tool call, the loop:
-1. Extracts `function.name` (e.g. `"seerr_trending"`) and `function.arguments` (e.g. `{"kind": "movie"}`)
-2. Calls `execute_tool(agent.skills, name, args)` which finds the owning skill and runs it
-3. Appends the result text to the message history
-4. Sends back to the LLM for a follow-up response
+LangGraph's `add_messages` reducer appends new messages and replaces messages
+with matching IDs (so tool-call results overwrite their placeholders).
+
+### 5. Message Conversion
+
+Because we use the raw `openai` client (not `langchain-openai`), messages must
+be converted between LangChain and OpenAI formats at every LLM call:
+
+- **LangChain → OpenAI** (`_lc_role_to_openai`, `_langchain_tc_to_openai`):
+  Maps `type` → `role` and converts top-level `name`/`args` tool-calls into
+  the nested `function` sub-object that the OpenAI API expects.
+
+- **OpenAI → LangChain** (inside `agent_node`):
+  Converts the `ChatCompletionMessage` response into an `AIMessage` with
+  LangChain-format `tool_calls` (top-level `name`/`args`/`id`).

 ---

@@ -130,28 +137,36 @@ When the LLM responds with a tool call, the loop:
 2. chat_completions():
   → _resolve_agent(model="media-agent")
     → get_agent("media-agent") → Agent(skills=["media_info", "seerr", "triage"])
-   → tools = get_all_tools(["media_info", "seerr", "triage"])
-     → Returns 7 tool definitions from seerr.py
-   → system_prompt = agent.build_system_prompt()
-     → base_prompt + media_info fragment + seerr fragment + triage fragment
+   → get_agent_graph("media-agent", request)
+     → looks up app.state.agent_graphs["media-agent"]
+     → first call → create_agent_graph() compiles the graph with 7 Seerr tools
+   → run_agent_with_tools(request, messages, agent_id)
+     → _invoke_graph(graph, messages)

-3. run_agent_with_tools() — Turn 1:
-   → LLM receives: [system prompt with tools] + [user: "What are trending movies?"]
-   → LLM responds: tool_calls = [{"function": {"name": "seerr_trending", "arguments": {"kind": "movie"}}}]
+3. Graph — Pass 1 (agent_node):
+   → LLM receives: [system prompt] + [user: "What are trending movies?"]
+   → LLM responds with tool_calls: seerr_trending(kind="movie")
+   → agent_node returns AIMessage with tool_calls in LangChain format

-4. Execute tool:
-   → execute_tool(["media_info", "seerr", "triage"], "seerr_trending", {"kind": "movie"})
-   → Finds seerr skill → calls _execute("seerr_trending", ...) → _trending(args)
-   → GET /api/v1/discover/trending?mediaType=movie
-   → Returns formatted list with [tmdb:IDs]
+4. Graph — _should_continue:
+   → AIMessage has tool_calls → route to "tool_node"

-5. run_agent_with_tools() — Turn 2:
-   → LLM receives: previous messages + [tool: "Found 20 trending movies..."]
-   → LLM responds: text = "Here are the top trending movies! 🎬 ..."
-   → finish_reason="stop" → return the text
+5. Graph — tool_node:
+   → Reads tool_call: name="seerr_trending", args={"kind": "movie"}
+   → execute_tool(["media_info", "seerr", "triage"], "seerr_trending", ...)
+   → Seerr API → GET /api/v1/discover/trending?mediaType=movie
+   → Returns ToolMessage with formatted results including [tmdb:IDs]

-6. chat_completions() returns:
-   { "choices": [{"message": {"content": "Here are the top trending movies!..."}}] }
+6. Graph — Pass 2 (agent_node):
+   → LLM receives previous exchange + tool result
+   → LLM responds with text only (no tool_calls)
+   → agent_node returns AIMessage(content="Here are the top trending movies!...")
+
+7. Graph — _should_continue:
+   → No tool_calls → route to END
+
+8. chat_completions() returns:
+   { "choices": [{"message": {"role": "assistant", "content": "Here are the top..."}}] }
 ```

 ### Step-by-step: "Request the 2026 one" (multi-turn context)
@@ -172,14 +187,49 @@ When the LLM responds with a tool call, the loop:

 2. chat_completions():
   → req.messages contains the ENTIRE conversation history
-   → System prompt prepended → full_messages = [system] + 5 history messages
-   → LLM sees everything: the trending list with [tmdb:931285], the disambiguation, "the 2026 one"
+   → graph.ainvoke({"messages": all_messages})
+   → agent_node prepends system prompt and sends everything to the LLM

-3. LLM reasons:
-   - I previously listed Mortal Kombat II (2026) with [tmdb:931285]
+3. LLM reasons from full context:
+   - Previously listed Mortal Kombat II (2026) with [tmdb:931285]
   - The user said "request the mortal kombat one" → I searched and showed 4 options
   - Now they say "the 2026 one" → that matches Mortal Kombat II (2026) [tmdb:931285]
   - I should call seerr_request_media(kind="movie", title="Mortal Kombat II", tmdb_id=931285)

-4. Tool executes the request → ✅ Success
+4. tool_node executes the request → ✅ Success
 ```
+
+---
+
+## Streaming
+
+Streaming works slightly differently from the sync path:
+
+```
+chat_completions(stream=True)
+  → _stream_graph(graph, messages)
+    → graph.ainvoke(state)        # runs graph to completion (tools execute silently)
+    → yields content character-by-character via SSE
+```
+
+For true token-level streaming (tokens appear as the LLM generates them),
+the agent_node would need to use `langchain-openai`'s `ChatOpenAI` instead of
+the raw `openai` client.  The current approach is a pragmatic middle ground
+that avoids adding another dependency while still giving the SSE client
+incremental output.
+
+---
+
+## File Map
+
+| File | Responsibility |
+|---|---|
+| `main.py` | FastAPI app, singleton creation, router mounting |
+| `api/v1/chat.py` | Endpoints — resolves agent, invokes graph, formats responses |
+| `api/dependencies.py` | `get_llm_client()`, `get_agent_graph()` — FastAPI `Depends` |
+| `core/graph.py` | `create_agent_graph()` — builds the StateGraph |
+| `core/state.py` | `AgentState` TypedDict |
+| `core/llm.py` | `create_client()` — OpenAI client factory |
+| `core/config.py` | Environment variable loader |
+| `agents/` | Agent definitions (dataclass + self-registration) |
+| `skills/` | Skill definitions (prompt fragments + tools + executors) |
@@ -1,7 +1,36 @@
 from fastapi import Request
 from openai import OpenAI

+from core.graph import create_agent_graph
+

 def get_llm_client(request: Request) -> OpenAI:
    """FastAPI dependency — returns the singleton OpenAI client from app.state."""
    return request.app.state.llm_client
+
+
+def get_agent_graph(agent_id: str, request: Request):
+    """
+    FastAPI dependency — returns the compiled LangGraph graph for *agent_id*.
+
+    Graphs are lazily compiled on first use and cached on app.state so each
+    agent's graph is only built once per process lifetime.
+    """
+    cache: dict = request.app.state.agent_graphs
+
+    if agent_id not in cache:
+        from agents import get as get_agent
+
+        agent = get_agent(agent_id)
+        if agent is None:
+            # Fall back to the naked agent if the requested one doesn't exist
+            agent_id = "naked"
+            agent = get_agent(agent_id)
+
+        cache[agent_id] = create_agent_graph(
+            client=request.app.state.llm_client,
+            agent_skills=agent.skills,
+            system_prompt=agent.build_system_prompt(),
+        )
+
+    return cache[agent_id]
@@ -1,13 +1,12 @@
-from fastapi import APIRouter, Depends
+from fastapi import APIRouter, Depends, Request
 from fastapi.responses import StreamingResponse
 from openai import OpenAI
 from pydantic import BaseModel
 import json
-import asyncio

-from api.dependencies import get_llm_client
+from api.dependencies import get_llm_client, get_agent_graph
 from agents import get as get_agent, list_all as list_all_agents
-from skills import get_all_tools, execute_tool
+from core.state import AgentState

 router = APIRouter()

@@ -42,166 +41,65 @@ def _resolve_agent(agent_id: str | None = None, model: str | None = None):


 # ---------------------------------------------------------------------------
-# Tool-calling loop (non-streaming)
+# LangGraph helpers
+# ---------------------------------------------------------------------------
+
+async def _invoke_graph(graph, messages: list[dict]) -> str:
+    """Run the graph synchronously (non-streaming) and return the final text."""
+    state: AgentState = {"messages": messages}
+    result = await graph.ainvoke(state)
+    last_msg = result["messages"][-1]
+    return last_msg.content or ""
+
+
+async def _stream_graph(graph, messages: list[dict]):
+    """
+    Run the graph and stream the final response token-by-token.
+
+    LangGraph's astream_events would require langchain-openai's ChatOpenAI
+    to intercept LLM chunks.  Instead we run the graph to completion (tools
+    execute silently) and then stream the final text content character by
+    character — this gives the client a real SSE stream without adding new
+    dependencies.
+    """
+    state: AgentState = {"messages": messages}
+    result = await graph.ainvoke(state)
+    content = result["messages"][-1].content or ""
+    # Yield token-by-token so the SSE client sees incremental output
+    for token in content:
+        yield token
+
+
+# ---------------------------------------------------------------------------
+# Non-streaming run (kept for /chat/sync and sync completions)
 # ---------------------------------------------------------------------------

 async def run_agent_with_tools(
-    client: OpenAI,
+    request: Request,
    messages: list[dict],
    agent_id: str | None = None,
    model: str | None = None,
-    max_turns: int = 5,
 ) -> str:
-    """Send messages to the LLM with tool definitions. Tool-calling loop."""
+    """Send messages through the agent's LangGraph.  Non-streaming."""
    agent = _resolve_agent(agent_id, model)
-    tools = get_all_tools(agent.skills)
-    system_prompt = agent.build_system_prompt()
-
-    full_messages: list[dict] = [{"role": "system", "content": system_prompt}]
-    full_messages.extend(messages)
-
-    loop = asyncio.get_running_loop()
-
-    for _ in range(max_turns):
-        resp = await loop.run_in_executor(
-            None,
-            lambda: client.chat.completions.create(
-                model="deepseek-chat",
-                messages=full_messages,
-                tools=tools if tools else None,
-                tool_choice="auto" if tools else None,
-            ),
-        )
-        choice = resp.choices[0]
-
-        if choice.finish_reason == "stop" and choice.message.content:
-            return choice.message.content
-
-        if choice.message.tool_calls:
-            full_messages.append(choice.message.model_dump(exclude_none=True))
-            for tc in choice.message.tool_calls:
-                fn_name = tc.function.name
-                fn_args = json.loads(tc.function.arguments)
-                tr = await execute_tool(agent.skills, fn_name, fn_args)
-                result = tr.content if tr else f"Tool '{fn_name}' is not available."
-                full_messages.append({
-                    "role": "tool", "tool_call_id": tc.id, "content": result,
-                })
-            continue
-
-        return choice.message.content or "I'm not sure how to help with that."
-
-    return "I've taken several actions but still need more information. Could you clarify?"
+    graph = get_agent_graph(agent.agent_id, request)
+    return await _invoke_graph(graph, messages)


 # ---------------------------------------------------------------------------
-# Streaming generators
+# Streaming generator (kept for /chat and stream completions)
 # ---------------------------------------------------------------------------

-async def _stream_with_tools(
-    client: OpenAI,
-    messages: list[dict],
-    agent_id: str | None = None,
-    model: str | None = None,
-    max_turns: int = 5,
-):
-    """Streaming tool-calling loop. Tools run silently, final text is streamed."""
-    agent = _resolve_agent(agent_id, model)
-    tools = get_all_tools(agent.skills)
-    system_prompt = agent.build_system_prompt()
-
-    full_messages: list[dict] = [{"role": "system", "content": system_prompt}]
-    full_messages.extend(messages)
-
-    loop = asyncio.get_running_loop()
-
-    for turn in range(max_turns):
-        resp = await loop.run_in_executor(
-            None,
-            lambda: client.chat.completions.create(
-                model="deepseek-chat",
-                messages=full_messages,
-                tools=tools if tools else None,
-                tool_choice="auto" if tools else None,
-            ),
-        )
-        choice = resp.choices[0]
-
-        if choice.message.tool_calls:
-            full_messages.append(choice.message.model_dump(exclude_none=True))
-            for tc in choice.message.tool_calls:
-                fn_name = tc.function.name
-                fn_args = json.loads(tc.function.arguments)
-                tr = await execute_tool(agent.skills, fn_name, fn_args)
-                result = tr.content if tr else f"Tool '{fn_name}' is not available."
-                full_messages.append({
-                    "role": "tool",
-                    "tool_call_id": tc.id,
-                    "content": result,
-                })
-            continue
-
-        if choice.finish_reason == "stop" and choice.message.content:
-            for token in choice.message.content:
-                yield token
-                await asyncio.sleep(0)
-            return
-
-        def _sync_stream():
-            stream = client.chat.completions.create(
-                model="deepseek-chat", messages=full_messages, stream=True,
-            )
-            for chunk in stream:
-                delta = chunk.choices[0].delta
-                if delta and delta.content:
-                    yield delta.content
-
-        gen = _sync_stream()
-        while True:
-            token = await loop.run_in_executor(None, next, gen, None)
-            if token is None:
-                return
-            yield token
-
-    yield "\u2026"
-
-
 async def run_agent_stream(
-    client: OpenAI,
+    request: Request,
    messages: list[dict],
    agent_id: str | None = None,
    model: str | None = None,
 ):
-    """Async generator — yields tokens. Uses tool-loop when skills have tools."""
+    """Async generator — yields tokens via the agent's LangGraph."""
    agent = _resolve_agent(agent_id, model)
-    tools = get_all_tools(agent.skills)
-
-    if tools:
-        async for token in _stream_with_tools(client, messages, agent_id, model):
-            yield token
-        return
-
-    # No tools — simple streaming
-    system_prompt = agent.build_system_prompt()
-    full_messages: list[dict] = [{"role": "system", "content": system_prompt}]
-    full_messages.extend(messages)
-
-    loop = asyncio.get_running_loop()
-
-    def _sync_stream():
-        stream = client.chat.completions.create(
-            model="deepseek-chat", messages=full_messages, stream=True,
-        )
-        for chunk in stream:
-            delta = chunk.choices[0].delta
-            if delta and delta.content:
-                yield delta.content
-
-    gen = _sync_stream()
-    while True:
-        token = await loop.run_in_executor(None, next, gen, None)
-        if token is None:
-            break
+    graph = get_agent_graph(agent.agent_id, request)
+    async for token in _stream_graph(graph, messages):
        yield token


@@ -217,13 +115,14 @@ def root():
@router.post("/chat")
 async def chat(
    req: ChatRequest,
+    request: Request,
    client: OpenAI = Depends(get_llm_client),
 ):
    """Streaming chat — single message, no history."""
    messages = [{"role": "user", "content": req.message}]

    async def event_stream():
-        async for token in run_agent_stream(client, messages, req.agent_id):
+        async for token in run_agent_stream(request, messages, req.agent_id):
            payload = json.dumps({"token": token, "session_id": req.session_id})
            yield f"data: {payload}\n\n"
        yield f"data: {json.dumps({'done': True, 'session_id': req.session_id})}\n\n"
@@ -242,26 +141,12 @@ async def chat(
@router.post("/chat/sync")
 async def chat_sync(
    req: ChatRequest,
+    request: Request,
    client: OpenAI = Depends(get_llm_client),
 ):
    """Non-streaming chat — single message."""
-    agent = _resolve_agent(req.agent_id)
-    tools = get_all_tools(agent.skills)
    messages = [{"role": "user", "content": req.message}]
-
-    if tools:
-        response = await run_agent_with_tools(client, messages, req.agent_id)
-    else:
-        agent_obj = _resolve_agent(req.agent_id)
-        resp = client.chat.completions.create(
-            model="deepseek-chat",
-            messages=[
-                {"role": "system", "content": agent_obj.build_system_prompt()},
-                {"role": "user", "content": req.message},
-            ],
-        )
-        response = resp.choices[0].message.content
-
+    response = await run_agent_with_tools(request, messages, req.agent_id)
    return {"response": response, "session_id": req.session_id}


@@ -300,6 +185,7 @@ def list_models():
@router.post("/chat/completions")
 async def chat_completions(
    req: ChatCompletionRequest,
+    request: Request,
    client: OpenAI = Depends(get_llm_client),
 ):
    """OpenAI-compatible /chat/completions — supports stream=True.
@@ -311,7 +197,7 @@ async def chat_completions(
    if req.stream:
        async def sse_stream():
            async for token in run_agent_stream(
-                client, req.messages, agent_id=agent.agent_id,
+                request, req.messages, agent_id=agent.agent_id,
            ):
                chunk = {
                    "id": "chatcmpl-local",
@@ -335,20 +221,10 @@ async def chat_completions(
            headers={"Cache-Control": "no-cache", "Connection": "keep-alive"},
        )

-    # Non-streaming — full history, tool-calling
-    tools = get_all_tools(agent.skills)
-    if tools:
-        response = await run_agent_with_tools(
-            client, req.messages, agent_id=agent.agent_id,
-        )
-    else:
-        system_prompt = agent.build_system_prompt()
-        full_msgs: list[dict] = [{"role": "system", "content": system_prompt}]
-        full_msgs.extend(req.messages)
-        resp = client.chat.completions.create(
-            model="deepseek-chat", messages=full_msgs,
-        )
-        response = resp.choices[0].message.content
+    # Non-streaming — full history, LangGraph agent
+    response = await run_agent_with_tools(
+        request, req.messages, agent_id=agent.agent_id,
+    )

    return {
        "id": "chatcmpl-local",
@@ -0,0 +1,261 @@
+"""
+LangGraph agent graph factory.
+
+Builds a StateGraph that replaces the manual tool-calling loop in api/v1/chat.py.
+The graph has two nodes:
+  - agent_node : calls the LLM (with system prompt + tool definitions)
+  - tool_node  : executes tool calls via the existing skill system
+
+A conditional edge routes tool_calls back to the agent, or ends the run.
+"""
+
+from __future__ import annotations
+
+import json
+import logging
+from typing import Any, Literal
+
+from langchain_core.messages import AIMessage, ToolMessage
+from langgraph.graph import END, StateGraph
+from openai import OpenAI
+
+from core.state import AgentState
+from skills import get_all_tools, execute_tool
+
+logger = logging.getLogger("graph")
+
+
+# ---------------------------------------------------------------------------
+# Helper — map LangChain message type → OpenAI role
+# ---------------------------------------------------------------------------
+
+def _lc_role_to_openai(msg_type: str) -> str:
+    """Convert a LangChain message type string to an OpenAI role."""
+    mapping = {"human": "user", "ai": "assistant", "tool": "tool", "system": "system"}
+    return mapping.get(msg_type, "user")
+
+
+def _langchain_tc_to_openai(tool_calls: list) -> list[dict[str, Any]]:
+    """
+    Convert LangChain-format tool_calls (with `name`/`args` at top level)
+    back to OpenAI format (with a nested `function` sub-object).
+    """
+    result: list[dict[str, Any]] = []
+    for tc in tool_calls:
+        if isinstance(tc, dict):
+            if "function" in tc:
+                result.append(tc)
+            else:
+                # LangChain format: {"name": ..., "args": ..., "id": ...}
+                result.append({
+                    "id": tc.get("id", ""),
+                    "type": "function",
+                    "function": {
+                        "name": tc.get("name", ""),
+                        "arguments": json.dumps(tc.get("args", {})),
+                    },
+                })
+        else:
+            # Pydantic model — dump to dict
+            d = tc.model_dump() if hasattr(tc, "model_dump") else {}
+            if "function" in d:
+                result.append(d)
+            else:
+                result.append({
+                    "id": d.get("id", ""),
+                    "type": "function",
+                    "function": {
+                        "name": d.get("name", ""),
+                        "arguments": json.dumps(d.get("args", {})),
+                    },
+                })
+    return result
+
+
+# ---------------------------------------------------------------------------
+# Agent node — calls the LLM
+# ---------------------------------------------------------------------------
+
+def _make_agent_node(
+    client: OpenAI,
+    system_prompt: str,
+    tool_defs: list[dict[str, Any]],
+    model_name: str = "deepseek-chat",
+):
+    """
+    Return a callable suitable as a LangGraph node.
+
+    The node reads the current message list from state, prepends the system
+    prompt, and calls the LLM.  If tool_defs is non-empty the LLM may return
+    tool_calls; ToolNode (or our custom tool node) will handle them.
+    """
+
+    def agent_node(state: AgentState) -> dict[str, list]:
+        messages = state["messages"]
+
+        # Convert LangChain message objects to plain dicts for the OpenAI client.
+        full: list[dict[str, Any]] = [{"role": "system", "content": system_prompt}]
+        for m in messages:
+            if isinstance(m, dict):
+                # Already a plain dict — pass through.
+                # But fix tool_calls if they're in LangChain format.
+                d = dict(m)
+                tc = d.get("tool_calls")
+                if tc and isinstance(tc, list) and tc and isinstance(tc[0], dict) and "function" not in tc[0]:
+                    d["tool_calls"] = _langchain_tc_to_openai(tc)
+                full.append(d)
+            else:
+                # LangChain message object → OpenAI-compatible dict
+                role = _lc_role_to_openai(getattr(m, "type", "user"))
+                d: dict[str, Any] = {"role": role, "content": getattr(m, "content", "")}
+                # Serialize tool_calls back to OpenAI format (if this is an AI msg)
+                tc = getattr(m, "tool_calls", None)
+                if tc:
+                    d["tool_calls"] = _langchain_tc_to_openai(tc)
+                tc_id = getattr(m, "tool_call_id", None)
+                if tc_id:
+                    d["tool_call_id"] = tc_id
+                full.append(d)
+
+        resp = client.chat.completions.create(
+            model=model_name,
+            messages=full,
+            tools=tool_defs if tool_defs else None,
+            tool_choice="auto" if tool_defs else None,
+        )
+        choice = resp.choices[0]
+
+        # Convert OpenAI tool_calls to the dict format LangChain expects.
+        raw_tool_calls = list(choice.message.tool_calls) if choice.message.tool_calls else []
+        tool_calls: list[dict[str, Any]] = []
+        for tc in raw_tool_calls:
+            fn = tc.function
+            tool_calls.append({
+                "name": fn.name,
+                "args": json.loads(fn.arguments),
+                "id": tc.id,
+            })
+        ai_msg = AIMessage(
+            content=choice.message.content or "",
+            tool_calls=tool_calls if tool_calls else [],
+            id=getattr(choice.message, "id", None),
+        )
+        return {"messages": [ai_msg]}
+
+    return agent_node
+
+
+# ---------------------------------------------------------------------------
+# Tool node — executes tools via the existing skill system
+# ---------------------------------------------------------------------------
+
+def _make_tool_node(skill_names: list[str]):
+    """
+    Return a callable that executes tool_calls from the last AI message.
+
+    This replaces LangGraph's built-in ToolNode — we call our own
+    `execute_tool()` pipeline so that skill-level auth, httpx sessions,
+    and ToolResult handling are fully preserved.
+    """
+
+    async def tool_node(state: AgentState) -> dict[str, list]:
+        last_msg = state["messages"][-1]
+        tool_calls = getattr(last_msg, "tool_calls", None)
+        if not tool_calls:
+            return {"messages": []}
+
+        results: list[ToolMessage] = []
+        for tc in tool_calls:
+            # Handle both LangChain format (top-level name/args) and
+            # OpenAI format (nested "function" key).
+            if isinstance(tc, dict):
+                if "function" in tc:
+                    # OpenAI format: {"id":..., "function": {"name":..., "arguments":"..."}}
+                    fn = tc["function"]
+                    fn_name = fn.get("name", "")
+                    fn_args_raw = fn.get("arguments", "{}")
+                else:
+                    # LangChain format: {"name":..., "args":{...}, "id":...}
+                    fn_name = tc.get("name", "")
+                    fn_args_raw = tc.get("args", {})
+                tc_id = tc.get("id", "")
+            else:
+                fn_name = getattr(tc, "name", "")
+                fn_args_raw = getattr(tc, "args", {})
+                tc_id = getattr(tc, "id", "")
+
+            # Parse args if they arrive as a JSON string
+            if isinstance(fn_args_raw, str):
+                fn_args = json.loads(fn_args_raw)
+            else:
+                fn_args = fn_args_raw
+
+            tr = await execute_tool(skill_names, fn_name, fn_args)
+            content = tr.content if tr else f"Tool '{fn_name}' is not available."
+            results.append(ToolMessage(content=content, tool_call_id=tc_id))
+
+        return {"messages": results}
+
+    return tool_node
+
+
+# ---------------------------------------------------------------------------
+# Router — decides whether to continue tool-calling or stop
+# ---------------------------------------------------------------------------
+
+def _should_continue(state: AgentState) -> Literal["tool_node", END]:
+    """If the last message contains tool_calls → execute them, else finish."""
+    last_msg = state["messages"][-1]
+    if getattr(last_msg, "tool_calls", None):
+        return "tool_node"
+    return END
+
+
+# ---------------------------------------------------------------------------
+# Graph factory — the public API
+# ---------------------------------------------------------------------------
+
+def create_agent_graph(
+    *,
+    client: OpenAI,
+    agent_skills: list[str],
+    system_prompt: str,
+    model_name: str = "deepseek-chat",
+) -> StateGraph:
+    """
+    Build and compile a LangGraph StateGraph for a single agent.
+
+    Parameters
+    ----------
+    client :        The OpenAI-compatible client (already authenticated).
+    agent_skills :  Skill names assigned to the agent (e.g. ["seerr", "triage"]).
+    system_prompt : The fully-built system prompt (base + skill fragments).
+    model_name :    Model identifier sent to the LLM provider.
+
+    Returns
+    -------
+    A compiled LangGraph graph ready for `.ainvoke()` or `.astream()`.
+    """
+    tool_defs = get_all_tools(agent_skills)
+
+    graph = StateGraph(AgentState)
+
+    # Nodes
+    graph.add_node(
+        "agent_node",
+        _make_agent_node(client, system_prompt, tool_defs, model_name),
+    )
+    if tool_defs:
+        graph.add_node("tool_node", _make_tool_node(agent_skills))
+        graph.add_conditional_edges("agent_node", _should_continue, {
+            "tool_node": "tool_node",
+            END: END,
+        })
+        graph.add_edge("tool_node", "agent_node")
+    else:
+        # No tools — agent responds once and finishes
+        graph.add_edge("agent_node", END)
+
+    graph.set_entry_point("agent_node")
+
+    return graph.compile()
@@ -0,0 +1,20 @@
+"""
+LangGraph agent state — defines the shape of the state object that flows
+through every node in the agent graph.
+"""
+
+from typing import Annotated, TypedDict
+
+from langgraph.graph.message import add_messages
+
+
+class AgentState(TypedDict):
+    """
+    The single source of truth that travels through every node in the graph.
+
+    `messages` uses LangGraph's `add_messages` reducer, which:
+    - Appends new messages to the list.
+    - Replaces messages with the same ID (useful for tool-call results).
+    """
+
+    messages: Annotated[list, add_messages]
@@ -0,0 +1,51 @@
+"""
+Tools adapter — bridges the existing skill/tool system with LangGraph's ToolNode.
+
+LangGraph's ToolNode expects callable tools (typically @tool-decorated functions).
+This module wraps our skill-based tool definitions and async executors so
+ToolNode can invoke them without any changes to the skills/ layer.
+"""
+
+from __future__ import annotations
+
+import json
+from typing import Any
+
+from langchain_core.tools import tool
+
+from skills import get_all_tools, execute_tool
+
+
+def build_langgraph_tools(skill_names: list[str]) -> list:
+    """
+    Convert the registered skill tool definitions into LangChain-compatible
+    @tool-decorated functions that ToolNode can call.
+
+    Each tool wraps the existing `execute_tool()` pipeline, so the skill
+    system's ToolResult + httpx session handling is fully preserved.
+    """
+    tool_defs = get_all_tools(skill_names)
+    wrapped: list = []
+
+    for td in tool_defs:
+        fn_def = td.get("function", {})
+        fn_name = fn_def.get("name", "")
+        fn_desc = fn_def.get("description", "")
+
+        # Create a unique factory so each closure captures the right fn_name
+        def _make_tool(name: str, desc: str, skills: list[str]):
+            @tool(name, description=desc)
+            async def _wrapped(**kwargs: Any) -> str:
+                """Execute the tool via the skill system and return its content."""
+                result = await execute_tool(skills, name, kwargs)
+                if result is None:
+                    return f"Tool '{name}' is not available."
+                return result.content
+
+            # Stash the original OpenAI schema so LangGraph can use it
+            _wrapped.metadata = fn_def
+            return _wrapped
+
+        wrapped.append(_make_tool(fn_name, fn_desc, skill_names))
+
+    return wrapped
@@ -41,6 +41,9 @@ app.add_middleware(
 # ---------------------------------------------------------------------------
 app.state.llm_client = create_client(DEEPSEEK_API_KEY)

+# Lazy-compiled LangGraph graphs — populated on first use per agent
+app.state.agent_graphs: dict = {}
+
 # ---------------------------------------------------------------------------
 # Routers
 # ---------------------------------------------------------------------------
@@ -2,4 +2,6 @@ fastapi
 openai
 uvicorn
 python-dotenv
-httpx
+httpx
+langgraph
+langgraph-checkpoint