Middleware - Docs by LangChain

Middleware provides a way to more tightly control what happens inside the agent. The core agent loop involves calling a model, letting it choose tools to execute, and then finishing when it calls no more tools:

Middleware exposes hooks before and after each of those steps:

What can middleware do?

Monitor

Track agent behavior with logging, analytics, and debugging

Modify

Transform prompts, tool selection, and output formatting

Control

Add retries, fallbacks, and early termination logic

Enforce

Apply rate limits, guardrails, and PII detection

Add middleware by passing them to create_agent:

from langchain.agents import create_agent
from langchain.agents.middleware import SummarizationMiddleware, HumanInTheLoopMiddleware

agent = create_agent(
    model="gpt-4o",
    tools=[...],
    middleware=[
        SummarizationMiddleware(...),
        HumanInTheLoopMiddleware(...)
    ],
)

Refer to the docs below for a list of parameters and configuration details for each type of middleware.

Built-in middleware

LangChain provides prebuilt middleware for common use cases:

Summarization

Automatically summarize conversation history when approaching token limits.

Perfect for:

Long-running conversations that exceed context windows
Multi-turn dialogues with extensive history
Applications where preserving full conversation context matters

The summarization middleware monitors message token counts and automatically summarizes older messages when thresholds are reached. You can configure when summarization triggers and how much context to preserve. Trigger conditions control when summarization runs. You can specify:

A single condition object (all properties must be met - AND logic)
An array of conditions (any condition must be met - OR logic)

Each condition can use fraction (of model’s context size), tokens (absolute count), or messages (message count). At least one property must be specified per condition. Keep conditions control how much context to preserve after summarization. Specify exactly one of:

fraction - Fraction of model’s context size to keep
tokens - Absolute token count to keep
messages - Number of recent messages to keep

from langchain.agents import create_agent
from langchain.agents.middleware import SummarizationMiddleware


# Single condition: trigger if tokens >= 4000 AND messages >= 10
agent = create_agent(
    model="gpt-4o",
    tools=[weather_tool, calculator_tool],
    middleware=[
        SummarizationMiddleware(
            model="gpt-4o-mini",
            trigger={"tokens": 4000, "messages": 10},
            keep={"messages": 20},
        ),
    ],
)

# Multiple conditions: trigger if (tokens >= 5000 AND messages >= 3) OR (tokens >= 3000 AND messages >= 6)
agent2 = create_agent(
    model="gpt-4o",
    tools=[weather_tool, calculator_tool],
    middleware=[
        SummarizationMiddleware(
            model="gpt-4o-mini",
            trigger=[
                {"tokens": 5000, "messages": 3},
                {"tokens": 3000, "messages": 6},
            ],
            keep={"messages": 20},
        ),
    ],
)

# Using fractional limits based on model context size
agent3 = create_agent(
    model="gpt-4o",
    tools=[weather_tool, calculator_tool],
    middleware=[
        SummarizationMiddleware(
            model="gpt-4o-mini",
            trigger={"fraction": 0.8},  # Trigger at 80% of context size
            keep={"fraction": 0.3},  # Keep 30% of context size
        ),
    ],
)

Configuration options

model

string | BaseChatModel

required

Model for generating summaries. Can be a model identifier string (e.g., 'openai:gpt-4o-mini') or a BaseChatModel instance. See init_chat_model for more information.

trigger

dict | list[dict]

Conditions for triggering summarization. Can be:

A single condition dict (all properties must be met - AND logic)
A list of condition dicts (any condition must be met - OR logic)

Each condition can include:

fraction (float): Fraction of model’s context size (0-1)
tokens (int): Absolute token count
messages (int): Message count

At least one property must be specified per condition. If not provided, summarization will not trigger automatically.

keep

dict

default:"{messages: 20}"

How much context to preserve after summarization. Specify exactly one of:

fraction (float): Fraction of model’s context size to keep (0-1)
tokens (int): Absolute token count to keep
messages (int): Number of recent messages to keep

token_counter

function

Custom token counting function. Defaults to character-based counting.

summary_prompt

string

Custom prompt template for summarization. Uses built-in template if not specified. The template should include {messages} placeholder where conversation history will be inserted.

trim_tokens_to_summarize

number

default:"4000"

Maximum number of tokens to include when generating the summary. Messages will be trimmed to fit this limit before summarization.

summary_prefix

string

Prefix to add to the summary message. If not provided, a default prefix is used.

max_tokens_before_summary

number

deprecated

Deprecated: Use trigger: {"tokens": value} instead. Token threshold for triggering summarization.

messages_to_keep

number

deprecated

Deprecated: Use keep: {"messages": value} instead. Recent messages to preserve.

Human-in-the-loop

Pause agent execution for human approval, editing, or rejection of tool calls before they execute.

Perfect for:

High-stakes operations requiring human approval (database writes, financial transactions)
Compliance workflows where human oversight is mandatory
Long running conversations where human feedback is used to guide the agent

from langchain.agents import create_agent
from langchain.agents.middleware import HumanInTheLoopMiddleware
from langgraph.checkpoint.memory import InMemorySaver


agent = create_agent(
    model="gpt-4o",
    tools=[read_email_tool, send_email_tool],
    checkpointer=InMemorySaver(),
    middleware=[
        HumanInTheLoopMiddleware(
            interrupt_on={
                # Require approval, editing, or rejection for sending emails
                "send_email_tool": {
                    "allowed_decisions": ["approve", "edit", "reject"],
                },
                # Auto-approve reading emails
                "read_email_tool": False,
            }
        ),
    ],
)

Configuration options

interrupt_on

dict

required

Mapping of tool names to approval configs. Values can be True (interrupt with default config), False (auto-approve), or an InterruptOnConfig object.

description_prefix

string

default:"Tool execution requires approval"

Prefix for action request descriptions

InterruptOnConfig options:

allowed_decisions

list[string]

List of allowed decisions: 'approve', 'edit', or 'reject'

description

string | callable

Static string or callable function for custom description

Important: Human-in-the-loop middleware requires a checkpointer to maintain state across interruptions.See the human-in-the-loop documentation for complete examples and integration patterns.

Anthropic prompt caching

Reduce costs by caching repetitive prompt prefixes with Anthropic models.

Perfect for:

Applications with long, repeated system prompts
Agents that reuse the same context across invocations
Reducing API costs for high-volume deployments

Learn more about Anthropic Prompt Caching strategies and limitations.

from langchain_anthropic import ChatAnthropic
from langchain_anthropic.middleware import AnthropicPromptCachingMiddleware
from langchain.agents import create_agent


LONG_PROMPT = """
Please be a helpful assistant.

<Lots more context ...>
"""

agent = create_agent(
    model=ChatAnthropic(model="claude-sonnet-4-5-20250929"),
    system_prompt=LONG_PROMPT,
    middleware=[AnthropicPromptCachingMiddleware(ttl="5m")],
)

# cache store
agent.invoke({"messages": [HumanMessage("Hi, my name is Bob")]})

# cache hit, system prompt is cached
agent.invoke({"messages": [HumanMessage("What's my name?")]})

Configuration options

type

string

default:"ephemeral"

Cache type. Only 'ephemeral' is currently supported.

ttl

string

default:"5m"

Time to live for cached content. Valid values: '5m' or '1h'

min_messages_to_cache

number

default:"0"

Minimum number of messages before caching starts

unsupported_model_behavior

string

default:"warn"

Behavior when using non-Anthropic models. Options: 'ignore', 'warn', or 'raise'

Model call limit

Limit the number of model calls to prevent infinite loops or excessive costs.

Perfect for:

Preventing runaway agents from making too many API calls
Enforcing cost controls on production deployments
Testing agent behavior within specific call budgets

from langchain.agents import create_agent
from langchain.agents.middleware import ModelCallLimitMiddleware


agent = create_agent(
    model="gpt-4o",
    tools=[],  # Add tools as needed
    middleware=[
        ModelCallLimitMiddleware(
            thread_limit=10,  # Max 10 calls per thread (across runs)
            run_limit=5,  # Max 5 calls per run (single invocation)
            exit_behavior="end",  # Or "error" to raise exception
        ),
    ],
)

Configuration options

thread_limit

number

Maximum model calls across all runs in a thread. Defaults to no limit.

run_limit

number

Maximum model calls per single invocation. Defaults to no limit.

exit_behavior

string

default:"end"

Behavior when limit is reached. Options: 'end' (graceful termination) or 'error' (raise exception)

Tool call limit

Control agent execution by limiting the number of tool calls, either globally across all tools or for specific tools.

Perfect for:

Preventing excessive calls to expensive external APIs
Limiting web searches or database queries
Enforcing rate limits on specific tool usage
Protecting against runaway agent loops

To limit tool calls globally across all tools or for specific tools, set tool_name. For each limit, specify one or both of:

Thread limit (thread_limit) - Max calls across all runs in a conversation. Persists across invocations. Requires a checkpointer.
Run limit (run_limit) - Max calls per single invocation. Resets each turn.

Exit behaviors:

Behavior	Effect	Best For
`'continue'` (default)	Blocks exceeded calls with error messages, agent continues	Most use cases - agent handles limits gracefully
`'error'`	Raises exception immediately	Complex workflows where you want to handle the limit error manually
`'end'`	Stops with ToolMessage + AI message	Single-tool scenarios (errors if other tools pending)

from langchain.agents import create_agent
from langchain.agents.middleware import ToolCallLimitMiddleware


# Global limit: max 20 calls per thread, 10 per run
global_limiter = ToolCallLimitMiddleware(
    thread_limit=20,
    run_limit=10,
)

# Tool-specific limit with default "continue" behavior
search_limiter = ToolCallLimitMiddleware(
    tool_name="search",
    thread_limit=5,
    run_limit=3,
)

# Thread limit only (no per-run limit)
database_limiter = ToolCallLimitMiddleware(
    tool_name="query_database",
    thread_limit=10,
)

# Strict enforcement with "error" behavior
web_scraper_limiter = ToolCallLimitMiddleware(
    tool_name="scrape_webpage",
    run_limit=2,
    exit_behavior="error",
)

# Immediate termination with "end" behavior
critical_tool_limiter = ToolCallLimitMiddleware(
    tool_name="delete_records",
    run_limit=1,
    exit_behavior="end",
)

# Use multiple limiters together
agent = create_agent(
    model="gpt-4o",
    tools=[search_tool, database_tool, scraper_tool],
    middleware=[
        global_limiter,
        search_limiter,
        database_limiter,
        web_scraper_limiter
    ],
)

Configuration options

tool_name

string

Name of specific tool to limit. If not provided, limits apply to all tools globally.

thread_limit

number

Maximum tool calls across all runs in a thread (conversation). Persists across multiple invocations with the same thread ID. Requires a checkpointer to maintain state. None means no thread limit.

run_limit

number

Maximum tool calls per single invocation (one user message → response cycle). Resets with each new user message. None means no run limit.Note: At least one of thread_limit or run_limit must be specified.

exit_behavior

string

default:"continue"

Behavior when limit is reached:

'continue' (default) - Block exceeded tool calls with error messages, let other tools and the model continue. The model decides when to end based on the error messages.
'error' - Raise a ToolCallLimitExceededError exception, stopping execution immediately
'end' - Stop execution immediately with a ToolMessage and AI message for the exceeded tool call. Only works when limiting a single tool; raises NotImplementedError if other tools have pending calls.

Model fallback

Automatically fallback to alternative models when the primary model fails.

Perfect for:

Building resilient agents that handle model outages
Cost optimization by falling back to cheaper models
Provider redundancy across OpenAI, Anthropic, etc.

from langchain.agents import create_agent
from langchain.agents.middleware import ModelFallbackMiddleware


agent = create_agent(
    model="gpt-4o",  # Primary model
    tools=[],  # Add tools as needed
    middleware=[
        ModelFallbackMiddleware(
            "gpt-4o-mini",  # Try first on error
            "claude-3-5-sonnet-20241022",  # Then this
        ),
    ],
)

Configuration options

first_model

string | BaseChatModel

required

First fallback model to try when the primary model fails. Can be a model identifier string (e.g., 'openai:gpt-4o-mini') or a BaseChatModel instance.

*additional_models

string | BaseChatModel

Additional fallback models to try in order if previous models fail

PII detection

Detect and handle Personally Identifiable Information in conversations.

Perfect for:

Healthcare and financial applications with compliance requirements
Customer service agents that need to sanitize logs
Any application handling sensitive user data

from langchain.agents import create_agent
from langchain.agents.middleware import PIIMiddleware


agent = create_agent(
    model="gpt-4o",
    tools=[],  # Add tools as needed
    middleware=[
        # Redact emails in user input
        PIIMiddleware("email", strategy="redact", apply_to_input=True),
        # Mask credit cards (show last 4 digits)
        PIIMiddleware("credit_card", strategy="mask", apply_to_input=True),
        # Custom PII type with regex
        PIIMiddleware(
            "api_key",
            detector=r"sk-[a-zA-Z0-9]{32}",
            strategy="block",  # Raise error if detected
        ),
    ],
)

Configuration options

pii_type

string

required

Type of PII to detect. Can be a built-in type (email, credit_card, ip, mac_address, url) or a custom type name.

strategy

string

default:"redact"

How to handle detected PII. Options:

'block' - Raise exception when detected
'redact' - Replace with [REDACTED_TYPE]
'mask' - Partially mask (e.g., ****-****-****-1234)
'hash' - Replace with deterministic hash

detector

function | regex

Custom detector function or regex pattern. If not provided, uses built-in detector for the PII type.

apply_to_input

boolean

default:"True"

Check user messages before model call

apply_to_output

boolean

default:"False"

Check AI messages after model call

apply_to_tool_results

boolean

default:"False"

Check tool result messages after execution

To-do list

Equip agents with task planning and tracking capabilities for complex multi-step tasks.

Perfect for:

Complex multi-step tasks requiring coordination across multiple tools
Long-running operations where progress visibility is important

Just as humans are more effective when they write down and track tasks, agents benefit from structured task management to break down complex problems, adapt plans as new information emerges, and provide transparency into their workflow. You may have noticed patterns like this in Claude Code, which writes out a to-do list before tackling complex, multi-part tasks.

This middleware automatically provides agents with a write_todos tool and system prompts to guide effective task planning.

from langchain.agents import create_agent
from langchain.agents.middleware import TodoListMiddleware
from langchain_core.messages import HumanMessage
from langchain_core.tools import tool


@tool
def read_file(file_path: str) -> str:
    """Read contents of a file."""
    with open(file_path) as f:
        return f.read()


@tool
def write_file(file_path: str, content: str) -> str:
    """Write content to a file."""
    with open(file_path, 'w') as f:
        f.write(content)
    return f"Wrote {len(content)} characters to {file_path}"


@tool
def run_tests(test_path: str) -> str:
    """Run tests and return results."""
    # Simplified for example
    return "All tests passed!"


agent = create_agent(
    model="gpt-4o",
    tools=[read_file, write_file, run_tests],
    middleware=[TodoListMiddleware()],
)

result = agent.invoke({
    "messages": [HumanMessage("Refactor the authentication module to use async/await and ensure all tests pass")]
})

# The agent will use write_todos to plan and track:
# 1. Read current authentication module code
# 2. Identify functions that need async conversion
# 3. Refactor functions to async/await
# 4. Update function calls throughout codebase
# 5. Run tests and fix any failures

print(result["todos"])  # Track the agent's progress through each step

Configuration options

system_prompt

string

Custom system prompt for guiding todo usage. Uses built-in prompt if not specified.

tool_description

string

Custom description for the write_todos tool. Uses built-in description if not specified.

LLM tool selector

Use an LLM to intelligently select relevant tools before calling the main model.

Perfect for:

Agents with many tools (10+) where most aren’t relevant per query
Reducing token usage by filtering irrelevant tools
Improving model focus and accuracy

from langchain.agents import create_agent
from langchain.agents.middleware import LLMToolSelectorMiddleware


agent = create_agent(
    model="gpt-4o",
    tools=[tool1, tool2, tool3, tool4, tool5, ...],  # Many tools
    middleware=[
        LLMToolSelectorMiddleware(
            model="gpt-4o-mini",  # Use cheaper model for selection
            max_tools=3,  # Limit to 3 most relevant tools
            always_include=["search"],  # Always include certain tools
        ),
    ],
)

Configuration options

model

string | BaseChatModel

Model for tool selection. Can be a model identifier string (e.g., 'openai:gpt-4o-mini') or a BaseChatModel instance. See init_chat_model for more information.Defaults to the agent’s main model.

system_prompt

string

Instructions for the selection model. Uses built-in prompt if not specified.

max_tools

number

Maximum number of tools to select. Defaults to no limit.

always_include

list[string]

List of tool names to always include in the selection

Tool retry

Automatically retry failed tool calls with configurable exponential backoff.

Perfect for:

Handling transient failures in external API calls
Improving reliability of network-dependent tools
Building resilient agents that gracefully handle temporary errors

from langchain.agents import create_agent
from langchain.agents.middleware import ToolRetryMiddleware


agent = create_agent(
    model="gpt-4o",
    tools=[search_tool, database_tool],
    middleware=[
        ToolRetryMiddleware(
            max_retries=3,  # Retry up to 3 times
            backoff_factor=2.0,  # Exponential backoff multiplier
            initial_delay=1.0,  # Start with 1 second delay
            max_delay=60.0,  # Cap delays at 60 seconds
            jitter=True,  # Add random jitter to avoid thundering herd
        ),
    ],
)

Configuration options

max_retries

number

default:"2"

Maximum number of retry attempts after the initial call (3 total attempts with default)

tools

list[BaseTool | str]

Optional list of tools or tool names to apply retry logic to. If None, applies to all tools.

retry_on

tuple[type[Exception], ...] | callable

default:"(Exception,)"

Either a tuple of exception types to retry on, or a callable that takes an exception and returns True if it should be retried.

on_failure

string | callable

default:"return_message"

Behavior when all retries are exhausted. Options:

'return_message' - Return a ToolMessage with error details (allows LLM to handle failure)
'raise' - Re-raise the exception (stops agent execution)
Custom callable - Function that takes the exception and returns a string for the ToolMessage content

backoff_factor

number

default:"2.0"

Multiplier for exponential backoff. Each retry waits initial_delay * (backoff_factor ** retry_number) seconds. Set to 0.0 for constant delay.

initial_delay

number

default:"1.0"

Initial delay in seconds before first retry

max_delay

number

default:"60.0"

Maximum delay in seconds between retries (caps exponential backoff growth)

jitter

boolean

default:"true"

Whether to add random jitter (±25%) to delay to avoid thundering herd

LLM tool emulator

Emulate tool execution using an LLM for testing purposes, replacing actual tool calls with AI-generated responses.

Perfect for:

Testing agent behavior without executing real tools
Developing agents when external tools are unavailable or expensive
Prototyping agent workflows before implementing actual tools

from langchain.agents import create_agent
from langchain.agents.middleware import LLMToolEmulator


agent = create_agent(
    model="gpt-4o",
    tools=[get_weather, search_database, send_email],
    middleware=[
        # Emulate all tools by default
        LLMToolEmulator(),

        # Or emulate specific tools
        # LLMToolEmulator(tools=["get_weather", "search_database"]),

        # Or use a custom model for emulation
        # LLMToolEmulator(model="claude-sonnet-4-5-20250929"),
    ],
)

Configuration options

tools

list[str | BaseTool]

List of tool names (str) or BaseTool instances to emulate. If None (default), ALL tools will be emulated. If empty list, no tools will be emulated.

model

string | BaseChatModel

default:"anthropic:claude-3-5-sonnet-latest"

Model to use for generating emulated tool responses. Can be a model identifier string (e.g., 'openai:gpt-4o-mini') or a BaseChatModel instance. See init_chat_model for more information.

Context editing

Manage conversation context by trimming, summarizing, or clearing tool uses.

Perfect for:

Long conversations that need periodic context cleanup
Removing failed tool attempts from context
Custom context management strategies

from langchain.agents import create_agent
from langchain.agents.middleware import ContextEditingMiddleware, ClearToolUsesEdit


agent = create_agent(
    model="gpt-4o",
    tools=[],  # Add tools as needed
    middleware=[
        ContextEditingMiddleware(
            edits=[
                ClearToolUsesEdit(
                    trigger=2000,        # Lower threshold for demo (default is 100K)
                    keep=3,              # Keep 3 most recent tool results
                    clear_tool_inputs=False,  # Keep tool call arguments for context
                    exclude_tools=[],    # No tools excluded from clearing
                    placeholder="[cleared]",  # Placeholder for cleared results
                ),
            ],
        ),
    ],
)

Configuration options

edits

list[ContextEdit]

default:"[ClearToolUsesEdit()]"

List of @[ContextEdit] strategies to apply

token_count_method

string

default:"approximate"

Token counting method. Options: 'approximate' or 'model'

ClearToolUsesEdit options:

trigger

number

default:"100000"

Token count that triggers the edit. When the conversation exceeds this token count, older tool outputs will be cleared.

clear_at_least

number

default:"0"

Minimum number of tokens to reclaim when the edit runs. If set to 0, clears as much as needed.

keep

number

default:"3"

Number of most recent tool results that must be preserved. These will never be cleared.

clear_tool_inputs

boolean

default:"False"

Whether to clear the originating tool call parameters on the AI message. When True, tool call arguments are replaced with empty objects.

exclude_tools

list[string]

default:"()"

List of tool names to exclude from clearing. These tools will never have their outputs cleared.

placeholder

string

default:"[cleared]"

Placeholder text inserted for cleared tool outputs. This replaces the original tool message content.

Custom middleware

Build custom middleware by implementing hooks that run at specific points in the agent execution flow. You can create middleware in two ways:

Decorator-based - Quick and simple for single-hook middleware
Class-based - More powerful for complex middleware with multiple hooks

Decorator-based middleware

For simple middleware that only needs a single hook, decorators provide the quickest way to add functionality:

from langchain.agents.middleware import before_model, after_model, wrap_model_call
from langchain.agents.middleware import AgentState, ModelRequest, ModelResponse, dynamic_prompt
from langchain.messages import AIMessage
from langchain.agents import create_agent
from langgraph.runtime import Runtime
from typing import Any, Callable


# Node-style: logging before model calls
@before_model
def log_before_model(state: AgentState, runtime: Runtime) -> dict[str, Any] | None:
    print(f"About to call model with {len(state['messages'])} messages")
    return None

# Node-style: validation after model calls
@after_model(can_jump_to=["end"])
def validate_output(state: AgentState, runtime: Runtime) -> dict[str, Any] | None:
    last_message = state["messages"][-1]
    if "BLOCKED" in last_message.content:
        return {
            "messages": [AIMessage("I cannot respond to that request.")],
            "jump_to": "end"
        }
    return None

# Wrap-style: retry logic
@wrap_model_call
def retry_model(
    request: ModelRequest,
    handler: Callable[[ModelRequest], ModelResponse],
) -> ModelResponse:
    for attempt in range(3):
        try:
            return handler(request)
        except Exception as e:
            if attempt == 2:
                raise
            print(f"Retry {attempt + 1}/3 after error: {e}")

# Wrap-style: dynamic prompts
@dynamic_prompt
def personalized_prompt(request: ModelRequest) -> str:
    user_id = request.runtime.context.get("user_id", "guest")
    return f"You are a helpful assistant for user {user_id}. Be concise and friendly."

# Use decorators in agent
agent = create_agent(
    model="gpt-4o",
    middleware=[log_before_model, validate_output, retry_model, personalized_prompt],
    tools=[...],
)

Available decorators

Node-style (run at specific execution points):

@before_agent - Before agent starts (once per invocation)
@before_model - Before each model call
@after_model - After each model response
@after_agent - After agent completes (once per invocation)

Wrap-style (intercept and control execution):

@wrap_model_call - Around each model call
@wrap_tool_call - Around each tool call

Convenience decorators:

@dynamic_prompt - Generates dynamic system prompts (equivalent to @wrap_model_call that modifies the prompt)

When to use decorators

Use decorators when

• You need a single hook
• No complex configuration

Use classes when

• Multiple hooks needed
• Complex configuration
• Reuse across projects (config on init)

Class-based middleware

Two hook styles

Node-style hooks

Run sequentially at specific execution points. Use for logging, validation, and state updates.

Wrap-style hooks

Intercept execution with full control over handler calls. Use for retries, caching, and transformation.

Node-style hooks

Run at specific points in the execution flow:

before_agent - Before agent starts (once per invocation)
before_model - Before each model call
after_model - After each model response
after_agent - After agent completes (up to once per invocation)

Example: Logging middleware

from langchain.agents.middleware import AgentMiddleware, AgentState
from langgraph.runtime import Runtime
from typing import Any

class LoggingMiddleware(AgentMiddleware):
    def before_model(self, state: AgentState, runtime: Runtime) -> dict[str, Any] | None:
        print(f"About to call model with {len(state['messages'])} messages")
        return None

    def after_model(self, state: AgentState, runtime: Runtime) -> dict[str, Any] | None:
        print(f"Model returned: {state['messages'][-1].content}")
        return None

Example: Conversation length limit

from langchain.agents.middleware import AgentMiddleware, AgentState
from langchain.messages import AIMessage
from langgraph.runtime import Runtime
from typing import Any

class MessageLimitMiddleware(AgentMiddleware):
    def __init__(self, max_messages: int = 50):
        super().__init__()
        self.max_messages = max_messages

    def before_model(self, state: AgentState, runtime: Runtime) -> dict[str, Any] | None:
        if len(state["messages"]) == self.max_messages:
            return {
                "messages": [AIMessage("Conversation limit reached.")],
                "jump_to": "end"
            }
        return None

Wrap-style hooks

Intercept execution and control when the handler is called:

wrap_model_call - Around each model call
wrap_tool_call - Around each tool call

You decide if the handler is called zero times (short-circuit), once (normal flow), or multiple times (retry logic). Example: Model retry middleware

from langchain.agents.middleware import AgentMiddleware, ModelRequest, ModelResponse
from typing import Callable

class RetryMiddleware(AgentMiddleware):
    def __init__(self, max_retries: int = 3):
        super().__init__()
        self.max_retries = max_retries

    def wrap_model_call(
        self,
        request: ModelRequest,
        handler: Callable[[ModelRequest], ModelResponse],
    ) -> ModelResponse:
        for attempt in range(self.max_retries):
            try:
                return handler(request)
            except Exception as e:
                if attempt == self.max_retries - 1:
                    raise
                print(f"Retry {attempt + 1}/{self.max_retries} after error: {e}")

Example: Dynamic model selection

from langchain.agents.middleware import AgentMiddleware, ModelRequest, ModelResponse
from langchain.chat_models import init_chat_model
from typing import Callable

class DynamicModelMiddleware(AgentMiddleware):
    def wrap_model_call(
        self,
        request: ModelRequest,
        handler: Callable[[ModelRequest], ModelResponse],
    ) -> ModelResponse:
        # Use different model based on conversation length
        if len(request.messages) > 10:
            request.model = init_chat_model("gpt-4o")
        else:
            request.model = init_chat_model("gpt-4o-mini")

        return handler(request)

Example: Tool call monitoring

from langchain.tools.tool_node import ToolCallRequest
from langchain.agents.middleware import AgentMiddleware
from langchain_core.messages import ToolMessage
from langgraph.types import Command
from typing import Callable

class ToolMonitoringMiddleware(AgentMiddleware):
    def wrap_tool_call(
        self,
        request: ToolCallRequest,
        handler: Callable[[ToolCallRequest], ToolMessage | Command],
    ) -> ToolMessage | Command:
        print(f"Executing tool: {request.tool_call['name']}")
        print(f"Arguments: {request.tool_call['args']}")

        try:
            result = handler(request)
            print(f"Tool completed successfully")
            return result
        except Exception as e:
            print(f"Tool failed: {e}")
            raise

Custom state schema

Middleware can extend the agent’s state with custom properties. Define a custom state type and set it as the state_schema:

from langchain.agents.middleware import AgentState, AgentMiddleware
from typing_extensions import NotRequired
from typing import Any

class CustomState(AgentState):
    model_call_count: NotRequired[int]
    user_id: NotRequired[str]

class CallCounterMiddleware(AgentMiddleware[CustomState]):
    state_schema = CustomState

    def before_model(self, state: CustomState, runtime) -> dict[str, Any] | None:
        # Access custom state properties
        count = state.get("model_call_count", 0)

        if count > 10:
            return {"jump_to": "end"}

        return None

    def after_model(self, state: CustomState, runtime) -> dict[str, Any] | None:
        # Update custom state
        return {"model_call_count": state.get("model_call_count", 0) + 1}

agent = create_agent(
    model="gpt-4o",
    middleware=[CallCounterMiddleware()],
    tools=[...],
)

# Invoke with custom state
result = agent.invoke({
    "messages": [HumanMessage("Hello")],
    "model_call_count": 0,
    "user_id": "user-123",
})

Execution order

When using multiple middleware, understanding execution order is important:

agent = create_agent(
    model="gpt-4o",
    middleware=[middleware1, middleware2, middleware3],
    tools=[...],
)

Execution flow (click to expand)

Before hooks run in order:

middleware1.before_agent()
middleware2.before_agent()
middleware3.before_agent()

Agent loop starts

middleware1.before_model()
middleware2.before_model()
middleware3.before_model()

Wrap hooks nest like function calls:

middleware1.wrap_model_call() → middleware2.wrap_model_call() → middleware3.wrap_model_call() → model

After hooks run in reverse order:

middleware3.after_model()
middleware2.after_model()
middleware1.after_model()

Agent loop ends

middleware3.after_agent()
middleware2.after_agent()
middleware1.after_agent()

Key rules:

before_* hooks: First to last
after_* hooks: Last to first (reverse)
wrap_* hooks: Nested (first middleware wraps all others)

Agent jumps

To exit early from middleware, return a dictionary with jump_to:

class EarlyExitMiddleware(AgentMiddleware):
    def before_model(self, state: AgentState, runtime) -> dict[str, Any] | None:
        # Check some condition
        if should_exit(state):
            return {
                "messages": [AIMessage("Exiting early due to condition.")],
                "jump_to": "end"
            }
        return None

Available jump targets:

'end': Jump to the end of the agent execution
'tools': Jump to the tools node
'model': Jump to the model node (or the first before_model hook)

Important: When jumping from before_model or after_model, jumping to 'model' will cause all before_model middleware to run again. To enable jumping, decorate your hook with @hook_config(can_jump_to=[...]):

from langchain.agents.middleware import AgentMiddleware, hook_config
from typing import Any

class ConditionalMiddleware(AgentMiddleware):
    @hook_config(can_jump_to=["end", "tools"])
    def after_model(self, state: AgentState, runtime) -> dict[str, Any] | None:
        if some_condition(state):
            return {"jump_to": "end"}
        return None

Best practices

Keep middleware focused - each should do one thing well
Handle errors gracefully - don’t let middleware errors crash the agent
Use appropriate hook types:
- Node-style for sequential logic (logging, validation)
- Wrap-style for control flow (retry, fallback, caching)
Clearly document any custom state properties
Unit test middleware independently before integrating
Consider execution order - place critical middleware first in the list
Use built-in middleware when possible, don’t reinvent the wheel :)

Examples

Dynamically selecting tools

Select relevant tools at runtime to improve performance and accuracy.

Benefits:

Shorter prompts - Reduce complexity by exposing only relevant tools
Better accuracy - Models choose correctly from fewer options
Permission control - Dynamically filter tools based on user access

from langchain.agents import create_agent
from langchain.agents.middleware import AgentMiddleware, ModelRequest
from typing import Callable


class ToolSelectorMiddleware(AgentMiddleware):
    def wrap_model_call(
        self,
        request: ModelRequest,
        handler: Callable[[ModelRequest], ModelResponse],
    ) -> ModelResponse:
        """Middleware to select relevant tools based on state/context."""
        # Select a small, relevant subset of tools based on state/context
        relevant_tools = select_relevant_tools(request.state, request.runtime)
        request.tools = relevant_tools
        return handler(request)

agent = create_agent(
    model="gpt-4o",
    tools=all_tools,  # All available tools need to be registered upfront
    # Middleware can be used to select a smaller subset that's relevant for the given run.
    middleware=[ToolSelectorMiddleware()],
)

Show Extended example: GitHub vs GitLab tool selection

from dataclasses import dataclass
from typing import Literal, Callable

from langchain.agents import create_agent
from langchain.agents.middleware import AgentMiddleware, ModelRequest, ModelResponse
from langchain_core.tools import tool


@tool
def github_create_issue(repo: str, title: str) -> dict:
    """Create an issue in a GitHub repository."""
    return {"url": f"https://github.com/{repo}/issues/1", "title": title}

@tool
def gitlab_create_issue(project: str, title: str) -> dict:
    """Create an issue in a GitLab project."""
    return {"url": f"https://gitlab.com/{project}/-/issues/1", "title": title}

all_tools = [github_create_issue, gitlab_create_issue]

@dataclass
class Context:
    provider: Literal["github", "gitlab"]

class ToolSelectorMiddleware(AgentMiddleware):
    def wrap_model_call(
        self,
        request: ModelRequest,
        handler: Callable[[ModelRequest], ModelResponse],
    ) -> ModelResponse:
        """Select tools based on the VCS provider."""
        provider = request.runtime.context.provider

        if provider == "gitlab":
            selected_tools = [t for t in request.tools if t.name == "gitlab_create_issue"]
        else:
            selected_tools = [t for t in request.tools if t.name == "github_create_issue"]

        request.tools = selected_tools
        return handler(request)

agent = create_agent(
    model="gpt-4o",
    tools=all_tools,
    middleware=[ToolSelectorMiddleware()],
    context_schema=Context,
)

# Invoke with GitHub context
agent.invoke(
    {
        "messages": [{"role": "user", "content": "Open an issue titled 'Bug: where are the cats' in the repository `its-a-cats-game`"}]
    },
    context=Context(provider="github"),
)

Key points:

Register all tools upfront
Middleware selects the relevant subset per request
Use context_schema for configuration requirements

Additional resources

Middleware API reference - Complete guide to custom middleware
Human-in-the-loop - Add human review for sensitive operations
Testing agents - Strategies for testing safety mechanisms

Edit the source of this page on GitHub.

Connect these docs programmatically to Claude, VSCode, and more via MCP for real-time answers.

LangChain v1.0

Get started

Core components

Advanced usage

Use in production

​What can middleware do?

Monitor

Modify

Control

Enforce

​Built-in middleware

​Summarization

​Human-in-the-loop

​Anthropic prompt caching

​Model call limit

​Tool call limit

​Model fallback

​PII detection

​To-do list

​LLM tool selector

​Tool retry

​LLM tool emulator

​Context editing

​Custom middleware

​Decorator-based middleware

​Available decorators

​When to use decorators

Use decorators when

Use classes when

​Class-based middleware

​Two hook styles

Node-style hooks

Wrap-style hooks

​Node-style hooks

​Wrap-style hooks

​Custom state schema

​Execution order

​Agent jumps

​Best practices

​Examples

​Dynamically selecting tools

​Additional resources

What can middleware do?

Built-in middleware

Summarization

Human-in-the-loop

Anthropic prompt caching

Model call limit

Tool call limit

Model fallback

PII detection

To-do list

LLM tool selector

Tool retry

LLM tool emulator

Context editing

Custom middleware

Decorator-based middleware

Available decorators

When to use decorators

Class-based middleware

Two hook styles

Node-style hooks

Wrap-style hooks

Custom state schema

Execution order

Agent jumps

Best practices

Examples

Dynamically selecting tools

Additional resources