Async Tool Patterns for AI Agents: Real-Time Events

Here’s what every agentic system does right now: call a tool, freeze, wait for the result, then wait for the human to send another message. Observe, act, wait, re-prompt. The conversation stops while the tool runs.

That blocking pattern isn’t a minor inconvenience. It’s a fundamental ceiling on what agents can do. Every tool call becomes a wall. The agent can’t respond to new input, can’t surface partial results, can’t handle anything else until the original call resolves. For a two-second tool call, that’s fine. For anything that takes minutes, it breaks the whole experience.

A developer just posted a method that breaks that loop. Tools that push events back to the agent in real time. No blocking. No polling. No re-prompting.

The Old Loop vs the New Pattern

Standard tool flow: agent calls tool, tool runs, agent blocks, result arrives, conversation resumes. If the tool takes three minutes, you sit in silence for three minutes. The agent isn’t thinking. It isn’t available. It’s just… gone.

What makes that painful in practice is that users don’t stop having needs while the tool runs. They want to ask a follow-up. They want to cancel and try something different. They want to know if it’s still working. The blocking model has no answer for any of that.

Async bidirectional tool use changes the shape entirely. The tool runs in the background. It fires events to the agent whenever something happens. The agent keeps talking to the user. When the tool has news, it surfaces it without the user sending a single extra message. The agent stays present the entire time instead of disappearing behind a loading screen.

The example in the post makes it concrete. You tell the agent: “run the installer and update me every 25%.” The agent starts the install. You chat. The agent reports 25% done and then 50% done as events arrive. You ask questions mid-install. The agent handles both streams at once. That’s a fundamentally different experience than staring at a spinner and waiting.

How to Set This Up

The mechanism is simpler than the theory makes it sound. Two configuration changes teach the model to treat tool calls as open channels instead of one-shot requests.

Step 1: Add a special developer prompt

Drop this into your system instructions:

All tools support multiple asynchronous tool results. You must suspend their tool flow while waiting for the results and can continue regular interactions in the meantime.

The author tested over 30 variations. This is the cleanest version that consistently works. The key mechanic here is telling the model to treat pending tool calls as suspended state it can return to, rather than blocking execution that must resolve before anything else can happen. Without this framing, the model treats injected results as orphaned noise.

Step 2: Define tools with async expectations

In your tool description, tell the model how many results to expect:

This tool returns 2 asynchronous results. You must wait for both to arrive before continuing.

That one line is enough to make the model treat incoming results as continuations of a suspended call, not orphaned messages to ignore. If your tool pushes variable numbers of updates, you can express that too: “this tool returns between 1 and N results depending on execution time.” The model handles the ambiguity better than you might expect.

Step 3: Build or proxy an async MCP server

Standard MCP closes the connection after a tool call returns. You need a server that stays alive, buffers delayed results, and pushes them when they arrive. Four implementation paths:

📦 Prompt-mediated + custom harness — simplest, best for proof-of-concept
🔌 Prompt-mediated + proxy — wraps your existing harness, most compatible path
⚙️ Fully managed + custom harness — cleanest architecture, most complex to build
🧩 Fully managed + proxy — existing harness untouched, proxy handles all async state

For testing today, the prompt-mediated approach with a proxy is the fastest path to a working demo. It lets you validate the pattern against your actual use case before committing to a custom harness build. Start there, then graduate to a fully managed approach once you know the tradeoffs matter for your workload.

Where This Actually Helps

The use cases are clearer than the theory. Long-running installs that report progress mid-conversation. GUI automation where the tool detects state changes and notifies the agent. Background jobs that push alerts instead of waiting to be polled. Multi-step workflows where the environment changes while the conversation continues. Data pipelines that process in chunks and surface intermediate results so the user can redirect early instead of waiting for a full run to finish.

Any workflow where the user currently has to babysit a tool by re-prompting for status updates is a candidate. This pattern automates that entirely.

What to Watch Out For

Two real friction points from the community. First: context bloat. Every injected mid-conversation tool result adds tokens, and at scale the model can start conflating suspended calls. If you’re running multiple long-running tools simultaneously, the interleaved results can get noisy fast. Design your event payloads to be minimal and clearly tagged to their originating call.

Second: interleaving. If both the tool stream and the user can trigger agent responses, you need a priority or locking layer or responses get tangled. This is especially sharp in production, where latency jitter means the tool event and the user message can arrive within milliseconds of each other. A simple queue with explicit ordering handles most of this, but don’t ship without thinking it through.

The MCP team is working on multi-response capabilities that would eventually make the custom server layer unnecessary. Proof-of-concept scripts are on GitHub using the OpenAI API. Claude untested — if you try it, report back.

The pattern is real. The friction is manageable. Worth building on!

Frequently Asked Questions

Q: How do you prevent context bloat with multiple concurrent tool calls?

Context bloat is a real concern when running 3, 4+ concurrent calls, the model can start mixing up state between suspended execution frames. Solutions include batching results, summarizing intermediate states between completions, or limiting truly concurrent calls. The key is keeping the context window focused on active state, not drowning in partial tool responses.

Q: What happens if the user sends a message while a tool is actively executing?

Without coordination, you get race conditions where the agent receives conflicting instructions simultaneously. A real-world example: a payment execution agent saw ~3% double-spend errors until adding a simple mutex with 200ms user-input debounce. You need clear priority rules (e.g., user input always wins) and audit logs showing who triggered what action and when.

Q: Can the agent handle multiple tools running at the same time?

Yes, but the real challenge isn’t running them concurrently, it’s defining whose input “wins” when both a user message and a tool event want to trigger the agent. Without a coordination layer, you risk interleaved partial states where the agent acts on stale information or takes conflicting actions.

Q: How many concurrent tool interactions can realistically run before things break?

Based on implementer feedback, context quality noticeably degrades after 3, 4 concurrent calls; the model starts conflating suspended frames. A practical limit seems to be 2, 3 truly concurrent interactions, with additional calls queued sequentially. Your mileage may vary depending on model capacity and state management.

Expanding Agentic Capabilities: Multiple Bidirectional Async Tool Interactions During Live Conversations (Working on Codex support)
by u/NovatarTheViolator in PromptEngineering

The Old Loop vs the New Pattern

How to Set This Up

Where This Actually Helps

What to Watch Out For

Frequently Asked Questions

Related: