Decode LLM APIs with Claude Code: Python Clients to Curl

Quick Start

What you’ll learn: How to use Claude Code to read through Python client libraries for major LLM providers and generate raw curl commands that reveal exactly how their APIs work under the hood.

What you need: Familiarity with Python, curl, and at least one LLM API. Access to Claude Code helps but isn’t strictly required.

Why this matters: If you’re building tools that talk to multiple LLM providers, you need to understand the raw JSON each one expects and returns, especially for newer features like server-side tool execution.

Simon Willison just published a new research repo that documents exactly how the major LLM APIs behave at the HTTP level. He’s working on a major redesign of his popular LLM Python library, and the approach he used to research these APIs is worth stealing for your own projects.

Here’s the core problem Willison faced: his LLM library provides an abstraction layer over hundreds of models from dozens of vendors through its plugin system. But vendors like Anthropic, OpenAI, Google, and Mistral have shipped new features (server-side tool execution, for example) that the current abstraction layer can’t handle. Before redesigning that layer, he needed to understand exactly what each API does at the wire level.

His method is practical and reproducible. Here’s how to do it yourself.

Step 1: Pick Your Target Libraries

Start with the official Python client libraries for the LLM providers you care about. Willison chose four: Anthropic, OpenAI, Gemini, and Mistral.

Why this matters: Python SDKs are the most maintained and feature-complete clients for most LLM providers. They contain the ground truth for what each API actually supports.

Step 2: Use Claude Code to Read the Client Libraries

Point Claude Code at each Python client library’s source code. Have it analyze the request/response patterns, parameter structures, and endpoint configurations.

Why this matters: These libraries are large codebases. Having an AI read through them and extract the relevant HTTP patterns saves hours of manual code archaeology.

Step 3: Generate Raw curl Commands

From the client library analysis, craft curl commands that hit each API’s raw JSON endpoints directly. Cover both streaming and non-streaming modes.

Why this matters: curl commands strip away all SDK abstractions and show you exactly what goes over the wire. This is the only way to truly compare how different providers handle the same feature.

Step 4: Test Across Multiple Scenarios

Run your curl commands against a range of scenarios: basic completion, streaming, tool use, and any provider-specific features you need to support.

Why this matters: Each provider implements features differently. Server-side tool execution, for instance, has completely different JSON structures across Anthropic, OpenAI, and Gemini. You won’t discover these differences from documentation alone.

Step 5: Capture and Store the Raw Outputs

Save both the scripts and the captured JSON outputs in a dedicated repo. This becomes your reference material for designing abstractions.

Why this matters: Raw API responses are your design documents. When you’re building an abstraction layer, you need to constantly reference what each provider actually returns, not what their docs say they return.

Key Takeaways

Don’t design abstractions from documentation. Docs lag behind reality. Read the actual client code and test against live endpoints.
Claude Code is genuinely useful for code archaeology. Pointing it at a large SDK and asking it to extract HTTP patterns is a high-leverage use of AI.
curl is your universal translator. When comparing APIs across providers, strip away the SDKs and look at raw HTTP. The differences become obvious.
Version your research. Willison’s approach of storing both scripts and outputs in a repo means the research is reproducible and trackable as APIs evolve.

What’s Next

This research feeds directly into a redesigned abstraction layer for the LLM library that can handle modern features like server-side tool execution across all supported providers. If you’re building multi-provider LLM tooling, Willison’s repo is worth bookmarking as a reference for how these APIs actually behave.

The full scripts and captured outputs are available in the research-llm-apis repository on Simon Willison’s GitHub.

Read original article