AI Insider Trading: Structured Data for Smarter Analysis

Asking Claude “analyze insider trading trends” gets you a textbook summary. Feeding it structured JSON from actual SEC filings gets you something hedge funds pay subscription fees for. That gap is the entire point of a workflow shared by u/zack_code in r/PromptEngineering. And it cuts to something most people miss about how AI actually works. The model hasn’t changed. The capability hasn’t changed. What changed is how you’re using it, and that distinction matters more than any model update you’ll read about this week.

The Real Problem With AI Financial Analysis

Bad input, bad output. Always. Vague questions to a language model are like asking a brilliant analyst to work from memory with no data in front of them. They’ll sound smart, but they won’t tell you anything you couldn’t find on Investopedia. The problem isn’t intelligence. It’s context. Language models generate the most probable continuation of whatever you give them. Ask a vague question, and the most probable answer is a vague, general response. That’s not a bug. That’s the system working exactly as designed. When you give a model rich, structured data and ask it a specific question about that specific data, you’re changing what “the most probable answer” looks like. You’re steering it toward something useful. Think about how a good analyst operates. You don’t walk into their office and say “tell me about the market.” You put a spreadsheet in front of them, point to a column, and ask a specific question. The model is no different. Context is everything. The more precise the input, the more precise the output.

Old Way vs. This Way

Old way: Pay for Bloomberg-level institutional subscriptions, or manually dig through SEC Form 4 filings, which are raw XML, inconsistently formatted, and require hours of cleanup before you can ask a single useful question. Most retail investors never bother because the friction is too high. The data exists, but getting it into a usable shape takes time most people don’t have.

This way: A scraper pulls insider trading data from Dataroma and outputs JSON where every record already includes insider name, title, ticker, company, transaction type, shares, price, total value, and filing date. No cleanup. Ready to paste into Claude. The friction drops from hours to seconds. That’s the real unlock. Not AI being smarter, but AI getting better inputs to work with. And the data quality matters here. Dataroma aggregates from SEC filings but presents it in a structured, readable format. The scraper converts that into clean JSON you can hand directly to a model. When the data comes in clean, the model spends its capacity on analysis, not on trying to interpret messy formatting.

🔍 The Workflow (Step by Step)

Step 1: Get the structured data

Run an SEC insider trading scraper pointed at Dataroma. The output is clean JSON with all the fields you need. No formatting work required. If you’re comfortable with Python, a basic requests and BeautifulSoup scraper gets you there in under 50 lines. If you’re not a developer, no-code tools can pull structured data from public sites on a schedule, so the technical barrier is lower than it sounds.

Step 2: Prompt with specificity, not vagueness

Instead of asking “what’s interesting here?”, ask:

📊 Which sectors show cluster buying activity right now
Which insiders are making large purchases relative to their historical transaction size
Companies where multiple insiders are buying simultaneously
Any transactions that look unusual in size, timing, or insider seniority

Notice what these questions have in common. They’re comparative. They ask the model to evaluate one thing against another, which is exactly the kind of reasoning where language models outperform keyword search or simple filters. Cluster buying is meaningful because it’s unusual. Large purchases relative to history are meaningful because context makes size relevant. These questions only make sense with structured data in front of the model, and the model handles them well when that data is clean.

Step 3: Request structured output

Ask Claude to format the response as an executive overview followed by a ranked list of notable transactions, each with a reason it stands out. That structure forces the model to prioritize instead of just listing. An unstructured output gives you a wall of text you still have to interpret yourself. A structured output is already a decision-support document. The format request takes five seconds and doubles the usefulness of what comes back.

Step 4: Automate the whole thing

Wire it through Make.com to run daily. Scraper fires, data goes to Claude, summary lands in your inbox or Notion. The ongoing effort drops to near zero. Set it up once on a Sunday and it runs every morning before you check your phone. That’s the compounding benefit: the workflow handles the repetitive work so you only engage when something actually warrants attention.

What You Actually Get

A breakdown that would take an analyst an hour to produce manually. For nothing, compared to what institutional data subscriptions cost. And what the breakdown actually contains is worth spelling out. You get a ranked list of insider transactions with context for why each one matters. You get sector-level patterns you wouldn’t catch by looking at individual filings. You get flagged anomalies: a CEO buying ten times more than their typical transaction size, or three executives at the same company all buying within a week of each other. These are signals that exist in public data. They’re just buried under friction most people never push through. The real insight isn’t that Claude is powerful. It’s that language models are only as useful as the data you hand them. Structure the input, structure the prompt, and the output starts looking like something you’d actually pay for. Pick a sector you already follow and try it tonight. The prompt structure is above. You just need the data feeding in clean.

Frequently Asked Questions

Q: Does this work with ChatGPT, or does it only work with Claude?

The approach works with both Claude and ChatGPT, though the exact output quality may vary. The key is feeding structured JSON data with a well-written prompt that specifies exactly what analysis you want. If you’re already using one service, stick with it, the real win is the data structure and prompt design, not the LLM choice.

Q: How do I know the AI analysis is actually accurate if it’s automated?

Start by running it on a batch of transactions you already understand, then compare the AI’s findings against your own analysis. The automation works best for surfacing patterns (sector trends, unusual transaction sizes) rather than making trading decisions. Once you validate that the results stay consistent on a few runs, you can increase your confidence in the daily automation.

Q: Can I apply this to specific sectors like biotech, or does it only work for large-cap stocks?

This works for any publicly traded company with SEC insiders. Biotech is a great use case since insider buying can be a signal in sectors with fewer traditional news catalysts. Just adjust your scraper to filter by sector or tickers you care about, and the same JSON-feeding approach applies.

Q: How reliable is a daily automation if the data or AI output changes?

The scraper gives you consistent, structured input (which rarely changes), but LLM outputs can vary slightly day-to-day. To reduce noise, run analysis on weekly batches instead of daily if consistency matters more to you, or set up a Make.com automation that only alerts you when specific thresholds are met (e.g., 3+ insiders buying the same stock within 7 days).

How I use structured SEC insider trading data to get actually useful analysis out of Claude
by u/zack_code in PromptEngineering

The Real Problem With AI Financial Analysis

Old Way vs. This Way

🔍 The Workflow (Step by Step)

What You Actually Get

Frequently Asked Questions

Related: