GPT-Image-2 Workflow: Build an OpenAI NotebookLM Alternative

NotebookLM has been my go-to AI tool for months. Drop in a messy PDF, get back clean visuals, podcasts, and structured summaries. The magic comes from Gemini 3.1 paired with Nano Banana 2, and honestly, nothing else came close. Until last week.

That’s when I saw this incredible build from an AI professional who decided to test if GPT-Image-2 could actually go head-to-head with Google’s stack. The original poster swapped out every Google model in a node-based app he’d vibe coded earlier and rebuilt the whole thing on OpenAI. The results genuinely surprised me, especially because most people are still using GPT-Image-2 for single posters and nothing more.

Here’s the kicker: it works. The slides work. The posters work. The podcasts work. And it all runs in one connected workflow.

Why this matters

Most folks assumed GPT-Image-2 was a single-image generator, good for posters and one-off visuals. The expert proved it can carry an entire NotebookLM-style pipeline, slides and all. That opens up a serious alternative for anyone who wants OpenAI’s stack instead of Google’s.

If you use visual references, the slides will adhere to the precise style you want.

The step-by-step rebuild

Here’s exactly how the creator pulled this off, broken down into clear stages so you can follow the logic:

Start with a working node-based app. The author had already vibe coded a NotebookLM-style app using Google AI Studio. Why this step matters: you need a baseline that already handles the document-to-output workflow before you swap models. Don’t start from scratch.
Replace every Google model with its OpenAI counterpart. Gemini 3.1 gets swapped for GPT-5.5. Nano Banana 2 gets replaced by GPT-Image-2. TTS shifts to GPT-4o-mini-TTS. Why this step matters: clean one-to-one swaps let you isolate which model is responsible for which output, so debugging stays manageable.
Adapt the system prompts for each node. OpenAI models respond differently than Google ones, so prompts that worked for Gemini won’t work as-is for GPT-5.5. Why this step matters: skipping this is the number one reason model swaps fail. Different model families need different instruction styles.
Tune the parameters per node. Temperature, max tokens, and image generation settings all need tweaking. Why this step matters: default settings rarely produce the same quality across model families. Small adjustments unlock huge quality gains.
Connect the pipeline end to end. User uploads a document, GPT-5.5 drafts the story or presentation outline, GPT-Image-2 generates slides plus posters and infographics, and GPT-4o-mini-TTS turns the story into a podcast. Why this step matters: NotebookLM’s strength is the connected experience. Stopping at one output kills the magic.
Feed visual references into the slide nodes. The mind behind it noticed that without references, GPT-Image-2 slides drift in style. With references, they nail the exact look you want. Why this step matters: this is the single biggest unlock for slide quality. Don’t skip it.
Use Claude Code as your build copilot. The creator did all the heavy coding lifting through Claude Code on Opus 4.7. Why this step matters: vibe coding a multi-node app from scratch is brutal alone. A capable AI pair makes the difference between shipping and stalling.

What the final app actually does

Once everything is wired up, the workflow runs like this:

Document upload: User drops in one or more source files
Story generation: GPT-5.5 writes a narrative or presentation outline
Slide creation: GPT-Image-2 produces the slides based on your prompt
Visual assets: GPT-Image-2 also handles posters and infographics
Audio output: GPT-4o-mini-TTS converts the story into a podcast
One unified workflow: Everything happens inside the same node graph

The honest limitations

This isn’t a perfect replacement, and the original poster is upfront about it. Controlling the precision of GPT-Image-1-generated slides is still tricky, just like with Nano Banana. You’ll get slides that look gorgeous but occasionally place text or elements in spots you didn’t ask for. That said, the creator argues the application of GPT-Image-2 should reach far beyond what we see with Nano Banana 2 today, simply because OpenAI’s image model is showing range that hasn’t been fully explored yet.

The author’s only real complaint after generating hundreds of images? The name. GPT-Image-2 doesn’t exactly roll off the tongue.

Why I’m sharing this

I was blown away when I saw the slide outputs because everyone in my feed treats GPT-Image-2 as a single-poster tool. This contributor showed it can drive a full multimodal pipeline, and the implications are big for anyone building content workflows, internal training tools, or research summarizers.

The takeaway is simple: if you’ve been waiting for an OpenAI-native NotebookLM, the building blocks are already here. You just need to wire them together properly.

Check out the full LinkedIn post to see the example slides and the actual app in action. The visual references trick alone is worth the read.

Visit source

Why this matters

The step-by-step rebuild

What the final app actually does

The honest limitations

Why I’m sharing this

Related: