I just stumbled onto one of the coolest AI workflow experiments I’ve seen in months. Imagine starting with an image of empty land and ending with a fully built ancient temple, then asking AI to fill in everything that happened in between. That’s exactly what this LinkedIn creator pulled off with the Temple of Heaven in Beijing, and the breakdown of how it was built is pure gold.
The original poster has been running these kinds of experiments for over a year, and according to him, the results have mostly been disappointing. AI just doesn’t really know how a building or an object was constructed. But something shifted recently. Thanks to ChatGPT’s storyboard capabilities, this whole subject became way more feasible. I was genuinely fascinated reading through his process.
The Stack Behind the Magic
Before getting into the steps, here’s the toolset this savvy professional combined:
- Claude for the deep research phase
- ChatGPT for generating storyboards and images
- Seedance 2.0 Omni Reference for turning the storyboard into actual video
Three tools, each doing what it does best. No single AI tries to handle the whole pipeline, and that’s exactly why the output feels coherent.
The Step-by-Step Workflow
Here’s the exact process the author followed, broken down so you can replicate it on any building, object, or process you want to visualize.
- Research the construction process with Claude. Generate a markdown file that specifies how the building was actually constructed. Claude was excellent at this part because it pulls together accurate historical and architectural details into a clean, structured format. Rationale: Without real context, AI just hallucinates the build process. The markdown becomes your source of truth.
- Convert the markdown into a storyboard inside ChatGPT. The post’s author calls this the easiest part of the workflow. ChatGPT takes the structured research and turns it into visual scenes that map to each construction phase. Rationale: Storyboards force a logical sequence. They translate abstract steps into visual beats the video model can follow.
- Generate the video using Seedance 2.0 Omni Reference. Feed the storyboard frames into Seedance and let it animate the transitions between each construction stage. Rationale: Seedance handles motion and timelapse interpolation in a way most other video models still can’t.
- Test multiple input combinations to find what works best. The creator ran four versions to compare control vs hallucination, which I’ll break down next. Rationale: You learn what each input contributes by removing one variable at a time.
- Extend the duration for complex processes. After chatting with architect June Chow, the original poster switched from 15 seconds to 1 minute. Longer timelapses simply read better when the process is intricate. Rationale: Compressing a complex build into 15 seconds loses the storytelling. Give it room to breathe.
The Four Versions He Tested
This is where things get really interesting. The expert ran the same project through four different input setups to see what actually moves the needle:
- Version 1: Start frame + end frame + storyboard fed into Seedance
- Version 2: Storyboard only into Seedance
- Version 3: Start frame + end frame + story text into Seedance
- Version 4: Start frame + end frame only into Seedance
His favorite was Version 1, because it offered the most control over the final output. Versions 1, 2, and 3 all produced accurate construction procedures since the storyboard and story text provided real context to the model.
Version 4 is a pure AI hallucination of how it was made, because in the prompt there was no context given.
That single observation is worth saving. Context is everything. Skip the research and storyboard, and you get a pretty video that’s architecturally nonsense.
Why This Workflow Actually Matters
I think this is a big deal for a few reasons. Process visualization has always been one of the hardest things to fake with AI. Static images? Easy. Short clips? Getting easier. But showing HOW something was made step by step, with accurate sequencing? That used to require actual 3D artists and weeks of work.
Now you can prototype a proof-of-concept in an afternoon. Or you can scale it up into a polished, high-accuracy professional render. Same workflow, different fidelity dials.
Where You Could Apply This
The Temple of Heaven is a fun demo, but the underlying method opens up a ton of use cases:
- Architectural pitches showing how a proposed building would come together
- Educational content explaining how historical structures were built
- Product assembly timelapses for ecommerce or manuals
- Engineering presentations breaking down complex systems
- Real estate marketing for new developments
Anywhere you’d normally need a process video, this stack can probably deliver a draft.
The Big Takeaway
The creator’s conclusion sums it up perfectly: it’s totally feasible to make timelapse AI videos that show a process, as long as you feed the AI the right context and the right visuals. Quick proof-of-concept or full professional render, the principle is the same.
AI doesn’t replace your judgment about what should happen on screen. It executes the vision you hand it. The better your inputs, the better the output. That’s the whole game.
Check out the full LinkedIn post to see all four video versions side by side. Watching the difference between Version 1 and Version 4 is honestly the best argument for context-driven prompting I’ve seen this year.