Typing is often the biggest bottleneck in the creative process, slowing down the flow of ideas from your brain to the machine. We tend to think that effective prompt engineering requires sitting in front of a monitor, carefully crafting syntax and editing words as we go. However, I recently came across a fascinating post by an AI expert who is completely flipping this narrative on its head with a voice-first approach.
This innovator argues that your voice is actually the most efficient tool for capturing ideas in their raw, unfiltered state. The core concept here is “prompt engineering via voice notes,” and it leverages the advanced capabilities of modern Large Language Models (LLMs) to do the heavy lifting. Instead of treating the AI as a code compiler that needs perfect input, the author treats it as an intelligent editor that thrives on context. By recording long, stream-of-consciousness audio files—what the creator calls “rambles”—and feeding them into a multimodal model like Gemini, you can bypass the friction of the keyboard entirely. The AI listens to the noise, the nuance, and the messy thoughts, then instantly transmutes them into high-quality, structured prompts or content.
🧠 The Velocity of Unfiltered Thought
The first major insight from this methodology is that speed matters more than precision during the ideation phase. The average person types significantly slower than they speak, meaning that the act of typing acts as a brake on your cognitive output. The LinkedIn user emphasizes that voice capture grabs thoughts in their “original form,” preserving the full resolution of your idea before your internal editor tries to shrink it down for the keyboard. When you type, you are constantly backspacing and correcting, which interrupts the flow state. By switching to voice, the expert allows ideas to spill out rapidly. This volume of information is actually beneficial for an LLM; the more context and background information you provide (even if it seems like rambling), the better the model understands your intent. The messiness that humans find annoying is actually gold for a context-hungry AI.
⚙️ Using LLMs as Distillation Engines
A common misconception is that a prompt must be concise to be effective, but this creator demonstrates that modern models excel at distillation. The author explains that while raw audio isn’t the best final product, it is the perfect raw material. The workflow involves using an LLM to transcribe and then structure the chaotic audio input. This is a profound shift in how we interact with these tools. Rather than spending twenty minutes trying to write the perfect three-sentence prompt, the expert spends two minutes talking through the problem, the desired outcome, the constraints, and the tone. The AI is then instructed to take that transcript and formulate the best possible prompt or solution based on the data. It turns the LLM into a collaborative partner that organizes your thoughts better than you could in real-time.
🚶 The “Anywhere” Workflow
This approach fundamentally changes the physical constraints of work. The original poster humorously describes wandering around Singapore, looking like a “crazy man” talking to his phone with a suspicious smile. This vivid image highlights a practical reality: you no longer need a desk to be productive. By decoupling prompt engineering from the screen, this professional generates ideas, posts, solutions, and training topics while walking or commuting. It turns dead time into creative time. If you can verbalize the structure of a problem while making coffee or walking the dog, you have done 90% of the work. The remaining 10% is simply handing the audio file to the AI to formalize it. This mobility allows for a more dynamic relationship with work, where inspiration can be captured and processed the moment it strikes.
⚠️ Nuances and Potential Friction
While this workflow is powerful, there are a few challenges to consider before diving in. First, you must get comfortable with the sound of your own voice and the feeling of speaking to a machine in public, which can feel awkward initially. Privacy is another factor; as the author notes, you might look like you are talking to yourself, but you must be mindful of who else is listening if you are discussing sensitive business strategies. Additionally, the quality of the output depends heavily on the transcription capability of the tool you use. You need a model that can accurately parse technical terms and handle accents without hallucinating words that change the meaning of your prompt.
Action Plan: The Voice-to-Prompt Workflow
If you want to replicate the success of this industry pro, here is a simple way to structure your voice notes for maximum effect:
- 📌 Open a high-quality voice recorder or a multimodal AI app (like ChatGPT or Gemini).
- 📌 Start by clearly stating your role and the goal. (e.g., “I am acting as a marketing strategist, and I need to write a post about…”).
- 📌 Ramble freely. Pour out every constraint, idea, worry, and requirement you have. Do not self-edit.
- 📌 End the note with a specific instruction for the AI. (e.g., “Take everything I just said, remove the fluff, and turn it into a structured prompt that I can use to generate the final asset.”).
It is fascinating to see how creators are pushing the boundaries of human-AI interaction. I highly recommend checking out the full post to see exactly how the author is doubling down on this habit for the coming year 🚀