Google AI Leak Exposes Gemini 3.0 System Prompt & API

This accidental leak effectively hands us the unreleased manual for Google’s most advanced AI architecture. While testing the new Flash model in Google AI Studio, a user encountered a massive system prompt that exposed the internal logic driving the platform.
I just saw this incredible post from an AI professional known as robdapcguy who captured the raw output when the system unexpectedly revealed its instructions.

This isn’t just a simple prompt; it is a comprehensive technical specification for the upcoming Gemini 3.0 ecosystem. The document outlines exact coding standards, new library requirements, and how Google orchestrates complex multimodal tasks like video generation and real-time audio.
It provides a rare glimpse into how the "Code Assistant" is programmed to behave like a world-class senior engineer.

The Shift to @google/genai and "Thinking" Models

The most immediate takeaway is the mandatory migration to a new SDK and the introduction of granular control over the model’s reasoning process. The leaked instructions explicitly deprecate older libraries in favor of @google/genai and reveal a sophisticated configuration for "thinking" models.

💡 The "Thinking Budget" Protocol

The leaked prompt introduces a concept called thinkingBudget specifically for the Gemini 3 and 2.5 series. This is a crucial evolution in how developers can manage model latency versus intelligence.

Separation of Concerns: The system distinguishes between maxOutputTokens (the final response) and thinkingBudget (the internal reasoning process). The prompt warns that if you set a max output without reserving space for thinking, the response might be empty.

Budget Limits: We now know the specific ceilings for these unreleased or preview models. The prompt lists a maximum thinking budget of 32,768 tokens for Gemini 2.5 Pro and 24,576 for Flash variants.

Latency Control: The instructions explain that developers can set the budget to 0 to disable thinking entirely for fast, low-latency tasks like simple Q&A. Conversely, for complex coding or math tasks, assigning a high token count allows the model to perform deep reasoning before generating a single character of output.

✅ Veo 3.1 and Native Audio Controls

The leak confirms that the Veo video generation model is being integrated directly into the API workflow with specific version tags like veo-3.1-fast-generate-preview. The level of control exposed here is impressive.

Video Generation Specifics: The prompt details the exact configuration needed to generate video. Developers must specify numberOfVideos (strictly 1), resolution (720p or 1080p), and aspect ratio. It even handles image-to-video tasks where a starting and ending frame can be defined using base64 encoded strings.

Native Audio & TTS: The document references a gemini-2.5-flash-preview-tts model. It outlines a detailed speechConfig allowing for specific voice selection (names like "Kore," "Puck," and "Zephyr").

Live API Setup: There is a massive section dedicated to the "Live API" for real-time voice interactions. It describes how to handle PCM audio streams, manage specific sample rates (16000Hz in, 24000Hz out), and use a cursor system (nextStartTime) to ensure gapless audio playback during a conversation.

📌 The "Senior Engineer" Persona & Strict Coding Rules

Perhaps the most useful part for developers today is seeing how Google prompts its own agent to write high-quality code. The persona is set as a "world-class senior frontend engineer" with a focus on aesthetics.

The XML Update Method: To prevent the AI from rewriting entire files and wasting tokens, the system uses a strict XML format. The agent must output the full file path followed by a check_circle indicator, and then only the code that needs updating. This makes the diffing process efficient.

Project Structure Enforcement: The prompt explicitly bans the creation of a nested src/ directory, forcing the agent to treat the current directory as the root. This standardization prevents the common AI habit of hallucinating new folder structures.

Library discipline: The prompt lists a "Prohibited" section, explicitly banning imports like GoogleGenerativeAI from the old package. It forces the use of import { GoogleGenAI } from "@google/genai"; and standardizes how API keys are handled (exclusively via process.env.API_KEY, never hardcoded).

This leak is a goldmine for anyone building on Google’s stack. It shows us exactly how to format requests for the newest models before the official documentation has even caught up!

Check out the full raw prompt in the source link to see every technical parameter.

💡 FAQ & Troubleshooting

Which SDK and initialization methods are required according to this instruction set?

The instructions strictly enforce the use of the @google/genai library. Developers must initialize the client using const ai = new GoogleGenAI({apiKey: process.env.API_KEY}). Older methods and types, such as GoogleGenerativeAI, models.create, and genAI.getGenerativeModel, are explicitly marked as deprecated and incorrect. The API key must be accessed exclusively via process.env.API_KEY without generating UI elements for user input.

What specific model versions are utilized in this configuration?

The text maps common aliases to specific preview versions. “Gemini Pro” is mapped to gemini-3-pro-preview (for complex tasks like coding and math), and “Gemini Flash” is mapped to gemini-flash-latest. Video generation utilizes veo-3.1-generate-preview or its “fast” variant. The instructions explicitly prohibit the use of older 1.5 series models like gemini-1.5-pro.

How does the “Thinking Config” feature work?

This feature is available exclusively for Gemini 3 and 2.5 series models. The thinkingBudget parameter dictates how many tokens the model uses for reasoning before generating a response. For example, Gemini 2.5 Pro has a maximum thinking budget of 32,768 tokens. If defining maxOutputTokens manually, you must also set a thinkingBudget to ensure the model reserves enough tokens for the final output.

Is this text confirmed to be the model’s core system prompt?

There is disagreement regarding the nature of the text. While the original poster identified it as a leaked system prompt or instruction file for the Gemini 3.0 Flash Code Assistant, other users argue that it appears to be an application-level prompt or a “UI bug” rather than the fundamental system prompt of the model itself.

How should video generation requests be handled?

Video generation using the veo-3.1 models requires an asynchronous polling mechanism. The code must initiate a generateVideos operation and loop while checking operation.done. Once complete, the video URI is retrieved, and the final download URL must have the API key appended to it manually (e.g., ${downloadLink}&key=${process.env.API_KEY}) to authorize the fetch request.

Google AI Studio Leaked System Prompt: 12/18/25
byu/robdapcguy in