For two years, the AI race was about chatbots: ask a question, get an answer. At Google I/O 2026 on May 19, Google declared that race over. The headline launches — Gemini 3.5 and Gemini Omni — are not built to chat with you. They are built to act for you: planning multi-step work, spinning up sub-agents, running code, and even generating fully edited video from a mix of text, images, and audio. If you have been wondering whether “agentic AI” is marketing fluff or a genuine platform shift, the Gemini 3.5 era is the clearest answer yet.

Here is what actually shipped, what the benchmarks and pricing look like, how Gemini Omni’s any-to-any generation works, and how you can start building with these models today — plus the pitfalls to avoid before you wire an autonomous agent into anything important.

What Is Gemini 3.5? A Quick Definition

Gemini 3.5 is Google’s family of agentic AI models announced at Google I/O 2026. Rather than only answering prompts, Gemini 3.5 models break complex tasks into multi-step plans, delegate work to sub-agents, call external tools, and execute long-horizon workflows autonomously — across coding, document analysis, and enterprise automation — with a context window of up to 1 million tokens.

The family launched in stages. Gemini 3.5 Flash — the fast, lower-cost tier — went generally available the same day it was announced and became the default model in the Gemini app and AI Mode in Google Search. Gemini 3.5 Pro, the heavyweight reasoning tier, was confirmed at I/O and is rolling out in June 2026. Alongside them, Google introduced Gemini Spark, a personal AI agent powered by 3.5 Flash, and Gemini Omni, an any-to-any multimodal generation model. You can read Google’s own framing in the official Gemini 3.5 announcement on the Google blog.

Why does this matter to you as a developer or tech professional? Because the unit of work is changing. With chatbot-era models, you wrote prompts. With agentic models, you define goals, tools, and guardrails — and the model handles the orchestration in between.

Gemini 3.5 Flash: Benchmarks, Speed, and Pricing

The surprise of I/O 2026 was that the Flash tier — historically the “cheap and cheerful” option — now posts frontier-level numbers. Google reports that Gemini 3.5 Flash outperforms the previous flagship, Gemini 3.1 Pro, on the benchmarks that matter most for agentic work:

  • Terminal-Bench 2.1: 76.2% — a benchmark measuring how well a model completes real tasks in a command-line environment, a strong proxy for autonomous coding ability.
  • GDPval-AA: 1656 Elo — an evaluation of economically valuable, real-world professional tasks.
  • MCP Atlas: 83.6% — measuring tool use through the Model Context Protocol, the open standard agents use to talk to external services.
  • CharXiv Reasoning: 84.2% — chart and figure understanding, a core multimodal reasoning test.

Speed is the other half of the story. Google claims roughly 4x the output tokens per second of comparable frontier models. That matters enormously for agents: an autonomous workflow might involve dozens of model calls chained together, so per-call latency compounds. A 4x faster model can mean the difference between an agent that finishes a task in two minutes versus eight.

Gemini 3.5 Flash Pricing and Context Window

Gemini 3.5 Flash is priced at $1.50 per million input tokens and $9.00 per million output tokens, which Google positions as less than half the cost of comparable frontier models. It supports a context window of up to 1 million tokens with outputs up to 65,000 tokens — enough to hold a large codebase, a stack of contracts, or hours of conversation history in a single request.

Where Gemini 3.5 Pro Fits

Google confirmed that Gemini 3.5 Pro is in active development and rolling out in June 2026. The pattern from previous generations suggests Pro will target the deepest reasoning tasks — research synthesis, hard mathematics, and architecture-level code design — while Flash handles the high-volume agentic workloads. If Flash already beats the older Pro tier, the new Pro is positioned as the model you reach for when correctness matters more than cost or speed.

Gemini Omni Explained: Any-to-Any Multimodal Generation

If Gemini 3.5 is the brain of the agentic era, Gemini Omni is its creative engine. Omni is Google’s first any-to-any multimodal model: you can combine images, audio, video, and text as input and generate high-quality output, starting with cinematic video. Think of it as collapsing what used to be three separate tools — a language model, an image model, and a video model — into one system that reasons across all of them at once.

Three capabilities make Omni different from earlier video generators:

  • Grounded world knowledge. Omni reasons about what should happen next in a scene by combining an intuitive understanding of physics with Gemini’s knowledge of history, science, and cultural context. A ball does not just move — it arcs, bounces, and decelerates plausibly.
  • Conversational editing. You edit video with natural language, and every instruction builds on the last. Characters stay consistent across edits, which has been the Achilles’ heel of AI video since the category emerged.
  • Built-in provenance. Every video carries Google’s imperceptible SynthID watermark, and you can verify that a clip was generated by Omni through the Gemini app, Gemini in Chrome, or Google Search. As synthetic video gets indistinguishable from camera footage, verifiable watermarking shifts from nice-to-have to essential infrastructure.

Rollout is aggressive: Gemini Omni Flash is shipping to all Google AI Plus, Pro, and Ultra subscribers globally through the Gemini app and Google Flow, and at no cost to users of YouTube Shorts and the YouTube Create app. Full technical details live on the Gemini Omni page at Google DeepMind.

The strategic signal: Google is not selling Omni as a standalone video tool. It is wiring generation directly into the surfaces where billions of people already create — Search, YouTube, and the Gemini app — and letting agents call it as just another tool.

Gemini Spark: Your Personal Agent That Runs 24/7

The third pillar of the announcement is Gemini Spark, a general-purpose personal AI agent built on Gemini 3.5 Flash. Unlike a chatbot session that ends when you close the tab, Spark runs continuously, reasoning over information in your connected apps and taking action on your behalf — under your direction.

In practice, Spark lets you:

  • Delegate complex, multi-step work (“research these five vendors and draft a comparison doc”).
  • Set recurring tasks that execute on a schedule without re-prompting.
  • Teach the agent new skills it can reuse later.
  • Stay in control through proactive status updates and explicit approval requirements before consequential actions.

Spark is in beta, rolling out first to trusted testers and then to Google AI Ultra subscribers in the US. That cautious rollout is telling: an agent with standing access to your email, calendar, and files is a fundamentally higher-stakes product than a chatbot, and Google knows it. The approval-gate design — the agent proposes, you confirm — is the current industry consensus for keeping autonomous systems on a leash.

How to Build with the Gemini 3.5 API

Enough theory — here is how you actually call Gemini 3.5 Flash. The model is available to developers through the Gemini API in Google AI Studio, in Android Studio, and inside Google Antigravity. The example below uses the official Python SDK to run an agentic request with Google Search grounding and code execution enabled:

# Install the SDK first:  pip install google-genai
from google import genai
from google.genai import types

# Create a client (reads GEMINI_API_KEY from your environment if omitted)
client = genai.Client(api_key="YOUR_API_KEY")

response = client.models.generate_content(
    model="gemini-3.5-flash",
    contents=(
        "Find the three most-starred open-source MCP servers, "
        "then write and run a Python snippet that ranks them "
        "by stars-per-month since their first release."
    ),
    config=types.GenerateContentConfig(
        # Built-in tools: the model decides when to search or run code
        tools=[
            types.Tool(google_search=types.GoogleSearch()),
            types.Tool(code_execution=types.ToolCodeExecution()),
        ],
        # Control how much the model "thinks" before acting
        thinking_config=types.ThinkingConfig(thinking_budget=2048),
    ),
)

print(response.text)

This single request quietly demonstrates the agentic shift. You did not write a search query, parse results, or author the ranking script — you stated a goal. The model planned the steps, invoked Google Search for live data, generated Python, executed it in a sandbox, and returned a synthesized answer. Beyond the built-in tools shown here, Gemini 3.5 also integrates Google Maps and URL context, and connects to third-party platforms like Shopify, Box, and Databricks for enterprise workflows.

For larger systems, two new pieces of infrastructure matter. Antigravity 2.0 is a standalone desktop application that acts as a central hub for agent interaction, supporting parallel sub-agent execution, scheduled background tasks, and integrations across AI Studio, Android, and Firebase. And the Managed Agents API on Google’s Agent Platform lets you run custom agents inside secure, Google-hosted environments — so a multi-step workflow can execute autonomously without you managing the sandbox, state, or scaling yourself.

Gemini 3.5 Flash vs. Pro vs. Omni: Which Model for Which Job?

With three headline models (plus Spark sitting on top of them), choosing the right one is the first real decision you will make. This comparison reflects what Google has shipped or confirmed as of June 2026:

Aspect Gemini 3.5 Flash Gemini 3.5 Pro Gemini Omni
Primary job Agentic workflows, coding, high-volume tasks Deepest reasoning and hardest problems Any-to-any generation, starting with video
Availability Generally available since May 19, 2026 Rolling out June 2026 Omni Flash rolling out to AI Plus/Pro/Ultra
Context window Up to 1M tokens (65K output) Expected 1M+ tokens Multimodal input: text, image, audio, video
Pricing $1.50 / $9.00 per million tokens (in/out) Not yet announced Subscription tiers; free on YouTube Shorts
Standout strength ~4x output speed at under half the cost Maximum accuracy for high-stakes work Consistent characters, language-driven editing
Best for Production agents, apps, automation Research, architecture, complex analysis Creators, marketing, video prototyping

The practical default for most developers in 2026 is simple: start with Flash. Its benchmark profile means it is no longer the “compromise” tier, and its speed and cost structure are exactly what chained agentic calls need. Escalate to Pro only when a task demonstrably fails on Flash.

Why the Agentic Gemini Era Changes How You Build

It is worth pausing on the architectural implication, because it changes day-to-day engineering work. In the chatbot era, your application owned the control flow: you called the model, inspected the output, decided the next step, and called again. In the agentic era, the model owns much of that loop. Gemini 3.5 can decompose a task, assign sub-agents, carry context across steps, and pick tools — your job shifts to three things:

  1. Goal specification. Writing clear, verifiable objectives (“all tests pass and the diff is under 200 lines”) rather than step-by-step instructions.
  2. Tool design. Exposing safe, well-documented capabilities — increasingly via the Model Context Protocol — that the agent can compose.
  3. Guardrails and review. Defining what the agent may do autonomously versus what requires human approval, and logging everything in between.

A useful analogy: prompt engineering was like giving turn-by-turn directions to a driver. Agent engineering is like dispatching a courier — you specify the destination, the package, and the rules of the road, then verify delivery. The skill ceiling moves from wording prompts to designing systems.

Common Pitfalls to Avoid with Gemini 3.5 and Omni

Early adopters are already hitting predictable failure modes. Save yourself the debugging time:

  • Treating agent output as verified. An agent that runs for ten minutes across twenty tool calls produces an answer that feels authoritative. It can still be wrong. Always design a verification step — tests, schema validation, or human review — at the end of the loop.
  • Ignoring cost compounding. $1.50/$9.00 per million tokens sounds cheap until an agent loops. A runaway multi-step workflow with a 1M-token context can burn through budget fast. Set hard caps on iterations and token spend per task.
  • Over-scoping permissions. Do not give an agent write access to production systems “to be efficient.” Mirror Gemini Spark’s own design: propose-then-approve for anything irreversible.
  • Stuffing the 1M-token context unnecessarily. Huge context is a capability, not a strategy. Retrieval of the relevant 20K tokens usually beats dumping 800K tokens of repo into every call — in both cost and answer quality.
  • Assuming Omni output is rights-cleared and disclosure-free. SynthID watermarking identifies AI-generated video, but it does not absolve you of disclosure obligations in advertising or political content, and platform policies differ. Check before you publish.
  • Skipping the model-routing decision. Sending every request to the biggest model is the new “premature optimization in reverse.” Route simple tasks to Flash and reserve Pro for what genuinely needs it.

Frequently Asked Questions About Gemini 3.5 and Gemini Omni

What is the difference between Gemini 3.5 Flash and Gemini 3.5 Pro?

Gemini 3.5 Flash is the fast, lower-cost model optimized for agentic and coding workloads; it shipped generally available at I/O 2026 and is the default in the Gemini app. Gemini 3.5 Pro is the higher-end reasoning tier, confirmed by Google and rolling out in June 2026 for the hardest analytical tasks.

Is Gemini Omni free to use?

Partially. Gemini Omni Flash is included for Google AI Plus, Pro, and Ultra subscribers through the Gemini app and Google Flow, and it is available at no cost inside YouTube Shorts and the YouTube Create app. Full API access and higher tiers follow Google’s standard subscription and usage pricing.

What does “agentic AI” actually mean?

Agentic AI describes models that pursue goals autonomously rather than answering single prompts. An agentic model like Gemini 3.5 plans multi-step tasks, calls tools (search, code execution, third-party APIs), evaluates intermediate results, and continues until the goal is met — with human approval gates for consequential actions.

How much does the Gemini 3.5 API cost for developers?

Gemini 3.5 Flash is priced at $1.50 per million input tokens and $9.00 per million output tokens through the Gemini API, which Google says is less than half the cost of comparable frontier models. Pricing for Gemini 3.5 Pro had not been announced at launch.

Can you detect videos made with Gemini Omni?

Yes. Every Omni-generated video embeds SynthID, Google’s imperceptible digital watermark. You can verify whether a video came from Gemini Omni using the Gemini app, Gemini in Chrome, or Google Search — an increasingly important check as synthetic video becomes photorealistic.

What is Gemini Spark and who can use it?

Gemini Spark is Google’s personal AI agent built on Gemini 3.5 Flash. It runs continuously, connects to your apps, executes delegated and recurring tasks, and asks for explicit approval before sensitive actions. It is in beta, starting with trusted testers and Google AI Ultra subscribers in the US.

Conclusion: The Agentic Era Is a Platform Shift, Not a Feature

Google I/O 2026 drew a clean line under the chatbot era. Gemini 3.5 brings frontier-level agentic capability to the cheap, fast tier — with a 1M-token context, standout coding and tool-use benchmarks, and pricing built for chained autonomous calls. Gemini Omni extends the same intelligence into any-to-any generation, making consistent, editable, watermarked video a native model capability rather than a bolt-on. And Gemini Spark previews where this lands for everyday users: a persistent agent that works while you do not.

For developers, the takeaways are concrete. Default to Gemini 3.5 Flash and escalate only when needed. Invest your effort in goal specification, tool design, and guardrails rather than prompt wordsmithing. Cap costs and verify outputs, because autonomy without verification is just fast failure. The teams that internalize those habits now will be the ones shipping reliable agentic products while everyone else is still debugging runaway loops. The agentic Gemini era is not coming — as of May 2026, it is the default.