Google has just introduced a smart tool known as implicit caching that helps developers to save money when using their AI models, Gemini 2.5 Pro and Gemini 2.5 Flash. This tool automatically holds back pieces of former work to minimize costs with up to 75% savings on repetitive tasks.
This is great news for developers who build apps, chatbots, or tools using Google’s AI. Many were upset about high bills from Google’s older system, but this new tool fixes that problem.

How the New Tool Works
Reusing Answers to Save Time and Money
Imagine you ask a friend the same question every day, like “What’s the weather today?” Instead of answering again, your friend just says, “Same as yesterday!” That’s how implicit caching works.
Here’s the breakdown:
- Automatic Savings: The tool checks if new questions start the same as old ones. If they do, it reuses the old answer.
- No Extra Work: Developers don’t have to set anything up. It works by itself.
- Cheaper Bills: Google says this can cut costs by 75% for parts of the AI’s work.
For example, a pizza app using Gemini AI to answer “What toppings do you have?” every time can now save money by reusing the answer.
Why This Helps Developers
Building apps with AI can get expensive. Every time the AI answers a question, it costs money. With implicit caching:
- Startups Save Cash: Small companies can build apps without huge bills.
- Faster Apps: Reusing answers speeds things up.
- Less Repetition: The AI doesn’t waste energy on the same tasks.
Google’s Gemini 2.5 Flash model now costs 0.15 per million “tokens” (pieces of words), and the Pro model costs 1.25 per million. This tool makes those prices even lower for common questions.
What Went Wrong Before
Google’s old system, called explicit caching, made developers do all the work. They had to manually pick which questions to reuse. But it was confusing and sometimes made bills higher, not lower.
Last week, many developers complained online. Google apologized and promised to fix it. Now, implicit caching does the job automatically.
Tips to Save Even More
Google says developers should:
- Put repeating words first: Like starting questions with “Show me the menu for…”
- Keep changing parts last: Like adding “…today” after the fixed part.
- Check bills: Ensure its operation and money saving.
What Developers Should Watch Out For
- Not Perfect Yet: Google’s claim of 75% has not been confirmed by others.
Big Jobs Cost More: Savings if a task uses more than 200,000 tokens (≈150,000 words) fall.

Why This Matters
AI costs are a big challenge for the apps and startups. The new tool from Google helps to make one cost less to make cool things. If it does work well, other companies may follow the lead of OpenAI. For now, developers should try the tool, adhere to Google tips and monitor their bills.