They don't mention it in the post, but it looks like this includes a price increase for the Gemini 2.5 Flash model.
For 2.5 Flash Preview https://web.archive.org/web/20250616024644/https://ai.google...
$0.15/million input text / image / video
$1.00/million audio
Output: $0.60/million non-thinking, $3.50/million thinking
The new prices for Gemini 2.5 Flash ditch the difference between thinking and non-thinking and are now: https://ai.google.dev/gemini-api/docs/pricing
$0.30/million input text / image / video (2x more)
$1.00/million audio (same)
$2.50/million output - significantly more than the old non-thinking price, less than the old thinking price.
At one point, when they made Gemini Pro free on AI Studio, Gemini was the model of choice for many people, I believe.
Somehow it's gotten worse since then, and I'm back to using Claude for serious work.
Gemini is like that guy who keeps talking but has no idea what he's actually talking about.
I still use Gemini for brainstorming, though I take its suggestions with several grains of salt. It's also useful for generating prompts that I can then refine and use with Claude.
I am very impressed with Gemini and stopped using OpenAI. Sometimes, I ping all three major models on OpenRouter but 90% is on Gemini now. Compare that to 90% ChatGPT last year.
Love to see it, this takes Flash Lite from "don't bother" territory for writing code to potentially useful. (Besides being inexpensive, Flash Lite is fast -- almost always sub-second, to as low as 200ms. Median around 400ms IME.)
Brokk (https://brokk.ai/) currently uses Flash 2.0 (non-Lite) for Quick Edits, we'll evaluate 2.5 Lite now.
ETA: I don't have a use case for a thinking model that is dumber than Flash 2.5, since thinking negates the big speed advantage of small models. Curious what other people use that for.
Curious to hear what folks are doing with Gemini outside of the coding space and why you chose it. Are you building your app so you can swap the underlying GenAI easily? Do you "load balance" your usage across other providers for redundancy or cost savings? What would happen if there was ever some kind of spot market for LLMs?
I switched to 2.5 Flash (non-think) for most of my projects because it was such a good model with good pricing.
Cost is an important factor so hoping that flash-lite is sufficient, even tho its somtimes more than 50% worse in relevant benchmarks which sucks.
Was also just looking at 4.1-mini but thats more expensive and often scores around the same as flash-lite in benchmarks (except coding which i dont care about).
Crazy to think that even after this move by google, openai is still the worse option for me, at least regarding API. Other than API im actually using chatgpt (o3/o4-mini, 4o is a joke) a lot more again lately after 2.5 Pro got nerfed
I had a great-ish result from 2.5 Pro the other day. I asked it to date an old photograph, and it successfully read the partial headline on a newspaper in the background (which I had initially thought was too small/blurry to make out) and identified the 1980s event it was reporting. Impressive. But then it confidently hallucinated the date of the article (which I later verified by checking in an archive).
6.33X increase in the price of Audio processing compared to 2.0 Flash-Lite
Gemini 2.5 Flash Lite (Audio Input) - $0.5/million tokens
Gemini 2.0 Flash Lite (Audio Input) - $0.075/million tokens
Wonder what led to such a high bump in Audio token processing
I run a batch inference/LLM data processing service and we do a lot of work around cost and performance profiling of (open-weight) models.
One odd disconnect that still exists in LLM pricing is the fact that providers charge linearly with respect to token consumption, but costs are actually quadratic with an increase in sequence length.
At this point, since a lot of models have converged around the same model architecture, inference algorithms, and hardware - the chosen costs are likely due to a historical, statistical analysis of the shape of customer requests. In other words, I'm not surprised to see costs increase as providers gather more data about real-world user consumption patterns.
Wishing they release the Gemini Diffusion model. It'll quickly replace the default model for Aider.
for anyone, who was expecting more news: the GA models benchmark basically the same as the last preview models. It's really just Google telling us that we get less api errors and this model will have a checkpoint for a longer time.
I'm glad that they standardized pricing for the thinking vs non-thinking variant. A couple weeks ago I accidentally spent thousands of extra dollars by forgetting to set the thinking budget to zero. Forgetting a single config parameter should not automatically raise the model cost 5X.
[edit] I'm less excited about this because it looks like their solution was to dramatically raise the base price on the non-thinking variant.
Blended price (assuming 3:1 for input:output tokens) is 3.24x of what was stated before [1], and now nearly 5x of 2.0 Flash. Makes 2.0 Flash a still competitive option for many use-cases, particularly ones that aren't coding-heavy I think. A slightly poorer performing model can net perform better through multiple prompt passes. Bummer, was hoping 2.5 Flash would be a slam dunk choice.
[1] - https://web.archive.org/web/20250616024644/https://ai.google...
Good luck using 2.5 for anything non-trivial.
I have about 500,000 news articles I am parsing. OpenAI models work well but found Gemini had fewer mistakes.
Problem is; they give me a terrible 10k RPD limit. To increase to the next tier, they then require a minimum amount of spending but I can't reach that amount even when maxing the RPD limit for multiple days in a row.
I emailed them twice and completed their forms but everyone knows how this works. So now I'm back at OpenAI, with a model with a bit more mistakes but that won't 403 me after half an hour of using it due to their limits.
Not sure where else to post this, but when attempting to use any of the Gemini 2.5 models via API, I receive an "empty content" response about 50% of the time. To be clear, the API responds successfully, but the `content` returned by the LLM is just an empty string.
Has anyone here had any luck working around this problem?
Anyone else unable to access 2.5-pro via api? I'm currently getting "Publisher Model `projects/349775993245/locations/us-west4/publishers/google/models/gemini-2.5-pro` was not found or your project does not have access to it. Please ensure you are using a valid model version."
It's a bummer that 2.5 Pro is still removed from the free tier of the API.
been testing gemini flash lite. latency is good, responses land under 400ms most times. useful for low-effort rewrites or boilerplate filler. quality isn’t stable though : context drifts after 4-5 turns, especially with anything recursive or structured. tried tagging it into prompt chains but fallback logic ends up too aggressive. good for assist, not for logic, wouldn't anchor anything serious on it yet
Is there a Codex/Claude Code competitor on the way?
I dream of a day when LLM naming follows a convention.
Classic bait-and-switch to make developers build things on top off models for 2 months, and then raise input price by 2x and output by 4x. But hey, it's Google, wouldn't expect anything else from an advertising company.
i cancelled chatgpt early this year and switched to Gemini, with Gemini making progress rapidly, i wonder if openai already lost the battle
Gemini 2.5 doesn’t get enough credit for the quality of its writing in non-code (eg law) topics. It’s definitely a notch below Claude 4, but well ahead of ChatGPT 4o, 4.5, o3.
I tried using the three new models to transcribe the audio of this morning's Gemini Twitter Space.
I got very strong results from 2.5 Pro and 2.5 Flash, but 2.5 Flash Lite sadly got stuck in a loop until it ran out of output tokens:
Um, like, what did the cows bring to you? Nothing. And then, um, and then, uh, and then, uh, and then, uh, and then, uh, and then, uh, and then, uh, and then, uh, and...
Notes on my results (including the transcripts which worked, which included timestamps and guessed speaker names) here: https://simonwillison.net/2025/Jun/17/gemini-2-5/#transcribi...
Considering moving from Groq Llama 3.3 70b to Gemini 2.5 Flash Lite for one of my use cases. Results are coming in great, and it's very fast (important for my real-time user perception needs).
What kind of rate limits do these new Gemini models have?
Anyone else unable to access 2.5-pro via api? I'm currently getting "Publisher Model `projects/349775993245/locations/us-west4/publishers/google/models/gemini-2.5-pro` was not found or your project does not have access to it. Please ensure you are using a valid model version."
I am always disappointed when I compare the answers to the same queries on 2.5 Pro vs. o4-mini/o3. But trying out the same query in AI Studio gives much better results, closer to OpenAI's models. What is wrong with 2.5 Pro in the Gemini app? I can't believe that the model in their consumer app would produce the same benchmark results as 2.5 Pro in the API or AI Studio.
Which are what?
I mean the model names are always a bit odd, but flash-lite is particulary good!
I have a huge background.js file from a now removed browser extension that the Devs made into a single line. Around 800KB of a single line file I think....
I tried many free stuff to try to refactor it but they all loose context window quickly.
I really wish all the AI companies would down tools on all development until they work out file downloads, ftp, sftp, git ANY way to access the files other than copy paste and “download file”.
The workflow is crushingly tedious.
And no I don’t want to use an AI IDE or some other tool. I like the UI of Gemini chat and AI Studio and I want them improved.
Gemini strangely says you cannot upload all sorts of file types.
But it accepts them just fine if you upload a zip file……. which you can only do in AI studio.
I need an AI model to be able to keep track of the AI model names.
2.5 Flash Lite seems better at everything compare to 2.0 Flash Lite with the only exception being SimpleQA, so there is probably a small tradeoff on pop culture knowledge for coding, math, science, reasoning and multimodal tasks.
[flagged]