Subscribe to our daily and weekly newsletters to receive the latest updates and exclusive content on industry-leading AI reporting. Learn more
In its latest push to redefine the AI landscape, Google announced Gemini 2.0 Flash Thinkinga multimodal reasoning model capable of tackling complex problems quickly and transparently.
In one Post on social network XGoogle CEO Sundar Pichai wrote: “Our most sophisticated model yet :)”
And on that Developer documentationGoogle explains: “Thinking Mode has stronger reasoning skills in its answers than the base Gemini 2.0 Flash model“, Google’s latest and greatest yet, was released just eight days ago.
The new model only supports 32,000 tokens of input (approx 50-60 pages of text) and can produce 8,000 tokens per output response. In a side panel on Google AI Studio, the company claims it’s best for “multimodal understanding, thinking” and “coding.”
Full details about the model’s training process, architecture, licensing, and cost have yet to be released. Currently, Google AI Studio does not display a cost per token.
More accessible and transparent reasoning
In contrast to competitors’ argumentation models o1 and o1 mini from OpenAIGemini 2.0 allows users to access the step-by-step reasoning via a drop-down menu, providing clearer and more transparent insight into how the model reaches its conclusions.

By allowing users to see how decisions are made, Gemini 2.0 addresses long-standing concerns about AI’s “black box” function and brings this model – the licensing terms of which are still unclear – on a level playing field other open source models from competitors.
My early simple tests of the model showed that it correctly and quickly (within one to three seconds) answered some questions that were notoriously difficult for other AI models, such as counting the Rs in the word “strawberry.” (See screenshot above).
In another test, comparing two decimal numbers (9.9 and 9.11), the model systematically broke down the problem into smaller steps, from analyzing whole numbers to comparing decimals.
These results are supported by independent third-party analysis LM arenawhich named Gemini 2.0 Flash Thinking the best performing model in all LLM categories.
Native support for image uploads and analysis
A further improvement over the competing OpenAI o1 family, Gemini 2.0 Flash Thinking is designed to process images from the jump.
o1 started as a pure text model, but has since been expanded to include the analysis of image and file uploads. Both models can currently only return text.
Gemini 2.0 Flash Thinking also does not currently support linking to Google Search or integration with other Google apps and external third-party tools Developer documentation.
Gemini 2.0 Flash Thinking’s multimodal capability expands its potential use cases and enables it to address scenarios that combine different data types.
For example, in one test, the model solved a puzzle that required analysis of text and visual elements, demonstrating its versatility in integrating and reasoning across formats.
Developers can use these features through Google AI Studio and Vertex AI, where the model is available for experimentation.
As the AI landscape becomes increasingly competitive, Gemini 2.0 Flash Thinking could mark the beginning of a new era for problem-solving models. Its ability to process diverse data types, provide visible inferences, and deliver scalable performance makes it a serious contender in the inferential AI market, competing with OpenAI’s o1 family and beyond.
Source link