Google Cloud has unveiled its latest AI models, Gemini 1.5 Flash and Gemini 1.5 Pro, designed to revolutionize the creation of AI-driven solutions within organizations.
These models, now publicly available, promise enhanced performance, scalability, and cost efficiency. With capabilities like rapid response times, extensive context handling, and provisioned throughput, Gemini 1.5 aims to support various applications, from retail chatbots to document parsing.
Thomas Kurian, Google Cloud Chief Executive, states that “Provision throughput allows us to essentially reserve inference capacity for customers. But if they want to reserve a certain amount of capacity, for example, if they’re running a large event and they’re seeing a big ramp in users, as we’re seeing with some of our social media platform customers, that are able to reserve capacity at a time, so they don’t start seeing exceptions from a service-level point of view. And that’s a big step forward in assuring them when we take our models into general availability, or giving them an assurance on a service-level objective, both with regard to response time, as well as availability up-time.”
How To Replace Google Assistant with Gemini? Simple Steps to Follow
Gemini 1. 5 Flash
Gemini 1. 5 Flash producers have less latency, reasonable costs, and a context window appropriate for retail chat personnel, document parsing, and bots that can generate whole libraries. Google stated that Gemini 1. 5 Flash is said to be forty percent much faster than the GPT-3. 5 Turbo if an input in the form of 10000 characters is to be processed. It is four times cheaper than OpenAI’s model and context caching is used for inputs that are over 32,000 characters.
Gemini 1. 5 Pro
Gemini 1. 5 Pro in this regard, gives developers a context window of 2 million tokens which is the highest limit of the renowned AI models. This means the model can take in and analyze more text, before responding than it was potentially designed to do. From Kurian’s presentation, one can get a high-definition two-hour video and get the model understand it as one thing without having to chop it into small segments, instead, one can upload almost an entire day worth of audio or one or two hours of video or greater than 60000 lines of code or 1. 5 million words and many businesses have reported vast benefits from this.
Kurian goes further to explain that such distinctions sets Gemini 1. 5 Both Flash and Pro depend on contexts in which a certain customer will be using them. For instance, if a customer decides to treat a two-hour video as one content, he or she will use Gemini 1. 5 Pro. However, if they need something that requires ultra low latency, they would use flash because flash for example, is implemented to be faster with more predictable latency.
Google Gemini vs OpenAI’s ChatGPT: A Battle of AI Titans Compared
Google is also launching context caching in public preview for both Gemini 1.5 Pro and Flash. Context caching will allow models to store and reuse information they already have without recomputing everything from scratch whenever they receive a request. This feature is helpful for long conversations or documents and can reduce input costs by up to 75%.
Thus, with provisioned throughput, developers can better scale their usage of Google’s Gemini models. This feature determines how many queries or texts a model can process over time. Previously, developers were charged with a “pay-as-you-go model,” but now they have the option of provisioned throughput, which will give them better predictability and reliability when it comes to production workloads.
How to Use Gemini AI in Google Docs, Sheets, Slides, Gmail, and Drive?