The Artificial Revolution (AI) has changed the digital landscape for better or for worse. Microsoft-backed OpenAI is at the top of the AI chain with its most advanced model, GPT-4, leaving behind mega-corporations like Google.
Following the tremendous surge in ChatGPT’s popularity, Google introduced its own conversational AI model called BARD. Tough Bard was supposed to be ChatGPT’s rival, but it could not compete with it and was soon forgotten. Since then, Google has been trailing behind in the AI race.
But this is not the case anymore. Google’s subsidiary, Google Deepmind, has announced the release of Gemini, its most advanced AI model that will rival OpenAI’s GPT series.
What is Google Deepmind?
In 2010, Demis Hassabis, Shane Legg, and Mustafa Suleyman founded Deepmind Technologies Limited in London. Initially, Deepmind worked on networks similar to brains, with a particular focus on playing video games from the 80s and 90s more skillfully than human experts. The purpose of the startup was to develop an AI general learning system that could be used for anything.
In 2014, Google acquired the AI research laboratory for $400 million. One year following the acquisition, DeepMind’s founders initiated negotiations to secure more independence from the parent company. After eight years of struggle, they were finally successful.
DeepMind merged with Google AI’s Google Brain division in April 2023 and thereby formed Google DeepMind, concluding their demand for increased autonomy.
Google Deepmind has spearheaded numerous pivotal AI breakthroughs, like Alpha GO, PaLM-2, AlphaFold, and now Gemini, their most advanced model.
Google Deepmind Technologies
Here are the technologies of Google Deepmind. Let’s take a look:
SynthID is a unique tool developed by Google Opemind in partnership with Google Research. The tool watermarks AI-generated images and audio. The watermark is not visible to the human eye; however, it can be perceived by AI detector tools.
SynthID works in two ways:
- Watermarking: It embeds a digital watermark on AI-generated images and audio. It is expected to expand further.
- Identification: It identifies the digital watermark put on AI-generated content to help assess originality.
Phenaki is a tool that allows users to generate a fully-fledged video using text prompts. It overcomes challenges like high computational costs, variable video lengths, and limited availability of high-quality text-video data, by leveraging two components:
- An encoder-decoder model that squishes videos into small, separate code-like pieces called tokens. This allows it to handle videos of different lengths.
- A bi-directional masked transformer starts with text that has been turned into tokens. Then it changes these text tokens and makes new ones for video. After this, Phenaki changes those video tokens back to create the desired video.
Alphafold is one of the greatest inventions of Google Deepmind to date. It can predict protein structure from their amino acid build. It works on a sophisticated learning system, trained on a very big collection of protein shapes and blocks that make them.
After four years of development, AlphaFold can accurately predict the shape of a protein, even if it has not been seen before.
Imagen is a text-to-image generator developed by Google Deepmind. Built on strong language models, it can create lifelike images. The images created by Imagen are hyperrealistic and rival the ones generated by Open AI’s DaLL-E 2, Midjourney, and Q-GAN+CLIP.
5. AlphaZero and MuZero
AlphaZero and MuZero are AI breakthroughs that can master any board or video game. AlphaZero is the successor of AlphaGo, the AI that defeated Lee Sedol, the professional Go player.
Alphazero works through reinforcement learning. It learns a game by playing against itself a thousand or million times. According to Google, it mastered chess in just 9 hours, Shogi in 12 hours, and Go in 13 days.
MuZero can master the above-mentioned games without being told of the rules. It learns the current environment of the game it is playing and then makes a plan for what to do next.
AlphaGo is taught the ancient Chinese game called Go. Go is an ancient board game that requires immense creativity and strategy. In 2016, this AI system created history after it defeated world champion Go player, Lee Sedol in a five-round match.
Its success allowed for the development of AlphaZero and MuZero.
7. PaLM 2
It is an advanced large language model that can carry out difficult tasks. PaLM 2 can code, solve math problems, reasonings and classifications, translate, and speak multiple languages as well.
PaLM-SayCan is a result of collaboration between Google AI and Everyday Robots. Its goal is to create robots that can understand everyday speech and carry out tasks accordingly. It works on a three-step strategy. First, the prompt is given, then the robot will interpret it, and then execute the task.
9. Universal Speech Model
Universal Speech Model or USM is an advanced speech model developed by Openmind. According to Google, it is “trained on 12 million hours of speech and 28 billion sentences of text, spanning 300+ languages.” It is utilized by YouTube and is capable of automatic speech recognition, also known as ASR.
WaveNet is a text-to-speech generator. Trained on human speech models, it makes sound patterns by guessing which sounds usually come next. The output sounds natural, as it includes sounds like breathing and lip-smacking. It also includes tone of voice and the different ways people speak from around the world.
AlphaDev learns by reinforcement to find better algorithms in computer science. This system aims to automate the job of creating algorithms to help take the load off of human experts, as the task is often challenging and time-consuming.
AlphaCode is an artificial intelligence that can write computer codes by itself from natural language. It was created to make programming easier and faster, taking away the heavy work of coding from the programmers. AlphaCode is very good for work that requires deep, logical thinking.
AlphaTencer is an impressive model developed by Openmind. It looks for new ways to do basic things, such as multiplying matrices together. It uses reinforcement learning to find the best steps for doing certain jobs.
This AI system was developed to master one of the most popular and challenging real-time games of all time, StarCraft II. It is considered a breakthrough in game-playing AI, being the first to achieve Grandmaster level, a rank reserved for the top 0.1% of human players.
Gemini is a large language model (LLM) developed by Google AI. It is the most powerful LLM developed by Google to date and is considered to be a big step forward for computer intelligence. Gemini is the first AI model that does better than human experts in MMLU (Massive Multitask Language Understanding). It is designed to directly compete with Open AI’s GPT models.
Google Opemind, the daughter company of Google, is dedicated to pushing the boundaries of research and development in AI.
Google Deepmind is a subsidiary of Google that aims to create general Artificial Intelligence (AI) systems to be used for different purposes.
Deepmind is famous for its innovative AI technologies and systems. It rose to prominence after its AlphaGO defeated a professional player of Chinese Go in 2016.
Deepmind, the startup founded in 2010 was acquired by Google in 2014 for $400 million. It became a part of Google AI’s Google Brain division in April 2023 and became a subsidiary.