Newsletter

Google Unveils Gemini 1.5 Pro: A Game-Changer in AI Technology

(Image = Google)

Google has launched the large multimode model (LMM) ‘Gemini 1.5 Pro’. It is characterized by expanding the context window to about 5 times the current maximum level, allowing it to process huge amounts of text or images at once. For the first time in a long time, confidence has been revealed that it has a competitive advantage comparable to Open AI’s ‘GPT-4’.

Reuters and Bloomberg reported on the 15th (local time) that Google released an upgraded version of ‘Gemini 1.0 Pro’, which was released on December 7 last year.

According to this, Gemini 1.5 Pro is characterized by its ability to process large amounts of data and has a context window that supports 1 million tokens at once. It is five times more than the 32,000 of the current ‘Gemini 1.0 Pro’ and the 128,000 of the Open AI ‘GPT-4 Turbo’, as well as the 200,000 of the previous most Antropic ‘Claude 2’ .

This means that the amount of data that can be entered at once has increased enormously. It is explained that it is possible to enter more than 700,000 words of text, more than 30,000 lines of code, 1 hour of video, and 11 hours of audio files.

It was emphasized that this will significantly increase the number of uses. Through a demo video, Google showed Gemini 1.5 Pro performing a plot summary of a 44-minute movie in 55 seconds.

Demo of Google’s Gemini 1.5 Pro summarizes the plot of a 44-minute movie (Photo = Google)

Performance has also been greatly improved compared to the previous version. It shows performance equivalent to the top spec ‘Gemini 1.0 Ultra’ released last week, and showed 87% better performance in benchmark tests than the current Gemini 1.0 Pro.

In addition, it operates faster and more efficiently by using ‘Mixed Experts (MoE)’ technology. MoE divides the Large Language Model (LLM) into small specialist models (Experts) who are in charge of each field, such as biology, physics, and mathematics, and it links or mixes several types of specialist models depending on the question. In this case, the cost and time is much less than running the entire large model.

Google CEO Sundar Pichai also gave a lot of support. “This launch is one of the breakthroughs that will revitalize Google’s business,” he said. “This model can help people significantly expand the questions they ask.”

“With the larger context window, filmmakers can ask AI to judge films like a critic, and the range of uses will increase almost infinitely,” he said.

Gemini 1.5 Pro is provided as a preview version through ‘Google AI Studio’, an AI development tool for developers, and ‘Vertex AI’, a platform that allows companies to deploy AI models.

In the future official version, we plan to provide a default context window of 128,000 tokens and expand the context window depending on the purchase option.

Meanwhile, Google’s announcement came two days after OpenAI announced a new version of ChatGPT that added a ‘long-term memory’ function, and one day after reports that it was developing a search tool to catch up with Google.

In particular, both companies attracted attention with their LMM competition even before the launch of Gemini in November last year. At the time, Google wasn’t sure if Gemini had better performance than OpenAI’s ‘GPT-4V’ LMM, so there were even reports that the release was delayed.

However, this upgrade can be considered to have a relatively clear advantage factor that CEO Pachai emphasized. As such, the competition between the two companies is intensifying, and AI technology in general is developing rapidly.

Reporter Park Chan cpark@aitimes.com

#Google #launches #Gemini #Pro #greatly #expanded #context #window.. #Confident #comparative #advantage #GPT4