Google's ambitious entry into the realm of generative artificial intelligence (AI) comes in the form of Gemini, a recently launched platform developed by Google's AI research labs, DeepMind and Google Research. This guide aims to provide an objective overview of Gemini, exploring its features, capabilities, and how it compares to existing competitors.
Gemini represents Google's latest foray into the world of generative AI, featuring three distinct models: Gemini Ultra (the flagship model), Gemini Pro (a lighter version), and Gemini Nano (a smaller model for mobile devices). Unlike previous models, Gemini stands out for being "natively multimodal," capable of processing not only text but also audio, images, and videos. This sets it apart from models like Google's LaMDA, which is text-centric.
Contrary to clear branding, Google's Bard is not Gemini itself but serves as an interface for accessing certain Gemini models. Drawing a parallel to OpenAI's products, Bard is akin to ChatGPT, while Gemini corresponds to the underlying language model. Additionally, Gemini is independent of Imagen-2, a text-to-image model, causing confusion among users.
Gemini's multimodal nature implies a wide range of potential applications, from transcribing speech to captioning images and videos to generating artwork. However, real-world applications are still limited. Google's track record, including a controversial video showcasing Gemini's capabilities, has left some skepticism about the platform's current capabilities.
Gemini Ultra: Primarily showcased through Google-led demos, Ultra is set to launch more broadly later this year. It promises applications in physics homework assistance, problem-solving, and information extraction from scientific papers.
Gemini Pro: Publicly available, Pro offers improvements in reasoning and understanding over its predecessor, LaMDA. However, users have reported challenges with math problems and factual errors.
Gemini Nano: A smaller, efficient version designed for mobile devices, currently powering features like summarization in Recorder and Smart Reply in Gboard on the Pixel 8 Pro.
While Google claims Gemini's superiority on benchmarks, practical applications and early impressions suggest mixed results. Gemini's benchmark performance is marginally better than GPT-4, but users have noted issues with basic facts, translations, and coding suggestions.
Gemini Pro is currently free in Bard, AI Studio, and Vertex AI preview. Once it exits preview in Vertex AI, it will cost $0.0025 per character for input and $0.00005 per character for output.
Gemini Pro is accessible in Bard and Vertex AI, with AI Studio providing developers with customization options. Gemini Nano is currently featured in Pixel 8 Pro and will expand to other devices. Duet AI for Developers and other Google development tools will incorporate Gemini in early 2024.