In today’s announcement at this year’s Google I/O, Google Cloud unveiled two major features, including a new foundation AI model for image generation and AI model fine-tuning capacity that uses human feedback.
Vertex AI, launched last year, provides businesses a cloud-based platform with a set of tools for building, training and using AI models. Some of the tools are advanced and designed for developers to use application programming interfaces to embed AI and machine learning models into their own apps, while others are for less tech-savvy users to embrace AI capabilities.
“We want to make it as easy as possible to interact with, to tune, to customize, to deploy these models, and make it very simple to do that,” Nenshad Bardoliwalla, director of Vertex AI at Google Cloud, told SiliconANGLE in an interview. The first part of Google’s solution is the Model Garden, which is where the new generative AI models reside. “We provide both Google models as well as open source models, so that there’s a wide variety of models, but they’re also ensured from an enterprise perspective in terms of enterprise governance and safety and the like,” he said.
The first new model, Imagen, is a text-to-image generative AI that allows customers to create images by entering natural language text prompts and customize them just by typing in what they want it to generate.
For example, a user could prompt the AI to create a fashion product for sale such as a dress or a handbag. The AI could be prompted to create a “studio photo of a teal colored bucket bag,” perhaps for an app being designed in-house, to enable a product team to come up with ideas, and it will produce potential handbags.
From there it’s even possible to edit the image with prompts by asking the AI to change specific portions such as the color, the texture, the handles or add a tassel – it can even change the color of the handles individually or the tassel after it is added. This is all possible because the AI is capable of breaking apart the image and contextually understanding the object or scene and modifying it at a granular level.
This doesn’t even require an image generated by the AI, users can upload their own image and have the AI customize it by entering prompts.
If a user has a photo of a handbag that they want to tailor the model on it can be uploaded and it can be instructed to use it easily. This will allow the image generation model to add real-world objects into scenes for numerous use cases, such as advertising purposes. A branded handbag could be quickly added to a table amid other objects, or edited using the AI image generator to modify its handles, color or material to see what it might look like with just a text prompt.
The Imagen model is also capable of captioning images provided to it. For example, if a user were to create or upload a green handbag and a bowl of fruit, they could have the AI generate a caption for it and it would produce the words: “A green handbag sitting on a white table next to a bowl of fruit.” Currently captions are available in more than 300 different languages.
Google Cloud also unveiled two other new generative AI models. They include Codey, a text-to-code generator that allows customers to train a model on their own codebase to produce new code, and Chirp, a speech-to-text model that allows the creation of voice-enabled applications, including superior speech detection and generation.
Codey’s code generation model supports over 20 coding languages, including Go, Java, Javascript, Python and Typescript. Developers will be able to access it inside their cloud development interface and have it suggest the next lines of code based on context or have it generate code from a prompt. They can also enter into a “code chat” and hold a conversation with a Codey bot that can assist with debugging, producing documentation, learning new concepts and code-related questions.
Trained on millions of hours of audio, Chirp is a speech model that supports over 100 languages designed to assist organizations engage with their customers more inclusively in their native languages. It makes it possible to build virtual contact center agents capable of understanding human speech, it can also caption videos and offer voice assistance in apps.
Developers will also gain access to the Embeddings API for text and images, which will allow them to build recommendation engines, text classifiers and other sophisticated apps.
These new models, including Imagen, are available in Vertex AI’s Model Garden and Generative AI Studio, a managed environment where developers can interact with, train and tune AI models. From Model Garden, developers can also add AI models they have developed using Studio directly into their software using application programming interfaces. Both Model Garden and Generative AI Studio are available in public preview today.
Using human feedback to make AI models work better
Google Cloud is also launching a new AI fine-tuning mechanism that uses human feedback to help build better models called “reinforcement learning from human feedback” for AI text generation.
Fine-tuning an AI model is the process of customizing it to behave in a certain way. Ordinarily, this means adjusting it so that it produces more accurate information and performs in a certain manner for a specific task, such as producing summaries, writing web code, creating symphony music, finding insights in financial reports or other specific tasks.
For example, a developer might want an AI that does a good job accurately summarizing documents, so its parameters would be adjusted by the developer until it produces good summaries. This is often done by preparing additional data beforehand that’s specific for the new task or adjusting it directly to make it fit the desired needs. All that can often be quite time-consuming because it can involve a lot of experimentation.
With Vertex AI’s new mechanism, it’s now possible to tune the AI by using a reward system based on human feedback to steer the AI toward behaving in the manner that the developer wants instead.
In the example above, the AI would be asked to summarize documents, and the document and the summary would be shown to people. If the summary matched the expectations of the people reading they would give it a thumbs up, and if not they would give that thumbs down. All of this feedback would then be run back into a preferences file that could be used to inform the AI how to behave.
“All the human has to do is give a thumbs up saying ‘yup’ this is the summary I like, or ‘nope’ this is not the summary I like and then we feed that in the reinforcement learning against the base model,” said Bardoliwalla. “We boost the signal of the thumbs up for the tighter summaries, and we deemphasize the summaries for the longer summaries so the model learns what our preferences are.”
In this manner, it could be used to produce better summaries of documents, create natural sounding blog posts from content or even generate tweets. The same sort of human-led preferences could even be used to instill the values of a company into an AI to help reduce toxicity or other dangerous outputs by using feedback from employees — adding an additional guardrail to AI responses.