Demonstrate an artificial intelligence system that can create images based on text input. The idea is that users can enter any descriptive text and the AI will convert that into an image. Created by the Brain Team at Google Research, the company says it offers “an unprecedented degree of photo-realism and a deep level of language understanding.”
This isn’t the first time we’ve seen AI models like this. (f) Headlines as well as images were created due to their skill at visualizing text. However, Google’s version is trying to create more realistic images.
To evaluate Imagen against other text-to-image models (including DALL-E 2, VQ-GAN + CLIP, and Latent Diffusion Models), the researchers created a benchmark called . This is a list of 200 text prompts entered on each form. Human reviewers were asked to rate each image. Google said they “prefer Imagen over other models in side-by-side comparisons, both in terms of sample quality and image text alignment.”
It is worth noting that the examples shown on the website are sponsored. As such, these may be the best images the model has created. May not accurately reflect most visual elements generated.
Like DALL-E, Imagen is not available to the public. Google does not believe it is yet suitable for use by the general public for a number of reasons. For one thing, text-to-image models are usually trained on large data sets that are removed from the web and not formatted, which leads to a number of problems.
“While this approach has enabled rapid advances in algorithms in recent years, data sets of this type often reflect social stereotypes, oppressive viewpoints, and degrading, or otherwise harmful, associations with marginalized identity groups,” the researchers wrote. “While a subset of our training data was filtered to remove noise and unwanted content, such as pornographic images and toxic language, we also used the LAION-400M dataset, which is known to contain a wide range of inappropriate content including pornographic images and slurs. Racism and harmful social stereotypes”.
As a result, they said, Imagen inherited “social biases and limitations of large language models” and may portray “harmful stereotypes and representations”. The team said initial findings suggest that the AI encodes social biases, including the tendency to create images of people with lighter skin color and place them in certain gender stereotyped roles. Additionally, the researchers note that there is potential for misuse if Imagen is made publicly available as is.
However, the team may eventually allow the audience to enter text into a copy of the form to create their own images. “In future work we will explore a framework for a responsible outsider that balances the value of external audit against the risks of unfettered open access,” the researchers wrote.
You can try Imagen on a limited basis. In , you can create a description using the predefined phrases. Users can select if the image should be a portrait or an oil painting, the type of animal shown, the clothes they wear, the action they perform and the setting. So, if you ever wanted to see an interpretation of an oil painting depicting a mysterious panda in sunglasses and a black leather jacket while surfing on a beach, this is your chance.
All products recommended by Engadget are handpicked by our editorial team, independently of the parent company. Some of our stories include affiliate links. If you buy something through one of these links, we may earn an affiliate commission.