Midjourney, OpenAI’s Dall-E 2 and Stability AI’s Stable Diffusion often struggle to render text.
(Bloomberg) — A new artificial intelligence startup is challenging more established rivals by solving a vexing problem: letting anyone create an image of a cat wearing a t-shirt with a clever slogan you can actually read.
Like its peers, Ideogram can whip up startlingly realistic images from short text prompts in a matter of seconds. But Ideogram, a Toronto-based startup that launched in August, can also go a step further and render text within those images. It can generate an image of a protester holding a legible sign or a cute cat in a t-shirt that clearly says, “Ask me about my AI startup.” Resolving what may seem like a niche technical issue has broad implications for the industry. When asked to render words in pictures, other popular AI image generators such as Midjourney, OpenAI’s Dall-E 2 and Stability AI’s Stable Diffusion often show nonsense.
Ideogram’s launch has the potential to shake up an increasingly crowded field of AI image generators — and also points to the next phase of this rapidly improving technology. A new version of Dall-E, set to be rolled out by OpenAI in October and currently available through Microsoft’s Bing Image Creator, appears similarly capable. OpenAI released an image showing an avocado patient without a pit saying to a spoon therapist, “I just feel so empty inside.” Stability AI can also represent text in images with software called DeepFloyd IF, but it is not easily accessible to most people.
Ideogram’s team includes several former Google employees who helped create the tech giant’s image-generation service, Imagen. The startup, which raised $16.5 million in seed funding in a round led by Andreessen Horowitz and Index Ventures, isn’t just focused on churning out images with text. Ideogram is also trying to make it more accessible for anyone to use AI to create compelling images, without typing the kind of complicated descriptions that spawned the phrase “prompt engineering.”
“Our goal is to make it as easy and as simple as possible for people to be able to engage in creative expression,” chief executive officer and co-founder Mohammad Norouzi told Bloomberg News. Norouzi said 1.1 million people have signed up for the free service since its launch, generating more than 80 million images so far (new users may have to sign up for a waitlist). Users enter their commands for the software on Ideogram’s website, and the service will respond by generating four images at a time.
With its features, Ideogram could eventually compete for business from marketers and creative professionals. However, by producing text and making it easier to churn out all kinds of pictures with AI, the startup also risks being used to spread misinformation, further undermining the credibility of images online.
It only took a moment to generate a reasonably realistic depiction of Albert Einstein holding up a sign that said, “Ask me anything” — similar to the kind of image people post as proof of their identity when conducting a question-and-answer session on Reddit. It’s not hard to imagine doing the same with a living public figure.
“I think that’s very reasonable to worry about,” said Nathan Lambert, a research scientist at Hugging Face, Inc. who writes regularly about AI studies. Midjourney, for example, has previously been shown to be easily tricked into making misinformation despite adding some safeguards to prevent it.
Norouzi said the potential for bad behavior is a “serious concern” for Ideogram. He doesn’t want its AI to be used to spread election-related disinformation, for instance, but like many in the tech industry, he also argues free speech is important. Ideogram’s small team tries to stop the spread of offensive content by automatically filtering certain images it produces (ones deemed, by software, inappropriate) and instead showing a picture of a cat holding up a sign that says “maybe not safe.”
All images users create with Ideogram, and all prompts they submit, are currently public. The company hopes this choice will help build a community around the product and encourage decent behavior. Even without a search function, however, it’s not hard to find images that skirt the line between family-friendly and NSFW, such as depictions of female celebrities covered in “body paint.”
Ideogram users mostly appear to be harnessing its ability to yield text for creative purposes. There are posters and T-shirt designs, holiday greetings, faux needlepoint and tarot cards. The demand is so high that users are frequently forced to wait 30 seconds or longer between image generations as the service struggles to keep up (an issue that has incited some users to create images of protestors holding up signs that say things like, “YOU NEED MORE SERVERS.”
“They have figured out how to truly unleash infinite, high-quality creativity from people who would never have considered themselves artists,” said Anjney Midha, a general partner at Andreessen Horowitz who invested in Ideogram before joining the venture capital firm.
Producing crisp-looking images that include legible text has long been a challenge for other popular AI image generators. Anima Anandkumar, a professor at the California Institute of Technology, explained it as an issue of “garbage in, garbage out” — a phrase often used to reference the idea that bad training data tends to yield bad results.
Before a generative image system can respond to a written prompt, it has to be fed mounds of images — including pictures of tons of different objects — and corresponding written captions. Pictures of apples or flowers might be included in different lighting and at different angles to help AI determine those concepts, Anandkumar noted. But text within those images may be of varying quality, incomplete or poorly lit, and there isn’t typically a ton of it in the images used to develop these tools. That leads to a poor grasp of the concept of what text is.
“This could be fixed with getting better data — getting data that’s text-centric,” Anandkumar said.
Norouzi didn’t explain exactly how Ideogram is able to produce text better than competitors. In general, Norouzi noted, generative AI tools that can take in written prompts and spit out text or images have improved as the scale of the model and its training data has increased. He said Ideogram instructs its model to pay attention to details such as quotation marks that are included in prompts. Norouzi would not detail the sources of its training data but said the company tried to include images that have text in them and has its own internal datasets.
“Our model tries to create text in the context of other objects and figure out its own typography — how to fit text to the constraints of the canvas,” Norouzi said.
This can be seen in some of the images users have made with Ideogram, ranging from an illuminated light bulb with “great idea” rendered inside it in neon letters to a cake covered in candles with the message “Happy birthday Andres” on the sides in a fondant-like font. For now, the text in these images is limited mainly to English, but Norouzi hopes to be able to generate text in numerous languages and alphabets over time.
On the company’s Discord channel, where Norouzi often chats with users, he has said the startup intends to let people generate images privately. Its text capability may also eventually help the company make money from businesses who want to use it to design logos and other marketing products.
Norouzi said the startup is planning to roll out a paid offering at some point that will let people use its service more quickly — and perhaps help it shoulder the high computing costs of building and operating AI.
“It’s not something we want to do quickly. We just got started,” Norouzi said. “But because of the economics of how things work in the AI space, that is inevitable.”
More stories like this are available on bloomberg.com
©2023 Bloomberg L.P.