May 25, 2022

Earlier last year, OpenAI showed off an amazing new AI model called DALL-E (a combination of WALL-E and Dali) that can draw almost anything and in almost any style. But the results were seldom what one wanted to hang on the wall. Now DALL-E 2 is out and it does what its predecessor did so much better – really shocking. But new features come with new restrictions to prevent abuse.

DALL-E was detailed in our original post, but the bottom line is that it is capable of picking up some pretty complex signals like “a bear riding a bike through a mall, a picture of a cat next to an ad stealing from Liberty.” It wants to match and is likely to meet user standards in hundreds of outputs.

DALL-E 2 does the same, turning a text prompt into a remarkably accurate image. But he learned some new tricks.

First, it’s better to make the original. The graphics coming out the other side of the DALL-E 2 are many times larger and more detailed. This is very fast, even though more images are generated, meaning that in just a few seconds, more options can be created so that the user can wait.

“Sea otter in the style of a girl with pearl earrings” is not bad.

Part of this improvement is due to moving to a diffusion model, a type of image creation that starts with pure noise and refines over time, repeatedly making it a little more like the requested image until the noise disappears. But it’s also just a smaller and more efficient model, say some of the engineers working on it.

Secondly, DALL-E does what they call “painting over”, which is essentially a smart replacement of a certain area of ​​the image. Let’s say you have a photo of your house, but there are dirty dishes on the table. Simply select that area and instead describe what you want: “an empty wooden table” or “a table with no plates on it,” whichever makes sense. Within a few seconds, the model will show you several interpretations of this sign, and you can choose what looks best.

You may be familiar with something similar in Photoshop, “context-sensitive fill”. But this tool is overkill for filling as much space as possible, say if you want to turn a bird into a clear sky and not bother stamping clones. The possibilities of DALL-E 2 are enormous: it can invent new things, like a different kind of bird or a cloud, or, in the case of a table, a flower vase or a dropped ketchup bottle. It is not difficult to come up with useful applications for this.

Specifically, the model will include things like appropriate lighting and shadows, or choose the right materials, as it knows about the rest of the scene. I’m using “awareness” here – no one, not even the manufacturer, knows how DALL-E represents these concepts internally, but for this purpose, the important thing is that the results show that it makes some sense.

Examples of ukiyo-e style teddy bears and a specialty flower shop.

The third new feature is “variations,” which is pretty accurate: you give the system a sample of an image, and it generates as many variations as you need, from very precise estimates to impressionistic iterations. You can also give it a second image and that will give them a sort of cross-pollination, bringing together the most attractive aspects of each. In the demo he showed me, the street murals were drawn from DALL-E 2 originals, and they did reflect the artist’s style for the most part, even if on inspection it was clear what the original was.

It’s hard to overestimate the quality of these images compared to other generators I’ve seen. While AI-generated images almost always “tell” you what you’re expecting, they’re less crisp, and the rest of the image is much better than the best ones created by others.

Almost all

I wrote that the DALL-E 2 can do “almost everything” at first, although there are really no technical limitations that prevent the model from doing everything you can imagine. But OpenAI is aware of the risk of deepfakes and other misuse of AI-generated images and content, and has therefore added some restrictions to its latest model.

DALL-E 2 is currently running on a hosted platform, an invitation-only testing environment where developers can test it in a controlled manner. This is partly because all of their model testimonials are checked for content policy violations, which they say ban “non-G-rated images.”

Means no: hate, threats, violence, self-harm, explicit or “shocking” images, illegal activities, deceit (such as fake news), political figures or situations, medical or disease-related images, or general spam. In fact, much of this would be impossible because infringing images were excluded from the training set: DALL-E 2 can do a Shiba Inu in a berry, but it doesn’t even know what a missile strike is.

In addition to signal evaluation, all resulting images will (for now) be checked by human inspectors. It certainly doesn’t scale, but the team told me it was part of the learning process. They’re not entirely sure how borders should work, so for now they’re miniaturizing and hosting the platform themselves.

Over time, DALL-E 2 will likely evolve into an API that can be called like other OpenAI functions, but the team said they want to make sure it makes sense before removing the learning wheels.

You can learn more about DALL-E 2 and try out some semi-interactive examples in the OpenAI blog post.

Leave a Reply

Your email address will not be published.