The Stable Diffusion-XL, which has opened the public beta, has reached the Midjourney level: can draw hands and write, and no longer need to write long prompts

Since Midjourney released v5, there have been significant improvements in the realism of generated images, finger details, etc., and progress has also been made in the accuracy of prompt understanding, aesthetic diversity, and language understanding.

In contrast, although Stable Diffusion is free and open source, it has to write a long list of prompts every time, and it depends on multiple card draws to generate high-quality images.

Stability AI previously announced that the Stable Diffusion XL under development has been open for public testing and is currently available for free trial on the Clipdrop platform.

Trial link: https://clipdrop.co/stable-diffusion

Emad Mostaque, founder and CEO of Stability AI, said that the model is still in the training stage and will be open-sourced when the parameters are stable; SD-XL will perform better in image details such as “handshake”, and it is almost completely controllable .

Stable Diffusion XL is not the name of the final release, and it is not v3, because the architecture of SD-XL is very similar to the model architecture of the SD-v2 series.

Minimalistic home gym with rubber flooring, wall-mounted TV, weight bench, medicine ball, dumbbells, yoga mats, high-tech equipment, high detail, organized and efficient.

minimalist home gym, rubber floor, wall mounted tv, weight bench, medicine ball, dumbbells, yoga mat, high tech fixtures, high detail, organization and efficiency

The following are some examples of SD-XL officially released, it can be seen that the quality of the image is already very good.

But sometimes less does not mean more. Some netizens think that SD-XL has set too many rules in order to get rid of “bad taste”, and the customization space is getting smaller and smaller, which does not meet the preferences of most people. Currently v1.5 Stable Diffusion is still the most popular base model in the community.

Netizens expressed hope that the new version of SD can be compatible with the embedding, hypernetworkds and Lora models of SD 2.1, and it would be too uncomfortable to retrain from scratch.

Some netizens also believe that the performance of SD-XL is similar to the model shared by netizens on the civit website, and the effect of the new model is not particularly amazing, that is, the average level.

SD-XL: Open source version of Midjourney

Regarding the specific information of the Stable Diffusion XL model, the official did not disclose too much. At present, it is only known that it is a model with a similar structure to the v2 model, but with a larger scale and parameters.

SD-v2.1 includes 900 million parameters, and SD-XL has about 2.3 billion parameters. Emad said that the official version may additionally release a smaller distilled version.

The SD-XL improvements over previous versions are as follows:

Generate high-quality images with short, descriptive prompts
A more prompt-fitting image can be generated
The human body structure in the image is more reasonable
Compared to v2.1 and v1.5 (to a lesser extent), SD-XL produces images that are more in line with the Fox aesthetic
Negative prompts are optional
Generated portraits are more realistic
Text in images is clearer

Note that SD-XL may not be compatible with previous versions of the plugin.

legible text

In the v1 series and the v2.1 version of the Stable Diffusion model, it does not have the ability to generate readable text in pictures.

While the textual information generated by SD-XL isn’t always accurate, it’s a huge improvement.

Photo of a woman sitting in a restaurant holding a menu that says “Menu”

A woman sitting in a restaurant holding a menu that says “Menu”

Photo of a man holding a sign that says “Stable Diffusion”

A man holds up a sign that says “Stable Diffusion”

a young female holding a sign that says “Stable Diffusion”，highlights in hair, sitting outside restaurant, brown eyes, wearing a dress, side light

A young woman holds up a sign that says “Stable Diffusion,” with highlighted hair, sitting outside the restaurant, brown eyes, wearing a skirt, sidelights.

better body structure

Stable Diffusion has always had many problems in generating human anatomy. It is too common to have more legs and fewer arms. Usually, it is necessary to use the inpaint function to further correct the image details; or use the Open Pose function of ControlNet Copy the pose of the human body from a reference image.

For example, SD-v1.5 generates images of yoga, often with distorted human bodies.

Photo of a woman in yoga outfit, triangle pose, beach in evening, rim lighting

Photo of a woman in yoga clothing, triangle pose, beach at night, edge lighting

Although the generated images of SD-XL are not perfect, there has been a significant improvement in human pose.

more aesthetic

For example, with the same theme of the house, SD-XL can generate more symmetrical and visually better photos.

The SD-XL also has a notable improvement in portrait photos.

photo shot of a woman

photo of a woman

An image that better fits the prompt

SD-XL can better understand input prompts and generate more accurate images.

For example, taking duotone (two-color) as an example, SD-v1.5 will only generate black and white images, while SD-XL can generate duotone images with multiple colors.

The ability to understand prompts has improved compared to the v1 model.

duotone portrait of a woman

Two-tone portrait of a woman

Because SD-XL belongs to the v2 series of models, the size of the text model is larger, which can better understand the prompt words than the v1 model.

For example, in the example below, the v1.5 model has always been unable to understand the two subjects (robot and human) in the image, but the SD-XL model can generate normal images (although the robot is not big enough).

big robot friend sitting next to a human, ghost in the shell style, anime wallpaper

Big robot friend sitting next to humans Ghost in the Shell style anime wallpaper

a young man, highlights in hair, brown eyes, in white shirt and blue jean on a beach with a volcano in background

A young man with brightly dyed hair and brown eyes in a white shirt and blue jeans stands on the beach with a volcano in the background

Art style

In terms of art style, SD-XL has not improved significantly, and has its own advantages and disadvantages from previous versions.

For example, two models generate Edward Hopper-style images from different angles.

New York city by Edward Hopper

New York by Edward Hopper

In the style of Leonid Afmov, SD-v1.5 is more accurate, SD-XL lacks different color brushes (unmistakable colorful board brushstrokes).

New York city by Leonid Afremov

New York by Leonid Afemov

In the William-Adolphe Bouguereau style, both V1.5 and SDXL produce somewhat similar content, with SD-XL being closer to the classic academic paintings created by Bouguereau, with more facial detail.

Portrait of beautiful woman by William-Adolphe Bouguereau

Portrait of a Beauty by William-Adolphe Bouguereau

style change problem

After adding some irrelevant keywords, the style of the model may suddenly change.

For example, generate a photo-style image first.

a young man, highlights in hair, brown eyes, in white shirt and blue jean on a beach with a volcano in background

A young man with brightly dyed hair and brown eyes in a white shirt and blue jeans stands on the beach with a volcano in the background

After adding a yellow scarf, the image style becomes a cartoon style.

a young man, highlights in hair, brown eyes, wearing a yellow scarf, in white shirt and blue jean on a beach with a volcano in background

A young man with brightly dyed hair, brown eyes, in a yellow scarf, in a white shirt and blue jeans, stands on a beach with a volcano in the background

The glitch in question may be due to a preview issue that will not be resolved after the official release.

References:

The Stable Diffusion-XL, which has opened the public beta, has reached the Midjourney level: can draw hands and write, and no longer need to write long prompts

SD-XL: Open source version of Midjourney

legible text

better body structure

more aesthetic

An image that better fits the prompt

Art style

style change problem

Share this:

Related

Unmistakable signs that will let you know you have a potassium and magnesium deficiency

The 28th Shanghai TV Festival Magnolia Awards shortlist announced “The World”, “Hurricane” and other competitions for the best Chinese TV series|Chinese TV Series|Magnolia Awards|Human World_Sina News

You may also like

Leave a Comment Cancel Reply