Since Midjourney released v5, there have been significant improvements in the realism of generated images, finger details, etc., and progress has also been made in the accuracy of prompt understanding, aesthetic diversity, and language understanding.
In contrast, although Stable Diffusion is free and open source, it has to write a long list of prompts every time, and it depends on multiple card draws to generate high-quality images.
Stability AI previously announced that the Stable Diffusion XL under development has been open for public testing and is currently available for free trial on the Clipdrop platform.
Trial link: https://clipdrop.co/stable-diffusion
Emad Mostaque, founder and CEO of Stability AI, said that the model is still in the training stage and will be open-sourced when the parameters are stable; SD-XL will perform better in image details such as “handshake”, and it is almost completely controllable .
Stable Diffusion XL is not the name of the final release, and it is not v3, because the architecture of SD-XL is very similar to the model architecture of the SD-v2 series.
Minimalistic home gym with rubber flooring, wall-mounted TV, weight bench, medicine ball, dumbbells, yoga mats, high-tech equipment, high detail, organized and efficient.
minimalist home gym, rubber floor, wall mounted tv, weight bench, medicine ball, dumbbells, yoga mat, high tech fixtures, high detail, organization and efficiency
The following are some examples of SD-XL officially released, it can be seen that the quality of the image is already very good.
But sometimes less does not mean more. Some netizens think that SD-XL has set too many rules in order to get rid of “bad taste”, and the customization space is getting smaller and smaller, which does not meet the preferences of most people. Currently v1.5 Stable Diffusion is still the most popular base model in the community.
Netizens expressed hope that the new version of SD can be compatible with the embedding, hypernetworkds and Lora models of SD 2.1, and it would be too uncomfortable to retrain from scratch.
Some netizens also believe that the performance of SD-XL is similar to the model shared by netizens on the civit website, and the effect of the new model is not particularly amazing, that is, the average level.
SD-XL: Open source version of Midjourney
Regarding the specific information of the Stable Diffusion XL model, the official did not disclose too much. At present, it is only known that it is a model with a similar structure to the v2 model, but with a larger scale and parameters.
SD-v2.1 includes 900 million parameters, and SD-XL has about 2.3 billion parameters. Emad said that the official version may additionally release a smaller distilled version.
The SD-XL improvements over previous versions are as follows:
-
Generate high-quality images with short, descriptive prompts
-
A more prompt-fitting image can be generated
-
The human body structure in the image is more reasonable
-
Compared to v2.1 and v1.5 (to a lesser extent), SD-XL produces images that are more in line with the Fox aesthetic
-
Negative prompts are optional
-
Generated portraits are more realistic
-
Text in images is clearer
Note that SD-XL may not be compatible with previous versions of the plugin.
legible text
In the v1 series and the v2.1 version of the Stable Diffusion model, it does not have the ability to generate readable text in pictures.
While the textual information generated by SD-XL isn’t always accurate, it’s a huge improvement.
Photo of a woman sitting in a restaurant holding a menu that says “Menu”
A woman sitting in a restaurant holding a menu that says “Menu”
Photo of a man holding a sign that says “Stable Diffusion”
A man holds up a sign that says “Stable Diffusion”
a young female holding a sign that says “Stable Diffusion”,highlights in hair, sitting outside restaurant, brown eyes, wearing a dress, side light
A young woman holds up a sign that says “Stable Diffusion,” with highlighted hair, sitting outside the restaurant, brown eyes, wearing a skirt, sidelights.
better body structure
Stable Diffusion has always had many problems in generating human anatomy. It is too common to have more legs and fewer arms. Usually, it is necessary to use the inpaint function to further correct the image details; or use the Open Pose function of ControlNet Copy the pose of the human body from a reference image.
For example, SD-v1.5 generates images of yoga, often with distorted human bodies.
Photo of a woman in yoga outfit, triangle pose, beach in evening, rim lighting
Photo of a woman in yoga clothing, triangle pose, beach at night, edge lighting
Although the generated images of SD-XL are not perfect, there has been a significant improvement in human pose.
more aesthetic
For example, with the same theme of the house, SD-XL can generate more symmetrical and visually better photos.
The SD-XL also has a notable improvement in portrait photos.
photo shot of a woman
photo of a woman
An image that better fits the prompt
SD-XL can better understand input prompts and generate more accurate images.
For example, taking duotone (two-color) as an example, SD-v1.5 will only generate black and white images, while SD-XL can generate duotone images with multiple colors.
The ability to understand prompts has improved compared to the v1 model.
duotone portrait of a woman
Two-tone portrait of a woman
Because SD-XL belongs to the v2 series of models, the size of the text model is larger, which can better understand the prompt words than the v1 model.
For example, in the example below, the v1.5 model has always been unable to understand the two subjects (robot and human) in the image, but the SD-XL model can generate normal images (although the robot is not big enough).
big robot friend sitting next to a human, ghost in the shell style, anime wallpaper
Big robot friend sitting next to humans Ghost in the Shell style anime wallpaper
a young man, highlights in hair, brown eyes, in white shirt and blue jean on a beach with a volcano in background
A young man with brightly dyed hair and brown eyes in a white shirt and blue jeans stands on the beach with a volcano in the background
Art style
In terms of art style, SD-XL has not improved significantly, and has its own advantages and disadvantages from previous versions.
For example, two models generate Edward Hopper-style images from different angles.
New York city by Edward Hopper
New York by Edward Hopper
In the style of Leonid Afmov, SD-v1.5 is more accurate, SD-XL lacks different color brushes (unmistakable colorful board brushstrokes).
New York city by Leonid Afremov
New York by Leonid Afemov
In the William-Adolphe Bouguereau style, both V1.5 and SDXL produce somewhat similar content, with SD-XL being closer to the classic academic paintings created by Bouguereau, with more facial detail.
Portrait of beautiful woman by William-Adolphe Bouguereau
Portrait of a Beauty by William-Adolphe Bouguereau
style change problem
After adding some irrelevant keywords, the style of the model may suddenly change.
For example, generate a photo-style image first.
a young man, highlights in hair, brown eyes, in white shirt and blue jean on a beach with a volcano in background
A young man with brightly dyed hair and brown eyes in a white shirt and blue jeans stands on the beach with a volcano in the background
After adding a yellow scarf, the image style becomes a cartoon style.
a young man, highlights in hair, brown eyes, wearing a yellow scarf, in white shirt and blue jean on a beach with a volcano in background
A young man with brightly dyed hair, brown eyes, in a yellow scarf, in a white shirt and blue jeans, stands on a beach with a volcano in the background
The glitch in question may be due to a preview issue that will not be resolved after the official release.
References: