The Battle of AI Image Generation: In-Depth Experience, Who is the True Image Artist?

baoshi.rao

On the early morning of October 11th, design software giant Adobe announced a series of image generation models, with Firefly Image 2 as the new-generation image generator. It enhances human rendering quality by improving skin, hair, eyes, hands, and body structure, offering better colors, improved dynamic range, and greater control over output for users.

Previously, on September 21st, OpenAI announced an upgrade to its image generation tool DALL-E. The new version, DALL-E 3, significantly improves image generation quality, particularly in generating text on images.

In the international image generation arena, Midjourney and DALL-E are often seen as the two main competitors. The release of Adobe Firefly 2 means another strong competitor has entered the field, forming a three-way competition.

Although Adobe launched the beta version of the Firefly model in March this year, some image analysts criticized Firefly for lagging behind competitors like Midjourney and DALL-E 2 in terms of generation effects. They attributed part of this gap to Adobe's commitment to using only licensed and public domain content for training.

Below is a comparison of images generated by Adobe Firefly, Midjourney, and DALL-E 2, with the prompt: A valley, a fairy-tale treehouse village covered, matte painting, highly detailed, dynamic lighting, cinematic, realism, photorealistic, sunset, detailed, high contrast, denoised, centered.

Comparison of Adobe Firefly, Midjourney, and DALL-E 2 generated images
▲Adobe Firefly, Midjourney, and DALL-E 2 generated image comparison (Source: Muhammad Usman, mdorazio)

From the comparison above, it can be seen that Midjourney generates the most detailed and rich images, while DALL-E 2's output resembles an oil painting style—less realistic but still acceptable. In contrast, Firefly's results were disappointing, failing to meet most prompts and exhibiting poor overall quality, including color noise around object outlines.

With this update, Firefly 2 has significantly improved image generation quality and accuracy, especially in portrait rendering.

Comparison of Firefly 2 and Firefly 1 generated images
▲Firefly 2 vs. Firefly 1 generated image comparison (Source: Adobe)

So, how does the current Firefly 2 perform in other aspects? Can it compete with DALL-E 3 and Midjourney, helping Adobe secure a place in the generative AI field? What are the characteristics and advantages of these three image generators? Recently, the foreign media Gold Penguin may have found answers to these questions through a comparison of image generation effects across eight aspects.

Overall, the three image generators each have their own styles and advantages. For example, DALL-E 3 excels in text generation and is more suitable for high-context prompts; Adobe Firefly 2 produces the most realistic results, with the best performance in portrait details and realistic representations; Midjourney often sparks some 'artistic' creations, providing inspiration.

The table below summarizes the characteristics of these three image generators in terms of usability, output quality, and speed for readers' reference. In short: Firefly 2 is more realistic, Midjourney is more artistic, and DALL-E 3 is more intuitive.

Comparison of the three AI image generators (Source: Gold Penguin, translated by Zhidongxi)

Today, we pit these three contenders against each other in a big showdown.

First up is contestant number one, Adobe Firefly Image 2, the next-generation image generation model launched by Adobe on October 11.

Adobe's position in the image processing field is unquestionable. Backed by Adobe, the Firefly series has garnered significant attention since its launch.

It is reported that Firefly 2 enhances image rendering quality by improving human skin, hair, eyes, hands, and other body structures in images, generating higher-quality outputs for users.

The Firefly 2 model introduces three major new features: Generative Match, Photo Settings, and Prompt Guidance.

It supports text prompts in over 100 languages and includes new subscription plans featuring 'fast' generation credits.

▲Firefly 2's Generative Match feature (Image source: Adobe)

The second contender, DALL-E 3, is also noteworthy.

DALL-E 3, launched by OpenAI in the early hours of September 21, is an upgraded version of the text-to-image tool, boasting enhanced prompt comprehension and superior text processing compared to its predecessors.

As the developer behind the phenomenal ChatGPT, OpenAI has sparked a wave of AIGC (AI-generated content) enthusiasm.

The upgraded DALL-E 3 is natively integrated into ChatGPT, significantly enhancing both products. On October 3, Microsoft announced that DALL-E 3 would be available for free to all Bing Chat and Bing Image Creator users, further lowering the barrier to entry.

Notably, DALL-E 3 has improved its ability to 'generate text on images,' a feature currently not available in Firefly 2 and Midjourney.

DALL·E 3 can generate accurate text on images (Source: OpenAI)

Compared to the other two, Midjourney, the third contender, may not have a well-known company behind it, but with its powerful image generation quality, it has become a phenomenon in the field, achieving 10 million users and $100 million in revenue within a year.

Midjourney was founded in August 2021 by David Holz, co-founder of the motion controller company Leap Motion. It is renowned for its detailed outputs, extensive customization through prompt engineering parameters, and nuanced features. The latest version, 5.2, was released on June 23.

The highlight of Midjourney 5.2 is the Zoom Out feature, which allows users to expand the canvas of an enlarged image beyond its original boundaries without altering the original content. The newly extended canvas is filled based on the prompt and the guidance of the original image.

▲Midjourney's Zoom Out feature (Image source: Midjourney)

In September, Midjourney's CEO revealed to the media that Midjourney 6 would be released within the year, promising a significant leap in quality.

Next, let’s compare the image generation capabilities of Adobe Firefly 2, Midjourney, and DALL-E 3 across eight categories: realistic portraits, architectural design, landscapes, surrealism, abstract concepts, stylized art, vector graphic design, and text generation.

1. Realistic Portraits

First, let's look at the highly praised portraits from Adobe Firefly 2. The two sets of images below were generated using the following prompts: a close-up of a tired college student; a portrait of a woman in a yellow shirt.

Realistic portrait (Source: Gold Penguin)

Realistic portrait (Source: X blogger @saana_ai)

It's evident that the portraits generated by Adobe Firefly 2 are remarkably realistic, with clear facial expressions, distinct skin and hair textures, and excellent lighting effects.

Midjourney's output is also quite impressive, but compared to Firefly 2, it appears softer with slightly less refined skin textures. For the first prompt, Midjourney's generated image had a minor rendering error with the books on the desk, though it's not very noticeable.

In comparison, the portraits generated by DALL-E 3 are somewhat inferior, lacking texture in skin and hair. For the first set of prompts, DALL-E 3 overemphasized the student's fatigue, making the "dark circles" appear exaggerated.

It's worth noting that none of these images triggered the "uncanny valley" effect, which is a significant advantage.

2. Architectural Design

Moving on to architectural design, the first set of prompts was: "A fashionable brick-walled Manhattan-style loft with a sunken living room, viewed from a wide-angle overhead perspective."

Architectural Design

▲ Architectural Design (Image source: Gold Penguin)

For the first set of prompts, none of the three image generators fully grasped the intended concept. They all created a Manhattan-style loft but struggled to accurately depict the sunken living room feature.

Adobe Firefly 2 excels in lighting effects, emphasizing the correspondence between shadows and light sources, and blending them perfectly together.

Midjourney's greatest strength lies in its attention to detail. From the books on the first floor to the paintings on the second floor, everything aligns with the design of a typical loft-style apartment.

DALL-E 3's lighting appears somewhat exaggerated, with a softer texture. However, it is the only generator that attempted to interpret the prompt 'sunken living room,' albeit with some inaccuracies in execution.

The second set of prompts was: bedroom, large windows, modern furniture, gray and gold, luxurious, mid-century modern style.

▲ Architectural Design (Image source: X blogger @chaseleantj)

For the second set of prompts, all three image generators performed well. However, DALL-E 3's generated images showed less emphasis on the 'luxurious' and 'gold' aspects compared to the other two generators.

3. Landscape

In terms of landscape scenery, the first set of prompt words consists of short phrases: wildflower meadow sunset landscape.

▲ Landscape (Image source: Gold Penguin)

For the first set of prompts, Adobe Firefly 2's output appears vivid but too similar to meadow images found online. Additionally, the rendering of wildflowers seems flawed upon closer inspection, with none appearing properly rendered.

Midjourney's meadow features very vibrant colors but leans towards stylization, resembling a painting more than a realistic photograph.

DALL-E 3 places stronger emphasis on the "sunset" prompt, presenting an overall orange hue that creates a majestic impression. While not the most colorful, it offers delicate textures.

The second set of prompts is more detailed: A drone aerial view of the stunning terrestrial landscape of Porapora Island, with sparkling water under sunlight.

Scenery
▲Scenery (Image source: X blogger @chaseleantj)

For the second set of prompts, Firefly 2 and Midjourney generated similar images with a grand, epic feel, though the latter rendered trees with more detail.

DALL-E 3's water rendering appears somewhat rough, emphasizing "sunlight" but failing to show strong shadow projections under intense lighting, making it look flat.

4. Surrealism

After looking at realism, let's examine surrealism. The prompt for the following image is: A surrealistic oil painting of a large firefly inside a house made of denim.

▲Surrealism (Image source: Gold Penguin)

For the first set of prompts, the three generators adopted completely different approaches.

Adobe Firefly 2's work heavily draws from children's books, resembling the style of picture books for kids.

Midjourney combined real-world imagery with fantastical concepts. Unlike the other two images, it focused on an interior perspective, making the "denim" element less prominent. Additionally, Midjourney even rendered the fireflies with a denim texture. This might slightly deviate from the prompt description, but the tester expressed appreciation for this creative interpretation.

DALL-E 3's approach was more artistic, blurring the boundaries of the house and creating a new narrative. It also "invented" some details, such as two moons and pocket windows.

Let's try a more abstract prompt: shocked, beautiful alien, sci-fi, futuristic, light brown and amber colors.

▲Surrealism (Image source: X blogger @saana_ai)

For the second set of prompts, the three generators also displayed distinct styles.

Adobe Firefly 2 maintained an illustration-like style, while Midjourney and DALL-E 3 leaned more towards "realism." However, DALL-E 3 ignored the "amber color" prompt, and the generated images appeared closer to "robots" rather than "aliens."

5. Abstract Concepts

If surrealism still provides some detailed descriptions, let's try completely abstract concepts next. The prompt for the image below is: visualization of infinity.

Abstract Concept (Source: Gold Penguin)

"Infinity" cannot be created, but three artworks attempt to represent this concept in different ways.

Adobe Firefly 2 and DALL-E 3 both chose spiral expressions. Firefly 2 resembles a visualization of the Fibonacci sequence, while DALL-E 3's generated image is more psychedelic, with rich colors that look like a complex tie-dye shirt.

Midjourney's image tells a story, featuring a human figure walking toward light, surrounded by vine-like or branch-like elements.

6. Stylized Art

In interpreting some stylized art, the three tools also show distinct approaches. The prompt for the first set of images was: Dadaism-style illustration of women fighting for equality.

Stylized Art

▲Stylized Art (Source: Gold Penguin)

Dadaism emerged in the early 20th century, specifically during World War I. Dada art is characterized by unconventional materials, collage, assemblage, and performance, aiming to provoke and shock audiences while questioning the meaning and purpose of art and society.

Adobe Firefly 2's output doesn't resemble any Dadaist art, and even after multiple prompt adjustments, the results remain similar.

Midjourney and DALL-E 3, however, understand the context, with their works fully mimicking Dadaism.

Midjourney leans towards collage art, similar to the style of the famous Russian artist Hannah Höch; DALL-E 3 tends to imitate the French artist Marcel Duchamp. Both artists were prominent figures during the Dadaist movement.

Let's also look at pixel art, using the prompt: Chibi pixel art on a white background, RPG game assets featuring a dragon sorcerer armor wielding the power of fire, surrounded by a matching set of items.

▲Stylized artwork (Source: X blogger @chaseleantj)

For pixel art style, DALL-E 3 performs exceptionally well. It covers almost all prompts while generating Chibi characters, pixel art, and item sets.

Firefly 2 successfully created pixel art but ignored the prompts for "white background" and "item sets".

Midjourney's output wasn't even pixelated.

7. Vector Graphic Design

Next is the more practical vector graphic design for office use. First, we asked AI assistants to draw an AI assistant with the prompt: A flat vector illustration of an AI assistant.

Vector Graphic Design (Source: Gold Penguin)

Adobe Firefly 2 once again misunderstood the prompt. While the output remains vector art, it failed to represent the keyword "AI assistant."

Midjourney and DALL-E 3 produced outputs more akin to traditional vector art. The former focused on depicting an AI assistant helping humans with work, while the latter emphasized the "AI assistant" itself.

Notably, DALL-E 3 even added logical text autonomously without any prompting.

Testing a more concrete prompt: "A simple flat vector illustration on a white background featuring a woman and a small dog sitting at a desk with a laptop."

Illustration of a woman and dog working (Source: Gold Penguin)

▲ Vector graphic design (Image source: X blogger @chaseleantj)

The second set of prompts showed generally good performance across all three models.

Upon closer inspection, both Firefly 2 and Midjourney had some minor flaws. In the image generated by Firefly 2, the woman's left hand appears to be "missing"; in Midjourney's image, the puppy's ears are too pointed, making it look more like a cat.

DALL-E 3's output has a more flat design style with clean color blocks, making it very suitable for presentations and promotional materials.

8. Text Generation

Finally, we tested DALL-E 3's much-touted text generation capability with the prompt: A custom sticker design on a white background featuring the name "Rachel" in an elegant font, adorned with watercolor butterflies, daisies, and soft pastel colors.

▲Text generation (Image source: X blogger @chaseleantj)

In terms of text generation, DALL-E 3 achieved an overwhelming victory. Both Firefly 2 and Midjourney failed to generate accurate text, though Firefly 2 came slightly closer to the correct answer compared to Midjourney.

Firefly 2 and DALL-E 3 performed more noticeably with 'stickers,' both using white outlines to represent them. In watercolor style, Firefly 2 performed the best.

Notably, Firefly 2 seems to consistently ignore the 'white background' prompt, 'persistently' replacing it with a light green background instead.

Generative AI is reshaping the field of artistic creation. Through image generators, anyone can open a new world of artistic creation by writing text prompts, and those engaged in creative work can save significant time and explore more possibilities for imagination.

As a veteran creative software giant, Adobe has reinforced its deep expertise in the field of image editing through a series of updates. The performance of Firefly 2 has improved significantly compared to before the upgrade, allowing it to compete back and forth with Midjourney and DALL-E 3.

Meanwhile, domestic models like Baidu's ERNIE and iFlytek's Spark have launched image generation capabilities and opened them to the public. Well-known domestic image software company Meitu is also actively deploying generative AI, releasing its self-developed Vision Model 3.0 on October 9, enhancing image generation quality and prompt word intelligent association functions.

Healthy competition can provide users with more choices and drive continuous iteration and evolution of products. Perhaps, a year from now, we will look back and realize how "naive" today's image generation effects are.