Skip to content
  • Categories
  • Newsletter
  • Recent
  • AI Insights
  • Tags
  • Popular
  • World
  • Groups
Skins
  • Light
  • Brite
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
  1. Home
  2. AI Insights
  3. Music ChatGPT 2.0 is Here! AI Composer Faces Challenges, Jay Chou's Hit Songs Fail in Hands-on Test
uSpeedo.ai - AI marketing assistant
Try uSpeedo.ai — Boost your marketing

Music ChatGPT 2.0 is Here! AI Composer Faces Challenges, Jay Chou's Hit Songs Fail in Hands-on Test

Scheduled Pinned Locked Moved AI Insights
ai-articles
1 Posts 1 Posters 10 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • baoshi.raoB Offline
    baoshi.raoB Offline
    baoshi.rao
    wrote last edited by
    #1

    The news of 200 musicians signing a protest letter had just surfaced when Stability AI's new music tool emerged! The newly released Stable Audio 2.0 can create music up to 3 minutes long, transforming your hums into musical compositions. However, many netizens and musicians who tried it expressed disappointment...

    The echoes of over 200 musicians signing an open letter to protest against Suno had not yet faded when another AI music tool appeared—

    Stability AI has also entered the fray of AI music! It seems that the departure of core developers has not slowed down its pace of product releases. Just moments ago, Stability AI released Stable Audio 2.0.

    With just a single natural language instruction, it can produce high-quality, fully structured musical compositions in 44.1 kHz stereo quality.

    Moreover, each track can be up to 3 minutes long! In comparison, Suno can only create tracks up to 2 minutes, making Stable Audio 2.0 clearly superior in this aspect.

    Additionally, Stable Audio 2.0's audio-to-audio functionality is currently only matched by Meta's MusicGen, a feature even Suno cannot achieve. Good news: The model is now available for free on the Stable Audio official website and will soon be accessible via the Stable Audio API.

    By the way, here's a key point: The music created with Stable Audio is commercially usable!

    However, the pricing is quite steep: The Pro version costs $11.99/month, the Studio version $29.99/month, and the top-tier Max version goes up to $89.99/month.

    Our team personally tried "remixing" a song by Jay Chou. Prompt as follows——

    Post-Rock, Guitars, Drum Kit, Bass, Strings, Euphoric, Up-Lifting, Moody, Flowing, Raw, Epic, Sentimental, 125 BPM

    And after inputting a melody from "Nocturne", the music output by Audio 2.0 was like this.

    It doesn't sound very ideal. (Of course, it's most likely due to the editor's lack of professionalism) Then how about humming a song? The editor hummed a few lines of 'Waiting for You After Class' and uploaded it—

    Blues, R&B woman, singer

    The result couldn't be said to be identical to the original song—it was completely unrelated.

    A guy's b-box performance made it sound like a whole band was behind him As soon as the news broke, musicians everywhere jumped at the opportunity!

    For example, this foreign guy combined his beatboxing with music generated by Audio 2.0, achieving the effect of an entire band all by himself.

    And whether it's beatboxing or complete songs, everything is generated by Audio 2.0.

    This Japanese netizen used Audio 2.0 to create a song in the style of "Touhou Chireiden." Shugo Nozaki commented after trying it out: Unlike Suno, Audio 2.0 seems to retain simple prompts and simplifies the songs.

    In short, this model can not only create audio from text but also from audio.

    Melody, accompaniment, independent tracks, sound effects... there's nothing it can't do.

    Complete track creation With a creation time of up to 3 minutes, Stable Audio 2.0 ensures that each piece has a clear structure, including an introduction, main body, and conclusion, along with stereo sound effects to make the composition more vivid and immersive.

    For example, the following piece of music has a very complete structure, with a soothing and ethereal style that is highly relaxing.

    A beautiful piano arpeggio grows to a full beautiful orchestral piece

    In another example, a melancholic movement begins with a piano melody, followed by orchestral phrases that build to a climax before gradually returning to tranquility. Piano melody begins a melancholic journey, full orchestral climax, the swells of the orchestral instrumentals.

    As long as you provide a specific prompt, it can generate music that fully meets the requirements. Whatever you can imagine in your mind, it can create.

    This feeling is like playing cyber instruments in a virtual studio in the metaverse!

    For example, this 127 BPM Tech House combines arpeggiators, the beautiful melody woven by Rhodes electric piano chords and melodies. It also includes syncopated rhythm percussion and onomatopoeic percussion, House-style heavy drums, natural percussion effects, and the flowing sense brought by walking bass. The entire track unfolds in a mysterious, understated atmosphere, making one feel as if embarking on a journey to explore the unknown.

    Tech House, underground UK rave, 127 BPM, synthesizer arpeggio, beautiful Rhodes piano chords and melodies, epic sweeping string section, syncopated percussion and foley percussion, house kick pattern, drum machine, natural percussion, breaks, walking bass, Mysterious, Mystical, Low-key

    Additionally, this 125 BPM post-rock piece features meticulously recorded drum kits and electric bass, occasionally interspersed with soaring harmonies, creating an overall grand and climactic atmosphere.

    Post Rock, echoing electric guitars with chorus, well recorded drum-kit, Electric Bass, occasional soaring harmonies, Moving, Epic, Climactic, 125 BPM This Nu-Disco track combines funky emotional piano with rich string quartet arrangements and multi-layered drum patterns. Additionally, the modern touch of G-Funk bass and synthesizers makes it perfectly suited for club environments.

    Nu-Disco, funky emotional Piano, lush string quartet, well layered Drum Machine, well-arranged composition, funky G-Funk bass, Synthersizers, Modern, Club-orientated, 115 BPM

    Interestingly, Audio 2.0 might also generate lyrics with vocals, but unfortunately, we can't input our own lyrics—we have to use whatever it provides.

    This is somewhat disappointing... Here's a male pop song created by Gorden Sun.

    You have a melody in your mind? Just hum it to Stable Audio 2.0, and it will generate a sample for you directly!

    The melody can be transformed into drums or bass guitars.

    Or perform some beatboxing, and it will instantly turn into a Lofi hip hop beat. This new model significantly enhances the capability to produce sounds and audio effects.

    Whether it's simulating the light tapping of a keyboard, the cheers of a crowd, or the background hum of city streets, it can add new layers to music.

    Additionally, if we already have an audio sample of a certain style and want to transform it into another style, simply upload it to Audio 2, specify what you want, and it will automatically generate it for you. Whether it's the overall style of the music or adjusting the tone of specific sections, Audio 2 can provide exclusive customization for us!

    From now on, the creative freedom and imagination of artists and music producers can be fully unleashed!

    In fact, as early as September 2023, the company had already launched version 1.0, which became the first commercially successful AI music tool.

    At that time, Stable Audio 1.0 was named one of the best inventions of 2023 by Time magazine. However, the recent uproar caused by musicians protesting against Suno has also sounded an alarm regarding music copyright issues.

    How does Stability AI address this problem?

    In response, they have corresponding measures: Stable Audio 2.0 is specifically trained on the licensed dataset from the AudioSparx music library, strictly respects opt-out requests, and promises to provide fair compensation for creators.

    Why can Stable Audio 2.0 create such structurally complete musical works? The reason lies in its adoption of a uniquely designed technical architecture.

    To this end, researchers have comprehensively optimized the system to ensure its superior performance when processing long-duration audio.

    Through a novel and efficient compression technique, they have compressed the original audio data into a shorter format, thereby enhancing processing efficiency. Additionally, they introduced an advanced 'Diffusion Transformer' technique, which is more adept at handling continuous long audio data compared to previous methods. Similar technology is also employed in Stable Diffusion 3.

    The combination of these two major technologies enables the model to accurately capture and reproduce the complex structures in music.

    The autoencoder can compress audio and reconstruct it back to its original state. It captures and replicates key features while filtering out less important details, thereby generating more coherent works. Diffusion Transformer (DiT) can progressively refine random noise into structured data, identifying complex patterns and relationships. Combined with an autoencoder, it gains the ability to process longer sequences, creating deeper and more accurate interpretations from the input.

    Like version 1.0, version 2.0 is also trained on the vast audio library provided by AudioSparx.

    This audio library covers over 800,000 files, rich in content, including various types of music, sound effects, and individual instrument tracks, along with relevant textual descriptions. All artists on the AudioSparx platform have the opportunity to choose whether to allow their works to participate in Stable Audio's training process.

    Moreover, to safeguard creators' copyrights, Stability AI collaborates with Audible Magic when uploading audio, utilizing their advanced content recognition technology.

    This technology can identify and match audio content in real-time, effectively preventing infringement and protecting the rights of every creator.

    Netizens' reaction: Without lyrics, it lacks soul Although it was heavily promoted, Audio 2.0 has faced criticism from some netizens after its release.

    The most obvious issue is that it cannot generate lyrics like Suno does.

    This feels like half of its soul has been taken away.

    Some netizens also complained that they don't consider it good music. It's like an AI-generated image—upon closer inspection, many flaws become apparent. In their view, excellent composers should be paid for creating flawless music, even if they're more expensive than AI.

    Indeed, many have pointed out that its music quality is subpar and cannot compare to Suno.

    In fact, many music generators produce better results than it does.

    "But I've already been spoiled by Suno." Music App Founder's Trial Experience: Somewhat Disappointed

    The founder of a music app named Ezra recorded his detailed experience after trying Audio 2.0.

    Video link: https://www.audiocipher.com/post/stable-audio-ai#viewer-85l4b974663 He conducted the following experiments to experience Audio 2.0's music generation capabilities across various genres.

    His first experiment involved capturing a simple rhythm from recorded input to see if Audio 2.0's Drum Solo feature could generate more interesting percussion concepts from its prompt library.

    The results of the first experiment were somewhat disappointing. While the generated music did show clear style and timbre transitions, it failed to produce the requested "drum solo." He tried a second time using "drum and bass" as the prompt. This time, Audio 2.0 produced different drum sounds, with both outputs featuring modified captured tones.

    This time, the guy recorded himself humming a simple ten-second melody.

    He then compared the uploaded audio waveform with Audio 2.0's output.

    It's evident that the loudest parts of the input signal precisely correspond to similar waveforms in the output. However, he mentioned that the style transfer effect was not actually good. The output sounded similar to his own humming but with a slightly different timbre.

    Overall, the first two experiments of the young man were somewhat failures.

    In the third experiment, he took a different approach by uploading a 30-second recording of an accordion piece he composed.

    This recording was clear and resonant, featuring chords and melody. The output from Audio 2.0 can be considered successful.

    However, the prompt specifically requested Gypsy jazz with bass and drums. What was produced was an acoustic jazz guitar piece that included what sounded like a xylophone, but no bass or drums.

    This time, the melody accuracy was about 90%, but there were some strange notes that weren't present in the original recording. Sometimes it would lose the main theme or jump into the melody too early or too late. On the other hand, Stable Audio has indeed innovated on the simple i-iv-V7-i chord progression, incorporating some surprising reharmonizations.

    So, if our goal is to come up with new chord arrangements, there's no doubt it would be a treasure trove of a tool.

    1 Reply Last reply
    0
    Reply
    • Reply as topic
    Log in to reply
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes


    • Login

    • Don't have an account? Register

    • Login or register to search.
    • First post
      Last post
    0
    • Categories
    • Newsletter
    • Recent
    • AI Insights
    • Tags
    • Popular
    • World
    • Groups