Stability AI Launches AI Music Generation Tool Stable Audio

baoshi.rao

London-based startup Stability AI launched a new product called Stable Audio on Wednesday, which uses artificial intelligence to generate customized music tracks and sound effects.

Stable Audio employs a diffusion-based AI model that can generate customized audio files from simple text inputs within seconds. Users can specify music styles, instruments, tones, and other characteristics, and the system will automatically create matching songs, sound effects, or instrumental parts.

Stability AI tested the tool with input text such as "post-rock, guitar, drum kit, bass, strings, upbeat, uplifting, emotional, smooth, raw, epic, sentimental, 125BPM." The result was a fast-paced atmospheric rock song with a BPM of 125. According to Stability, this demonstrates that Stable Audio can generate songs in various styles, including ambient, techno, and electronic dance music.

Unlike previous AI-based music generators, Stable Audio appears capable of producing musically coherent works up to 90 seconds long at a professional audio quality of 44.1kHz.

The generated audio samples sound remarkably realistic, making it almost impossible to suspect that no human composer was involved. According to Stability AI, using an Nvidia A100 GPU, 95 seconds of audio can be generated in less than a second.

This technology has the potential to play a role in various applications, particularly in creative fields such as film production and game development. Accessible via web browsers, even users unfamiliar with AI can easily utilize it.

To achieve this quality, Stability AI trained its model using a music library provided by AudioSparx. With approximately 800,000 songs, sound effects, and instrumental clips, AudioSparx collaborated with Stability AI, pledging to share revenue from the music library with the startup Stability Audio. In return, creators of songs used in the training can share in Stable Audio's profits through AudioSparx.

Reportedly, creators were asked for permission before their songs were used in training. This decision may be a response to the widespread opposition Stability faced regarding copyright debates over training materials for Stable Diffusion.

According to Stability AI, users can freely use tracks created with Stable Audio for personal purposes. Commercial use requires a paid subscription. The company targets creative professionals like filmmakers or game developers who need quick access to suitable background music.

Stability AI also plans to release an open-source music model trained on a different dataset.

Stable Audio differs from Stable Diffusion as it is not open-source, unlike the popular image model. However, the FAQ states that an open-source model based on other datasets will be released soon.

The foundation of Stable Audio is the text-to-music model Dance Diffusion, released in 2022 by Harmonai with support from Stability. However, Stable Audio is a model developed from scratch by Stability AI's audio division, established in April 2022.

Using diffusion models for music is not a new idea. However, Stable Audio's strength lies in its ability to generate works of varying lengths, which was considered during the training process.

Stability AI explains the underlying technology as follows:

You can exclusively use Stable Audio through the recently launched web interface. It offers 20 free tracks per month for personal use, each up to 45 seconds long. For $11.99 per month, you get 500 tracks with durations up to 90 seconds and commercial licensing.

The lack of content filters could easily lead to plagiarism

The tool could also be used to forge songs by popular artists. So far, record companies have successfully fought against such AI creations, but the legal status remains unclear.

Stability AI itself insists in an interview with Techcrunch that it wants to use the technology responsibly. AudioSparx's database doesn't contain popular songs, but many tracks are tagged with the styles of well-known artists. Unlike Google's MusicLM, famous artists' names aren't blocked—at least not yet.

Whether Stable Audio can bring returns to Stability AI's business model, which has been operating at a loss so far, remains to be seen. In any case, the impressive quality of AI creations is remarkable.