AI Music Generation on Vultr Cloud GPU

Introduction

AI music generation is an innovative synthesis of art and science. By analyzing vast databases of existing musical compositions, AI models, often based on deep learning techniques have become adept at generating music across different genres and styles. With the advance of such models and algorithms, you can generate attractive music and sounds in a few minutes.

AudioCraft and Bark are two open-source text-to-audio tools used together to generate a soundtrack and lyrics to match an audio file. In addition, tools such as FFmpeg mix the generated melody and lyrics to create a single final output file.

This article explains how you can perform AI Music Generation tasks on a Vultr Cloud GPU server. You are to mix the generated soundtrack and lyrics to create a single output file you can download and share with personal licenses.

Prerequisites

Before you begin, make sure you:

Deploy a Debian NVIDIA A100 Cloud GPU server on Vultr with at least:
- 1/7 GPU
- 10GB GPU RAM
- 15GB memory
Using SSH, access the server
Create a non-root user and switch to the user account
Update the server

Set Up the Server

Install FFmpeg
```
 $ sudo apt install ffmpeg
```
Install the Python virtual environment package
```
 $ sudo apt install python3.11-venv
```
Create a new Python Virtual environment
```
 $ python3 -m venv myenv
```
Activate the environment
```
 $ source myenv/bin/activate
```
Upgrade the Pip package manager
```
 $ pip install --upgrade pip
```
Using pip, the necessary dependency packages
```
 $ pip install torch==2.0.1 audiocraft==0.0.2 bark==0.1.5 protobuf==4.24.2
```
The above command installs the following packages:
- pyTorch: A deep-learning Python library
- audiocraft: A PyTorch library used for deep learning on audio generation
- bark: A transformer-based text-to-audio model
- protobuf: A required package necessary for loading the AudioCraft model

Generate the Lyrics Audio

To create a full audio file, generate the lyrics audio before mounting a melody as described in the steps below.

Access the Python Shell
```
 $ python3
```

Import the bark library and necessary packages to your session

 from bark import SAMPLE_RATE, generate_audio, preload_models
 from scipy.io.wavfile import write as write_wav

Download and load all bark text-to-audio models
```
 >>> preload_models()
```
The download process may take between 1 to 2 minutes to complete, and the total model size is above 10GB

Define your lyrics using the lyrics_text variable

 >>> lyrics_text = '''
        In the realm of the digital, where clouds converge, Vultr's brilliance shines, a power to emerge. 
        Bytes and data swirling in cosmic dance, Unveiling solutions, fate is given a chance.
     '''

Replace the above lyrics with your desired text

Call the Bark library generate_audio function to generate lyrics using the audio array
```
 >>> audio_array = generate_audio(lyrics_text)
```
Save the generated audio to a local file. Replace lyrics.wav with your desired filename
```
 >>> write_wav('lyrics.wav', SAMPLE_RATE, audio_array)
```
Exit the Python Shell
```
 >>> exit ()
```
List files in your working directory
```
 $ ls
```
Verify that your generated lyrics audio file is available

Generate the Sound Track

To generate a soundtrack you can combine with your lyrics audio, choose your desired audiocraft pre-trained model to apply. As of September 2023, below are the available models:

Text-to-music only models:
- A small model with 300M parameters
- A medium model with 1.5B parameters
- A large model with 3.3B parameters
A melody model with 1.5B parameters that supports melody-guided music generation

This section uses the melody model to generate a soundtrack based on your text prompt as described below.

Access the Python Shell
```
 $ python 3
```

Import the audiocraft libraries

 >>> from audiocraft.models import MusicGen
 >>> from audiocraft.data.audio import audio_write

Load your target model. For this article, melody

 >>> model = MusicGen.get_pretrained("melody")

Set the soundtrack length
```
 >>> model.set_generation_params(duration=14)
```
It's recommended to generate a soundtrack with the same length as your lyrics audio. The above code generates a track with 14 seconds that matches the lyrics audio length

Define the sound track prompt with your desired text

 >>> melody_prompt = 'modern and forward-looking, with a blend of electronic and acoustic elements'

Generate the sound track using the generate function from the AudioCraft library
```
 >>> audio_array = model.generate([melody_prompt], progress=True)
```
Export the generated soundtrack to a file. Replace melody-track with your desired filename
```
 >>> audio_write('melody-track', audio_array[0].cpu(), model.sample_rate)
```
Close the Python console
```
 >>> exit ()
```
List files in your working directory
```
 $ ls
```
Verify that a new melody-track.wav file is available

Mix the Generated Lyrics and the Sound Track

When you generate and export the necessary audio files to your directory, use ffmpeg to combine the lyrics with your sound to create a single output file as described below.

Using ffmpeg, normalize the lyrics audio file to a standard volume to match your soundtrack
```
 $ ffmpeg -i lyrics.wav -filter:a loudnorm lyrics_norm.wav
```

Normalize the soundtrack file volume

 $ ffmpeg -i melody.wav -filter:a loudnorm melody_norm.wav

Mix the normalized audio inputs to create a single stereo output file

 $ ffmpeg -i melody_norm.wav -i lyrics_norm.wav  -filter_complex "[0:a][1:a]amerge=inputs=2,pan=stereo|c0<c0+c1+c2+c3|c1<c0+c1+c2+c3[a]" -map "[a]" output.mp3

When successful, verify that a new output.mp3 file is available in your working directory

Deactivate the Python virtual environment
```
 $ deactivate
```

Download the Generated Music File

To download a copy of your generated music file to your computer, use a secure file transfer protocol such as SFTP, FTP, Rsync, or SCP. In this section, use Secure Copy (SCP) to download the mixed music file to your computer as described below.

In a new terminal window, use scp to download the output.mp3 file from your user home directory to your computer's working directory

$ scp example-user@SERVER-IP:~/output.mp3 .

When the download is complete, find the file in your computer files, and open it using a media application such as VLC to listen to the generated music

Conclusion

In this article, you generated AI music on a Vultr Cloud GPU server. Depending on your use case, you can change the lyrics and soundtrack prompts to match your needs. The music generation process takes a few minutes to complete, for more information about the generation tools, visit the following documentation pages.

Tags:

AI Music Generation

Cloud GPU

Ubuntu