Tags:Constant-Q transform spectrogram images, Music style transfer and Selective audio remixing
Abstract:
Previous research on music related generation and transformation has commonly targeted single instrument or single melody music. Here, in contrast, five music genres are used with the goal to achieve selective remixing by using domain transfer methods on spectrogram images of music. A pipeline architecture comprised of two independent generative adversarial network models was created. The first, CycleGAN performs style transfer on constant-Q transform spectrogram images, by applying features from one of five genres to the spectrogram. The second network turns the spectrogram into a real-value tensor representation which is approximately reconstructed back into audio. Four seconds of music are output by the system and can be concatenated to recreate a full length music track. The system was evaluated through a number of experiments and a survey. Due to the increased complexity involved in processing high sample rate music with homophonic or polyphonic audio textures, the system’s audio output was considered to be low quality, but the style transfer produced noticeable selective remixing on most of the music tracks used for evaluation.
Music Style Transfer Using Constant-Q Transform Spectrograms