Research on High-Accuracy Polyphonic Music Recognition Algorithm Based on Long Short-Term Memory

Zhan, Shubo

doi:10.3233/ATDE240504

Abstract

This paper aims to enhance polyphonic music audio recognition by addressing the challenge of low resolution. It proposes a new method that uses time-frequency spectrograms for improved recognition of polyphonic music. The method focuses on extracting the main melody and other elements in polyphonic pieces, significantly increasing pitch resolution beyond traditional semitone identification methods. The process begins with using a Long Short-Term Memory (LSTM) network to create the musical signal’s time-frequency spectrogram. An adaptive edge distortion processing technique is then applied to binarize the spectrogram, reducing note edge distortions. This binarized spectrogram undergoes analysis using a sophisticated Simulated Annealing (SA) algorithm, which converts transformations from discrete to continuous domains, achieving accurate note placement. Finally, a density-based clustering algorithm (DBSCAN) combined with fundamental frequency extraction is used to extract musical information. The results demonstrate the algorithm’s ability to deliver high resolution in both time and frequency dimensions for polyphonic music, with an average frequency domain error below 6 Hz and a temporal error under 80 ms.

This website uses cookies

This website uses cookies