Bass vs. Drums

Author/s: Khadisha Dabayeva, Lina Marangattil, Zenon Hanappi, Felicia Gulda, Lina Marangattil
Posted: 16 June 2020

Let’s play some bass and drums

In this blog post we are describing our take on Teachable Machine Audio to classify drum and bass recordings.

1. Corpus creation

For a training model we prepared a dataset and recorded a background noise, 118 samples of drums and bass. The bass training data for model 1 consists of mainly one string sounds downloaded from the open library NSyth Dataset, while the bass training data used in model 2 are 50 samples of live recordings.

Figure 1. Corpus creation

2. Characterization / key figures of your corpus

First things first: Gather the audio samples. Felicia has recorded bass and Zenon has recorded drum tracks. We have also tested prerecorded instrument samples from an open library. In order to train the machine, we have created a dataset with different classes. Secondly, we have recorded background noise samples because our model needs something to compare to the other classes to and detect when any noises are occurring at all. We have decided to record the background score in a silent room. In the third step the other classes are renamed and uploaded with bass and drum recordings. Those are the examples that the program can learn from. The pictures of the recordings visualize the various amplitudes and pitches, different frequencies of the sound. These spectrograms were sent to the machine learning model. As you can see, we were very generous in data-feeding our classes so they become better in classifying. The next step was to train the model. By switching on the input mode in the preview, it shows again the spectrogram of the noises that are recognized at the moment and detects in the output areas below if it is either a background noise, drum or bass sound.

3. Training and testing

Model 1:
After the first training the results were pretty good with accuracy between 80-100% (Figure 2). But it should be taken into account that the testing data was very similar to our training data. Also, during the testing phase it was found that the model recognized more accurate the drums (video 1).

Video 1. Drums Testing

The bass guitar was recognized with a higher rate of accuracy when the recordings were 1-string sounds as from the training data set (video 2).

Video 2. Bass Testing with 1-string sounds.

In other cases when the sound was recorded from a YouTube video of solo bass guitar playing the findings were not precise (video 3). Especially those samples which are highly dynamic and contain lots of short mixed sounds were identified as drums as their spectograms of waveforms look pretty similar. Long and more melodic sounds were identified correctly.

Video 3. Bass Failing Test

Thus, it was decided to extend our training data and add more complex sounds from different performances. As a result the training dataset for a class of bass guitar was extended to 150 samples (Figure 3).

Figure 3. Extended training dataset

The diversification led to a remarkable improvement in the case of bass recognition (video 4). Also, this model was tested with voice and beats on the table. The voice was defined in the bass guitar class, while the beats were like drums as it was expected. The model distinguishes at a good level percussion sounds from melodic, stringed ones. One suggestion for further work is to add instrument classes and sounds.

Video 3. Bass Identification after Dataset Extension

Model 2:
The aim with model two was to draw a comparison between self recorded and downloaded samples from the open library NSyth Dataset.
In model 2, only 50 self recorded drums, and self recorded bass audio samples made up the testing data. This time around, louder and lively backround noises were included.

Video 4. Live-Bass Identification Accuracy

As identifiable in the video, Model 2 was tested with a live bass performance, playing melodies and 1-string sounds/ single notes with a variety in frequencies.  
It should be noted that while playing the bass, the model identifies the fret buzz as background noise and/or drums, which also results in the bass recognition accuracy to be at 80% or lower.

Image 4. Bass Testing with 1-String Sounds of Model 2

Image 4 showcases that the live played 1-string sounds on the bass guitar results in the highest rate of accuracy.

Image 5. Accuracy and Loss Graphs from Model 2

Overall, the model indentifies the bass at a high accuracy of around 80 -100%, even though the training data hat less samples than model 1, and is able to successfully distinguish between percussion and the lowest - pitched guitar.

A suggestion to improve the model could be to include a broader variety of different backround noises, as well as a higher number of drum samples for the drum recognition to be more accurately divisible from background noise. Another issue that needs to be adressed is that a variety in the bass instrument and the play style is needed for the model to distinuish better between drums and the fret buzz.


One of the difficulties was to start this project. Firstly: this program is not supported on some browsers such as Google Chrome or Safari. To perform the experiment, it was decided to use the Firefox. Secondly, judging by the fact that none of us managed to download files from the drive or from the computer as suggested by the program itself, this system needs to be improved. But it was not a big problem as we used the built-in microphone and turned on the audio from the computer. We think that this is even better, because we did not train on "perfectly recorded soundtracks", but used a database that was close to reality with all the noise and background noise.

Our Models: Drums vs. Bass

If you are interested in testing out our model, grab yourself an instrument or use your bare hands to make some noise.

Model 1

Model 2


Besides the defaults in operating the Teachable Machine Audio, this program model has indeed fulfilled our intention to prove that AI can classify bass and drums sounds. Differences in the levels of accuracy were observed visually in the spectograms, due to the various amplitudes and frequencies of the test samples from different sources. By adding more diverse input audios, the recognition of the tested samples got better. Another assumption of us got approved that noises that sound similar to the instrument, in this case the drums, can mistakenly be detected as the latter. In conclusion, this training model has helped us to understand the details of the spectrogram of sound and the types of graphics that correspond to different sounds. Within own trial and error, we managed to figure out all the intricacies and pitfalls of the program.