Over the summers of 2019 and 2020 I've worked at Ithaca College's JimiLab researching music technology. The team has been developing a web-app called Localify which generates playlists with local artists and recommends upcoming concerts in the area based on their spotify listening history. For this project we collaborated with Thorsten Jochims and a few of his students at Cornell University.
Localify uses Collaborative filtering (CF) to generate playlists. This is quite difficult with local artists, as CF requires a large amount of user listening data. This data is very sparse for obscure local artists. To get around this issue, we have been researching the viability of using Content-based (CB) recommendation. With CB, we recommend songs based on how they sound compared to songs we know the user likes. This doesn't have a bias towards the number of listens a song has, and is ideally a more fair, merit-based way of recommending music.
In pursuit of this, we developed an experiment using Triplet Networks. Triplet Networks are ideal for one-shot learning, as they create meaningful embeddings which can then be compared by taking their L2 distance. You can read the fine details in our paper which I had the pleasure of presenting at Machine Learning for Music Discovery (ML4MD) 2020, an International Conference on Machine Learning (ICML) workshop.
Now that a few years have passed, I would love to revisit this kind of project with a different approach. In our training set, what we deemed as 'similar' songs were simply two songs by the same artist. If we'd had more resources, it would have been ideal to manually annotate songs with a listening study, since some artists may vary pretty widely in the sonic similarity of their songs.