Botify Grime - Playlist Song Recommendations¶
In addition to divvying up my playlist into smaller playlists, I wanted to find a way to invent my own form of recommendations for any given playlist. This has led to the creation of the Botify shootoff: Botify Grime.
Web Scraping Tool for Music Genres¶
I want the recommendation engine to primarily play off of 2 things: genres and desired track features. To first tackle the genres we need to find an aggregate list of all possible Spotify genres. This information is unfortunately not given to us by Spotify, but it is given to us by Spotify's lead data engineer, Grant Macdonald. Grant Macdonald's site https://everynoise.com/ contains a mapped out version of every single possible Spotify genre, so we need to first scrape all possible visible information about these genres.
This includes their name, their position on an x-y cartesian plane, their font-size, and their color represented as both a hex value and an rgb value. The font size and rgb values will be used later on in the code.
Define a scraping function: A function named scrape_genres is created, which takes a URL as an argument and performs the following tasks:
- Sends an HTTP request to the URL and fetches the content.
- Parses the HTML response using Beautiful Soup library.
- Extracts relevant information about music genres from the page using appropriate selectors.
- Pre-processes and cleans the data.
- Converts the extracted data into a DataFrame using pandas.
Clean and save data: The scraped genre names are cleaned from unnecessary characters and the resulting DataFrame is saved as a CSV file. We then create a split-off dataframe called "enao_graph.csv" which will be used later on when we utilize the font size and rgb values.
import requests
from bs4 import BeautifulSoup
import pandas as pd
def scrape_genres(url):
try:
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
genres_elems = soup.find_all("div", class_="genre")
genres_objs = []
for genre in genres_elems:
style = genre['style']
font_size = style.split('font-size:')[1].split(';')[0].strip()
color_str = style.split('color:')[1].split(';')[0].strip()
r, g, b = tuple(int(color_str[i:i + 2], 16) for i in (1, 3, 5))
top = style.split('top:')[1].split(';')[0].strip()
left = style.split('left:')[1].split(';')[0].strip()
genre_obj = {
"genre": genre.text.replace("»", "").strip(),
"font_size": font_size,
"color": color_str,
"colors_rgb": f"rgb({r}, {g}, {b})",
"top": top,
"left": left
}
genres_objs.append(genre_obj)
genres_df = pd.DataFrame(genres_objs)
return genres_df
except Exception as e:
print(f"An error occurred while scraping genres: {e}")
return None
url = "https://everynoise.com/engenremap.html"
genres_df = scrape_genres(url)
if genres_df is not None:
genres_df.to_csv("enao_genres.csv", index=False)
genres_df.head()
| genre | font_size | color | colors_rgb | top | left | |
|---|---|---|---|---|---|---|
| 0 | pop | 160% | #ad8807 | rgb(173, 136, 7) | 4850px | 787px |
| 1 | rap | 144% | #a88903 | rgb(168, 137, 3) | 5759px | 1070px |
| 2 | rock | 141% | #ab711a | rgb(171, 113, 26) | 11449px | 564px |
| 3 | urbano latino | 134% | #bd9002 | rgb(189, 144, 2) | 3341px | 1170px |
| 4 | hip hop | 134% | #ad7e09 | rgb(173, 126, 9) | 6978px | 1085px |
genres_df = pd.read_csv("enao_genres.csv")
genres_df['color'] = genres_df['colors_rgb']
enao_graph = genres_df[['genre', 'color', 'font_size', 'left', 'top']]
enao_graph.to_csv("enao_graph.csv", index=False)
print(enao_graph)
genre color font_size left top
0 pop rgb(173, 136, 7) 160% 787px 4850px
1 rap rgb(168, 137, 3) 144% 1070px 5759px
2 rock rgb(171, 113, 26) 141% 564px 11449px
3 urbano latino rgb(189, 144, 2) 134% 1170px 3341px
4 hip hop rgb(173, 126, 9) 134% 1085px 6978px
... ... ... ... ... ...
6173 yunnan traditional rgb(69, 154, 40) 100% 714px 17793px
6174 classical string trio rgb(25, 173, 130) 100% 381px 21422px
6175 string quintet rgb(69, 166, 181) 100% 494px 18809px
6176 quartetto d'archi rgb(52, 164, 95) 100% 430px 18166px
6177 youth orchestra rgb(38, 145, 172) 100% 175px 20806px
[6178 rows x 5 columns]
genres_df = pd.read_csv("enao_genres.csv")
genres_df['left'] = genres_df['left'].apply(lambda value: int(value.replace("px", "")))
genres_df['top'] = genres_df['top'].apply(lambda value: int(value.replace("px", "")))
genres_df = genres_df.rename(columns={'left': 'x', 'top': 'y'})
df = genres_df[['genre', 'color', 'x', 'y']]
print(df)
genre color x y
0 pop #ad8807 787 4850
1 rap #a88903 1070 5759
2 rock #ab711a 564 11449
3 urbano latino #bd9002 1170 3341
4 hip hop #ad7e09 1085 6978
... ... ... ... ...
6173 yunnan traditional #459a28 714 17793
6174 classical string trio #19ad82 381 21422
6175 string quintet #45a6b5 494 18809
6176 quartetto d'archi #34a45f 430 18166
6177 youth orchestra #2691ac 175 20806
[6178 rows x 4 columns]
Genre Scatter Plot Visualization¶
Using the scraped data, we can plot our own little version of Every Noise's genre mapping using plotly.express
import pandas as pd
import plotly.express as px
fig = px.scatter(df,
x='x',
y='y',
color='color',
hover_name='genre',
title='Visual Mapping of Genres',
labels={'x': 'x',
'y': 'y'})
# Show the plot
fig.show()