The problem
TikTok has more than a billion monthly active users. A song can go from zero to ten million views in 48 hours. But that visibility doesn’t guarantee a single stream on Spotify.
There was a lot of narrative around TikTok and music, but very little quantitative evidence. No one had precisely measured how much TikTok virality really affects sustained performance on Spotify, nor whether that impact works the same way for every artist.
That was the starting point for this project.
The approach
I designed a complete, end-to-end analysis pipeline of three linked phases, from data collection through to strategic visualisation.
PHASE 1
Python
Data preparation
Enrichment via Spotify API and Last.fm API. Cleaning and feature engineering.
Output
Clean dataset + derived variables
PHASE 2
R
Statistical analysis
Correlations between virality and commercial success. Profile clustering.
Output
Models + clusters + statistical findings
PHASE 3
Power BI
Visualisation
Dashboard with 4 views (Overview, Artists, Genres, Songs), synchronised filters, and a contextual help panel.
Output
Interactive dashboard with KPIs and filters
Phase 1 · Data preparation — Python
I started from the public Kaggle dataset "Most Streamed Spotify Songs 2024" (4,601 songs) and enriched it using the Spotify and Last.fm APIs to incorporate music genres, artist metadata, and additional engagement metrics. I applied structural cleaning, Unicode normalisation, percentile filtering to remove songs with no statistical signal, imputation of anomalous values, and I built a set of my own derived variables: a composite virality index, cross-platform conversion ratios, and normalised engagement metrics. The final dataset contained 4,143 clean and enriched songs.
Phase 2 · Statistical analysis — R
With the clean dataset, I ran four linked analyses: logarithmic correlations between TikTok and Spotify metrics, a systematic comparison between two artist profiles (New Pop vs Traditional), k-means clustering to identify success archetypes, and a multiple linear regression model with cross-validation to quantify TikTok’s predictive power over Spotify. All the code is documented and reproducible.
Phase 3 · Visualisation — Power BI
I built an interactive dashboard with four views: Overall summary, Artists, Genres, and Songs, with dynamic filters and actionable KPIs, designed for a non-technical business user.
The findings
Finding 1 · TikTok does predict success on Spotify — but not in the way I expected
The cross-validated regression model explains 35% of the variance in Spotify performance (R²=0.35). In marketing and consumer behaviour, where thousands of unobservable variables are at play, this is a statistically robust result.
The two strongest predictors are:
- Composite TikTok engagement · β = +0.73 · The main driver of Spotify success
- Streams-per-view conversion · β = +0.67 · Retention efficiency is everything
- Release year · β = −0.30 · Each more recent year makes success harder (growing saturation)
This confirms that TikTok predicts Spotify, and that a composite index outperforms any single metric. But the most revealing coefficient is the negative one for release year: the market is saturated, and recent releases start at a structural disadvantage.
Finding 2 · Two routes to success, not one
The k-means clustering analysis identified two clearly differentiated archetypes:
Songs with an average release year of 2019, an established audience, and a stable conversion of 0.92 streams per view. This is the traditional success model: cumulative, predictable, and less dependent on virality.
More recent songs (average 2023) that go viral thanks to specific trends. Lower baseline engagement, but conversion efficiency is more than double: 2.05 streams per view. These are songs that convert attention into listening extremely efficiently when they land.
The result has clear strategic implications: a Cluster A song needs to build an audience patiently. A Cluster B song needs to maximise its viral window in the first few days. Confusing the two strategies is inefficient.
Finding 3 · The New Pop paradox
This was the most surprising finding of the project — and the one that adds the most analytical value.
New Pop — post-2018 emerging artists with high dependence on TikTok, such as Sabrina Carpenter, Chappell Roan, or Olivia Rodrigo — clearly dominates on TikTok. Higher like rate (0.152 vs 0.118), and a much higher median number of views (364M vs 211M). That was expected.
But when I analyse conversion to Spotify, the opposite happens: traditional artists convert 42% better in terms of streams per view (1.255 vs 0.703). Total Spotify streams are practically the same between the two groups.
That means New Pop creates a lot of noise on TikTok, but does not turn it into habitual listening with the same efficiency. I call this the New Pop paradox: a tactically advantageous position in visibility, but strategic vulnerability in conversion.
New Pop has solved the problem of being seen. It has not yet solved the problem of being listened to sustainably.
Finding 4 · Genres and seasonality: the invisible factor behind success
From the 98 genres identified in the dataset, three clear archetypes emerge:
- TikTok-centric — speedcore, neoperreo, hyperpop: high native virality, low conversion to sustained streaming
- Spotify-centric — mainstream pop, classic R&B: steady playback flow, less viral dependence
- Hybrids — country rap, dark pop: the most strategically efficient, because they combine virality with conversion
The calendar matters too. Releases concentrate in the first half of the year (Q1: 1,155 · Q2: 1,165 vs Q3: 905 · Q4: 918), reflecting industry cycles linked to awards and post-Christmas consumption peaks. The calendar is strategy, too.
The conclusion
For years, the music industry has been obsessed with virality as the objective. This project shows quantitatively that virality is a necessary condition, but not a sufficient one. The real indicator of sustainable success is what I call efficient virality: the ability to convert fleeting attention into a consumption habit.
For artists, labels, and managers, this changes the strategic conversation. It’s not enough to optimise for TikTok. You have to optimise for conversion.
The dashboard
The project includes an interactive Power BI dashboard with four operational views designed for a business user:
Overview view · Global KPIs (streams, views, engagement, conversion), impact by musical era (CD, Internet MP3, Streaming, Viral algorithmic), Top 5 most viral songs, New Pop vs Traditional artists, and a card for the most viral song
Artists view · Top 10 artists with metric selector (streams, views, likes, posts, engagement, virality, conversion), New Pop vs Traditional comparison, and a detailed table by artist
Genres view · Treemap of viral genres, Spotify vs TikTok consumption comparison, conversion vs virality scatter plot, and a detailed table by genre
Songs view · Top 10 songs, a detailed table per song (title, artist, duration, streams, views, conversion, engagement, popularity, virality) and a positional scatter plot coloured by cluster (High-performing vs TikTok-driven) with size proportional to the virality index
Additionally, the dashboard includes a collapsible filter panel with integrated search (by musical era, year, artist, genre, or cluster) synchronised across all pages, and a contextual help button explaining key variables and analysis limitations.
Stack and methodology
| Tool | Use in the project |
|---|---|
| Python 3 · pandas, requests, numpy | Cleaning, API enrichment, feature engineering |
| Spotify API + Last.fm API | Artist metadata, music genres, durations, and images |
| R · tidyverse, cluster, caret | Log correlations, k-means clustering, cross-validated linear regression |
| Power BI · DAX | Interactive dashboard with 4 views, synchronised filters, and a snowflake schema model |
| Kaggle Dataset (Nelgiriyewithana, 2024) | Base data source |
Dataset: 4,143 songs · 98 genres · bespoke derived variables
Model: Multiple linear regression · Cross-validation · R² = 0.35
Clustering: K-means · K=2 determined by the elbow method and silhouette analysis