This data visualization was submitted for the monthly data vis battle from r/dataisbeautiful. The dataset was movies with jump scares. This was my first time doing both data visualization and working with D3.
A Reddit user kindly provided the dataset as a csv file. After reviewing it, I was most interested in mapping the IMDB and Jump Scare scores over time.
Since I was familiar with SVGs, it was fairly easy to get started with D3. There are lots of examples and extensive documentation.
I decided to represent IMDB scores as purple, and Jump Scare scores as blue.
Observations
IANADS (I Am Not A Data Scientist) but it looks like the average IMDB score trends down over time. However, this might be caused by people only bothering to rate good older movies on IMDB. No one wanted to take the time to go rate bad movies from the 1960 and 70s. I am not familiar enough with the genre to know how many older movies could be missing.
It was also interesting to see the maximum Jump Scare score increase over time.
Challenges
PROBLEM Having both datasets on screen at the same time created a lot of dots/data points.
SOLUTION I created a slider that controlled the opacity of each dataset. You can drag the slider to the left to only view IMDB scores, and to the right to view Jump Scare scores.
PROBLEM A lot of dots overlap each other, due to movies being released in the same year that have the same score.
SOLUTION I lowered the opacity of each dot to make the "dense" patches of dots more clear. I also created a tooltip that shows a list of all movies sharing the same data point on hover.
Thoughts for my next data vis project
- I would like to investigate alternate ways to display overlapping data, such as swarmplots. Consider when it's worthwhile to simplify data to a line graph instead of showing all data points.
- The chart is responsive, but the tooltip gets cut off on smaller screens. For future versions, I should code in a check that will orient the tooltip to the opposite side of the data point when needed. Same for data near the bottom of the graph.