Given that this book is written by a group of people who all have the subject of data journalism in common, we thought it should be quite likely that the authors of this book would form the nucleus of an advocacy network. Given this possibility, we further conjectured that the network would have a strongly connected nucleus, as well as a large reach.
All but one of the authors has a Twitter account, so we decided to use the Twitter user profile information as a useful proxy for our network measurements. We harvested the Twitter user profile data for each author and also the user profiles of all their friends. (Friends are the people you follow on Twitter, rather than your followers, who follow you.)
From this data set all the people are nodes and a connection (following someone) is an edge. Author nodes and author-to-author edges are coloured and sized to make them more visible. We then used this data set to create a force-directed network chart in D3.
The network chart supports our hypothesis as it clearly shows both a strongly connected network nucleus (of author coloured nodes and links) and a very large network reach (white nodes and links).
The data for this graph was retrieved in October 2015. Some data may have been updated subsequently.
Note: The image above is static - check back next week for the interactive version!
Watch the video
The video shows the force directed layout algorithm at work. The nodes start in randomised positions, and the mathematics of mutually-opposing forces work to find a stable layout for the network. For the book cover image and the video, we seeded the random number generator so the start and finish layouts would always be the same, given the same input data. The video playback is 8x original recording speed. The first 10 seconds are the most dramatic, with large changes between each iteration. The remaining minute shows much more gradual changes, as the network settles slowly.
We also asked our great friends at Plotly to take a crack at the data, and they created this gorgeous K-means cluster chart.
Plotly is a platform for creating, sharing, and discussing interactive charts. It's collaborative, free, and entirely online. Plotly has packages and support for Python, matplotlib, R, ggplot2, MATLAB, Excel, and more.
The chart below was created with Plotly's R client. Its a k-means clustering of the Twitter friends and followers for every author and friend shown in the network graph above.
About the K-means cluster chart
The chart uses the same base data as the network graph above. It uses k-means clustering based on an original post on R-Bloggers: Cluster your Twitter Data with R and k-means. It uses logarithmic scales on both axes and Color Brewer colours. Hover on the marks to see people's names and Twitter handles.
People on the left of the chart (cluster 3) have relatively low numbers of friends and followers. Given the results from our network graph, it is not surprising that most of our authors (coloured white) are in the same cluster (cluster 2), with generally larger numbers of friends or followers or both. To the right of the chart (clusters 1 and 4) people have a large number of followers. See if you can spot some famous people in these clusters!
Hover on the chart to see the tools panel. You can use the tools to pan and zoom the chart, to focus in on areas of interest.
The R code to generate the plot is on Github - contact Carson if you need more help getting started with R and Plotly.
Cluster chart credits
Story idea: Jack Parmer (firstname.lastname@example.org)
Visualisation: Carson Sievert (email@example.com)
Original chart: Justin Marciszewski (firstname.lastname@example.org)