A big interest of mine is social networks. I have a research project right now looking at the network structure of blogs (data still being collected…will be for another 2-3 months) and recently I got interested in the Last.fm friends network.
One of the ideas behind Last.fm is that they can provide music recommendations based on what people like you enjoy. On top of this they built a social aspect to their system where a user can have friends and see what they are listening to. Presumably this friends network also propagates their recommendations. An obvious question then is whether or not their actually is a user network or if friendships are localized so small groups with little contact with anyone else.
To answer this question I used the VERY open properties of Last.fm and, with the help of their API, began collected data on relationships between users. I first crawled the main last.fm site and pulled out 364,499 user names. Then I wrote a php script that systematically pulled all the friend information for each user. Respecting the TOS and limiting myself to 1 query per second, I’m still collecting data (6 days, 17 hours, and 34 minutes in) with about 8 more days to go. As of now, I have data for 166,332 users (99,030 of whom have any friends). That results in 843,514 friends at an average of 8.52 friends / user.
Conveniently, another one of my interests is visualizing large data sets. Unfortunately, I have no computer science training (only the tidbits of programming (php, flash/actionscript, perl) that I picked up as needed) and so I’m stuck with using pre-built visualization software. This wouldn’t normally be a problem, but given that I’m dealing with this much data, a custom built app would be very nice.
I’ve played around with Tulip visualization package a bit before and so I figured I’d give it another go. Unfortunately, when I dumped all my data into Tulip, I got a big fat crash. Turns out there’s no way it (or my system) can handle this much. After playing around with a few other software packages ( Pajek, UCINET, JUNG, and SoNIA) I decided to just look at a sample of my data instead.
So I picked the first 25,000 relationships in my data set (technically a random sample since I collected the data in a random manner), and after some tweaking (needed to get rid of duplicate reverse relationships) came up with a 2,310 seed users with 19,008 friends. This resulted in 24,036 relationships at an average of 10.41 friends / user (min = 1, max = 159…see distribution histogram below). I dumped that into Tulip and got some neat results. (If anyone knows how to increase the resolution of the saved images in Tulip, please let me know).
Each red square represents a user and each line represents a relationship
(Click on each image for the full version).
It’s pretty clear from this image that the user network is quite strong. There are certainly clusters of close friends, but also plenty of cluster inter-connectivity. Let’s zoom in a bit and see more detail:
Here we can see that there are more prominent users (size of square = # of friends) who have friends “orbiting” around them. This alone is pretty neat, but we can further see that the orbitals are also connected with other users, suggesting that there is a large amount of interconnectivity. Another zoom:
Again we see a similar pattern: seeds and orbitals.
But are all users connected? Well, not really. I cropped a part of the original image which had the following:
What we see here are all the users who are not connected to the “main network.” Now this can result from two things: 1) these users might just not have any connections to the main network, 2) because I’m looking at a sample of the data, I am missing the connections between these users and the main network. I suspect it’s a combination of both. Let’s zoom in a bit and see exactly what’s going on with these folks:
What we see is that even for these outliers, there appear to be some networks. We can see a big one right in the middle with a couple of dozen users and smaller 2-user “networks” throughout. Pretty neat!
Finally, you may be interested in knowing what the distribution of # of friends looks like. Well, here you go.
Clearly there is a big tail with the majority of users having only a few friends.
What’s next?
I would love to figure out a way to visualize all the data. If someone out there has the technical skills to do this (and presumably the computing power) I’d be more than happy to collaborate.
I’m also beginning to collect the listening history of users (again thanks to the wonderful API) and hope to examine music listening patterns as they relate to the network. That’ll be a much bigger problem because of the volume of data. I collected a small amount just to see what it would look like and for the 183 users that I checked, I already have 1,179,480 track plays. Scaling up to ~300k users is a bit much. Regardless, I may use the friends data to identify a sub-network of friends and track their listening patterns to see how they influence one another.
If you like what you see here, drop me a comment.






















[...] Prof has generated some pretty pictures showing the friends network on last.fm. This is very interesting because most social networks keep [...]
Love the graphs! I’ve been doing some investigation into this type of data myself. I’ll share my findings when I’ve got some results that are as interesting as yours.
Great visuals here… thanks!
Thanks Richard and Jayne. I got a suggestion to try a different visualization package (GUESS) so i’ll see if i can run a larger sample this weekend.
-AP
Cool! I did a project for my Network Analysis class last semester using GUESS and it worked out pretty well on a fairly small dataset (about 600 nodes). It would be really cool to see how the entire network looks, compared to just a sample. Have you run any component finding algorithms on it yet?
If you want to see how I analyzed my network, check out my paper, and feel free to email me!
http://www.hung-truong.com/research.html
Looks interesting. We did a similar thing for youtube communities, and we got similar results..in a sense that majority of users have small # of friends..
[...] What Does The Last.fm Friends Network Look Like? (tags: lastfm social data visualization) [...]
[...] quite excited by the number of people who found the Last.fm post interesting. I never expected that large of a response! A reader pointed me towards a new software [...]
Great post and the visualizations are really cool. Fred Wilson (avc.blogs.com) had a conversation around visualizing his song listening habits, tangentially related to this post.
Here was one of commenters posting on his analysis and visualizations, Lee Byron at CMU: http://www.megamu.com/lastfm/
And here was online app developed around Lee’s work: http://lastgraph.aeracode.org/about/
Thanks for starting up this conversation.
Hey Chris,
What Lee did is amazing. I’m looking to do something similar but on a larger scale (look at multiple users) as well as see if I can map the relationships b/w friends as they affect song popularity. It would be really interesting to see, for example, how my preference for a song over time changes and how that affects my friends preference for a song. I suspect this will be pretty difficult, but certainly doable.
Thanks for the great link!
[...] here: Vizzies of Last.fm’s friend network. I don’t friend on Last.fm. [...]
[...] sich dafür interessiert, sollte man einen Blick auf die Seite von “anonymous Prof” wagen, der einen ausführlichen - und wie die Überschrift verspricht- nett visualisierten [...]
[...] an update on my attempt to visualize more of the data I collected last week. Despite my best efforts, I just can’t get any visualization programs to run all of the data I [...]
I think the thing that needs to be mentioned is that the ‘friends’ capability of last.fm is kind of tangental to the overall site functionality.
That is, you can be a member of last.fm and enjoy all of its tracking/streaming/radio/user comparing functionality and never add a single friend.
This puts this data on a somewhat different level than say, the equivalent dataset for MySpace, whose sole function is connectivity.
[...] this blog probably know, I’m quite interested in visualizing social networks (for example, Last.fm). One interesting application of this is the visualization of historical movements. I initially [...]
[...] to understand. After seeing some pretty impressive data visualizations of social networks for Last.fm and Scouta, it occurred to us that it might be interesting to try and visualize the StumbleUpon [...]
[...] namens TULIP in Verbindung zusetzen. Wenn er mehr Daten nahm, stürzte sein Rechner ab1. Herausgekommen ist eine tolle Veranschaulichung, die er in seinem Blog auszugsweise veröffentlicht hat. Man kann nicht nur sehen erahnen, wer [...]
[...] Anonymous Prof » Blog Archive » What Does The Last.fm Friends Network Look Like? Visualisation of the Last.fm social network using Tulip software (tags: data visualization last.fm socialnetworking) [...]