Archive for the ‘Visualization’ Category

A Brief 3D Tour of Classical Music History

As those of you have been following this blog probably know, I’m quite interested in visualizing social networks (for example, Last.fm). One interesting application of this is the visualization of historical movements. I initially thought of looking at the relationship between artists as a function of when they created their works and who they collaborated with, but quickly realized that such data would be nearly impossible (at least for me) to collect or compile. And so, I turned to a related area, classical music.

While there are certainly plenty of classical composers, the breadth is smaller, and perhaps the relationships have already been cataloged. A quick google search proved me right. Charles H. Smith, Professor of Library Public Services at Western Kentucky University complied such a database in 1993 called The Classical Music Navigator! Unfortunately, since the site was created back in Web 1.0 times, the data were not accessible via an API or even via a conventional web crawler (due to formatting inconsistencies caused by static web pages). As such, I contacted Prof. Smith and he was kind enough to send me the underlying data (though still in a format that required a few hours of parsing).

Anyway, after playing around for a while, I was able to map the relationships between 444 classical composers starting from Hildegard in 1098 through the present day. Using Tulip to do the heavy lifting, I came up with the following:

What we see here is a 3D representation of the 444 composers. Each white sphere is a composer and each line represents a connection. For this visualization, a connection represents a point of influence. In other words, every time The Classical Music Navigator indicated that composer A was influenced by composer B, a link was created. The size of the spheres represents the number of direct influences that a composer has had. This resulted in 2,618 direct relationships. The bluer the line the younger the composer (bottom) and the redder the line, the older the composer (top).

Because I don’t know how to export Tulip data into any kind of 3D software (I’m not sure if it’s even possible), I decided to make a video tour which not only showed the 3D’ness of the visualization, but also highlighted some neat aspects of it.

You need to have flashplayer enabled to watch this Google video

If you’ve made it this far, than you might be interested in some more information about the data. Let’s start with a histogram of the number of direct relationships per composer.

Nothing surprising here. A strong positive skew suggesting that most composers had little influence.

Interestingly, when one only uses the number of direct relationships as a metric for influence, one misses out on all the indirect influence a composer has had. What I mean by this is that composer A may have influenced composer B who in tern influenced composers C, D, and E. Looking only at direct relationships, we would say that composer A was not influential as he had only 1 connection. However, if we look at the indirect effect (the effect A had on all of B’s influences) we find that A actually had 4 (B,C,D, and E) connections. Following this type of logic, we come up with a different interpretation of the results. Here is a histogram of the number of indirect relationships each composer has had:

And now a scatter plot of both direct (x-axis) and indirect (y-axis)

What we see from the scatter plot is that there are quite a few composers who had very few direct relationships (on the left) but very many indirect relationships (at the top). Of course, we would expect that older composers would have strong indirect influences just by virtue of being around earlier. We can examine this hypothesis by looking at a scatter plot of the # of indirect relationships by the birth year of the composer.

Here we see that while there is a clear negative relationship suggesting that, yes, indirect influence is a function of birth year, we also see that the relationship is not quite perfect. Several composers have an indirect influence of about 200 spanning the time range and some have no influence at all.

If I recreate the visualization using the # of indirect influences to determine the size of each node, you get a slightly different picture.

Here we see that the composers at the top generally have much more influence than the composers at the bottom, as we’d expect. However, we also see that many composers throughout the visualization have had a large influence on classical music.

If you find this interesting, I suggest you take a deeper look at the data by either going to the Classical Music Navigator and playing around or by downloading Tulip and playing with the model yourself.

I’m making the Tulip data freely available here.


Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • bodytext
  • Sphinn
  • del.icio.us
  • Facebook
  • Mixx
  • Google
  • Slashdot
  • StumbleUpon
  • Live
  • Technorati
  • Reddit
  • YahooMyWeb
Congrats To Matt


A recent article at the MIT Technology Review featured some of Matthew Hursts visualizations of the blogsphere. They are quite stunning. Congrats!


Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • bodytext
  • Sphinn
  • del.icio.us
  • Facebook
  • Mixx
  • Google
  • Slashdot
  • StumbleUpon
  • Live
  • Technorati
  • Reddit
  • YahooMyWeb
Songs Expressed with Graphs

Andrew over at Information Aesthetics recently posted about a very funny flickr set of graphs that represent song lyrics. Definitely worth a look.


Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • bodytext
  • Sphinn
  • del.icio.us
  • Facebook
  • Mixx
  • Google
  • Slashdot
  • StumbleUpon
  • Live
  • Technorati
  • Reddit
  • YahooMyWeb
Music Preference Dispersion Through the Last.fm Network: Data Collection Approach

First an update on my attempt to visualize more of the data I collected last week. Despite my best efforts, I just can’t get any visualization programs to run all of the data I have (now over 1,000,000 friendships). It seems like when I go past about 25,000-50,000 users, the time to process the data goes way beyond anything my system can handle. Too bad. If someone has a super computer somewhere running social network visualization software on it, let me know and I’ll send you my data.

However, I’m far from ready to call it quits. Instead, I’m taking a new approach. Here’s the plan:

1. Rather than collecting data on all users, I will start with one or two “seed” users and perform a snowballing data collection process with 2 degrees of freedom. This should yield about 5,000 friendships per seed (in fact I did this for one seed already and it came to 4,844 friendships).
2. I will also collect data on users who are not part of that particular social network to use as a control.
3. Then I will collect the music listening history for each user. This, from my experience, will take a while.
4. Then I will attempt to map the diffusion of new song introductions to this network.

The idea is that as a new song emerges it has to start somewhere. If social networks help spread new songs/artists, than I would expect to see a spreading of preferences throughout the network. So the probability of a “friend” (or a friend of a friend) choosing to listen to a song is greater than the probability of a non-friend.

This should let me do some interesting statistical analyses as well as produce some cool visualization movies. Imagine a mesh with each node representing a user. Because I have a time series of listening history I can have each node “light up” when that user listens to a song. If there is no network effect than the lighting up should be random. However, if there is, than I should observe a spreading of “lights” throughout the network.

I’m leaving for a conference tomorrow so my blogging rate will slow down a bit (back next Monday), but I’ll have my data-collecting programs running in the mean time. Hopefully, when I return I’ll have more data than I’ll know what to do with!


Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • bodytext
  • Sphinn
  • del.icio.us
  • Facebook
  • Mixx
  • Google
  • Slashdot
  • StumbleUpon
  • Live
  • Technorati
  • Reddit
  • YahooMyWeb
Visualizing Last.fm Listening History

A commenter pointed me to LastGraph, an amazing web app which builds a beautiful chart of your Last.fm listening history. It also makes available the chart of anyone who ran the app.

Here’s an example of one for user “cblake” (note: pdf file):

lastgraph.jpg

Pretty impressive!


Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • bodytext
  • Sphinn
  • del.icio.us
  • Facebook
  • Mixx
  • Google
  • Slashdot
  • StumbleUpon
  • Live
  • Technorati
  • Reddit
  • YahooMyWeb