Archive for the ‘Social Networks’ Category

The Social Science of Social Networks


The folks over at The Situationist posted some great excerpts from a recent Edge video of Nicholas Christakis discussion on the social science of social networks.

Nicholas has been studying the psychology behind social networks for some time now and thinks that one of the shifts in thinking that he brings to the table is the way he fundamentally looks at social networks. Rather than thinking about them as static, he seems them as a combination of typology and change. This ever shifting dynamic nature of social networks is what makes them so fascinating and so powerful.

You can watch the entire video and read the whole transcript here or read the great summary by The Situationist here.


Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • bodytext
  • Sphinn
  • del.icio.us
  • Facebook
  • Mixx
  • Google
  • Slashdot
  • StumbleUpon
  • Live
  • Technorati
  • Reddit
  • YahooMyWeb
Congrats To Matt


A recent article at the MIT Technology Review featured some of Matthew Hursts visualizations of the blogsphere. They are quite stunning. Congrats!


Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • bodytext
  • Sphinn
  • del.icio.us
  • Facebook
  • Mixx
  • Google
  • Slashdot
  • StumbleUpon
  • Live
  • Technorati
  • Reddit
  • YahooMyWeb
What’s The Value of a Digg?

For those of you who are impatient, here’s the punch line. On average, a single digg increases traffic by 0.10%. So a story that gets 3,000 diggs results in an increase in total traffic to the referring site by 300%. Now for those of you who want to know the entire story, read on.

All too often you see websites going down to do the “digg effect.” There’s even a wikipedia page for it (though for some reason it’s titled “Slashdot effect”). You also often see comments on digg like “only 84 diggs and it’s already down!” promptly dugg down because, as other observant commenters point out, “you’re an idiot, 1 digg <> 1 click.” This is, of course, due to the fact that you get plenty of free riders on digg (yours truly included) who read tons of stories but never digg them up.

This led me to the obvious question: what then, is the value of a single digg? In other words, how much traffic is generated from someone digging a site? With the help of the digg API and the ALEXA data service provided by Amazon I decided to answer this question.

Here’s what I did:

1. I picked some random date in January (the 12th if you’re curious) and worked back in time (to Oct 6, 2007…again for no particular reason), collecting basic info on 5,794 stories that were “popular” (made it to the front page).
2. ALEXA only provides pageview data for the top 100,000 sites on the internet (as per them), so I checked the list of sites I collected against their database and came back with 1,999 stories that I could follow.
3. I then collected all the digg histories for these stories as well as the pageviews (from ALEXA ) for each website 7 days prior and following a digg submission.

The idea was to use the ALEXA data as a proxy for total pageviews and compare that to the digg rate per story.

First, some basics info:

As per ALEXA, a pageview reflects 1 user per million. So we don’t have the exact number of people visiting a site since we don’t know how many users are out there, but we can easily track % changes. It wouldn’t be fair to group all the websites together since the impact of a digg for a large site (say mozilla.com…currently ranked the 44th more visited site on the net) is likely smaller than for a small site (say anonymousprof.com…currently far from being ranked) due to their large difference in existing traffic. To combat this problem, I created buckets for website size based off of the distribution of my sample (see below).

digg_histogrampagiews.gif

This resulted in the following groupings:
Small Size: 5 or fewer pageviews/million (n = 664)
Mid Size: 100 or fewer pageviews/million (n = 831)
Large Size: the rest (n = 504)

It’s also reasonable to assume that stories with different overall digg amounts should have different impacts on traffic. Following the same procedure I made 3 more cuts at the data.

digg_histogrampdiggs.gif

Small # of Diggs: 700 or fewer (n = 714)
Mid # of Diggs: 1500 or fewer (n = 819)
Large # of Diggs: the rest (n =466)

Now to the fun stuff. For all of these analyses I will show three different sets of data, one for each of the website sizes previously specified. I tried pooling everything, but the charts become incomprehensible.

Let’s start with the number of pageviews for the 15 day period starting 7 days prior to a post and 7 days following.

Each line represents a bucket of digg sizes. For example, the black line represents all stories that had a maximum of 700 diggs throughout their lifetime.

digg_large.gif

We see that small websites clearly benefit from being dugg and, not surprisingly, the bigger the story on digg (the more total diggs) the greater the increase in traffic. This is less true for the medium sized sites, and, apparently, not true at all for the big guys. None of this is surprising since we would expect smaller sites to benefit the most. The big guys are already getting lots of the traffic that would have come from digg.

The problem with the previous charts is that we don’t really get to see the value to the sites because we don’t know what a pageview/million really is. Instead, let’s look at the same data but as % gain per day. In order to do this we need a reference point and so we will average the pageviews for the 7 days prior to the story appearing on digg and assume that that is baseline traffic. We then compute the gain per day relative to this baseline and look at the % difference. I’ll also include the total # of diggs per day on the secondary axis (dashed lines) so you can see exactly where the traffic is coming from.

digg_smallgain.gif

digg_midgain.gif

digg_largegain.gif

Here we can clearly see that the benefit to smaller sites is greater than for larger sites. Interestingly, large sites seem to suffer from stories that don’t get too many diggs. I suspect this is just noise, but worth noting nonetheless.

So far, I have yet to answer my initial question: what’s the value of a single digg? The simple computation would be to sum the total gain in pageviews and divide that by the number of diggs (as I did in the opening paragraph). However, that would be misleading because the effect should be different for different sized sites. Also, because the frequency of digging goes down with time, each additional digg likely reflects a larger increase in traffic. So let’s plot the gain in traffic/digg by size of site across time.

digg_smallvalue.gif

digg_midvalue.gif

digg_largevalue.gif

Pretty cool! For small guys, the value increases with time since the denominator (# of diggs) falls quickly. This makes sense if we look back a few charts and notice that overall traffic dies down by this point. So for every digg late in the life of a story we see a large % gain. The mid size guys are all over the place, but we can still see that, on the whole, each digg helps a little bit. What’s really surprising though, is the value of a digg for large sites. It’s negative! I’m not sure I have a good hypothesis for this one, but I’ll be glad to hear some in the comments.

Finally, we can ignore the time difference and just collapse the value of the digg by website size and digg size. (error bars reflect standard errors)

digg_allcomparison.gif

We see here that, as predicted, the effect of a single digg is greater for smaller sites than for larger ones. In fact, the overall benefit for large sites is pretty much non-existent. Sorry Gawker. And, like I said at the outset, the overall effect is 0.10% increase in traffic per digg.

What’s also interesting is that it doesn’t appear that getting a tremendous amount of diggs helps that much more than just getting a lot of diggs. It looks like hitting the front page is all that matters. Once you’re there, there’s little difference between having 800 and 5,000 diggs.

Of course, there are some clear limitations to this analysis:

1. Because Alexa only has data for the top 100,000 sites I can’t see the effect of digging a really tiny site like mine. Though if this story hits the front page, I can give you an even more detailed analysis using my own data.
2. I’m only looking at stories that hit the front page. What about all those stories that never make it? Clearly they lead to traffic, but how much? That’ll have to be the topic of another discussion.
3. I don’t know what a pageview/million really is, so it’s hard to say something like: “1 digg = X clicks.” The best I can do is calculate a % gain.

If you have thoughts on how I can improve this analysis, please leave me a comment. I spent a lot of time working on this, but I’m sure it can be improved.


Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • bodytext
  • Sphinn
  • del.icio.us
  • Facebook
  • Mixx
  • Google
  • Slashdot
  • StumbleUpon
  • Live
  • Technorati
  • Reddit
  • YahooMyWeb
Music Preference Dispersion Through the Last.fm Network: Data Collection Approach

First an update on my attempt to visualize more of the data I collected last week. Despite my best efforts, I just can’t get any visualization programs to run all of the data I have (now over 1,000,000 friendships). It seems like when I go past about 25,000-50,000 users, the time to process the data goes way beyond anything my system can handle. Too bad. If someone has a super computer somewhere running social network visualization software on it, let me know and I’ll send you my data.

However, I’m far from ready to call it quits. Instead, I’m taking a new approach. Here’s the plan:

1. Rather than collecting data on all users, I will start with one or two “seed” users and perform a snowballing data collection process with 2 degrees of freedom. This should yield about 5,000 friendships per seed (in fact I did this for one seed already and it came to 4,844 friendships).
2. I will also collect data on users who are not part of that particular social network to use as a control.
3. Then I will collect the music listening history for each user. This, from my experience, will take a while.
4. Then I will attempt to map the diffusion of new song introductions to this network.

The idea is that as a new song emerges it has to start somewhere. If social networks help spread new songs/artists, than I would expect to see a spreading of preferences throughout the network. So the probability of a “friend” (or a friend of a friend) choosing to listen to a song is greater than the probability of a non-friend.

This should let me do some interesting statistical analyses as well as produce some cool visualization movies. Imagine a mesh with each node representing a user. Because I have a time series of listening history I can have each node “light up” when that user listens to a song. If there is no network effect than the lighting up should be random. However, if there is, than I should observe a spreading of “lights” throughout the network.

I’m leaving for a conference tomorrow so my blogging rate will slow down a bit (back next Monday), but I’ll have my data-collecting programs running in the mean time. Hopefully, when I return I’ll have more data than I’ll know what to do with!


Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • bodytext
  • Sphinn
  • del.icio.us
  • Facebook
  • Mixx
  • Google
  • Slashdot
  • StumbleUpon
  • Live
  • Technorati
  • Reddit
  • YahooMyWeb
What’s Popular on Different Social Sites?

story1.gif

SocialMediaTrader has their monthly roundup of what’s popular on different social media sites. It’s a pretty neat analysis, but I would love to see comparisons across time ( here’s last months)…or at least the raw data so I can do it (pulling data from graphs is a bit too tedious…even for me). You can eyeball some differences across the months, but mostly it’s consistent with technology, politics, and general news leading the way on all the sites. Good job guys.


Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • bodytext
  • Sphinn
  • del.icio.us
  • Facebook
  • Mixx
  • Google
  • Slashdot
  • StumbleUpon
  • Live
  • Technorati
  • Reddit
  • YahooMyWeb