Research meta-housekeeping: On HuffPo and BAI 2.0

Yesterday morning, I wrote my first piece for the Huffington Post.  I also posted a note to the Blogosphere Authority Index site, explaining that the rankings have been suspended while I tinker with the tracking system.*  There’s a relationship between the two.  Take a look at the toolbar listing under “share this story” in the screencap below:


1,139 people “liked” the story.  480 shared it.  163 tweeted, 63 e-mailed, and 4 Google +’ed.  The post also attracted 14 comments.**

That’s a lot of community activity.  The Blogosphere Authority Index would treat it as very little activity, though.  The BAI algorithm draws upon four types of public data: passive (blogroll) hyperlinks, active (in-text) hyperlinks, total site traffic, and community activity (total number of comments).  When I designed the BAI in 2007, those were the right sources to track.  Content wasn’t easily shareable on Facebook or Twitter.  Both platforms existed, but deep software integration was still years away.

The experience of blogging at HuffingtonPost is different from the experience of blogging at ShoutingLoudly.  There’s no “share this story” toolbar at SL.  I announce these posts on twitter and facebook, but any social media traction they get is strictly D.I.Y.  Facebook isn’t integrated.  And ShoutingLoudly isn’t *quite* the hub that HuffingtonPost is (if AOL wants to purchase the site too, I’m sure all of us authors are willing to listen!).  When I launched the BAI, HuffingtonPost was a blog with aspirations towards being a media operation.  Now, it’s a full-fledge media operation with bloggy roots.

And that signals the reason why I’ve taken the current BAI offline to focus on BAI 2.0.  When I designed the BAI, the goal was to make it “swappable.”  I knew what the best available metrics were at the time, and I knew they would not stay the best available metrics.  The idea was to create a system that could be reengineered without too much headache.

But it’s still a bigger headache than I thought it would be.  The current metrics (sitemeter/alexa for site traffic, blogroll crawls for network centrality, technorati for hyperlinks, and hand-counting/automated counting of blog comments)simply aren’t good enough anymore.  Blogrolls are too static.  They provide a decent map of blog clusters, but no real measure of changes in influence.  Facebook and Twitter have become core tools for sharing and discussion.  They have to be factored into the ranking system.

That’s going to take some time, particularly because it’s practically impossible to automate the data collection on the more-sophisticated sites.  The top sites tend to use customized platforms, which means hand-counting their thousands of reader comments.  I can’t simultaneously run the current BAI and design the next BAI.

So, with apologies to my fellow researchers who want to study the blogosphere in the 2012 election, the dataset is on hiatus (I can already foresee some very disappointed doctoral students in 2014, finding out that the dataset has a hole in it).  The February 2012 snapshot is a decent stand-in for the state of the blogosphere — past research shows that there isn’t a lot of month-to-month fluctuation in the among the elite blogs.  After three and a half years of data collection, though, it’s time to get under the hood and tinker with the mechanics some more.

Blogging at a major site today works differently than blogging at a major site in 2007.  The architecture has changed, and that has to be factored in to how we measure blog influence.



*They’ve actually been suspended since March.  I just got around to posting the note yesterday though.

**That screencap is from yesterday afternoon.  The post now has over 2,400 likes.  Which is probably more people than will read my book.  …I can’t actually decide how to feel about that.