« Tracking blogs from nature.com and beyond | Main | RSS subscriber numbers for science blogs »

Long Tails: just how we roll

(interested in this kind of thing? Also check out Christina's forthcoming talk, via Bora)

If we assume that people blogroll sites that they think (a) are good and (b) are relevant to their audience then it seems fair to also assume that by aggregating and analyzing blogrolls from blogs tagged in Nature.com Blogs (did I mention Nature.com Blogs already? Did you sign up?) we can come up with some sort of 'top ten blogs' in each area.

I wrote some scripts to do the relevant scraping. Here are the results. Note that ranks can be tied - The Loom and Aetiology share fourth place, for example:

Across all subject areas

BlogRank
Pharyngula1
The Panda's Thumb2
RealClimate3
The Loom4
Aetiology4
A Blog Around The Clock5
Cosmic Variance6
Adventures in Ethics and Science7
Respectful Insolence8
The Intersection8

I left out the raw numbers for now (at this stage it's just an experiment) but can tell you that Pharyngula and The Panda's Thumb are way ahead of the competition... they're on twice as many blogrolls as Real Climate (also a very popular choice).

It all looks very Long Tail'ish...

blogrolls_long_tail.png

... that is to say that there are a very small number of blogs on lots of blogrolls and a very large number of blogs on few blogrolls.

We're been tracking links between science blogs since Nature.com Blogs launched but there isn't enough data to do a proper comparison between blogroll popularity and incoming link popularity yet. I'd be interested to see what kind of correlation there is: do you link more often to the blogs on your blogroll? Are there some blogs that you add because you feel you should? Does much reciprocal blogrolling go on (almost certainly, I'd guess)? Is there a blogrolling equivalent to prominently displaying an unread copy of Ulysses on your bookshelf even though all you read is Stephen King?

(tables for individual subject areas are below the fold)

Chemistry

BlogRank
Molecule of the Day1
Useful Chemistry1
Developing Intelligence2
The Sceptical Chymist2
Genomicron2
easternblot.net2
Stoat3
petermr's blog4
The Culture of Chemistry4
chem-bla-ics4

Life Sciences

BlogRank
Pharyngula1
The Panda's Thumb2
Aetiology3
A Blog Around The Clock4
Living the Scientific Life (Scientist, Interrupted)5
Eye on DNA6
The Other 95%7
Laelaps7
Gene Expression8
Not Exactly Rocket Science8

Physics

BlogRank
Cosmic Variance1
Bad Astronomy2
Uncertain Principles3
Biocurious3
Good Math, Bad Math4
Cocktail Party Physics4
Angry Physics5
Shtetl-Optimized5
Backreaction6
A Photon In The Darkness7

Bioinformatics

BlogRank
business|bytes|genes|molecules1
Public Rambling1
What You're Doing Is Rather Desperate1
nodalpoint.org - A bioinformatics weblog2
The Hyphal Tip3
Flags and Lollipops - Bioinformatics Blog3
Omics! Omics!3
The Seven Stones3
Suicyte Notes4
Microarray and bioinformatics4

Postgenomic TrackBack

Similar items from Scintilla

Comments

Hi, Euan. What do you mean when you say, "blogrolls from blogs tagged in Nature.com Blogs"? You mean you went to each Nature.com Blog, looked at the blogs in each one's blogroll, went to each of *those* blogs, and then tabulated the blogs listed in *their* blogrolls?

Hey Amos,

Sorry, should have been a bit clearer about the methodology, which was basically:

1) download list of blogs (and their subject areas) from Nature Blogs

2) visit each one, pull out its blogroll (by downloading half a dozen posts published on different dates, working out which part of the pages never changed and then pulling out links from there)

3) filtering out non-blogs and blogs not listed in the Nature Blogs index

There was quite a lot of munging of URIs involved (so that scienceblogs.com/ matched www.scienceblogs.com , for example).

Nice.

But Neurotopia and Of Two Minds are not physics blogs, they are neuroscience blogs (more life science than anything else).

THere may be a serious flaw in your methodology. If you search for parts of the page that do not change you will never include any rotating blogroll. My blog roll includes about 300 sites, but displays only 50 at a time.

Can I claim my blogs yet?

Is this affected by the fall in popularity of the blogroll (static) in favour of imported widgets for RSS readers (now standard on Blogger (Google) blogs but not, sadly, on Typepad as yet). I think bloggers and others are tending nowadays to subscribe and unsubscribe to blogs in their RSS readers and not bothering to update their static blogrolls. In the "old days of blogging", blogrolls were ways to get bloggers to refer to each other, etc -- but now, many coversations go on separately at Friendfeed, twitter, Nature Network, etc, as well as at the blog itself.

Must be hard to capture a lot of this ephemeral activity for analytical purposes.

Another point: Nature Network blogs don't even have blogrolls. Will be interesting to see what happens to your analysis when they do. But for the moment, they are presumably disproportionately excluded from a blogroll-based analysis such as this?

Maxine: there's nothing to stop Network blogs being blogrolled by others which would make them show up here - OTOH I bet you're right in that you're far less likely to be blogrolled if you don't maintain a blogroll yourself.

Greg: you're right, the methodology definitely can't capture rotating blogrolls. I don't think it's too serious a flaw if the numbers of non-rotating blogrolls are big enough but it's a good point and something that would definitely be worth tackling properly if we wanted to use ranks like these for anything formal. It'd be dumb to not use all the data available.

What Bora mentioned is definitely a serious flaw: those aren't physics blogs! Must have a tag gremlin... will correct tomorrow.

Post a comment

Comments will be reviewed by the editors before being published. You can be as critical or controversial as you like, but please don't get personal or offensive. We strongly encourage you to use your real, full name. Email addresses are useful in case we need to discuss your comment with you privately, or notify you in case we decide not to publish your comment. Email addresses will not be made public on the blog.

We have designed this blog to be as accessible to as many people as possible. If you are having difficulty leaving a comment because of the graphical security code below, please send your comment to 'nascent at nature.com'



"Nascent Web publishing efforts have their genesis in a burning need to say something, but their ultimate success comes from people wanting to listen, needing to hear each other’s voices, and answering in kind."
Rick Levine
The Cluetrain Manifesto

Subscribe

Subscribe to this blog's feeds:

[What is this?]

The Life Scientists on FriendFeed

Recent Comments

Out of 368 total comments.
The most recent three were on:
Powered by
Movable Type 3.2