<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0">
   <channel>
      <title>Nascent</title>
      <link>http://blogs.nature.com/wp/nascent/</link>
      <description>Nature Publishing Group&apos;s blog on web technology and science</description>
      <language>en</language>
      <copyright>Copyright 2009</copyright>
      <lastBuildDate>Wed, 01 Jul 2009 12:11:12 -0500</lastBuildDate>
      <generator>http://www.sixapart.com/movabletype/?v=3.2</generator>
      <docs>http://blogs.law.harvard.edu/tech/rss</docs> 

            <item>
         <title>&quot;I am not a scientist, I am a number&quot;</title>
         <description><![CDATA[<p>On Monday I was at the BioLINK Special Interest Group at the Intelligent Systems for Molecule Biology meeting in Stockholm. Amongst the many thought-provoking talks was one by Phil Bourne, he of the <a href="http://www.pdb.org">Protein Data Bank</a>, <a href="http://www.scivee.tv">SciVee</a> and other goodies.  Phil made a cogent plea for a system of unique identifiers for scientists.</p>]]></description>
         <link>http://blogs.nature.com/wp/nascent/2009/07/i_am_not_a_scientist_i_am_a_nu.html</link>
         <guid>http://blogs.nature.com/wp/nascent/2009/07/i_am_not_a_scientist_i_am_a_nu.html</guid>
         <category>Publishing</category>
         <pubDate>Wed, 01 Jul 2009 12:11:12 -0500</pubDate>
      </item>
            <item>
         <title>Welcome to the Streamosphere</title>
         <description><![CDATA[<p><img alt="river-of-news.jpg" src="http://blogs.nature.com/wp/nascent/river-of-news.jpg" width="300" height="150" style='margin: 0px 0px 10px 10px;' align='right'/>Web publishing as a discipline has few tenets but I think <b>release early, release often</b> and <b>don't be afraid to fail</b> are pretty sound. That was the philosophy behind <a href='http://www.nextgenerationscience.com/reference-management/connotea-saving-references-made-simple/'>Connotea</a> when <a href='http://network.nature.com/people/timo/profile'>Timo</a> and <a href='http://www.academia.edu'>Ben Lund</a> launched it in 2004 and it's the spirit in which I've just put up an early version of <a href='http://streamosphere.nature.com'>Streamosphere</a>. </p>

<p>Streamosphere is a pet side project which I'm running according to what I guess you could call the <a href='http://www.paulgraham.com/startuplessons.html'>Paul Graham principles</a> (it'd be disingenuous to say "as a start-up" as most startups don't have NPG level resources. OTOH we lack a fussball table and free M&Ms). Think of it as a pre-alpha alpha.<br />
 <br />
<b>The elevator pitch</b></p>

<p>Streamosphere lets you track scientific discussion on the web, in real time.</p>

<p><b>What it does</b></p>

<p>If you visit <a href='http://streamosphere.nature.com/preview.php#24'>streamosphere.nature.com/preview.php#24</a> you'll see a page of stacked timelines like these:</p>

<div style='text-align: center; margin: 10px 0px 10px 0px;'>
<img alt="Picture 5.png" src="http://blogs.nature.com/wp/nascent/Picture%205.png" width="600" height="99" />
</div>

<p>Each timeline shows discussion around a particular item, for now always a web page. The portrait on the left is of one of the people who first started talking about the item. The slice of time in which the discussion was active (people were leaving comments, tweeting, liking or bookmarking it) is coloured a shade of magnolia. Behind the active slice is a graph - this shows you how much activity there was at any one point.</p>

<p>Click on an item's active slice to pop up more details about it including an activity breakdown and a selection of associated comments and tweets. If the item is a video or photograph it should be embedded in the popup. If the item description is in a foreign language hover your mouse cursor over it to get the English translation.</p>

<div style='text-align: center; margin: 10px 0px 10px 0px;'>
<img alt="Picture 6.png" src="http://blogs.nature.com/wp/nascent/Picture%206.png" width="400" height="285" />
</div>

<p>Streamosphere only ever shows the most active items in a given time period. Use the controls on the right hand side of the screen to see the most active items in the past few hours, day, week or month. You can also filter items by domain or by keywords in their description.</p>

<p>In smaller time periods you'll see some items that aren't anything to do with science: recently there's been stuff about Iran and a viral video for example. I'm not sure if this is a bug or a feature, or how to filter out non-science stuff is that's a requirement - suggestions welcome.</p>

<p>In the future I'd like to see the page update dynamically as new activity gets tracked but for now to refresh the page you need to reload or choose a new time period.</p>

<p><b>How it works</b></p>

<p>Streamosphere tracks ~ 4k accounts on half a dozen different social media sites including Friendfeed, Twitter and bookmarking services like Delicious. The account owners have all self-identified (sometimes implicitly) as scientists or people interested in science.</p>

<p>It uses a combination of polling, web hooks (via <a href='http://www.gnip.com/'>GNIP</a>) and <a href='http://en.wikipedia.org/wiki/Simple_Update_Protocol'>SUP</a> feeds to aggregate public updates from tracked accounts as soon after they happen as possible. Average latency is ~ 3 minutes for Friendfeed and a few seconds for Twitter. </p>

<p>Right now there's only one view on the data: by item. Items are the URIs associated with or mentioned in updates: if I tweet "I love http://lolcats.com" and you bookmark it on delicious then the streamosphere database will record a single item (lolcats.com) associated with two updates.</p>

<p>Items are currently always websites but in the future I'd like to add views for users and topics; these are non-trival because of problems with account owner disambiguation and classifying short messages respectively.</p>

<p>Owner disambiguation relies on the <a href='http://code.google.com/apis/socialgraph/'>Google Social Graph API</a>. We need to disambiguate owners because otherwise the same person could post a single link on multiple services and Streamosphere would believe it's amazingly popular. </p>

<p>Sometimes users have set up rules to automatically route updates from one service to another (e.g. they share an item on Google Reader which appears in their Friendfeed stream which gets pushed out to their Twitter account). Rules like this are the bane of Streamosphere's existence - it's non-trivial to detect this kind of thing and handle them correctly.</p>

<p>I'm collecting hashtags, tags and extracting key terms from all updates but don't quite know what to do with them yet - still need a good algorithm to detect trending topics. Links are extracted from updates but right now there's no disambiguation for papers (<a href='http://www.nodalpoint.org/2006/12/15/buggotea_redundant_links_in_connotea'>Buggotea</a> is alive and well in Streamosphere). There's a best effort attempt to resolve shortened URLs though occasionally one will slip through.</p>

<p>There's no API but if anybody has a good use for the data I'm happy to set something up using GNIP or long polling to support real time updates if necessary - just send me a use case.</p>]]></description>
         <link>http://blogs.nature.com/wp/nascent/2009/06/welcome_to_the_streamosphere.html</link>
         <guid>http://blogs.nature.com/wp/nascent/2009/06/welcome_to_the_streamosphere.html</guid>
         <category>Social software</category>
         <pubDate>Tue, 16 Jun 2009 06:48:22 -0500</pubDate>
      </item>
            <item>
         <title>Which web 2.0 services do scientists use?</title>
         <description><![CDATA[<p>Which web services are scientists actively contributing to?</p>

<p>There are ~ 1,240 Friendfeeders in science related rooms (the-life-scientists, scienceapps, science-2-0, science-online...). What percentage have listed usernames associated with the science related tools supported by Friendfeed?</p>

<p><img alt="Picture 10.png" src="http://blogs.nature.com/wp/nascent/Picture%2010.png" width="400" height="360" /></p>

<table border='1' cellspacing='0' cellpadding='4'><tr>		<th>Service</th>		<th>Count</th></tr>	<tr>		<td>citeulike</td>		<td>41</td></tr>	<tr>		<td>connotea</td>		<td>31</td>	</tr>	<tr>		<td>delicious</td>		<td>431</td></tr>	<tr>		<td>digg</td>		<td>208</td></tr>	<tr>		<td>googlereader</td>		<td>394</td>	</tr>	<tr>		<td>reddit</td>		<td>68</td> </tr>	<tr>		<td>slideshare</td>		<td>143</td></tr>	<tr>		<td>twitter</td>
		<td>675</td>	</tr>	<tr>		<td>youtube</td>		<td>341</td></tr>
</table>

<p><b>Why this dataset isn't very good...</b></p>

<p>There's a bias towards services formally supported by Friendfeed - it's easy to add feeds from supported services. Connotea and CiteULike aren't formally supported though you can add your library RSS feeds manually. Many Friendfeed users won't bother to do this.</p>

<p>People may be contributing to services (like YouTube...) for reasons that have nothing to do with science. </p>

<p>People who use Friendfeed aren't a representative sample of scientists (though they may well be a representative sample of blog friendly, web savvy scientists).</p>

<p>People sometimes remove their Twitter feeds from Friendfeed to help keep the conversations that they start there in one place.</p>

<p>I picked the set of services to look at which is why you don't see, say, Wikipedia or OpenWetWare above (some preliminary analysis suggested that the numbers would be negligible).</p>

<p><b>That said...</b></p>

<p>We can still use it to guess at broad trends.</p>

<p>Almost a third of Friendfeed scientists have delicious bookmarks. Don't discount non-academic bookmarking services as a source of paper metadata.</p>

<p>A similar number use the share functionality in Google Reader. </p>

<p>Despite rumors to the contrary not everybody is on Twitter.</p>

<p>A surprising (to me) number of people are uploading and favouriting items on Slideshare.</p>]]></description>
         <link>http://blogs.nature.com/wp/nascent/2009/05/what_web_20_do_scientists_use.html</link>
         <guid>http://blogs.nature.com/wp/nascent/2009/05/what_web_20_do_scientists_use.html</guid>
         <category>Social software</category>
         <pubDate>Fri, 29 May 2009 06:06:06 -0500</pubDate>
      </item>
            <item>
         <title>We&apos;re hiring (May/June 2009 edition)</title>
         <description><![CDATA[<p>Interested in a senior position on the growing, hard-working, award-winning Nature.com team?  If so, we have two vacancies that you should check out: <a href="http://www.nature.com/naturejobs/science/jobs/98207-Head-of-Online-Communities">Head of Online Communities</a> and <a href="http://www.nature.com/naturejobs/science/jobs/98208-Assistant-Publisher">Assistant Publisher</a>.</p>

<p>Enquiries and CVs to the contact address given in the ads, or to me via <a href="http://network.nature.com/people/timo/profile">my Nature Network page</a>.</p>]]></description>
         <link>http://blogs.nature.com/wp/nascent/2009/05/were_hiring_mayjune_2009_editi_1.html</link>
         <guid>http://blogs.nature.com/wp/nascent/2009/05/were_hiring_mayjune_2009_editi_1.html</guid>
         <category>Recruiting</category>
         <pubDate>Fri, 29 May 2009 05:01:23 -0500</pubDate>
      </item>
            <item>
         <title>Gobbledygook Interview</title>
         <description><![CDATA[<form mt:asset-id="8" class="mt-enclosure mt-enclosure-image" style="display: inline;" contenteditable="false"><a href="http://network.nature.com/people/mfenner/blog/2009/05/25/oai-pmh-interview-with-tony-hammond"><img alt="gobbledygook-interview.jpg" src="http://blogs.nature.com/wp/nascent/images/gobbledygook-interview.jpg" class="mt-image-none" style="" height="290" width="472" /></a><br />I was interviewed by <a href="http://network.nature.com/people/mfenner/profile">Martin Fenner</a>, a Clinical Fellow in Oncology at Hannover Medical School, for his column <a href="http://network.nature.com/people/mfenner/blog/2009/05/25/oai-pmh-interview-with-tony-hammond">Gobbledygook</a> on <a href="http://network.nature.com/">Nature Network</a>. The interview is mainly about our new <a href="http://www.nature.com/oai/">OAI-PMH service</a> (which I blogged on earlier <a href="http://blogs.nature.com/wp/nascent/2009/05/a_catalog_for_naturecom.html">here</a>) but also touches on the broader picture of <a href="http://www.nature.com/libarries/public_interfaces.html">Public Interfaces</a>.<br /><br /></form>]]></description>
         <link>http://blogs.nature.com/wp/nascent/2009/05/gobbledygook_interview.html</link>
         <guid>http://blogs.nature.com/wp/nascent/2009/05/gobbledygook_interview.html</guid>
         <category>Technology</category>
         <pubDate>Tue, 26 May 2009 04:03:29 -0500</pubDate>
      </item>
            <item>
         <title>Wolfram|Alpha has potential, but I can&apos;t see scientists using it for a while yet</title>
         <description><![CDATA[<p><img alt="hal9000.jpg" src="http://blogs.nature.com/wp/nascent/hal9000.jpg" width="250" height="187" style='float: left; margin: 0px 10px 10px 0px;' />Wolfram|Alpha should have launched officially by the time you read this, though it has been live since Friday evening. The execution is slick. The different result visualizations are a great idea. It's loaded up with cool widgets and APIs. Most of the time the servers don't fall over (despite some <a href='http://twitter.com/rasmus/status/1814055437'>glaring security holes</a>). To quote FriendFeeder <a href='http://friendfeed.com/iddux'>Iddo Friedberg</a> it's "a free, somewhat simple interface to Mathematica". Free for personal, non-commercial use, anyway. If you've got any questions about the GDP of Singapore then wolframalpha.com is <i>the</i> place to go.</p>

<p>I think that it's a very interesting project and that it's important to bear in mind that as the homepage says:</p>

<blockquote>Today's Wolfram|Alpha is the <b>first step</b> in an ambitious, long-term project to make all systematic knowledge immediately computable by anyone</blockquote> (emphasis mine)

<p>WA certainly has lots of potential but was anybody who used it over the weekend <i>not</i> left mildly let down? You'd have thought that we'd all have learned not to believe interweb hype after the <a href='http://bits.blogs.nytimes.com/2008/05/12/powerset-debuts-with-search-of-wikipedia/'>Powerset</a> and <a href='http://www.cuil.com'>Cuil</a> launches but even if you took all the pre-launch media guff with a liberal sprinkling of salt it was hard not to expect much from Alpha. A breathless Andrew Johnson suggested that it was "the biggest internet revolution for a generation" in <a href='http://www.independent.co.uk/life-style/gadgets-and-tech/news/an-invention-that-could-change-the-internet-for-ever-1678109.html'>The Independent</a>: "Wolfram Alpha has the potential to become one of the biggest names on the planet". </p>

<p>Personally I was disappointed because I'd been expecting the wrong thing. I'd assumed that WA was akin to <a href='http://en.wikipedia.org/wiki/Cyc'>Cyc</a>, which is a computational engine that takes a large manually curated database of "common sense" facts and relations and uses it to infer new knowledge. For example: searching photos for "someone at risk for skin cancer" might return a photo captioned "girl reclining on a beach". Reclining at the beach implies suntanning and suntanning implies a risk of skin cancer. </p>

<p>A few years back a Paul Allen venture called <a href='http://www.projecthalo.com/halotempl.asp?cid=04&newsid=4'>Project</a> <a href='http://www.researchchannel.org/prog/displayevent.aspx?rID=3558'>Halo</a> took the engine behind Cyc and taught it facts and rules from chemistry textbooks; it took a lot of time and money but the resulting system had a good go at answering college level chemistry exam questions.</p>

<p>It turns out that WA doesn't do anything like this. One of the most interesting posts about the system that I've read comes from <a href='http://www.semanticuniverse.com/blogs-i-was-positively-impressed-wolfram-alpha.html'>Doug Lenat</a> who perhaps not coincidentally is the founder of Cyc. Lenat was impressed by WA but notes that it's a different beast altogether:</p>

<blockquote>It does not have an ontology, so what it knows about, say, GDP, or population, or stock price, is no more nor less than the equations that involve that term"... [it's] able to report the number of cattle in Chicago but not (even a lower bound on) the number of mammals because it doesn't know taxonomy and reason that way</blockquote>

<p>If a connection isn't represented by a manually curated equation it isn't represented at all. Apparently the Mathematica theorem prover is currently turned off as it's too computationally expensive.</p>

<blockquote>One example of this is: "How old was Obama when Mitterrand was elected president of France?"  It can tell you demographic information about Obama, if you ask, and it can tell you information about Mitterrand (including his ruleStartDate), but doesn't make or execute the plan to calculate a person's age on a certain date given his birth date, which is what is being asked for in this query.</blockquote>

<p>It might seem harsh to criticize WA for not being what people (OK, I) wanted it to be but bear in mind that Wolfram's About and FAQ pages suggest that WA is an amazing leap forward that brings "expert level knowledge" to everybody and "implements every known model, method, and algorithm" - it's not like they were managing expectations particularly well.</p>

<p>Even if the computational inference part is lacking the system is still potentially useful as a well presented structured data almanac - but I'm not convinced that it's a winner for life sciences data.</p>

<p><b>Wolfram|Alpha for genetics questions</b></p>

<p>If I search for "DISC1" I get back information about the human gene (genetics coverage in WA is lacking, despite Stephen Wolfram using a sequence search in the video demo. Only the human genome is available). It tells me the transcripts, reference sequence, the coordinates of DISC1, protein functions and a list of nearby genes. </p>

<p>That data is useless without proper citations, though. What genome assembly release are the gene coordinates on? Are the "nearby genes" nearby on the same assembly, or do they come from a different source? Who and what predicted the transcripts, and what data did they use? Were the protein functions confirmed by work in the lab or just predicted by algorithm (if so, what's the confidence score)?</p>

<p>The "sources" link at the bottom provides a bunch of high level papers describing different genome databases but doesn't specifically match these to elements of data on the page: furthermore there's a disclaimer suggesting that actually the data could be from somewhere else entirely that isn't listed. Not much help.</p>

<p>What happens with contradictory data? The GDP of North Korea varies depending on who I ask. How does WA - or rather whoever curates that data for WA - decide which version of the answer to show? </p>

<p>I'm also worried about how current the data is. Lenat mentions that:</p>

<blockquote>In a small number of cases, he also connects via API to third party information, but mostly for realtime data such as a current stock price or current temperature.  Rather than connecting to and relying on the current or future Semantic Web, Alpha computes its answers primarily from [Wolfram's] own curated data to the extent possible; [Stephen Wolfram] sees Alpha as the home for almost all the information it needs, and will use to answer users' queries.</blockquote>

<p>I can see why you wouldn't want to rely on connections to third party data sources for anything that looks like a search engine; users expect a quick response. But in fast moving scientific fields the systematic knowledge that's useful to researchers isn't static like dates of birth or melting points - datapoints get updated, corrected and deleted all the time. Does Wolfram bulk import whole datasets regularly? If I correct an error in a record at the NCBI when will Wolfram pick it up?</p>

<p>Can a monolithic, generalized datastore run by Wolfram staff work as well as smaller specialized databases run by experts? What's the incentive for the specialized databases to release data to Wolfram in the first place, given that WA will be a commercial product? </p>

<p>(for more science tinged coverage there's lots of Wolfram|Alpha chatter on <a href='http://friendfeed.com/mndoci/ee30b0ad/some-early-wolfram-alpha-driven-thoughts'>Friendfeed</a>, a <a href='http://friendfeed.com/wolfram-alpha-in-ls'>new room</a> dedicated to collecting life sciences feedback for Wolfram and <a href='http://mndoci.com/blog/2009/05/16/some-early-wolframalpha-driven-thoughts/'>Deepak</a> has a good blog post out)<br />
</p>]]></description>
         <link>http://blogs.nature.com/wp/nascent/2009/05/wolframalpha_has_potential_but.html</link>
         <guid>http://blogs.nature.com/wp/nascent/2009/05/wolframalpha_has_potential_but.html</guid>
         <category>Technology</category>
         <pubDate>Mon, 18 May 2009 09:11:44 -0500</pubDate>
      </item>
            <item>
         <title>Public Interfaces</title>
         <description><![CDATA[<p><a href="http://www.nature.com/libraries/public_interfaces/index.html" border="0"><img alt="public-interfaces-menu.jpg" src="http://blogs.nature.com/wp/nascent/images/public-interfaces-menu.jpg" class="mt-image-left" style="margin: 0pt 20px 20px 0pt; float: left;" height="435" width="195" /></a>We just added last week a new section under our Librarian Gateway - <a href="http://www.nature.com/libraries/public_interfaces/index.html">Public Interfaces</a>.<br /><br />This is the beginning of a general documentation facility to cover all the various interfaces we are using on the nature.com platform for discovery and linking.<br /><br />The aim is to consolidate technical documentation on the sometimes bewildering array of acronyms and to provide additional references to sources of information for users. Although this is listed under the Librarian Gateway the technologies listed here will be of interest not only to digital librarians but also to other communities as well.<br /><br />We have kicked this section off with some well-established routes into nature.com: DOI and OpenURL. We have also added some of the newer means for disclosing metadata through self-describing content: META tags for HTML, and XMP for PDF. And we've also provided details on our new OAI-PMH service, as well as saying something about our long-standing stable of RSS feeds.<br /><br />Further suggestions on how to improve these pages regarding what should be added or other changes we could make would be more than welcome.<br /><br />
</p>]]></description>
         <link>http://blogs.nature.com/wp/nascent/2009/05/public_interfaces.html</link>
         <guid>http://blogs.nature.com/wp/nascent/2009/05/public_interfaces.html</guid>
         <category>Publishing</category>
         <pubDate>Mon, 18 May 2009 03:18:14 -0500</pubDate>
      </item>
            <item>
         <title>Google, Obama and God: good. H1N1, Elsevier and Merck: bad.</title>
         <description><![CDATA[<p>(see <a href="http://blogs.nature.com/wp/nascent/2009/05/sentiment_analysis_on_science.html">the previous post</a> for background on these tables) </p>

<h2>Entities associated with negative emotions</h2>
<table border='2' cellspacing='0' cellpadding='5'>
<tr><td>Term</td><td>Sum score</td><td>Blogs mentioning entity</td></tr><tr><td><a href='#?details=H1N1'>H1N1</a></td><td>-7.50</td><td>15</td></tr><tr><td><a href='#?details=Elsevier'>Elsevier</a></td><td>-4.52</td><td>5</td></tr><tr><td><a href='#?details=Merck'>Merck</a></td><td>-3.79</td><td>8</td></tr><tr><td><a href='#?details=CDC'>CDC</a></td><td>-2.34</td><td>7</td></tr><tr><td><a href='#?details=Dana'>Dana</a></td><td>-2.00</td><td>2</td></tr><tr><td><a href='#?details=Japan'>Japan</a></td><td>-2.00</td><td>2</td></tr><tr><td><a href='#?details=McCaffery'>McCaffery</a></td><td>-2.00</td><td>2</td></tr><tr><td><a href='#?details=WSJ'>WSJ</a></td><td>-2.00</td><td>2</td></tr><tr><td><a href='#?details=Sci'>Sci</a></td><td>-1.98</td><td>3</td></tr>
<tr><td><a href='#?details=Jacqui Smith'>Jacqui Smith</a></td><td>-1.54</td><td>2</td></tr><tr><td><a href='#?details=James Corbett'>James Corbett</a></td><td>-1.51</td><td>2</td></tr><tr><td><a href='#?details=Israel'>Israel</a></td><td>-1.50</td><td>2</td></tr><tr><td><a href='#?details=iPhone'>iPhone</a></td><td>-1.33</td><td>2</td></tr><tr><td><a href='#?details=Alzheimer'>Alzheimer</a></td><td>-1.19</td><td>2</td></tr><tr><td><a href='#?details=HIV'>HIV</a></td><td>-1.12</td><td>2</td></tr></table>

<p><a href='http://en.wikipedia.org/wiki/Influenza_A_virus_subtype_H1N1'>H1N1</a> is the subtype of the "swine flu" influenza virus. </p>

<p>Elsevier and Merck published <a href='http://www.earlham.edu/~peters/fos/2009/05/elsevier-and-merck-published-fake.html'>fake journals</a> in Australia, "Dana" is <a href='http://blogs.discovermagazine.com/badastronomy/2009/05/06/giving-vaccines-a-shot-in-the-arm/'>Dana McCafferty</a>, who tragically died of whooping cough because of the low vaccination rates in New South Wales.</p>

<p>"WSJ" is the Wall Street Journal which published a flawed explanation of quantum entanglement that got <a href='http://scienceblogs.com/principles/2009/05/the_wall_street_journal_gets_e.php'>picked up by physics bloggers</a>. Jacqui Smith is the UK's home secretary, currently <a href='http://blogs.nature.com/news/thegreatbeyond/2009/05/uk_will_retain_dna_of_innocent.html'>under fire</a> for (amongst other things) keeping the DNA of innocent people on file for six years after their arrest.</p>

<p>James Corbett is the teacher in California who told his students that creationism was "superstitious nonsense" - he was <a href='http://www.wired.com/wiredscience/2009/05/creationism-dig-violated-students-rights/'>later sued by a student</a> who believed that their first amendment rights had been violated.</p>

<p>The iPhone is in there because of <a href='http://arstechnica.com/apple/news/2009/05/developers-worried-by-apple-change-to-app-store-review-policy.ars'>changes to App Store policies</a> that may impact smaller developers.</p>

<p>Positive emotions after the break.</p>]]></description>
         <link>http://blogs.nature.com/wp/nascent/2009/05/google_obama_and_god_good_h1n1.html</link>
         <guid>http://blogs.nature.com/wp/nascent/2009/05/google_obama_and_god_good_h1n1.html</guid>
         <category>Blogging</category>
         <pubDate>Tue, 12 May 2009 11:00:08 -0500</pubDate>
      </item>
            <item>
         <title>Sentiment analysis on science blogs</title>
         <description><![CDATA[<p><img alt="FeelingsLP.jpg" src="http://blogs.nature.com/wp/nascent/FeelingsLP.jpg" width="200" height="200" style='margin-right: 10px;' align="left" /> We've been thinking about new features for <a href='http://blogs.nature.com'>Nature.com Blogs</a> recently, after spending a lot of time on the back end doing boring yet vital things like enabling trackbacks for journal articles on nature.com.</p>

<p>One particularly cool new feature (potentially) is <a href='http://en.wikipedia.org/wiki/Sentiment_analysis'>sentiment analysis</a>. Nature.com Blogs already performs entity extraction, pulling out all of the names, places and things mentioned in each blog post. We use this to cluster posts about the same topic together in the <a href='http://blogs.nature.com/stories'>"stories" section</a>. </p>

<p>Sentiment analysis tries to give emotional context to entities. For example, if I blog:</p>

<p>"I love Biology. It rules, Physics drools"</p>

<p>and Nature.com Blog processes my post then it might store the following metadata alongside it:</p>

<pre>
&lt;entities&gt;
   &lt;entity name=&quot;Biology&quot; emotion=&quot;Positive&quot; score=&quot;0.6&quot; /&gt; 
   &lt;entity name=&quot;Physics&quot; emotion=&quot;Negative&quot; score=&quot;0.3&quot; /&gt;
&lt;/entities&gt;
</pre>

<p>... here "Biology" and "Physics" are the entities; each has an emotion associated with it in the text. There are more positive emotions associated with "biology" than there are negative emotions associated with "physics" - that's the score part.</p>

<p>Sentiment analysis is still a young field and frankly it gets things wrong a lot of the time. It's also difficult to find a system that can do both entity extraction <i>and</i> sentiment analysis properly - to build a proof of concept I had to use a combination of <a href='http://developer.yahoo.com/search/content/V1/termExtraction.html'>Yahoo! Term Extraction</a> and <a href='http://www.openamplify.com'>OpenAmplify</a>. </p>

<p>Having said that, I think results over large datasets are promising. I've run a couple of thousand posts through the proof of concept system and compiled lists of the entities most strongly associated with positive and negative emotions in science blogs this week (published in the next couple of posts). Is this information useful? Interesting? Fun? Misleading? Any suggestions for how it might be presented are welcome!</p>]]></description>
         <link>http://blogs.nature.com/wp/nascent/2009/05/sentiment_analysis_on_science.html</link>
         <guid>http://blogs.nature.com/wp/nascent/2009/05/sentiment_analysis_on_science.html</guid>
         <category>Blogging</category>
         <pubDate>Tue, 12 May 2009 09:46:54 -0500</pubDate>
      </item>
            <item>
         <title>A Catalog for Nature.com</title>
         <description><![CDATA[<p>We're pleased to announce that Nature.com now has an <a href="http://www.nature.com/oai ">OAI-PMH interface</a>. This service implements the <a href="http://www.openarchives.org/OAI/openarchivesprotocol.html">Protocol for Metadata Harvesting</a> from the <a href="http://openarchives.org/">Open Archives Initiative</a>. This means that the Nature.com platform can now be queried by item, by title or by date range and that structured data records will be returned. All articles from over 150 titles can be accessed and dating back to 1869 for <a href="http://www.nature.com/nature">Nature</a> magazine.</p>

<p><img alt="oai-blogs-pam.png" src="http://blogs.nature.com/wp/nascent/images/oai-blogs-pam.png" width="613" height="403" /></p>

<p>Queries are made with a simple request URL (via HTTP GET, although POST is also supported) according to the OAI-PMH protocol. Result sets are in XML using formats defined by W3C XML Schema: either the OAI-PMH base format for metadata exchange, i.e. <a href="http://dublincore.org/">Dublin Core</a>, or an enhanced bibliographic metadata format, i.e. <a href="http://www.idealliance.org/industry_resources/intelligent_content_informed_workflow/about_the_prism_aggregator_message">PRISM Aggregator Message</a> (PAM) format. </p>

<p>In PAM format the results are very similar to the standard article descriptions published in our RSS feeds, or embedded directly within content entities (either as META tags in HTML, or as XMP packets in PDF). Further details on our use of PAM are given in a <a href="http://www.crossref.org/CrossTech/2009/05/post_2.html">related post</a> on CrossTech.</p>

<p>The Nature.com OAI-PMH service has two access points:<dl><dt><i>User interface:</i></dt><dd><a href="http://www.nature.com/oai ">http://www.nature.com/oai</a></dd><br />
<dt><i>Service endpoint:</i></dt><dd><a href="http://www.nature.com/oai/request">http://www.nature.com/oai/request</a></dd></dl></p>

<p>Special credits go to Jeff Young of OCLC for creating the excellent open-source <a href="http://www.oclc.org/research/software/oai/cat.htm">OAICat</a> software package, and to Nawab Siddiqui of Nature Publishing Group for doing all the heavy lifting in implementing this service for Nature.com.<br />
</p>]]></description>
         <link>http://blogs.nature.com/wp/nascent/2009/05/a_catalog_for_naturecom.html</link>
         <guid>http://blogs.nature.com/wp/nascent/2009/05/a_catalog_for_naturecom.html</guid>
         <category>Technology</category>
         <pubDate>Fri, 08 May 2009 04:28:49 -0500</pubDate>
      </item>
            <item>
         <title>Walls come tumbling down</title>
         <description><![CDATA[<p>A short article of mine will be coming out any day in <i><a href="http://www.alpsp.org/ngen_public/default.asp?ID=198">Learned Publishing</a></i>.  Below is my original draft (so please excuse any typos).  I'll post a link to the final article once it appears online.</p>

<p><b>Update 30/3/09:</b> <a href="http://dx.doi.org/10.1087/2009210">Here's a link to the official published version.</a></p>]]></description>
         <link>http://blogs.nature.com/wp/nascent/2009/03/walls_come_tumbling_down_1.html</link>
         <guid>http://blogs.nature.com/wp/nascent/2009/03/walls_come_tumbling_down_1.html</guid>
         <category>Publishing</category>
         <pubDate>Tue, 17 Mar 2009 05:39:38 -0500</pubDate>
      </item>
            <item>
         <title>Nature Darwin Debate 2: What Price Biodiversity?</title>
         <description><![CDATA[<p>Quick plug for a very interesting Second Life event coming up on Monday: the second in the Nature Darwin Debate series. </p>

<p>Panelists James Lovelock, Michael Meacher MP and Sir Crispin Tickell will join chair Ehsan Masood for a live debate entitled <strong>What Price Biodiversity?</strong> <a href="http://network.nature.com/people/joannascott/blog/2009/02/04/event-the-nature-darwin-debate-are-we-still-evolving">As before</a>, the debate will be held at Kings Place, London (just behind the Nature building) and will be live streamed in Second Life for anyone who can't attend in person. </p>

<p>The last debate was excellent, both in RL and SL and this one looks like it will be just as good. It is <strong>Monday, 7pm GMT/ 12pm Pacific time</strong> (shorter time diff than normal, our clocks go forward this weekend; Europe not until the end of the month) and all are very welcome! For all the details, <a href="http://network.nature.com/people/joannascott/blog/2009/03/03/the-nature-darwin-debate-2-what-price-biodiversity">see our Second Life blog.</a></p>]]></description>
         <link>http://blogs.nature.com/wp/nascent/2009/03/nature_darwin_debate_2_what_pr.html</link>
         <guid>http://blogs.nature.com/wp/nascent/2009/03/nature_darwin_debate_2_what_pr.html</guid>
         <category>Virtual reality</category>
         <pubDate>Sun, 08 Mar 2009 00:27:07 -0500</pubDate>
      </item>
            <item>
         <title>Scientists, Unconferences and Culture Clash</title>
         <description><![CDATA[<p>Someone I know recently emailed me with the following question</p>

<blockquote>I'm co-organizing a biomedical/healthcare-themed unconference to be held later this year, and culture clash has come up as an issue. Were you involved in any of the SciFoo events? Can you offer any advice for how to approach this? Any hard lessons learned?</blockquote>.

<p>Sadly I've not been to a SciFoo event yet, but I have been to plenty of scientific conference and one or two geek driven unconferences. From what I hear there are indeed some differences that emerge when unconferenceing with scientists compared to unconferencing with Geeks. For a start an important part of a scientists career development revolves around making well argued presentations of their work to their peers in the crucible of the conference. Add in the lecturing role and you have an individual who is very used to standing up in a room and presenting the complete story. </p>

<p>One of the goals of an unconference is perhaps to tease apart the complete and finished story, to look at the spaces in between and to be open to blue sky thinking. This may lead to a slight mismatch in expectation about the kind of conversations that the organizers might hope to happen at an unconference, compared to the mode of communication that a scientific group brings with them to the meeting. </p>

<p>I know that the SciFoo invite is very specific about this, and through application of the <a href="http://www.chathamhouse.org.uk/about/chathamhouserule/">Chatham House Rule</a> an environment of open discussion is fostered. </p>

<p>I'm sure many of the people out there reading this blog have some input into the question though, so I thought I would post here and see if any of you enlightened science geeks might have some advice for my friend?</p>]]></description>
         <link>http://blogs.nature.com/wp/nascent/2009/02/scientists_unconferences_and_c.html</link>
         <guid>http://blogs.nature.com/wp/nascent/2009/02/scientists_unconferences_and_c.html</guid>
         <category></category>
         <pubDate>Thu, 26 Feb 2009 06:52:52 -0500</pubDate>
      </item>
            <item>
         <title>Commenting on scientific articles (PLoS edition)</title>
         <description><![CDATA[<p>I've been taking a look at the comments left on PLoS ONE from inception until August '08 (data courtesy <a href='http://www.scienceblogs.com/clock/2008/08/postpublication_peerreview_in.php'>Bora</a>). Last week's <a href='http://blogs.nature.com/wp/nascent/2009/02/user_generated_content_survey.html'>crowdsourcing</a> paid off and all of the categorization work gone done really quickly - thank you if you participated! Pedro Beltrao and Lindsay Morgan were the random reward winners and will be receiving some magnificent Nature branded marketing crapola shortly.</p>

<p><img alt="plos_breakdown.png" src="http://blogs.nature.com/wp/nascent/plos_breakdown.png" width="400" height="386" /></p>

<h2>Summary</h2>
<ul>
<li>18% of PLoS ONE papers have reader or author submitted comments</li>
<li>39% if you count comments added by editors (usually reviewer's comments)</li>
<li>Very few comments are of the 'omg, wow' variety (as opposed to comments on blogs - this one excepted, obviously)</li>
<li>authors are responsible for a high percentage (~ 40%) of user submitted comments</li>
<li>17% of user submitted comments contain interpretation or journal club style precis</li>
<li>13% of user submitted comments are direct criticism</li>
<li>11% are direct questions or requests for clarification</li>
<li>These %s are similar to <a href='http://blogs.nature.com/wp/nascent/2008/07/who_leaves_comments_on_scienti_1.html'>what we saw in the BMC dataset</a></li>
<li>The trackbacks protocol is inadequate for picking up blog chatter about papers</li>
</ul>

<p>(more below the fold)</p>]]></description>
         <link>http://blogs.nature.com/wp/nascent/2009/02/commenting_on_scientific_artic.html</link>
         <guid>http://blogs.nature.com/wp/nascent/2009/02/commenting_on_scientific_artic.html</guid>
         <category>Publishing</category>
         <pubDate>Wed, 11 Feb 2009 11:45:23 -0500</pubDate>
      </item>
            <item>
         <title>Crowdsourcing comment categorization, pt 2</title>
         <description><![CDATA[<div style='text-align: center;'>
<img alt="swag1.jpg" src="http://blogs.nature.com/wp/nascent/swag1.jpg" width="225" height="300" /></div>

<p>As a reminder, if you haven't already please do check out <a href='http://ploscomments.appspot.com'>ploscomments.appspot.com</a> and categorize some comments. Thanks to <a href='http://twitter.com/grace_baynes'>Grace</a> we've gathered an impressive collection of swag for the one or two lucky contributors selected at random once the experiment is finished (and we remove bogus annotations). </p>

<p>Check it: a USB laser foutain pen from <a href='http://www.nature.com/nmat/index.html'>Materials</a>, <a href='http://www.facebook.com/home.php?#/group.php?gid=8274710679'>It's in my Nature.com</a> tshirts, <a href='http://www.nature.com/secondnature/archive_pages/2009_02_09.html'>Darwin anniversary</a> Post-its and pens... all for clicking on some buttons.</p>

<p><b><a href='http://ploscomments.appspot.com'>http://ploscomments.appspot.com</a></b></p>]]></description>
         <link>http://blogs.nature.com/wp/nascent/2009/02/crowdsourcing_comment_categori.html</link>
         <guid>http://blogs.nature.com/wp/nascent/2009/02/crowdsourcing_comment_categori.html</guid>
         <category>Social software</category>
         <pubDate>Thu, 05 Feb 2009 09:21:04 -0500</pubDate>
      </item>
      
   </channel>
</rss>
