Nascent

BarCamb Cambridge

Barcamp is an ad-hoc gathering born from the desire for people to share and learn in an open environment. The name of the event is an homage to Foo Camp, but unlike it’s bigger brother Barcamps are open to anyone who wants to go along. If you go to the Barcamp homepage you’ll see a list of upcoming meetings being held across the globe. It’s pretty clear that the desire for people to get together to discuss cool ideas is pretty strong.

With all that in mind last Friday saw a Barcamp take place in my own backyard. Matt Wood from the Sanger Institute organised Barcamb; BarCamp Cambridge. I, and about 30 other people gathered on Friday morning at the Sanger institute. Coffee and cookies on hand we gathered around the whiteboard for the Foo format of scrabbling to fill in slots on the white board with talk ideas. Two rooms were on hand, but in the end there was just enough time in the day to fit all of the sessions into one of the rooms, so that everyone could hear all of the discussions. I took notes on most of the discussions, but I might have missed one or two, so omission here is no indicator of lack of quality. If you want to get the full skinny on the meeting I’d advise going along to the next one you can make it to. Anyway, a few days on and here is what I can reconstruct from the very rewarding day:

Matt talks about microformats and science

The rough theme of the meetings was tools for science, however there was a nice diversity in the topics presented. Matt opened with a discussion of the semantic web for science. The gist of his argument is that there are two types of semantic web. There is the Semantic-Web with capitals that comes with all of the specifications in place, full RDF and support for all of the machinery that goes with this. For the sort to medium term he identified two significant problems with this. The first is that most researchers don’t have the inclination to learn all of the machinery to work with outputting data in this format it is miraculous enough to get them to work with well formatted HTML, but more on that later. The second issue is that getting funding in science to do Semantic-Web related projects is hard. Funding bodies at the moment, outside of computer science, just don’t want to go there. His solution is to use the lowercase semantic-web. This means adding minimal amounts of micro-formatting to HTML documents, and creating a marketplace of markup. If your system is good, it could gain de facto acceptance in a me-too way. Put it out there because it is easy to put it out there, and if it is good it will be used. (Bioformats.org from Matt is an attempt to do just that with microformats for biology). Standardisation can come later, or not. In the Q & A a danger to this approach was pointed out where the domain experts may loose control of the translation of the de facto standard into a standard ontology if when that process happens they leave it to the computer science people. Later in the day this issue was returned to.

Laura James, Alertme and the network of things

Laura James from AlertMe.com spoke next. The company she is working for are creating a consumer product which attempts to open up the internet of things. By provide a set of relatively cheap sensor arrays connected to a hub that talks to a server the owner can set up whatever behaviour they can think of for the network. Sensors include motion sensors, accelerometers, light detectors and sensors that can switch. The technology is based on a very low powered wireless network protocol called Zigbee. The hub runs on linux with Python on top of it. The basic chip is small.

The first app they are trying to sell is a home security application. There was some interesting discussion about how one could match a component in the system to it’s abstract representation in the web interface. The question was “how can I tell which one is the hallway monitor?” to which someone replied, “it’s the one in the hallway”. “Ahh”, said the questioner, “you have obviously not lived in a house with children who like to move things around”. Luckily for such bedevilled patrons one will be able to write on the sensors. She said that they are interested in hearing novel ideas for how this kit could be used. Someone suggested that it could possibly be introduced to the bench and help with automating tasks, or with auto-data capture an enabling the open-notebook approach to science.

Simon Ford and bootstapping hardware hacking

Simon Ford then talked about Rapid Prototyping with Microcontrollers. I’ve never done any embedded programming, but Simon has created a platform on an embedded device that when plugged into a computer appears on the computer as a flash drive. Pulling executable programs on to this device allows them to run. He is also working on a web interface to the compiler for the processor. Right there in front of us he made the leds on the device blink. This is the “hello world” of embedded programming and it usually takes three days to get working. He made a very salient point; reducing the chain of complexity in a process increases the confidence in the result. When you go through a lot of steps then when you do get a result (such as a flashing light on a chip), you have an intrinsic skepticism. Every extra step is a potential barrier. By reducing the ‘hello world’ of hardware from a three day slog to two minutes you create a system that people can have a lot more confidence in. He wants to lower the barriers so that software people can extend their ideas to hardware, and hardware people can bridge the gap back to software. With Simon’s light weight system developing programs for embedded systems could become something that can be taken into schools or other environments and allow people with little experience of embedded programming to begin to explore this space.

Akex Griekspoor and document workflow

Alex Griekspoor gave a presentation on Papers.app, a program that he created with Tom Groothuis. It is designed to be the iTiunes for pdf paper management and runs on a mac. What drove Alex and Tom to do this is their belief in the place for the dedicated desktop program in the scientific work flow. I’d seen Alex talk about this before at Nature so my notes from the BarCamp meeting are a bit scarce, however they have won three apple design awards for their work.

Michael Dales and Quentin Stafford-Fraser, why two, no four, no eight screens are better than one

Michael Dales and Quentin Stafford-Fraser presented the work they are doing with Ndiyo. The Guardian has a nice article about what this non-profit company is doing. In a nutshell, multiple monitors, one computer. With linux as the OS the idea to be able to provide multi-seat internet cafes to the developing world in a box. To achieve this they use an on-chip video compressor that can then send the signal across either USB or ethernet. They demoed a working version of their system. It was very impressive.

The benefits are legion, not least of which is that this solution can provide up to a factor of 20 power saving over the traditional model of having a PC for every screen, per screen.

Jeff Viet and open source CMS

Jeff Viet demoed Drupal as a content management system. During his presentation he created a new content type within Drupal. After experimenting with lots of CMS systems his conclusion was that “OS CMS systems beat the crap out of the free ones for what you get for your money, including support. If you have a budget then you can get in touch with the authors of the OS systems easily”

Peter Corbett and conversations with computers

Peter Corbett talked about teaching computers to understand text. He described himself as someone who had a desk at the computer lab and at the chemistry lab. Now he works on computational linguistic chemistry with the aim of auto-detecting language in chemistry papers to try to recognize chemicals and then auto-markup these papers. The idea is to supplement the mark-up from publishers. His system can also draw the chemical and annotations and overlay them on the paper Some problems that they encounter is are that there can be new names in papers, compact names, include extra hyphens, his program can deal with these kinds of things to a certain extent.

You can go from plain text to something like a connection layout using an information rich markup The RSC is using this software along with human-cleanup to create markup of chemistry papers. The hope is that you can then do semantic search over papers. One of the gems from his talk is he described a small natural language processing trick. Imagine we were interested in opiates, we could just ask google “opiates” but if you take into account the structure of language and you search for phrases like “opiates such as” you will get a much better result in your search. there are many patterns like this, and I think he said that they are known as Hirst patterns, though I may have misheard this. He did a pass over abstracts on pubmed for these kind of patterns to make a network of relationships. It turns out that you can do reasoning on structure as well as processes using this analysis. A few bits of wisdom from his work was that most of the information has come from biochemists rather than chemists, more biologists are into open science, and open databases. Chemisty has been mostly captured by commercial interests, and it is hard to get free chemistry data. It is important to define what you are looking for so that you can evaluate how well the software has done. and it’s important to remember that in a lot of text there is a difference between what you think the world looks like and how it is described in the literature.

lunch mmmmm

Over lunch I chatted with James Smith, who is head of the internet team for Ensembl. One of the issues that they are facing is the cost of storing and retrieving very large data sets. This is a growing issue in science in general, so I was pretty interested in hearing how they currently deal with it. At the moment the machines that sequence the DNA are so fast that is is almost getting to the point that retrieving stored data from a sequenced gene is slower than re-sequencing the gene. Almost but not quite.

James Smith and a large ensemble of data

He gave a talk about Ensembl, which grew out of the human genome project about 8 years ago to prevent the commercialization of genomic data. The idea was to have an open source human genome. The ensemble projects takes the raw data from the genes and adds other data to this, such as reference data from other experiments. Now the project has two main products, the ensembl code and the the data produced by the project. They have data on 41 genomes, and the code is also used in contexts apart from this project. There are probably about 100 installed copies world wide. It’s 1.5 million lines of perl code and there is a public mysql interface to the data. There is also an archive system to see old data and everything is in CVS. They have 35 species in ensemble, human, mouse rat and zebra fish were amongst the first to be sequenced, and then there are random mammals for example hedgehogs which to my mind is a good example of a random mammal. We learned that the claw of the platypus is poisonous. He described the data load they are running (very high, one of the biggest quesry sets in the world), the hardware that is running it (very big, and lots of it). He said that they are moving over to AJAX because people don’t realize that items in the interface are buttons or forms. In spite of data-mining activities, most of the interactions with the site are human interaction

James Graham and more HTML

James Graham gave a tour of HTML5. He is just a member of the mailing list, but stressed that this list is still open, and anyone who has an opinion on the way it should work is still free to sign up. For me the most interesting nugget to come out of this talk is that HTML4 is underspecified. No one knows how to parse it, because there is no specification. All of the voodoo of parsing it is tied up inside current browser technology, and this is why there are so many problems with cross-browser compatibility. The main advantage that HTML should give us is the ability to know how to parse HTML5 and dirty HTML4. Another reason why this effort is important is that there is a lot of information tied up in HTML that is never going to make it to XML, and if we don’t want to loose access to this information in the future we need the ability to abstract the understanding of these documents away from specific browsers. Of interest to publishers is that a “caption” tag is being defined that can associate a description of an image to an image. There are a number of new elements and the decision on which elements to be included was based on a search for the most popular class names in html files in the google codex.

Ian Mulvany and building on a social tool for science

I think it was at this point that I gave my talk. I’ve uploaded the slides to Slideshare. and you can see the raw notes that I took during some of the talks on my blog. I’ve also tagged a few pictures on flickr with barcamb.

Tom Morris and why thinking doesn’t need to be difficult

The last talk of the day was by Tom Morris, a shotgun ride through the semantic web for hackers. Allthough this talk was at the very end of the day it was excellent and capped a great day. It went by pretty quickly so I only caught a fraction of the wisdom that Tom was delivering. He started by asking what’s cool about microformats? I’m not sure we got to an answer to that question, but it was clear that Tom had a small gripe with the current process that the microformats community has taken for the adoption of new microformats. They ask what problem does the microformat solve? Tom suggested that perhaps there is no problem at all. He asked us what what problem did blogging solve? or Twitter for christ’s sake? (that got an appreciable laugh in the audience, which I think I twittered). His point is that often no one knows what use something will have until it is popular. For example yahoo pipes is not practical yet, it is a user experience nightmare and it doesn’t have a clearly defined purpose, but it is useful becasuse there is a lot of data via rss out there and it gives us room to play. The microformats route is not compatible with this. We should just be puting up data and letting people play. If if doesn’t get used then Darwin will clear up the mess. The amount of interesting data is greater than the possible number of microformats. At this point he went through an example of creating your own microformat and providing a method via GRDDL of allowing anyone to use your data. If your data maps to an RDF schema then you can use it today and if not, but it is based on URI’s you can make your own schema. For a free market everyone needs to take part and the semantic web is only scary if you make it scary. There is an active community of people willing to help at GetSemantic.com.

And that was it, and the day was over. It was a lot of fun, and a lot of interesting stuff was presented. My apologies to anything I’ve missed or mis-represented.

Comments

Comments are closed.