We were amazingly, stupendously lucky enough to have both David Weinberger and Jimmy Wales come to visit yesterday. After lunch they each gave a talk to assembled staff from Nature, and Macmillan and Holtzbrinck (our parent companies). My notes are below. Sadly there was no time to record the discussion that continued in the pub afterwards amid pints of warm beer and plates of deep-fried bar food.
David started with an overview of del.icio.us, which, based on a show of hands, most of the audience knew about. People can tag pages and find other people with similar interests. They can also track postings via RSS.
Technorati also aggregates information from multiple sites using tags.
There are 3 layers to this kind of information organisation:
- Meaning (tags)
- Web pages (links)
- People (users)
Traditionally we have held three beliefs about organising information (my paraphrasing, so may not capture David’s intended meaning exactly):
- The world is constructed in a certain unambiguous way.
- We can categorise things neatly.
- We need experts to help us do this.
As a result, we’ve been sorting (abstract) knowledge the way that we’ve been sorting our (physical) laundry. For example, in the Bettman Archive, which is composed of 11 million photos:
- The photos themselves are held in a particular order
- Using a card catalogue, the information is organised in three other ways
But when everything is bits you can put things in as many categories as you like. E.g., it’s in Amazon‘s interest to put the cameras they sell into as many applicable categories as possible in order to make them easier for people to find. In clothes stores, everything is arranged essential at random and you can’t create a pile of everything relevant to you (i.e., everything that might fit you) as a starting point to choosing something. Online you can do exactly this.
Another key change is that the owner of the object no longer owns the metadata, users now do.
So now knowledge looks less like a tree and more like a pile of leaves. Instead of filtering, we include everything and use the metadata to sort it. In other words, we postpone classification until the user asks for information and then classify according to their needs.
In the case of a book, you might want to know:
- The author
- Everything he wrote
- Everything written about him
- The contents of the book
- Everything on the web
Note the fourth item in that list: the distinction between data and metadata no longer exists (if, indeed, it ever did).
There is a Wikipedia motto: “Wiki is not paper.” Paper is a scarce physical commodity. Britannica has 32 volumes and 65,000 topics. These are limited by the physical world. Wikipedia is limited only by what people are interested in.
Comparison of Doc Searles’ blog links to a lot of other websites, the NY Times doesn’t (except for ads). [This really resonates with me. And we’re just as bad.] Each link to an external site is a tiny act of generosity.
Multi-subjectivity: there are many different points of views, and now the web allows them to talk with one another.
In this age of abundant information, we’re not looking for the best, only good enough, sources. That’s why Wikipedia and Google are so popular.
What? What’s interesting
How? By talking
Where? In global conversation
Why? Because we care
The Wikimedia Foundation is a not-for-profit organisation set up one-and-a-half years ago to provide a free encyclopedia to everyone in the world in their own language. The core principle is free knowledge:
- Decreases individual sense of ownership
- Enhances the popularity of Wikipedia
- Attribution [I think]
The ‘neutral point of view’ policy is also key. This provides a social concept of cooperation. Wikipedia does not take a stand on controversial issues, just tries to reflect them in its output. This seems to work well in practice.
Wikipedia uses only free (as in speech) software. This comes from its geek origins but also important to ensure the continuing freedom of the content.
Local chapters (Jimbo listed several throughout Europe) are helping to develop local communities.
English Wikipedia is approaching 1 billion words — larger than Britannica and Encarta combined. German is second (>300k articles). There are 2.5 million articles across 200 languages, though some languages are not really active. There are 75 languages with at least 1,000 articles.
- Wiktionary: Spun off from Wikipedia entries that talked about words instead of topics.
- Wikibooks: Recently got a lot of press attention, but has been going for ~2 years. Aims to drive down the cost of production to allow textbook access in the developing world.
- Wikiquote: Arose from social pressure in the community — some people like to collect quotes.
- Wikimedia Commons: Makes it easier to find photos that people can reuse. >200k objects, mainly photos but also some sound and video.
- Wikinews: Wikipedia does good work on current events. Also, lots of information that can help in providing background (e.g., an article on every Tube stop in London).
~2.5 billion page views a month. Now much higher traffic than About.com (which was sold earlier this year to the NY Times for about $400m [I think]). It’s a similar concept but their site is poor because it contains lots of distracting advertising.
Wikipedia uses over 120 servers in 4 data centres. These are managed by volunteers — there’s always someone available on IRC. Could not afford this level of support if Wikipedia was a commercial or ad-supported site.
Wikipedia’s network of Squid caches and servers is becoming very distributed. Jimbo usually doesn’t know exactly what’s going on(!). The site rarely goes down, but when it does contributions go up(!).
Only 3 employees, everyone else (including Jimbo) is a volunteer.
Wikipedia takes a fairly traditional approach towards writing and editing. It’s not a purely emergent phenomenon arising from individuals who don’t see the big picture. It’s really a community model, not an emergent model.
In the community model, reputation is a natural product of human interactions. (In the emergent model, you need to create a reputation system.)
>50% of all edits to English Wikipedia have been made by 0.7% of all users (615 people). The most active 2% (1,746 people) have made 72.8% of all edits. Jimbo knows most of these people personally. So the core community is relatively small.
How does Wikipedia ensure quality? In effect it’s real-time peer review. All edits go onto the ‘recent changes’ page, which is monitored by 100s of people. Users can create ‘watch lists’ to monitor areas of interest. The ‘diff’ function automatically shows changes made, so you don’t need to reread the whole article. Such easy monitoring of the website is very important. Editorially, this is an accountability model, not a gateway model.
Organisation is by the community. No a priori rules and as few assumptions as possible built into the software.
Pages often have to be deleted. This is determined by a ‘votes for deletion’ process, but it’s not a simple voting process, more of a discussion — and the minority view can prevail. This is why it can’t be replaced with an automated system. Contrary to what some developers seem to think, the idea of ‘social software’ is not to replace the ‘social’ with the ‘software’. Also important not to assume how the users will want to work — keep the software flexible. Also, if rules are required then try to enforce them socially, not by diktat.
Wikipedia governance works by a confusing by workable mix of:
- Consensus (minority views count)
- Democracy (but not usually simple voting)
- Aristocracy (those who have built respect in the community have greater sway)
- Monarchy (‘benevolent dictator’ — someone to make a decision when it needs to be made)
However, Jimbo does not see himself as the dictator of Wikipedia — this role for all of human knowledge is too big! So slowly introducing more institutional mechanisms — just as the role of the British monarchy is declining with time.
Need to remain pragmatic. There was a case in which some neo-Nazis discovered Wikipedia’s ‘vote pages for deletion’ system. They tried to swamp the process but only 18 of them showed up, so it didn’t work. But if there was a danger of them taking over then Jimbo could have banned them. If, for whatever reason, the ‘votes for deletion’ system stops working then a new approach can always replace it.