Nascent

Nature.com adds metadata

Nature.com has now added metadata (using HTML meta tags) into all its newly published pages including full text, abstracts and landing pages (all bar four titles which are currently being worked on). Metadata coverage extends back through the Nature archives (and depth of coverage varies depending on title). This conforms to the W3C’s Guideline 13.2 in the Web Content Accessibility Guidelines 1.0 which exhorts content publishers to “provide metadata to add semantic information to pages and sites”.

Metadata is provided in both DC and PRISM formats as well as in a Google bespoke metadata format. This generally follows the DCMI recommendation “Expressing Dublin Core metadata using HTML/XHTML meta and link elements”, and the earlier RFC 2731 “Encoding Dublin Core Metadata in HTML.

The actual HTML metadata sets from an example landing page are presented below.


If you view the HTML page source you should see something like the text below. (Note that you may have to scroll past whitespace which is emitted by the HTML template generator.)


<link title="schema(DC)" rel="schema.dc" href="https://purl.org/dc/elements/1.1/" />

<meta name="dc.publisher" content="Nature Publishing Group" />

<meta name="dc.language" content="en" />

<meta name="dc.rights" content="&#169; 2008 Nature Publishing Group" />

<meta name="dc.title" content="Crystal structure of squid rhodopsin" />

<meta name="dc.creator" content="Midori Murakami" />

<meta name="dc.creator" content="Tsutomu Kouyama" />

<meta name="dc.identifier" content="doi:10.1038/nature06925" />

<link title="schema(PRISM)" rel="schema.prism" href="https://prismstandard.org/namespaces/1.2/basic/" />

<meta name="prism.copyright" content="&#169; 2008 Nature Publishing Group" />

<meta name="prism.rightsAgent" content="permissions@nature.com" />

<meta name="prism.publicationName" content="Nature" />

<meta name="prism.issn" content="0028-0836" />

<meta name="prism.eIssn" content="1476-4687" />

<meta name="prism.volume" content="453" />

<meta name="prism.number" content="7193" />

<meta name="prism.startingPage" content="363" />

<meta name="prism.endingPage" content="367" />

<meta name="citation_journal_title" content="Nature" />

<meta name="citation_publisher" content="Nature Publishing Group" />

<meta name="citation_authors" content="Midori Murakami, Tsutomu Kouyama" />

<meta name="citation_title" content="Crystal structure of squid rhodopsin" />

<meta name="citation_volume" content="453" />

<meta name="citation_issue" content="7193" />

<meta name="citation_firstpage" content="363" />

<meta name="citation_doi" content="doi:10.1038/nature06925" />

While it is not expected that search engines will index these terms directly and that no direct SEO (search engine optimization) is intended, we think there is enough value for applications to make use of these terms. The terms are reasonably accessible to simple scripts, etc. Note that even in RFC 2731 (published in 1999) there is a Perl script listed in Section 9 which allows the metadata name/value pairs to be easily pulled out. Running this over the example page yields the following output:


@(urc;

@|MISSING ELEMENT NAME; text/css

@|MISSING ELEMENT NAME; text/html; charset=iso-8859-1

@|robots; noarchive

@|keywords; Nature, science, science news, biology, physics, genetics, astronomy, astrophysics, quantum physics, evolution, evolutionary biology, geophysics, climate change, earth science, materials science, interdisciplinary science, science policy, medicine, systems biology, genomics, transcriptomics, palaeobiology, ecology, molecular biology, cancer, immunology, pharmacology, development, developmental biology, structural biology, biochemistry, bioinformatics, computational biology, nanotechnology, proteomics, metabolomics, biotechnology, drug discovery, environmental science, life, marine biology, medical research, neuroscience, neurobiology, functional genomics, molecular interactions, RNA, DNA, cell cycle, signal transduction, cell signalling.

@|description; Nature is the international weekly journal of science: a magazine style journal that publishes full-length research papers in all disciplines of science, as well as News and Views, reviews, news, features, commentaries, web focuses and more, covering all branches of science and how science impacts upon all aspects of society and life.

@|dc.publisher; Nature Publishing Group

@|dc.language; en

@|dc.rights; #169; 2008 Nature Publishing Group

@|dc.title; Crystal structure of squid rhodopsin

@|dc.creator; Midori Murakami

@|dc.creator; Tsutomu Kouyama

@|dc.identifier; doi:10.1038/nature06925

@|prism.copyright; © 2008 Nature Publishing Group

@|prism.rightsAgent; permissions@nature.com

@|prism.publicationName; Nature

@|prism.issn; 0028-0836

@|prism.eIssn; 1476-4687

@|prism.volume; 453

@|prism.number; 7193

@|prism.startingPage; 363

@|prism.endingPage; 367

@|citation_journal_title; Nature

@|citation_publisher; Nature Publishing Group

@|citation_authors; Midori Murakami, Tsutomu Kouyama

@|citation_title; Crystal structure of squid rhodopsin

@|citation_volume; 453

@|citation_issue; 7193

@|citation_firstpage; 363

@|citation_doi; doi:10.1038/nature06925

@)urc;

We look forward to seeing applications making use of this metadata and providing new value for users.

Comments

Comments are closed.