Scientific Data | Scientific Data

Data Matters : Interview with Jens Kattge

Jens_KJens Kattge is group leader of the research group Functional Biogeography at the Max Planck Institute for Biogeochemistry, Germany

What are the current data sharing practices in your field?

I am working in ecology, often with data for characteristics of plants (‘plant traits’). In this context I see awareness of the relevance of data, the importance of reusing data and willingness to share data. However, sharing data may not be directly making data publicly available without restriction, but with some degree of control. While data are young, people tend to share under the condition of eventually being involved in publications. As the data become older, or have been reused, people tend to become less reluctant to make their data publicly available without restrictions.

Data Matters presents a series of interviews with scientists, funders and librarians on topics related to data sharing and standards.

In ecology there is some scepticism about quality issues of data that are being reused, because they are often reused in different contexts. This is not a quality problem of the data per se, but a problem of how the data are being reused. For example, a plant trait may have been measured on plants grown in Europe. The same species may also occur in North America. Can the trait values that have been measured in Europe also be used for plants of the same species grown in North America? In some cases this may be appropriate, in other cases it may be difficult.

Are there any barriers to the wider sharing of data in your field?

The example above indicates a problem for data sharing in the domain of ecology and plant traits: depending on the trait and the question of interest, different auxiliary data may be necessary to interpret the measured trait values. In addition to measurement methods and details, these auxiliary data may include the conditions under which the individual plants have been grown – climate, soil and/or disturbances. As these demands for auxiliary data are specific for the different traits and scientific questions, these requirements are hard to standardize. For example the auxiliary data which are needed to interpret the ‘growth form’ of a plant are very different from auxiliary data needed to interpret a photosynthesis measurement on a specific leaf. In both cases the plant species may be relevant for comparison to measurements on the same species or to other species. While the plant growth form is stable within a species – in most cases all individuals of a species can be attributed to the same growth form, either tree, shrub, herb or grass – in case of photosynthesis measurements the conditions during plant growth and photosynthesis measurement are important (e.g., soil nutrient availability during plant growth; water availability, light and temperature during growth and during measurement). In addition, plant and leaf diseases and developmental stage (e.g., juvenile versus mature) may be relevant.

Due to these specific requirements plant trait data are little standardized and dispersed over many small datasets, which are difficult to find or not available at all. To overcome this problem we started an initiative called TRY to combine different plant trait datasets and make them available at one Internet portal in a consolidated format (www.try-db.org). To make data sharing in the context of TRY attractive, we agreed that data providers may decide with whom to share their data, if they would like to be involved in publications using their data via TRY, and that the original sources of the data should be cited. TRY is thus an example for an incentive driven approach to data sharing. This seems to be successful as it has led to an unprecedented collection of plant traits – so far the initiative has combined 220 datasets with trait data for 80,000 of the 350,000 vascular plant species existing today – and the data are frequently used.

Are there any particular problems with the citation of data in ecology?

“Sound reproducible scholarship rests upon the foundation of robust accessible data.”

One incentive for data sharing is the perspective that the references to the original data sources should be cited. Ecology is characterized by a high variability of species and data types. Therefore many kinds of ecological data are compiled in several different and often small original resources. In the context of new analyses and publications often data from many sources are reused. Unfortunately, several journals in ecology have a limitation to the number of references, which they accept in the reference list. These references are then preferably used for context citations, while references to data sources are often moved to supplementary material. References in supplementary material in general are not indexed by Web of Science or Google Scholar and therefore provide no credit for people who had made their data available.

In the context of the TRY initiative often traits from more than 20 original data sources are requested. When results are published in about half of the cases the authors are not able to cite the original data sources in the main reference list – they have to move them to supplementary material.

The restriction of the number of references seems to be a legacy from printing journals when printed pages were expensive. Since journals are nowadays primarily distributed in electronic form this restriction might be loosened. Data references will become more visible, which will be an incentive for making data easily available.

How do you think a product like Scientific Data can help?

Scientific Data has endorsed the joint declaration of data citation principles, which highlights that sound reproducible scholarship rests upon the foundation of robust accessible data. Citation of data, like the citation of other evidence and sources, is highlighted as good research practice.

Scientific Data, being one of Nature Publishing Group’s journals, will become a high profile journal, which will be a strong motivation for making data available via a publication in Scientific Data. This will also provide additional motivation for people to do measurements, as it provides an opportunity for additional credit for their work. The combination of the two aspects – the opportunity of high profile data publications and consequent citation of original data references – will provide substantial support towards an incentive driven open data policy.

Interview by David Stuart, a freelance writer based in London, UK

Comments

There are currently no comments.