{"id":11677,"date":"2016-11-30T17:00:50","date_gmt":"2016-11-30T17:00:50","guid":{"rendered":"https:\/\/blogs.nature.com\/naturejobs\/?p=11677"},"modified":"2016-11-29T18:00:52","modified_gmt":"2016-11-29T18:00:52","slug":"opening-doors-to-open-data-at-scidata16","status":"publish","type":"post","link":"https:\/\/blogs.nature.com\/naturejobs\/2016\/11\/30\/opening-doors-to-open-data-at-scidata16\/","title":{"rendered":"Opening doors to open data at #scidata16"},"content":{"rendered":"<h2>Want to embrace open data but don\u2019t know where to start? The tools are out there, says Matthew Edmonds.<\/h2>\n<p>The <em><a href=\"https:\/\/blogs.nature.com\/naturejobs\/2016\/07\/18\/scidata16-publishing-better-science-through-better-data-writing-competition#more-10251\">Publishing Better Science through Better Data<\/a><\/em> conference, or #scidata16 for short, took place at the Wellcome Collection in London at the end of October. This one-day event organised by the journal <em>Scientific Data<\/em>, Springer Nature and the Wellcome Trust explored the challenges facing early-career researchers as we enter the era of open data.<\/p>\n<p><a class=\"wpn-image-link\" href=\"https:\/\/blogs.nature.com\/naturejobs\/files\/2016\/11\/analytics-282739_1280.png\"><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-11683 wpn-image alignright\" title=\"analytics-282739_1280\" src=\"https:\/\/blogs.nature.com\/naturejobs\/files\/2016\/11\/analytics-282739_1280.png\" alt=\"analytics-282739_1280\" width=\"1280\" height=\"1259\" srcset=\"https:\/\/blogs.nature.com\/naturejobs\/files\/2016\/11\/analytics-282739_1280.png 1280w, https:\/\/blogs.nature.com\/naturejobs\/files\/2016\/11\/analytics-282739_1280-300x295.png 300w, https:\/\/blogs.nature.com\/naturejobs\/files\/2016\/11\/analytics-282739_1280-1024x1007.png 1024w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/a><\/p>\n<p>As a data novice, I arrived without really knowing what to expect. The types of experiments I perform generate only small datasets needing a simple statistical test, easily summarised in a graph in the manuscript. The original data can be safely left to gather dust in a shared drive.<!--more--><\/p>\n<p>Or so I thought. As the day progressed, it became clear that most data issues apply no matter what your experiment is. I hadn\u2019t previously considered how other researchers can access, read, understand, use and confirm the reproducibility of my data. Now, we\u2019re entering an era when most journals require publication of raw data and analysis techniques alongside a main article. Fortunately, a post-lunch session of lightning talks illustrated some of the emerging solutions to these problems.<\/p>\n<p>Some results may be years old before they\u2019re published, and others never see the light of day because they\u2019re considered \u201cnegative\u201d or irrelevant. An experimental manipulation that produces no effect may not make it into a journal article, for example. To challenge these views, Dr Rachel Harding is sharing her results in near-real-time through the <a href=\"https:\/\/labscribbles.com\/\">Lab Scribbles<\/a> blog. She makes mini-reports of her work on Huntington\u2019s disease and her data go into the <a href=\"https:\/\/zenodo.org\/\">Zenodo<\/a> repository (which makes any dataset citable). Taking this approach has benefits for both her and the wider community: they don\u2019t have to wait months to see new data, and can offer suggestions for improvements or collaborations before publication. Dr Harding\u2019s lab book has been viewed over 20,000 times from 95 countries. How many people have read yours?<\/p>\n<p>Of course, making sense of someone else\u2019s lab book can be challenging. Jo Barratt of Open Knowledge International introduced their concept of <a href=\"https:\/\/frictionlessdata.io\/\">Frictionless Data<\/a>: packaging data to a few simple standards that greatly improve the ease of sharing, reading and usage. Among their tools is Goodtables, which enables quick validation of tabular data (you can publically test it <a href=\"https:\/\/goodtables.okfnlabs.org\/\">here<\/a>). Just upload a table and a schema (defining the variables and any restrictions, such as integers only) and it will flag any errors before you get into the nitty-gritty of analysis.<\/p>\n<p>And analysis is particularly difficult in some cases. Take image processing, which is important across many scientific disciplines. The research goals of each field can throw up idiosyncratic problems. Research in nanomaterials, for example, can use electron tomography \u2013 a technique that allows 3D characterisation but requires rendering of the data into an image. Visualisation is essential to understanding the data but is highly dependent on the preferences of individual researchers, which are very difficult to describe in writing. To address this problem, Dr Robert Hovden of the University of Michigan developed <a href=\"https:\/\/www.tomviz.org\/\">tomviz<\/a>, which integrates the raw data and manipulation steps into one place. This allows others in the field to see the pipeline from data to model. Dr Hovden has made it open source and independent of operating system, and he says he sees no reason why it couldn\u2019t be used for any similar dataset.<\/p>\n<p>In contrast, neuroimaging datasets require huge number-crunching power to provide outputs relevant to whole networks of neurons, even up to the level of the whole brain. Individual researchers often don\u2019t have the required computing power at their disposal at their institutions, or must wait their turn to use them. This bottleneck inspired the Montr\u00e9al Neurological Institute to create resources open to anyone. They cover the whole process from data repository (<a href=\"https:\/\/www.loris.ca\/\">LORIS<\/a>) to high-performance computer processing (<a href=\"https:\/\/mcin-cnim.ca\/neuroimagingtechnologies\/cbrain\/\">CBRAIN<\/a>), and importantly maintain compatibility with multinational collaborations such as the European <a href=\"https:\/\/www.humanbrainproject.eu\/en_GB\">Human Brain Project<\/a>.<\/p>\n<p>So what were the lessons for my dormant hard drive and I? I\u2019d never seriously considered sharing my data going into #scidata16, but this position is fast becoming inexcusable. Yes, it might be difficult for me to get all of my information out in the big wide world, but creative people are making tools to facilitate that process. Now I will be seeking and using those tools for my own data \u2013 for the benefit of everyone.<\/p>\n<p>&nbsp;<\/p>\n<p><em>Matthew Edmonds is a postdoc at the University of Birmingham, UK, researching how cells which have defective mechanisms to repair damaged DNA can lead to cancer. He finds the pace of change in technology available to researchers astonishing, and tries his best to keep up. You can keep up with him at <a href=\"https:\/\/twitter.com\/benchmatt\">@benchmatt<\/a>.<a class=\"wpn-image-link\" href=\"https:\/\/blogs.nature.com\/naturejobs\/files\/2016\/11\/Edmonds-Matthew-photo.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\" wp-image-11685 wpn-image alignright\" title=\"Edmonds, Matthew \u2013 photo\" src=\"https:\/\/blogs.nature.com\/naturejobs\/files\/2016\/11\/Edmonds-Matthew-photo.jpg\" alt=\"Edmonds, Matthew - photo\" width=\"209\" height=\"301\" srcset=\"https:\/\/blogs.nature.com\/naturejobs\/files\/2016\/11\/Edmonds-Matthew-photo.jpg 869w, https:\/\/blogs.nature.com\/naturejobs\/files\/2016\/11\/Edmonds-Matthew-photo-208x300.jpg 208w, https:\/\/blogs.nature.com\/naturejobs\/files\/2016\/11\/Edmonds-Matthew-photo-710x1024.jpg 710w\" sizes=\"auto, (max-width: 209px) 100vw, 209px\" \/><\/a><\/em><\/p>\n<p><em>You can access all the slides and videos from <\/em>Publishing Better Science through Better Data 2016<em>, as well as the great visual summary of the day, on the <a href=\"https:\/\/www.nature.com\/openresearch\/scidata16\/\">event website<\/a>.<\/em><\/p>\n<p>&nbsp;<\/p>\n<p><strong>Suggested posts<\/strong><\/p>\n<p class=\"wpn-post-title entry-title article-heading\"><a href=\"https:\/\/blogs.nature.com\/naturejobs\/2016\/10\/10\/how-can-better-data-sharing-and-management-improve-a-career-in-science\/\">How can better data sharing and management improve a career in science?<\/a><\/p>\n<p class=\"wpn-post-title entry-title article-heading\"><a href=\"https:\/\/blogs.nature.com\/naturejobs\/2016\/10\/31\/why-should-we-work-so-hard-to-make-our-work-reproducible\/\">Why should we work so hard to make our work reproducible?<\/a><\/p>\n<p class=\"wpn-post-title entry-title article-heading\"><a href=\"https:\/\/blogs.nature.com\/naturejobs\/2016\/10\/21\/why-dont-scientists-always-share-their-data\/\">Why don\u2019t scientists always share their data?<\/a><\/p>\n<p class=\"wpn-post-title entry-title article-heading\"><a href=\"https:\/\/blogs.nature.com\/naturejobs\/2016\/11\/02\/has-big-data-changed-what-it-means-to-be-a-scientist\/\">Has big data changed what it means to be a scientist?<\/a><\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The Publishing Better Science through Better Data conference, or #scidata16 for short, took place at the Wellcome Collection in London at the end of October. This one-day event organised by the journal Scientific Data, Springer Nature and the Wellcome Trust explored the challenges facing early-career researchers as we enter the era of open data.&nbsp; <a href=\"\/naturejobs\/2016\/11\/30\/opening-doors-to-open-data-at-scidata16#more-11677\" class=\"more-link\">Read more<\/a> <a href=\"https:\/\/blogs.nature.com\/naturejobs\/2016\/11\/30\/opening-doors-to-open-data-at-scidata16\/\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":90925,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[190,191,385,185,865,65,199,200],"tags":[3163831,3789015,3,306,1189,865,4734979,3382713,4105231,1605,13,4105227,563,4814595,4045575,1373],"class_list":["post-11677","post","type-post","status-publish","format-standard","hentry","category-academia-2","category-admin-2","category-careers-articles","category-collaboration-2","category-data","category-phd","category-research-2","category-technology-2","tag-scidata16","tag-analysis","tag-blog","tag-competition","tag-conference","tag-data","tag-hard-drive","tag-information","tag-matthew-edmonds","tag-open-data","tag-research","tag-scidata","tag-scientific-data","tag-scientific-information","tag-tools","tag-writing-competition"],"_links":{"self":[{"href":"https:\/\/blogs.nature.com\/naturejobs\/wp-json\/wp\/v2\/posts\/11677","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blogs.nature.com\/naturejobs\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.nature.com\/naturejobs\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.nature.com\/naturejobs\/wp-json\/wp\/v2\/users\/90925"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.nature.com\/naturejobs\/wp-json\/wp\/v2\/comments?post=11677"}],"version-history":[{"count":0,"href":"https:\/\/blogs.nature.com\/naturejobs\/wp-json\/wp\/v2\/posts\/11677\/revisions"}],"wp:attachment":[{"href":"https:\/\/blogs.nature.com\/naturejobs\/wp-json\/wp\/v2\/media?parent=11677"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.nature.com\/naturejobs\/wp-json\/wp\/v2\/categories?post=11677"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.nature.com\/naturejobs\/wp-json\/wp\/v2\/tags?post=11677"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}