Data paper workshop, March 2015, Trondheim

Introduction to writing good data set metadata and to the procedure for writing and submitting data paper manuscripts to the PENSOFT Biodiversity Data Journal. Participants bring their own data set(s) and we aim at least a good start on the writing for a complete data paper manuscript by the end of day 2.

Dimitri Brosens

Dimitri Brosens from the GBIF Belgia node

A data paper focus on the description of a scientific data set to improve accessibility and appropriate reuse of the data by other researchers. A data manuscript may include the following information categories.


Dataset (Resource)


People and Organisations

Keyword Set (General Keywords)

Taxonomic Coverage

Geographic Coverage

Temporal Coverage


Intellectual Property Rights

Additional Metadata


Tuesday 24th March (NTNU VM)

On Tuesday we will be in the meeting room Stormrommet at NTNU University Museum downtown Trondheim at Erling Skakkes gate 47.


09:30 Coffee and registration

10:00 Introduction to the course

    Overview course content and practical information.

Round-table presentation of your background (maximum 2 minutes each).

10:30 Introduction to data papers

What is a data paper? Why a data paper? The GBIF metadata profile! Tools to use (Pensoft writing tool and the use of Github).

  • What & why

  • The GBIF-IPT metadata profile & data paper guidelines

  • Tools to use

11:30 Introduction to journals accepting data papers

Pensoft: Biodiversity Data Journal,;

Pensoft: ZooKeys; Pensoft: PhytoKeys

Nature: Scientific Data,

12:15 Lunch break

13:00 Presentation of participants datasets

All participants, max 10 min each

15:00 Introduction to data cleaning, visualisation and mapping tools

    OpenRefine, cartoDB, Mapbox, ...

16:00 Starting to write the data papers (all)

17:00 End of day 1


Wednesday 25th March (NINA-building)

On Wednesday we be in the meeting room Toppskarven at the NINA-building located at the NTNU university campus Gløshaugen, in the northern part near Lerkendal, Høgskoleringen 9.


09:00 Introduction to day 2

09:30 Continue writing the data papers (all)

11:30 Lunch-seminar (max 30 minutes)

Introducing academic and peer-review publication for biodiversity data sets.

The Global Biodiversity Information Facility (GBIF) and the Norwegian Biodiversity Information Centre (Artsdatabanken) provides a framework for publishing primary biodiversity data. This lunch seminar presents recent developments on an integrated data publishing framework for primary biodiversity data. The seminar will present the new scientific journals from Pensoft and the Nature publishing group offering novel academic publication of (biodiversity) data set descriptions which is developed and recently released in collaboration with GBIF. We will also present options for persistent archiving of data sets using Norstore, B2SHARE (EUDAT) and Data One.

Popularized introduction to biodiversity data publishing open to the public.

12:00 Lunch

    Hegstad Blakstad lunch restaurant in the NINA-building, Høyskoleringen 9.

13:00 Introduction to submitting the data papers

Pensoft BDJ, ZooKeys or PhytoKeys

15:00 Continue writing the data papers (all)

We aim at all data papers published before the summer holidays.

16:00 End of day 2

16:15 Workshop dinner at NINA-huset (proposed and if most of you can attend)

    Hegstad Blakstad restaurant in the NINA-building, Høyskoleringen 9.


Organizers (contact persons)

Participants (approximately 23 people)

  • NTNU VM, 13 participants: Anders G. Finstad, Kristian Hassel, Tommy Presthus, Karstein Hårsaker, Marc Daverdin, Egil Aune, Maria Capa, Xio-long Lin, Olav H., Anders Lyngstad, Narjes Yousefi, Dag-Inge Øien, Gunnar Austrheim
  • NINA, 7 participants: Erlend Nilsen (researcher), Siri Sæther (Ansvarlig for Bibliotek og open access), Roald Vang (Miljødataseksjonen), Frank Hanssen (Miljødataseksjonen), Graciela Rusch (forsker), Ishita Ahuja (forsker NINA/NTNU), Elisabet Forsgren (researcher, aquaculture)
  • Artsdatabanken, 1 participant: Wouter Koch,
  • [no participants attended from NTNU Institute for biology]
  • GBIF, 3 people: Dimitri Brosens, Dag Endresen, Christian Svindseth


Please remember to prepare a short presentation of your own dataset! A round table presentation of datasets are included in the agenda after lunch on Tuesday.


Data paper template


Google Doc template

Google Documents provides an efficient platform for collaborative writing of a data paper. You can copy this document [save as] and share with your co-authors to start writing your own data paper.

Template with examples:

Github template link:

Another efficient tool for writing a collaborative data paper is provided by GitHub. GitHub is often used for collaborative development of software code. You can clone the template used by the Belgian GBIF network to start your own data paper.

Pensoft Writing Tool

Pensoft Publishers provides a useful writing tool for collaborative writing of a data paper and easy submission to the Pensoft Biodiversity Data Journal.

Global Registry of Biorepositories (GRBio)

The Global Registry of Biorepositories (GRBio, provides a collaborative registry of biorepositories where biorepository curators and contact persons can register their own collections and reserve their preferred institution code and collection codes.

  • Darwin Core: instituteCode, instituteID, collectionCode, collectionID

Vegetation survey data

Some of you may have datasets based on survey data. This is a new data type in GBIF and methodology for improved support are under development. We recommend to start publishing your datasets following the current Darwin Core format. Enhanced support for this data type is under development and of high priority for the 2015 GBIF work plan.


Suggested reading list and examples


Pensoft: Biodiversity Data Journal (ISSN 1314-2828)

Biodiversity Data Journal is classified as a level 1 (nivå 1) journal in Norway:

Editorial with an introduction to the objective of BDJ:

Nature: Scientific Data (ISSN 2052-4436)


Examples of published data papers


Example data paper from Pensoft: BDJ (ISSN 1314-2828)

Example data paper from Pensoft: PhytoKeys (ISSN 1314-2003)

  • Browse data paper in PhytoKeys
  • Alonso P, Iriondo JM (2014). URJC GB dataset: Community-based seed bank of Mediterranean high-mountain and semi-arid plant species at Universidad Rey Juan Carlos (Spain). PhytoKeys 35: 57–72. doi:10.3897/phytokeys.35.6746 Resource Key:
  • García-Sánchez J, Cabezudo B (2013). Herbarium of the University of Malaga (Spain): Vascular Plants Collection. PhytoKeys 26: 7–19. doi:10.3897/phytokeys.26.5396, Resource ID: GBIF key:
  • Espinosa M, López J (2013). Herbarium of Vascular Plants Collection of the University of Extremadura (Spain). PhytoKeys 25: 1–13. doi:10.3897/phytokeys.25.5341 Resource ID: GBIF key:
  • Desmet P, Brouillet L (2013). Database of Vascular Plants of Canada (VASCAN): a community contributed taxonomic checklist of all vascular plants of Canada, Saint Pierre and Miquelon, and Greenland. PhytoKeys 25: 55–67. doi:10.3897/phytokeys.25.3100 Resource ID: GBIF key:
  • Van Landuyt W, Vanhecke L, Brosens D (2012) Florabank1: a grid-based database on vascular plant distribution in the northern part of Belgium (Flanders and the Brussels Capital region). PhytoKeys 12: 59-67. doi:10.3897/phytokeys.12.2849

Example data paper from Pensoft: ZooKeys (ISSN 1313-2970)

  • Browse data paper in ZooKeys
  • Piazza P, Blazewicz-Paszkowycz M, Ghiglione C, Alvaro M, Schnabel K, Schiaparelli S (2014) Distributional records of Ross Sea (Antarctica) Tanaidacea from museum samples stored in the collections of the Italian National Antarctic Museum (MNA) and the New Zealand National Institute of Water and Atmospheric Research (NIWA). ZooKeys 451: 49-60. doi:10.3897/zookeys.451.8373
  • Neubauer T, Kroh A, Harzhauser M, Georgopoulou E, Mandic O (2014) Synopsis of valid species-group taxa for freshwater Gastropoda recorded from the European Neogene. ZooKeys 435: 1-6. doi:10.3897/zookeys.435.8193
  • Martínez-Morales M, Pinilla-Buitrago G, González-García F, Enríquez P, Rangel-Salazar J, Guichard Romero C, Navarro-Sigüenza A, Monterrubio-Rico T, Escalona-Segura G (2014) CracidMex1: a comprehensive database of global occurrences of cracids (Aves, Galliformes) with distribution in Mexico. ZooKeys 420: 87-115. doi:10.3897/zookeys.420.7050
  • Morales Rozo A, Valencia F, Acosta A, Parra J (2014) Birds of Antioquia: Georeferenced database of specimens from the Colección de Ciencias Naturales del Museo Universitario de la Universidad de Antioquia (MUA). ZooKeys 410: 95-103. doi:10.3897/zookeys.410.7109
  • Figueira R, Monteiro M, Reino L, Beja P, Mills M, Bastos-Silveira C, Ramos M, Rodrigues D, Queirós Neves I, Consciência S (2014) The collection and database of Birds of Angola hosted at IICT (Instituto de Investigação Científica Tropical), Lisboa, Portugal. ZooKeys 387: 89-99. doi:10.3897/zookeys.387.6412
  • Neubauer T, Kroh A, Harzhauser M, Georgopoulou E, Mandic O (2014). Synopsis of valid species-group taxa for freshwater Gastropoda recorded from the European Neogene. ZooKeys 435: 1-6. doi:10.3897/zookeys.435.8193
  • Gutt J, Piepenburg D, Voß J (2014). Asteroids, ophiuroids and holothurians from the southeastern Weddell Sea (Southern Ocean). ZooKeys 434: 1-15. doi:10.3897/zookeys.434.7622
  • Brosens D, Vankerkhoven F, Ignace D, Wegnez P, Noé N, Heughebaert A, Bortels J & Dekoninck W (2013). FORMIDABEL: The Belgian Ants Database. ZooKeys 306: 59-70. doi:10.3897/zookeys.306.4898
  • Brosens D, Breine J, Van Thuyne G, Belpaire C, Desmet P, Verreycken H (2015). VIS – A database on the distribution of fishes in inland and estuarine waters in Flanders, Belgium. ZooKeys 475: 119-145. doi:10.3897/zookeys.475.8556

Example data paper from Nature: Scientific Data (ISSN 2052-4436)

  • Pigott DM, Golding N, Messina JP, Battle KE, Duda KA, Balard Y, Bastien P, Pratlong F, Brownstein JS, Freifeld CC, Mekaru SR, Madoff LC, George DB, Myers MF & Hay SI (2014). Global database of leishmaniasis occurrence locations, 1960–2012. Sci. Data 1:140036. doi:10.1038/sdata.2014.36.
  • Roquet F, Williams G, Hindell MA, Harcourt R, McMahon C, Guinet C, Charrassin J-B, Reverdin G, Boehme L, Lovell P & Fedak M (2014). A Southern Indian Ocean database of hydrographic profiles obtained with instrumented elephant seals. Sci. Data 1:140028. doi:10.1038/sdata.2014.28.
  • Plooij FX, van de Rijt-Plooij H, Fischer M & Pusey A (2014). Longitudinal recordings of the vocalizations of immature Gombe chimpanzees for developmental studies. Scientific Data 1:140025. doi:10.1038/sdata.2014.25
  • Mazzoldi C, Sambo A & Riginella E (2014). The Clodia database: a long time series of fishery data from the Adriatic Sea. Scientific Data 1:140018. doi:10.1038/sdata.2014.18.
  • Hao Z, AghaKouchak A, Nakhjiri N & Farahmand A (2014). Global integrated drought monitoring and prediction system. Scientific Data 1, Article number: 140001. doi:10.1038/sdata.2014.1
  • Edgar GJ & Stuart-Smith RD (2014). Systematic global assessment of reef fish communities by the Reef Life Survey program. Scientific Data 1, Article number: 140007. ​doi:10.1038/sdata.2014.7
  • Messina JP, Brady OJ, Pigott DM, Brownstein JS, Hoen AG & Hay SI ( 2014). A global compendium of human dengue virus occurrence. Scientific Data 1:140004. doi:10.1038/sdata.2014.4


  • Chavan V, Penev L (2011) The data paper: a mechanism to incentivize data publishing in biodiversity science. BMC Bioinformatics 12: S2. doi:10.1186/1471-2105-12-S15-S2
  • Chavan V, Penev L, & Hobern D (2013) Cultural Change in Data Publishing Is Essential BioScience 63(6): 419-420. doi:10.1525/bio.2013.63.6.3
  • Costello MJ, Michener WK, Gahegan M, Zhang ZQ, Bourne PE (2013) Biodiversity Data Should Be Published, Cited, and Peer Reviewed. Trends in Ecology & Evolution Volume 29, Issue 8, Pages 454–461.​.002
  • Piwowar HA, Day RS, Fridsma DB (2007) Sharing Detailed Research Data Is Associated with Increased Citation Rate. PLoS ONE 2(3): e308. doi:10.1371/journal.pone.0000308
  • Robertson T, Döring M, Guralnick R, Bloom D, Wieczorek J, Braak K, Otegui J, Russell L, Desmet P (2014) The GBIF Integrated Publishing Toolkit: Facilitating the Efficient Publishing of Biodiversity Data on the Internet. PLoS ONE 9(8): e102623. doi:10.1371/journal.pone.0102623
  • Smith V, Georgiev T, Stoev P, Biserkov J, Miller J, Livermore L, Baker E, Mietchen D, Couvreur T, Mueller G, Dikow T, Helgen K, Frank J, Agosti D, Roberts D, Penev L (2013) Beyond dead trees: integrating the scientific process in the Biodiversity Data Journal. Biodiversity Data Journal 1: e995. doi:10.3897/BDJ.1.e995

Collection of relevant slides (on data papers)


GBIF Data publishing framework


Pensoft: Biodiversity Data Journal

NPG: Scientific Data

ViBRANT, Scratchpads and the Natural History Museum in London

Other useful slides to consult

Mapping tools and resources


Practical information

GBIF Norway will cover the costs of travel and accommodation for invited participants travelling from outside of Trondheim.

Travel reimbursement forms (Norwegian):

Travel reimbursement forms (English):



Tags: gbif, data paper, workshop, biodiversity
Published Apr. 23, 2015 8:25 AM - Last modified Sep. 8, 2015 2:50 PM