I spent a few days in Paris, France, with my research data colleagues, almost 600 participants from 38 countries, who gathered for the 6th RDA Plenary. This RDA (research data alliance) focused on the need to work with enterprises, and had as underlying theme the climate change.
That was the reason that Barbara Ryan (Secretariat Director, Group on Earth Observations) held a keynote on the first day. She was not just focusing on the climate change per se, but explained how she managed to get their data open, and the effect that this has had on usage. “Countries have borders, earth observations have not.”
We were all impressed with the statement that Axelle Lemaire (Minister of State for Digital Technology, French Ministry of Economy, Industry and Digital Technology) made at the start of the conference. She preferred to use the metaphor of light instead of oil, when talking about data. Data is not a fossil source that might run dry, data is around in many forms, sometimes a bit diffused, but crucial and it needs to be shared to create value. She told us that France will launch a public consultation on 26 September about “the Digital Bill”. A delightful presentation.
At the Plenary day I attended (there were three days in total) on 23 September, I was especially curious to see how working groups that I attended before, had progressed. So I attended the Publishing Data Workflows and the Data Citation Groups. The first group gave us a link to their article, and sample cases where either Dataverse, Dryad or figshare are used in the publisher’s data workflow. The future work will concentrate on moving forward in the research process, and analyse how processes for data publishing might work there. The working group invites everybody to give their best practices, thoughts and comments.
I think that we as libraries should realize that this is indeed what publishers are doing now (just also notice the press release announced at the RDA meeting about Mendeley Data and DANS). If we support our researchers with their data management plans and data stewardship, we can advise them how to keep, store and share their data, without giving the content away. I thought that the remark by William Gunn from Mendeley on the workshop a day before was reassuring “All types of content providers need to focus on value-added services and not paywalls”.
The Data Citation Working Group will shortly report on their 14 recommendations. The idea of RDA was that working groups only work for 18 months on a certain topic and that the group dissolves, and new groups emerge again. The difficulty here is that people like to continue their work, either because they feel committed to their legacy, or because there are many more ideas or recommendations to explore or make. New for me in this session was the “query store” as a middle man (you need to be able to reproduce your queries, so you give them a persistent identifier, but you also need to be able to retrieve the same data with that query, so you version your data with a timestamp). I also learned that data can be watermarked or carry fingerprints, as a protection layer (this related to data from social insurance providers for doctors and hospitals). Another term often used, was a “snapshot”: a version is a snapshot of your database. And I think it was Stefan Proll (but perhaps was it somebody who asked him some questions) said: “If users do not cite your data, cite your users”.
I already referred to the workshop on the day preceding the RDA, that was on e-Infrastructures & RDA for data intensive science. There was some overlap between these two days. One I did not mind at all. A very nice tool, called RD Switchboard, presented by Amir Aryani from ANDS (Australia). This switchboard is connecting datasets on the basis of co-authorship or other collaboration (e.g. via funding). Paolo Manghi showed that they already work together with the RD Switchboard by finding connections via the OpenAire database, between publications and projects and publications and data.
Mark Parsons, the secretary general of RDA talked (amongst other funny stuff) on infrastructures during the opening session of the preceding day. How we went from systems, to networks to networked infrastructures. Infrastructures are about bridges, both social and technical, and that is what RDA wants to do, creating bridges, and be open! “Preserve the freedom to tinker, that is why choice for open source is important.”
My Paris RDA trip started even a day before that, with the persistent identifiers workshop, organized by DataCite and ePIC. ePIC stands for persistent identifiers for eResearch, and is working on data in the full research cycle (what they call referrable data), whereas DataCite provides identifiers to citeable data. At the workshop there were presentations about identifiers such as ark, doi, handle, orcid and isni. For domain-specific work identifiers are often also needed, Anne Cambon-Thomsen started a journal for descriptions on Bioresources and Kerstin Lehnert introduced the igsn, the geosample number.
And we are not yet there, we want to use identifiers for more physical objects, we should always make sure that we refer to the pid in the metadata, and according to Peter Wissenburg, we should also use identifiers for the metadata. It is obvious that the most important thing is that these persistent identifiers are linked across platforms, and that we have an open scholarly infrastructure. A project about this, has just started, “Technical and Human Infrastructure for Open Research”: THOR. Tobias Weigl even wanted to bring it further: “We need an operational transition process. Go from one pid to the other. That is not possible yet.”
New for me was in the presentation by Laura Paglioni from ORCID that they will come with review information in your ORCID profile, and she showed that there is already a dataflow between CrossRef, DataCite and ORCID.
So even though I could not attend the full Plenary, enough inspiration as a take-away!