The Organization Identifier Project: a way forward

The scholarly communications sector has built and adopted a series of open identifier and metadata infrastructure systems to great success.  Content identifiers (through Crossref and DataCite) and contributor identifiers (through ORCID) have become foundational infrastructure to the industry.  The OI Project, organization identifierBut there still seems to be one piece of the infrastructure that is missing.  There is as yet no open, stakeholder-governed infrastructure for organization identifiers and associated metadata.

In order to understand this gap, Crossref, DataCite and ORCID have been collaborating to:

  • Explore the current landscape of organizational identifiers;
  • Collect the use-cases that would benefit our respective stakeholders in scholarly communications industry;
  • Identify those use-cases that can be more feasibly addressed in the near term; and
  • Explore how the three organizations can collaborate (with each other and with others) to practically address this key missing piece of scholarly infrastructure.

The result of this work is in three related papers being released by Crossref, DataCite and ORCID for community review and feedback. The three papers are:

  • Organization Identifier Project: A Way Forward (PDFGDoc)
  • Organization Identifier Provider Landscape (PDF; GDoc)
  • Technical Considerations for an Organization Identifier Registry (PDF; GDoc)

We invite the community to comment on these papers both via email (oi-project@orcid.org) and at PIDapalooza on November 9th and 10th and at Crossref LIVE16 on November 1st and 2nd. To move The OI Project forward, we will be forming a Community Working Group with the goal of holding an initial meeting before the end of 2016. The Working Group’s main charge is to develop a plan to launch and sustain an open, independent, non-profit organization identifier registry to facilitate the disambiguation of researcher affiliations.

Crossref Use Cases

Crossref has also been discussing the needs of its members over the last year and there is value in focusing on the affiliation name ambiguity problem with research outputs and contributors. In terms of the metadata that Crossref collects, something that is missing has been affiliations for the authors of publications. Over the last couple of years, Crossref has been expanding what it collects – for example, funding and licensing data and ORCID iDs – and this enables a fuller picture of what we are calling the “article nexus”. In order to continue to fill out the metadata we collect – and for our publisher members to use in their own systems and publications – we need an organization identifier.

Another use case for Crossref is identifying funders as part of collecting funder data to enable connecting funding sources with the published scholarly literature. In order to enable the reliable identification of funders in the Crossref system we created the Open Funder Registry that now has over 13,000 funders available as Open Data under a CC0 waiver. While this has been very successful, it is a very narrowly focused registry and is not suitable for a broad, community-run organization identifier registry that addresses the affiliation use case.  In future, our goal will be to merge the Open Funder Registry into the identifier registry that the Organization Identifier Working Group will work on.

By working collaboratively we can define a pragmatic and cost-effective service that will meet a fundamental need of all scholarly communication stakeholders.

Geoffrey Bilder will be focusing his talk at Crossref LIVE16 this week on this initiative, dubbed The OI Project. The talk is scheduled for 2pm UK time and will be live streamed along with the rest of that day’s program.

Linking Publications to Data and Software

TL;DR

Crossref and Datacite provide a service to link publications and data. The easiest way for Crossref members to participate in this is to cite data using DataCite DOIs and to include them in the references within the metadata deposit. These data citations are automatically detected. Alternatively and/or additionally, Crossref members can deposit data citations (regardless of identifier) as a relation type in the metadata. Data & software citations from both methods are freely propagated. This blog post also describes how to retrieve the links collected between publication and data & software. Continue reading “Linking Publications to Data and Software”

Announcing PIDapalooza – a festival of identifiers

sideAThe buzz is building around PIDapalooza – the first open festival of scholarly research persistent identifiers (PID), to be held at the Radisson Blu Saga Hotel Reykjavik on November 9-10, 2016.

PIDapalooza will bring together creators and users of PIDs from around the world to shape the future PID landscape through the development of tools and services for the research community. PIDs support proper attribution and credit, promote collaboration and reuse, enable reproducibility of findings, foster faster and more efficient progress, and facilitate effective sharing, dissemination, and linking of scholarly works. Continue reading “Announcing PIDapalooza – a festival of identifiers”

Linking data and publications

Do you want to see if a CrossRef DOI (typically assigned to publications) refers to DataCite DOIs (typically assigned to data)? Here you go:

http://api.labs.crossref.org/graph/doi/10.4319/lo.1997.42.1.0001

Conversely, do you want to see if a DataCite DOI refers to CrossRef DOIs? Voilà:

http://api.labs.crossref.org/graph/doi/10.1594/pangaea.185321

Background

“How can we effectively integrate data into the scholarly record?” This is the question that has, for the past few years, generated an unprecedented amount of handwringing on the part researchers, librarians, funders and publishers. Indeed, this week I am in Amsterdam to attend the 4th RDA plenary in which this topic will no doubt again garner a lot of deserved attention.

We hope that the small example above will help push the RDAs agenda a little further. Like the recent ODIN project, It illustrates how we can simply combine two existing scholarly infrastructure systems to build important new functionality for integrating research objects into the scholarly literature.

Does it solve all of the problems associated with citing and referring to data? Can the various workgroups at RDA just cancel their data citation sessions and spend the week riding bikes and gorging on croquettes? Of course not. But my guess is that by simply integrating DataCite and CrossRef in this way, we can make a giant push in the right direction.

There are certainly going to be differences between traditional citation and data citation. Some even claim that citing data isn’t “as simple as citing traditional literature.” But this is a caricature of traditional citation. If you believe this, go off an peruse the MLA, Chicago, Harvard, NLM and APA citation guides. Then read Anthony Grafton’s, The Footnote? Are you back yet? Good, so let’s continue…

Citation of any sort is a complex issue- full of subtleties, edge-cases exceptions, disciplinary variations and kludges. Historically, the way to deal with these edge-cases has been social, not technical. For traditional literature we have simply evolved and documented citation practices which generally make contextually-appropriate use of the same technical infrastructure (footnotes, endnotes, metadata, etc.). I suspect the same will be true in citing data. The solutions will not be technical, they will mostly be social. Researchers, and publishers will evolve new, contextually appropriate mechanisms to use existing infrastructure deal with the peculiarities of data citation.

Does this mean that we will never have to develop new systems to handle data citation? Possibly But I don’t think we’ll know what those systems are or how they should work until we’ve actually had researchers attempting to use and adapt the tools we have.

Technical background

About five years ago, CrossRef and DataCite explored the possibility of exposing linkages between DataCite and CrossRef DOIs. Accordingly, we spent some time trying to assemble an example corpus that would illustrate the power of interlinking these identifiers. We encountered a slight problem. We could hardly find any examples. At that time, virtually nobody cited data with DataCite DOIs and, if they did, the CrossRef system did not handle them properly. We had to sit back and wait a while.

And now the situation has changed.

This demonstrator harvests DataCite DOIs using their OAI-PMH API and links them in a graph database with CrossRef DOIs. We have exposed this functionality on the “labs” (i.e. experimental) version of our REST API as a graph resource. So…

You can get a list of CrossRef DOIs that refer to DataCite DOIs as follows:

http://api.labs.crossref.org/graph?rel=cites:*&filter=source:crossref,related-source:datacite

And the converse:

http://api.labs.crossref.org/graph?rel=cites:*&filter=source:datacite,related-source:crossref

Caveats and Weasel Words

  • We have not finished indexing all the links.
  • The API is currently a very early labs project. It is about as reliable as a devolution promise from Westminster.
  • The API is run on a pair of raspberry-pi’s connected to the internet via bluetooth.
  • It is not fast.
  • The representation and the API is under active development.Things will change. Watch the CrossRef Labs site for updates on this collaboration with DataCite