As a follow-up to our blog posts on the Crossref REST API we talked to SHARE about the work they’re doing, and how they’re employing the Crossref metadata as a piece of the puzzle. Cynthia Hudson-Vitale from SHARE explains in more detail…
As the linking hub for scholarly content, it’s our job to tame URLs and put in their place something better. Why? Most URLs suffer from link rot and can be created, deleted or changed at any time. And that’s a problem if you’re trying to cite them.
Crossref will be updating its DOI Display Guidelines within the next couple of weeks. This is a big deal. We last made a change in 2011 so it’s not something that happens often or that we take lightly. In short, the changes are to drop “dx” from DOI links and to use “https:” rather than “http:”. An example of the new best practice in displaying a Crossref DOI link is: https://doi.org/10.1629/22161
Hey Ho, “doi:” and “dx” have got to go
The updated Crossref DOI Display guidelines recommend that https://doi.org/ be used and not http://dx.doi.org/ in DOI links. Originally the “dx” separated the DOI resolver from the International DOI Foundation (IDF) website but this has changed and the IDF has already updated its recommendations so we are bringing ours in line with theirs.
We are also recommending the use of HTTPS because it makes for more secure browsing. When you use an HTTPS link, the connection between the person who clicks the DOI and the DOI resolver is secure. This means it can’t be tampered with or eavesdropped on. The DOI resolver will redirect to both HTTP and HTTPS URLs.
Timing and backwards compatibility
We are requesting all Crossref member publishers and anyone using Crossref DOIs to start following the updated guidelines as soon as possible. But realistically we are setting a goal of six months for implementation; we realize that updating systems and websites can take time. We at Crossref will also be updating our systems within six months – we already use HTTPS for some of our services and our new website (coming very soon!) will use HTTPS.
An important point about backwards compatibility is that “http://dx.doi.org/” and “http://doi.org/” are valid and will continue to work forever–or as long as Crossref DOIs continue to work–and we plan to be around a long time.
We need to do better
Reflecting on the 2011 update to the display guidelines it’s fair to say that we have been disappointed. It is still much too common to see unlinked DOIs in the form doi:10.1063/1.3599050 or DOI: 10.1629/22161 or even unlinked in this form: http://dx.doi.org/10.1002/poc.3551
What’s so wrong with this approach? To demonstrate, please click on this DOI doi:10.1063/1.3599050 – oh, you can’t click on it? How about I send you to a real example of a publisher page. What I’d like you to do is click the following link and then copy the DOI you find there and come back – http://dx.doi.org/10.1002/poc.3551.
Are you back? I expect you had to carefully highlight the “10.1063/1.3599050” and then do “edit”, “copy”. That wasn’t too bad but the next step is to put the DOI into an email and send it to someone. But wait – what are they going to do with “10.1063/1.3599050”? It’s useless. If you want it to be useful you’ll have to add “http://doi.org” or https://doi.org/ in the front.
When publishers follow the guidelines it makes things easier – if you go to https://doi.org/10.1063/1.3599050 you’ll note that you can just right click on the full DOI link on the page and get a full menu of options of what to do with it. One of which is to copy the link and then you can easily paste into an email or anywhere else.
However–putting a positive spin on the spotty adherence to the 2011 update to the DOI display guidelines–everyone has another chance with the latest set of updates to make all the changes at once!
More on HTTPS (future-proofing scholarly linking)
We take providing the central linking infrastructure for scholarly publishing seriously. Because we form the link between publisher sites all over the web, it’s important that we do our bit to enable secure browsing from start to finish. In addition, HTTPS is now a ranking signal for Google who gives sites using HTTPS a small ranking boost.
The process of enabling HTTPS on publisher sites will be a long one and, given the number of members we have, it may a while before everyone’s made the transition. But by using HTTPS we are future-proofing scholarly linking on the web.
Some years ago we started the process of making our new services available exclusively over HTTPS. The Crossref Metadata API is HTTPS enabled, and Crossmark and our Assets CDN use HTTPS exclusively. Last year we collaborated with Wikipedia to make all of their DOI links HTTPS. We hope that we’ll start to see more of the scholarly publishing industry doing the same.
So–it’s simple–always make the DOI a full link – https://doi.org/10.1006/jmbi.1995.0238 – even when it’s on the abstract or full text page of the content that the DOI identifies – and use “https://doi.org/”.
Crossref discourages our members from using DOI-like strings or fake DOIs.
Recently we have seen quite a bit of debate around the use of so-called “fake-DOIs.” We have also been quoted as saying that we discourage the use of “fake DOIs” or “DOI-like strings”. This post outlines some of the cases in which we’ve seen fake DOIs used and why we recommend against doing so.
Using DOI-like strings as internal identifiers
Some of our members use DOI-like strings as internal identifiers for their manuscript tracking systems. These only get registered as real DOIs with Crossref once an article is published. This seems relatively harmless, except that, frequently, the unregistered DOI-like strings for unpublished (e.g. under review or rejected manuscripts) content ‘escape’ into the public as well. People attempting to use these DOI-like strings get understandably confused and angry when they don’t resolve or otherwise work as DOIs. After years of experiencing the frustration that these DOI-like things cause, we have taken to recommending that our members not use DOI-like strings as their internal identifiers.
Using DOI-like strings in access control compliance applications
We’ve also had members use DOI-like strings as the basis for systems that they use to detect and block tools designed to bypass the member’s access control system and bulk-download content. The methods employed by our members have fallen into two broad categories:
- Spider (or robot) traps.
- Proxy bait.
A “spider trap” is essentially a tripwire that allows a site owner to detect when a spider/robot is crawling their site to download content. The technique involves embedding a special trigger URL in a public page on a web site. The URL is embedded such that a normal user should not be able see it or follow it, but an automated bot (aka “spider”) will detect it and follow it. The theory is that when one of these trap URLs is followed, the website owner can then conclude that the ip address from which it was followed harbours a bot and take action. Usually the action is to inform the organisation from which the bot is connecting and to ask them to block it. But sometimes triggering a spider trap has resulted in the IP address associated with it being instantly cut off. This, in turn, can affect an entire university’s access to said member’s content.
When a spider/bot trap includes a DOI-like string, then we have seen some particularly pernicious problems as they can trip-up legitimate tools and activities as well. For example, a bibliographic management browser plugin might automatically extract DOIs and retrieve metadata on pages visited by a researcher. If the plugin were to pick up one of these spider traps DOI-like strings, it might inadvertently trigger the researcher being blocked- or worse- the researcher’s entire university being blocked. In the past, this has even been a problem for Crossref itself. We periodically run tools to test DOI resolution and to ensure that our members are properly displaying DOIs, CrossMarks, and metadata as per their member obligations. We’ve occasionally been blocked when we ran across the spider traps as well.
Using proxy bait is similar to using a spider trap, but it has an important difference. It does not involve embedding specially crafted DOI like strings on the member’s website itself. The DOI-like strings are instead fed directly to tools designed to subvert the member’s access control systems. These tools, in turn, use proxies on a subscriber’s network to retrieve the “bait” DOI-like string. When the member sees one of these special DOI-like strings being requested from a particular institution, they then know that said institution’s network harbours a proxy. In theory this technique never exposes the DOI-like strings to the public and automated tools should not be able to stumble upon them. However, recently one of our members had some of these DOI-like strings “escape” into the public and at least one of them was indexed by Google. The problem was compounded because people clicking on these DOI-like strings sometimes ended having their university’s IP address banned from the member’s web site. As you can imagine, there has been a lot of gnashing of teeth. We are convinced, in this case, that the member was doing their best to make sure the DOI-like strings never entered the public. But they did nonetheless. We think this just underscores how hard it is to ensure DOI-like strings remain private and why we recommend our members not use them.
Pedantry and terminology
Notice that we have not used the phrase “fake DOI” yet. This is because, internally, at least, we have distinguished between “DOI-like strings” and “fake DOIs.” The terminology might be daft, but it is what we’ve used in the past and some of our members at least will be familiar with it. We don’t expect anybody outside of Crossref to know this.
To us, the following is not a DOI:
It is simply a string of alphanumeric characters that copy the DOI syntax. We call them “DOI-like strings.” It is not registered with any DOI registration agency and one cannot lookup metadata for it. If you try to “resolve” it, you will simply get an error. Here, you can try it. Don’t worry- clicking on it will not disable access for your university.
The following is what we have sometimes called a “fake DOI”
It is registered with Crossref, resolves to a fake article in a fake journal called The Journal of Psychoceramics (the study of Cracked Pots) run by a fictitious author (Josiah Carberry) who has a fake ORCID (http://orcid.org/0000-0002-1825-0097) but who is affiliated with a real university (Brown University).
Again, you can try it.
And you can even look up metadata for it.
Our dirty little secret is that this “fake DOI” was registered and is controlled by Crossref.
Why does this exist? Aren’t we subverting the scholarly record? Isn’t this awful? Aren’t we at the very least hypocrites? And how does a real university feel about having this fake author and journal associated with them?
Well- the DOI is using a prefix that we use for testing. It follows a long tradition of test identifiers starting with “5”. Fake phone numbers in the US start with “555”. Many credit card companies reserve fake numbers starting with “5”. For example, Mastercard’s are “5555555555554444” and “5105105105105100.”
We have created this fake DOI, the fake journal and the fake ORCID so that we can test our systems and demonstrate interoperable features and tools. The fake author, Josiah Carberry, is a long-running joke at Brown University. He even has a Wikipedia entry. There are also a lot of other DOIs under the test prefix “5555.”
We acknowledge that the term “fake DOI” might not be the best in this case- but it is a term we’ve used internally at least and it is worth distinguishing it from the case of DOI-like strings mentioned above.
But back to the important stuff….
As far as we know, none of our members has ever registered a “fake DOI” (as defined above) in order to detect and prevent the circumvention of their access control systems. If they had, we would consider it much more serious than the mere creation of DOI-like strings. The information associated with registered DOIs becomes part of the persistent scholarly citation record. Many, many third party systems and tools make use of our API and metadata including bibliographic management tools, TDM tools, CRIS systems, altmetrics services, etc. It would be a very bad thing if people started to worry that the legitimate use of registered DOIs could inadvertently block them from accessing content. Crossref DOIs are designed to encourage discovery and access- not block it.
And again, we have absolutely no evidence that any of our members has registered fake DOIs.
But just in case, we will continue to discourage our members from using DOI-like strings and/or registering fake DOIs.
This has been a public service announcement from the identifier dweebs at Crossref.
Unless otherwise noted, included images purchased from The Noun Project
We now have linked clinical trials deposits coming in from five publishers: BioMedCentral, BMJ, Elsevier, National Institute for Health Research and PLOS. It’s still a relatively small pool of metadata – around 4000 DOIs with associated clinical trial numbers – but we’re delighted to see that “threads” of publications are already starting to form.
If you look at this article in The Lancet and click on the CrossMark button you will see that in the Clinical Trials section there are links to three other articles reporting on the same trial: two from the American Heart Journal and one from BMJ’s Heart. Readers can navigate between these four articles in three separate journals using the CrossMark functionality- a new set of links and routes for discovery have appeared.
In another example, three articles from PLOS ONE are threaded together around a trial for the treatment of Type 1 diabetes. And here another PLOS journal, Neglected Tropical Diseases links through to a PLOS ONE article about the same trial.
If you publish in the health sciences please do consider joining this exciting initiative so that we can expand these threads and build up the metadata. Read the tech specs here or drop me an email if you have questions.
I had a great chat with Danielle Padula of Scholastica, a journals platform with an integrated peer-review process that was founded in 2011. We talked about how journals get started with Crossref, and she turned our conversation into a blog post that describes the steps to begin registering content and depositing metadata with us. Since the result is a really useful description of our new member on-boarding process, I want to share it with you here as well. As always, comments and questions are welcome here, at member@Crossref.org, and @CrossrefOrg. – Anna
The internet is in a constant state of change, with new content being added to the web by the minute and old content sometimes getting moved around. While the benefit of publishing scholarly outputs online is that it’s possible to update them at any moment, moving or modifying content can also …
Test out the early preview of Event Data while we continue to develop it. Share your thoughts. And be warned: we may break a few eggs from time to time!
Want to discover which research works are being shared, liked and commented on? What about the number of times a scholarly item is referenced? Starting today, you can whet your appetite with an early preview of the forthcoming Crossref Event Data service. We invite you to start exploring the activity of DOIs as they permeate and interact with the world after publication.
Back in 2014, Geoffrey Bilder blogged about the kick-off of an initiative between Crossref and Wikimedia to better integrate scholarly literature into the world’s largest knowledge space, Wikipedia. Since then, Crossref has been working to coordinate activities with Wikimedia: Joe Wass has worked with them to create a live stream of content being cited in Wikipedia; and we’re including Wikipedia in Event Data, a new service to launch later this year. In that time, we’ve also seen Wikipedia importance grow in terms of the volume of DOI referrals.
Alex Stinson, Project Manager for the Wikipedia Library, and guest blogger! This file is licensed under the Creative Commons Attribution-Share Alike 3.0 Unported license (Source: Myleen Hollero Photography)
How can we keep this momentum going and continue to improve the way we link Wikipedia articles with the formal literature? We invited Alex Stinson, a project manager at The Wikipedia Library (and one of our first guest bloggers) to explain more: Continue reading “The Wikipedia Library: A Partnership of Wikipedia and Publishers to Enhance Research and Discovery”
What happens to a research work outside of the formal literature? That’s what Event Data will aim to answer when the service launches later this year.
Following the successful DOI Event Tracker pilot in Spring 2014, development has been underway to build our new service, newly re-named Crossref Event Data. It’s an open data service that registers online activity (specifically, events) associated with Crossref metadata. Event Data will collect and store a record of any activity surrounding a research work from a defined set of web sources. The data will be made available as part of our metadata search service or via our Metadata API and normalised across a diverse set of sources. Data will be open, audit-able and replicable. Continue reading “Event Data: open for your interpretation”
We will shortly be adding a new feature to CrossMark. In a section called “Clinical Trials” we will be using new metadata fields to link together all of the publications we know about that reference a particular clinical trial.
Most medical journals make clinical trial registration a prerequisite for publication. Trials should be registered with one of the fifteen WHO-approved public trial registries, or with clinicaltrials.gov, which is run by the US National Library of Medicine. Once registered, a trial is assigned a clinical trial number (CTN) which is subsequently used to identify that trial in any publications that report on it. Continue reading “Linking clinical trials = enriched metadata and increased transparency”