Linking DOIs using HTTPS: The background to Crossref’s new guidelines

Linking DOIs using HTTPS: The background to Crossref’s new guidelines

Recently we announced that we were making some new recommendations in our DOI display guidelines. One of them was to use the secure HTTPS protocol to link Crossref DOIs, instead of the insecure HTTP.

Some people asked whether the move to HTTPS might affect their ability to measure referrals (i.e. where the people who visit your site come from).

TL;DR: Yes

Yes. If you do not move your DOI links to HTTPS, Crossref, its members and the members of other DOI registration agencies (e.g. DataCite, JLC, CNKI)  will find it increasingly difficult to accurately measure referrals. You should link DOIs using HTTPS.

In fact, if you do not support HTTPS on your site now, it is likely that your ability to measure referrals is already impaired. If you do not already have a plan to move your site to HTTPS, you should develop one.

If you have already transitioned your site to HTTPS, you should follow the new guidelines and link DOIs via HTTPS as soon as possible. As it stands, you are not sending any referrer information when DOIs are clicked on and followed from your site. You should also make sure that the URLs you have registered with Crossref are HTTPS URLs, otherwise *you* will not get referrer information on your site when they are followed.

Read on if you want some grody details. We’ll try to keep it as non-technical as possible.

Two protocols, one web

To start with your web browser supports two closely related protocols, HTTP and HTTPS.

The first, HTTP, is the protocol that the web started out with. It is an unencrypted protocol and it is also easy to intercept and modify. It is also very easy and inexpensive to implement.

The second protocol, HTTPS, is a secure version of the first protocol. It is very difficult to intercept and modify. It has historically been more complex and expensive to implement.

Here you might say – “Great, but HTTPS has been around for a long time. We’ve used it for sensitive transactions like authentication and credit card transactions. Why do we want to use DOI links with HTTPS?” Why are you suggesting that we should even consider moving our entire site to HTTPS? 

The pressure to move to HTTPS

The insecure HTTP protocol has become a major vector for a lot of security issues on the web. It allows user web pages to be intercepted and modified between the server and the browser. This flaw is being abused for everything from spying, to inserting unwanted advertisements into web pages, to distributing viruses, ransomware and botnets.

As such, there has been a steady drumbeat of industry encouragement to move to the more secure HTTPS protocol for all website functions.

We are not going to argue all the points here. Instead we will mention the major constituencies that are advocating for a move to HTTPS and provide you with some pointers. We apologise that these are all so US-centric, but a lot of the web’s global direction does seem to be presaged by US adoption trends.

Google

It is probably easiest to start with Google, since its practices tend to focus the attention of those managing websites.

Back in 2014 Google announced that they would slowly move toward including the use of HTTPS as a ranking signal. In 2015 they upped the ante by announcing that they would start indexing HTTPS versions of pages by default. It looks like in early 2017 they will really start to take the gloves off as they modify their Chrome browser to flag sites that do not use HTTPS as being insecure.

Every top website, evah

It looks like Google’s plan is working too. Their 2016 transparency report shows that most top websites have already transitioned to HTTPS and that this translates to approximately 25% of all web traffic worldwide taking place using HTTPS. Indeed, over 50% of all web pages viewed by desktop users are delivered via HTTPS.

Government agencies

The USA’s Whitehouse issued [a directive instructing all Federal websites to adopt HTTPS]. As of December 2016 64% of federal websites have made the transition.

Libraries

Much of the pressure to move to HTTPS is coming from the library community who have a historical tradition of protecting patron privacy and resisting efforts to censor content. The third principle of the American Library Association’s code of ethics reads:

We protect each library user’s right to privacy and confidentiality with respect to information sought or received and resources consulted, borrowed, acquired or transmitted.

Recently there has been a major push by the Electronic Frontier Foundation to get libraries to adopt a number of security and privacy practices, including the use of HTTPS by all library systems as well as those used by library vendors.

What are Crossref members doing about HTTPS?

How big an issue is this? How many of our members have moved to HTTPS? How many plan to? Well, we looked at the URLs that are registered with Crossref and we tested them with both protocols. Eventually we will write a blog post detailing our findings – but the highlights are:

  • Slightly fewer than half of the member domains tested only support HTTP.
  • Slightly fewer than half of the member domains tested support both HTTP and HTTPS.
  • About 370 of the member domains tested only support HTTPS.

The transition to HTTPS and the issue of DOI referrals

The HTTP referrer is a piece of information passed on by a browser that indicates the site from which the user navigated.

So, for example, if a user visiting site A clicks on a link which takes them to site B, site B will then record in its logs that a user visited them from site A. Obviously, this is important information for understanding where your web site traffic comes from.

The default rules for referrals are1:

  1. If you link between two sites with the same level of security, all referral information is retained.
  2. When you follow a link from an insecure (HTTP) web site to a secure (HTTPS) site, referral data is passed on to the secure web site.
  3. If you follow a link from a secure (HTTPS) web site to an insecure (HTTP) site, referral data is not passed on to the insecure web site.

So let’s see what the situation would look like with normal links. If we had two sites, A & B, the following table maps the possible combinations of protocols that can be used to link from A to B. So, for example, row #2 reads:

A user browses site A using HTTP and clicks on a HTTPS link to publisher B who hosts their site using HTTPS.

The last column indicates if the referrer information is passed along by the browser. In the case of row #2, the answer is “yes”. The user has navigated from a less secure site to a more secure site.

User views site A using Site A links to site B using Browser reports referrer to site B
HTTP HTTP Yes
HTTP HTTPS Yes
HTTPS HTTP No
HTTPS HTTPS Yes


But this gets a little more complicated with DOIs. In this case publisher A links to publisher B through the DOI system. This means there are two parts to the link. The first (A->doi.org) results in a redirect (A->B). Again we use the last columns to indicate when referrer information is passed along to site B. Again, let’s look at row #2. It reads:

A user browses the site of member A using HTTP and clicks on a HTTP DOI link. The DOI system redirects the browser to member B using an HTTPS link registered with Crossref by member B. The middle column and the last column records whether Crossref and the publisher were able to see referrer information. The answer in both cases is “yes”. In the first case (A->DOI) because the link was from a less secure site (HTTP on A) to a more secure site (HTTPS at DOI). The second case because the link is between two sites at the same security level (HTTP).

User views site A using Site A links DOI using Browser reports referrer to Crossref2 Crossref redirects to site B using3 Browser reports referrer to site B
1 HTTP HTTP Yes HTTP Yes
2 HTTP HTTP Yes HTTPS Yes
3 HTTP HTTPS Yes HTTP Yes
4 HTTP HTTPS Yes HTTPS Yes
5 HTTPS HTTP No HTTP No
6 HTTPS HTTP No HTTPS No
7 HTTPS HTTPS Yes HTTP No
8 HTTPS HTTPS Yes HTTPS Yes


So what does this mean?

Our old display guidelines recommended linking DOIs using HTTP. Rows #1, #2, #5, #6 represent the status quo.

About half of our members support HTTPS. A few support it exclusively and it seems, given the industry pressures mentioned above, those who support both protocols are likely doing so as a transition stage to HTTPS-only sites.

This means that the scenarios represented in row #5 & #6 are already happening. The referral information for any user viewing one of our member sites using HTTPS is being lost when they click on DOIs that use the HTTP protocol. Crossref doesn’t get the referral data and neither does the member whose DOI has been clicked on.

Of course this applies to non-member sites that link to DOIs as well. Wikipedia is the largest referrer of DOIs from outside the industry. In 2015 The Wikimedia Foundation made a highly publicised transition to HTTPS on all of their sites. This means that any of our members who are running HTTP sites have already lost the ability to see any referral information from Wikipedia on their own sites. However, Crossref worked closely with Wikimedia to ensure that, at the very least, Crossref was still able to record Wikimedia referral data on behalf of our members.

A solution

It is largely this work with Wikimedia that has helped us to understand just how important it is for Crossref to get ahead of the curve in helping our community to transition to HTTPS.

As long as our members are running a combination of HTTP and HTTPS sites, there is no way for our community to avoid some disruption in the flow of referral data. And we certainly would never entertain the notion of asking our members to keep using HTTP.The best we can do is recommend a practice that will help smooth the transition to HTTPS. That is what we are doing.Our new recommendation is to move to linking DOIs using HTTPS. This is represented in rows #3, #4, #7 and #8 in the table above.

This is a particularly important step for our members who have already moved to hosting their sites on HTTPS. As long as they are using HTTP DOIs on their site, they will be sending no referral traffic to Crossref, other Crossref members or other users of the DOI infrastructure. This is captured in scenarios #5 and #6.

If our linking guidelines are followed during the industry’s transition to HTTPS, then scenario #5 and #6 will eventually be replaced with scenario #7. It is still not perfect, but at least it means that, during the transition, publishers who are still running HTTP sites will be able to get some DOI referral data via Crossref. And of course, once our members have widely transitioned to HTTPS, everything will go back to normal and they will be able to see referral data on their own sites as well (i.e.they will have moved from the state represented in row #1 to state represented in row #8.)

In summary, please change your sites to use HTTPS to link DOIs. They should look like this:

https://doi.org/10.7554/eLife.20320

FAQ

Q: If I have moved my site to HTTPS, do I need to redeposit my URLs to that they use the HTTPS protocol instead?

A: Yes. If you want to be able to still collect referrer information on your site (scenario #8) as opposed to via Crossref (scenario #7).


Q: But can’t I avoid redepositing my URLs and get referrer data again if I simply redirect HTTP URLs to HTTPS on my own site?

A: No. The browser will strip referrer information if there is any HTTP step in the redirects. Even if the redirect is done on your own site.


Q: Can I avoid having to redeposit all my URLs? Can’t Crossref just update the protocol on our existing DOIs for us?

A: Contact support@crossref.org. We’ll see what we can do.


Q: What about all the old PDFs that are are there? They link to DOIs using HTTP.

A: That is true. But links followed from PDFs don’t send referrer information anyway.


Q: And what about my new PDFs? Should I start linking DOIs from them using HTTPS.

A: Probably. But not because of the DOI referrer problem. Simply because HTTPS is a more secure, private, and future-proof protocol.


Q: Don’t some countries block HTTPS?

A: Typically countries block specific sites and/or services. We do not know of any countries that have a blanket block on the HTTPS protocol.


Q: I use a link resolver that uses OpenURL + a  cookie pusher to redirect my users to local resources. What do I need to do?

A: You need to change your cookie pusher script to enable the Secure attribute for cookies for HTTPS-linked DOIs.   


Q: Can I use protocol-relative URLs (e.g. //doi.org/10.7554/eLife.20320)?

A: Protocol-relative URLs can be used in HTML HREFs to help ease the transition from HTTP to HTTPS, but use the full protocol in the text of the DOI link itself. So, for example, the following is fine:


Q: I hear that HTTP and HTTPS versions of URI identifiers are considered to be different identifiers. Doesn’t this mean that by moving to HTTPS we are essentially doubling the number of DOI-based identifiers out there?

A: Yes. It isn’t a problem that is only being faced by DOIs. Basically all HTTP-URI based identifiers face the same issue. We will put in place appropriate same-as assertions in our metadata and HTTP headers to allow people to understand that the HTTP and HTTPS representations of the DOI point to the same thing.

On a personal note (@gbilder speaking- don’t blame @CrossrefOrg) – it breaks my brain that the official line is that the protocol difference means they are different identifiers. As a practical matter (a concept the W3C seems to be increasingly alienated from), it would be insane for anybody to follow this policy to the letter. You can probably be pretty safe swapping the protocols on DOIs and being sure you will get the same thing.


Q: I see that the Crossref site isn’t running on HTTPS. Are you just a bunch of hypocrites?

A: Yes. The site will be moving to HTTPS-only very soon. Then we won’t be.

 

References

  1. These rules can be tweaked using meta referrer tags (https://www.w3.org/TR/referrer-policy/), but not in any way that both avoids the fundamental problems outlined here and that preserves the security/privacy characteristics that are the very reason to implement HTTPS in the first place.
  2. To be pedantic- it actually passes referrer information to the DOI proxy (https://doi.org/), which in turn is reported to Crossref.
  3. To continue with the pedantry- the DOI proxy does the redirect based on the URL member B has deposited with Crossref.

Using the Crossref REST API. Part 3 (with SHARE)

As a follow-up to our blog posts on the Crossref REST API we talked to SHARE about the work they’re doing, and how they’re employing the Crossref metadata as a piece of the puzzle.  Cynthia Hudson-Vitale from SHARE explains in more detail…

Continue reading “Using the Crossref REST API. Part 3 (with SHARE)”

Call for participation: Membership & Fees Committee

Crossref was founded to enable collaboration between publishers.  As our membership has grown and diversified over recent years, it’s becoming even more vital that we take input from a representative cross-section of the membership. This is especially important when considering how fees and policies will affect our diverse members in different ways.

About the M&F Committee

The Membership & Fees Committee (M&F Committee) was established in 2001 and plays an important role in Crossref’s governance.  Made up of 10-12 organizations of both board members and regular members, the group makes recommendations to the board about fees and policies for all of our services. They regularly review existing fees to discuss if any changes are needed. They also review new services while they are being developed, to assess if fees should be charged and if so, what those fees should be. For example, the committee recently made recommendations to the board about the fees for a new service called Event Data that we’ll launch soon, and the deposit fees for preprints – our newest content type.  In addition, the board can also ask the committee to address specific issues about policies and services. Increasingly, the committee works with the outreach team to include research and survey insights.

About committee participation

The M&F Committee meets via one-hour conference calls about six times a year, although this can vary depending on what issues the committee is considering. Often proposals are developed by staff and then reviewed and discussed by the committee – so there is reading to do in preparation for the calls.Join a Crossref committee

This is very important work and in order to ensure that the committee is broadly representative of Crossref’s diverse membership we are seeking expressions of interest from members who would like to serve on the M&F Committee for 2017. Appointments are for one year and members can serve multiple terms.

About you

In view of our commitment to be representative of the membership we are refreshing the committee and want to have engaged and interested people from a diverse set of members join.

If you are interested in joining the committee and helping Crossref fulfil its mission please email feedback@crossref.org with your name, title, organization and a short statement about why you want to serve on the committee by December 19th, 2016.      

Scott Delman, Director of Group Publishing, ACM is the current Chair of the committee and will review the expressions of interest with me, Ed Pentz, Executive Director, to form the committee.

Thanks for your interest.

A look back at LIVE16

Crossref LIVE16 opened with a Mashup Day on 1st November 2016 in London. Attendees from the scholarly communications world met to chat with Crossref team members in an open house atmosphere. The Crossref team put their latest projects on display and were met with questions, comments, and ideas from members and other metadata folks. Here’s what it looked like — you may recognize a few familiar faces. 

Crossref LIVE16 in London

crossref-day1-mashupdaysign

LIVE16 continued with the Conference Day on 2nd November, a plenary session with invited speakers and presentations by the Crossref team. Here are the presentations, in chronological order.

Dario Taraborelli speaks on “Wikipedia’s role in the dissemination of scholarship” 

Ian Calvert speaks on: “You don’t have metadata (and how to befriend a data scientist)” 

Ed Pentz speaks on “Crossref’s outlook & key priorities” 

Ginny Hendricks speaks on “A vision for membership”

Geoffrey Bilder speaks on “The case of the missing leg” 

Lisa Hart Martin speaks on “The meaning of governance”

Jennifer Lin speaks on “New territories in the Scholarly Research Map”

 

Chuck Koscher speaks on “Relationships and other notable things” 

Carly Strasser speaks on “Funders and Publishers as Agents of Change” 

April Hathcock speaks on “Opening Up the Margins”

 

Your survey feedback

We’re serious about making Crossref LIVE a useful and welcoming annual event for the Crossref membership as well as members of the wider scholarly communications community. That’s why we appreciate responses from the attendees who answered our survey. Here’s what we have learned from your feedback:

Content

  • You want speakers to tell you something new, even if you don’t agree with their points of view
  • Your favorite speakers were those who inspired you
  • You prefer an unscripted presentation style that makes complex topics accessible to all
  • You’re not as interested in the mechanics of Crossref’s annual election as we are

Format

  • You enjoyed the diversity of presenters and would like even more external speakers
  • You want more opportunity to ask us technical questions on the Mashup Day  
  • You want to see panel discussions in addition to individual presentations on the Conference Day
  • Those who attended the Conference Day only wished they had also attended the Mashup Day

Atmosphere

  • You liked the casual atmosphere but wanted more seating and more dessert.  So noted!

LIVE17 will be held next November 14-15 in Asia. Until then, we hope you’ll have the chance to see us at the regional Crossref LIVE events we are planning around the world throughout the year. Our next local event is Crossref LIVE in Brazil, held 13 December in Campinas and 16 December in Sao Paulo. 

URLs and DOIs: a complicated relationship

As the linking hub for scholarly content, it’s our job to tame URLs and put in their place something better. Why? Most URLs suffer from link rot and can be created, deleted or changed at any time. And that’s a problem if you’re trying to cite them.

Continue reading “URLs and DOIs: a complicated relationship”

Preprints are go at Crossref!

As content evolves, connections persist and new links are added
As content evolves, connections persist and new links are added

We’re excited to say that we’ve finished the work on our infrastructure to allow members to register preprints. Want to know why we’re doing this? Jennifer Lin explains the rationale in detail in an earlier post, but in short we want to help make sure that:

  • links to these publications persist over time
  • they are connected to the full history of the shared research results
  • the citation record is clear and up-to-date

Doing so will help fully integrate preprint publications into the formal scholarly record.

Continue reading “Preprints are go at Crossref!”

The Organization Identifier Project: a way forward

The scholarly communications sector has built and adopted a series of open identifier and metadata infrastructure systems to great success.  Content identifiers (through Crossref and DataCite) and contributor identifiers (through ORCID) have become foundational infrastructure to the industry.  The OI Project, organization identifierBut there still seems to be one piece of the infrastructure that is missing.  There is as yet no open, stakeholder-governed infrastructure for organization identifiers and associated metadata.

In order to understand this gap, Crossref, DataCite and ORCID have been collaborating to:

  • Explore the current landscape of organizational identifiers;
  • Collect the use-cases that would benefit our respective stakeholders in scholarly communications industry;
  • Identify those use-cases that can be more feasibly addressed in the near term; and
  • Explore how the three organizations can collaborate (with each other and with others) to practically address this key missing piece of scholarly infrastructure.

The result of this work is in three related papers being released by Crossref, DataCite and ORCID for community review and feedback. The three papers are:

  • Organization Identifier Project: A Way Forward (PDFGDoc)
  • Organization Identifier Provider Landscape (PDF; GDoc)
  • Technical Considerations for an Organization Identifier Registry (PDF; GDoc)

We invite the community to comment on these papers both via email (oi-project@orcid.org) and at PIDapalooza on November 9th and 10th and at Crossref LIVE16 on November 1st and 2nd. To move The OI Project forward, we will be forming a Community Working Group with the goal of holding an initial meeting before the end of 2016. The Working Group’s main charge is to develop a plan to launch and sustain an open, independent, non-profit organization identifier registry to facilitate the disambiguation of researcher affiliations.

Crossref Use Cases

Crossref has also been discussing the needs of its members over the last year and there is value in focusing on the affiliation name ambiguity problem with research outputs and contributors. In terms of the metadata that Crossref collects, something that is missing has been affiliations for the authors of publications. Over the last couple of years, Crossref has been expanding what it collects – for example, funding and licensing data and ORCID iDs – and this enables a fuller picture of what we are calling the “article nexus”. In order to continue to fill out the metadata we collect – and for our publisher members to use in their own systems and publications – we need an organization identifier.

Another use case for Crossref is identifying funders as part of collecting funder data to enable connecting funding sources with the published scholarly literature. In order to enable the reliable identification of funders in the Crossref system we created the Open Funder Registry that now has over 13,000 funders available as Open Data under a CC0 waiver. While this has been very successful, it is a very narrowly focused registry and is not suitable for a broad, community-run organization identifier registry that addresses the affiliation use case.  In future, our goal will be to merge the Open Funder Registry into the identifier registry that the Organization Identifier Working Group will work on.

By working collaboratively we can define a pragmatic and cost-effective service that will meet a fundamental need of all scholarly communication stakeholders.

Geoffrey Bilder will be focusing his talk at Crossref LIVE16 this week on this initiative, dubbed The OI Project. The talk is scheduled for 2pm UK time and will be live streamed along with the rest of that day’s program.

Smart alone; brilliant together. Community reigns at Crossref LIVE16

A bit different from our traditional meetings, Crossref LIVE16 next week is the first of a totally new annual event for the scholarly communications community.  Our theme is Smart alone; brilliant together.  We have a broad program of both informal and plenary talks across two days. There will be stations to visit, conversation starters, and entertainment, that highlight what our community can achieve if it works together.

Check out the final program.

We’re now opening the doors to all parties—our 5,000+ publisher members of all shapes and sizes—as well as the technology providers, funders, libraries, and researchers that we work with.  Our aim is to gather the ‘metadata-curious’ and have more opportunities to talk face-to-face to share ideas and information, see live demos, and get to know one another.

Mashup Day – Tuesday 1st November 12-5pm.  An ‘open house’ vibe, we’ll have several stations to visit each Crossref team, a LIVE Lounge, good food, and guest areas run by our friends at DataCite, ORCID, and Turnitin.  We’ll have some special programming too, on-the-hour lightning talks, including a wild talk at 2pm from a primatologist who speaks baboon!

Conference Day – Wednesday 2nd November 9am-5pm.  There is more of a formal plenary agenda this day, with keynote speakers from across the scholarly communications landscape.  Our primary goal is to share Crossref strategy and plans, alongside thought-provoking perspectives from our guest speakers.  We’ll hear from many corners of our community including:

  • Funder program officer, Carly Strasser (Moore Foundation) on “Publishers and funders as agents of change“,
  • Data scientist, Ian Calvert (Digital Science) on “You don’t have metadata“,
  • Open knowledge advocate, Dario Taraborelli (The Wikimedia Foundation) on “Citations for the sum of all human knowledge“, and
  • Scholarly communications librarian, April Hathcock (New York University) on “Opening up the margins“.

For our part, we will set out Crossref’s “strategy and key priorities” (Ed Pentz), “A vision for membership” (me, Ginny Hendricks), “The meaning of governance” (Lisa Hart Martin), “The case of the missing leg” (Geoffrey Bilder),”New territories in the scholarly research map” (Jennifer Lin), and “Relationships and other notable things” (Chuck Koscher).  

We will also set aside thirty minutes for the important Crossref annual business meeting, when we will announce the results of the membership’s vote, and welcome new board members.

I can’t wait to welcome you all.

Have you voted?

If you’re a voting member of Crossref you’ll have cast your vote already I hope! I’m so happy to see that people have voted in record numbers although it’s under 7% of our eligible members which is not high… more on member participation next week.