September 17, 2020

A deep dive on DOIs

Edifix has been linking to Crossref since before Edifix existed—it’s a core element of our eXtyles reference processing, on which Edifix is based. Now we’ve extended our Crossref linking to not only better handle data citations but also improve how Edifix verifies DOIs for all reference types.

Remind me how Edifix DOI linking works?

Until now, Edifix has found and added DOI links by first parsing unstyled references into component pieces, then performing a metadata lookup on Crossref using all available metadata except an author-provided DOI. For example, with a journal reference, Edifix queries Crossref based on the first author’s surname, article title, journal name, volume, first page, and year. For a conference reference, Edifix often queries Crossref using only the first author’s name, paper title, and year.

When the reference included an author-supplied DOI, Edifix did one of 3 things:

  1. If the author’s DOI matched the DOI returned by the Crossref query, then Edifix simply used the DOI returned by the Crossref query.
  2. If the author’s DOI did not match the DOI returned by the Crossref query, then Edifix used the DOI returned by the Crossref query, and inserted a warning that the author’s DOI had been corrected.
  3. If no match was returned by Crossref, Edifix silently ignored the DOI.

This means that while Edifix has always been good at finding and adding DOIs based on citation metadata (excluding the DOI), until now it hasn’t done so well at flagging problems with author-supplied DOIs. 

Until recently, this was usually a valid approach, not only because authors rarely provided DOIs but because, in our experience, author-provided DOIs had a 20% error rate! Now that authors are increasingly citing online-only materials, such as journal articles published online ahead of print, preprints, and data sets, we increasingly see author-provided DOIs in references—so it was high time to revisit our methodology.

So how does Edifix DOI linking work now?

We’ve shifted from using metadata queries exclusively, to a multi-step process that allows Edifix to also verify author-supplied DOIs. Here’s how it works:

If the author has provided a DOI, then Edifix will

  1. Query Crossref using available metadata (except the DOI), just as it always has. If Crossref returns a DOI, then Edifix follows the logic described above. If not,
  2. Edifix checks to see if the author-supplied DOI is malformed (e.g., missing the 10.XXXX prefix). If the DOI is malformed, then Edifix adds a warning comment. If the DOI syntax is valid,
  3. Edifix queries Crossref to see if the DOI is registered. If the return from Crossref is valid, then Edifix turns the author-supplied DOI into a hyperlink. If Crossref indicates that the DOI is not registered with Crossref,
  4. Edifix queries Crossref again to discover the registration agency for the DOI. If Crossref returns a registration agency, Edifix adds a comment identifying the agency. If not,
  5. Edifix adds a warning comment to indicate that the DOI is unknown, and that the author should be queried for a corrected DOI.

ℹ️Did you know that Crossref is not the only registration agency for DOIs? For example, DataCite DOIs are registered with DataCite, not Crossref.

This means that Edifix now verifies every single DOI in a reference list, for every type of reference!