What is metadata?

What academic reference management tools provide successful automatic metadata extraction from a collection of PDF files?

  • It seems like it shouldn't be hard to detect the title of each file and look up the metadata online automatically, but I haven't yet encountered a tool that does this robustly for computer science papers. This is a follow-up question to .

  • Answer:

    I have been using Mendeley Desktop for biology pdfs, and it generally works quite well. If I know the PubMed retrieval number, Mendeley can correct its automatic metadata extraction if I think it's wrong. It also supports retrieval through a few other databases. I have not tried it out with computer science papers, but it might do the trick for you.

Joydeep Banerjee at Quora Visit the source

Was this solution helpful to you?

Other answers

PDF files do generally not contain the meta data needed to create the correct output as required by citation styles. It is very difficult to extract the correct information from the text of a PDF file, apart from the title, but the title alone may not be enough to find the rest of the information in machine readable format in a bibliographic database. Citavi (https://www.citavi.com) extracts the metadata from a PDF file if there is any, and if the file contains a DOI somewhere on the first pages, it uses the DOI to look up the information in external databases like CrossRef or PubMed. Journal pages, on the other hand, very often do contain the bibliographic information in machine readable format, either in the COinS format, or as HighWire Press Tags, so you should use a citation manager that can extract this information from the journal landing page and then attach the PDF file of the paper to the reference. Please note that information extracted from the PDF or imported from a landing page may not be complete or correct, so make sure to double check everything that you did not type yourself (which you would check, too). Citavi lets you create tasks like "Verify bibliographic information" and tag the references as "verified".

Patrick Hilt

Zotero can do it if the PDF has metadata embedded in it. http://www.zotero.org/support/retrieve_pdf_metadata

Stephen Francoeur

Mendeley desktop is the most hassle free way to extract metadata from PDFs..zotero can do the same, but mendeley has 2 GB of free storage while zotero only has a 100 MB. Also check out Colwiz desktop..excellent tool.

Karthik Bala

Both Refworks and Zotero support limited collection of metadata from various sources including PDFs however some people find that this functionality is somewhat frustrating because it is heavily dependent upon the creator of the original document correctly formatting the data and adding it in the appropriate fields.  No bibliographic tool can match the data to the fields on the fly from the raw information about a document. Oliver

Oliver Starr

Just Added Q & A:

Find solution

For every problem there is a solution! Proved by Solucija.

  • Got an issue and looking for advice?

  • Ask Solucija to search every corner of the Web for help.

  • Get workable solutions and helpful tips in a moment.

Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.