Monday, April 13, 2009

bibliography conversion utilities

source http://www.scripps.edu/~cdputnam/software/bibutils/bibutils2.html

description

The bibutils program set interconverts between various bibliography formats using a common MODS-format XML intermediate. For example, one can convert RIS-format files to Bibtex by doing two transformations: RIS->MODS->Bibtex. By using a common intermediate for N formats, only 2N programs are required and not N²-N. These programs operate on the command line and are styled after standard UNIX-like filters.

I primarily use these tools at the command line, but they are suitable for scripting and have been incorporated into a number of different bibliographic projects.



MODS

The XML intermediate is the Library of Congress's Metadata Object Description Schema (MODS) version 3.1. This is a very flexible standard that should prove quite useful as the number of tools that directly interact with it increase. For other programmers working on tools for working with MODS, I've written a quick introduction.



program tools
  • bib2xml - convert BibTeX to MODS XML intermediate
  • biblatex2xml - convert BibLaTeX to MODS XML intermediate
  • copac2xml - convert COPAC format references to MODS XML intermediate
  • end2xml - convert EndNote (Refer format) to MODS XML intermediate
  • endx2xml - convert EndNote XML to MODS XML intermediate
  • isi2xml - convert ISI web of science to MODS XML intermediate
  • med2xml - convert Pubmed XML references to MODS XML intermediate
  • modsclean - a MODS to MODS converter for testing puposes mostly
  • ris2xml - convert RIS format to MODS XML intermediate
  • xml2ads - convert MODS XML intermediate into Smithsonian Astrophysical Observatory (SAO)/National Aeronautics and Space Administration (NASA) Astrophyics Data System or ADS reference format (converter submitted by Richard Mathar)
  • xml2bib - convert MODS XML intermediate into BibTeX
  • xml2end - convert MODS XML intermediate into format for EndNote
  • xml2isi - convert MODS XML intermediate to ISI format
  • xml2ris - convert MODS XML intermediate into RIS format
  • xml2wordbib - convert MODS XML intermediate into Word 2007 bibliography format


  • downloads
  • bibutils_4.1_i386.tgz - x86 Linux binaries
  • bibutils_4.1_win.zip - Windows binaries
  • bibutils_4.1_cygwin.zip - Cygwin binaries (compiled on Windows XP)
  • bibutils_4.0_osx.tgz - MacOSX binaries
  • bibutils_4.1_src.tgz - C source code

    The main difference between Bibutils version 4 and Bibutils version 3 is the library API. Documentation for the library API is available here.

    The older Bibutils version 2 generates a non-standard XML intermediate that isn't as useful as MODS. I'm still keeping it available; however, I encourage all users to migration to the latest version 3 release.

    For other programmers working on tools for working with MODS, I've written a quick introduction.

    Starting with the version 3.15 release, the programs have been reorganized into a nice library for being plugged into other progjects. Documentation for the library API will be available shortly.



  • MODS flags

    Several flags available for the end2xml, endx2xml, bib2xml, ris2xml, med2xml, and copac2xml programs:

    -h, --help                 display help

    -v, --version display version

    -a, --add-refcount add "_#", where # is reference count to reference id

    -s, --single-refperfile put one reference per file name by the reference number

    -i, --input-encoding interpret the input file as using the requested
    character set (use w/o argument for current list
    derived from character sets at www.kostis.net)
    unicode is now a character set option

    -u, --unicode-characters encode unicode characters directly in the file
    rather than as XML entities (default)

    -un,--unicode-no-bom as -u, but don't include a byte order mark

    -x, --xml-entities change default UTF8-encoded characters to XML
    entities (opposite of -u)

    -nl,--no-latex do not covert latex character combinations (bib2xml)

    -d, --drop-key don't put citation key in the mods id field

    -c, --corporation-file with argument specifying a file containing a list
    of corporation names to be placed in
    instead
    of type="personal" and eliminate name mangling

    --verbose verbose output

    --debug very verbose output (mostly for debugging)


    bib2xml

    bib2xml converts a BibTeX-formatted reference file to an XML-intermediate bibliography file. Specify file(s) to be converted on the command line. Files containing BibTeX substitutions strings should be specified before the files where substitutions are specified (or in the same file before their use). If no files are specified, then BibTeX information will be read from standard in.

    bib2xml BibTeX_file.bib > output_file.xml


    copac2xml

    copac2xml converts a COPAC formatted reference file to a MODS XML-intermediate bibliography file.

    end2xml

    end2xml converts a text endnote-formatted reference file to an XML-intermediate bibliography file. This program will not work on the binary library; the file needs to be exported first.

    Endnote tagged formats ("Refer" format export) look like:

    %0 Journal Article
    %A C. D. Putnam
    %A C. S. Pikaard
    %D 1992
    %T Cooperative binding of the Xenopus RNA polymerase I
    transcription factor xUBF to repetitive ribosomal gene enhancers
    %J Mol Cell Biol
    %V 12
    %P 4970-4980
    %F Putnam1992

    There are very nice instructions for making sure that you are properly exporting this at http://www.sonnysoftware.com/endnoteimport.html

    Usage for end2xml is the same as bib2xml.

    end2xml endnote_file.end > output_file.xml


    endx2xml

    endx2xml converts a EndNote-XML exported reference file to a MODS XML-intermediate bibliography file. This program will not work on the binary library; the file needs to be exported first.

    isi2xml

    isi2xml converts an ISI-web-of-science-formatted reference file to a MODS XML-intermediate bibliography file.

    Usage for isi2xml is the same as bib2xml.

    isi2xml input_file.isi > output_file.xml


    med2xml

    med2xml converts an medline XML formatted reference file to a MODS XML-intermediate bibliography file.

    To download references from PubMed, choose the "Display" option "XML" and then select "Send To" "File". This file is in the correct format for med2xml to read.

    ris2xml

    ris2xml converts a RIS-formatted reference file to an XML-intermediate bibliography file. ris2xml usage is as end2xml and bib2xml

    ris2xml ris_file.ris > output_file.xml


    xml2bib

    xml2bib converts the MODS XML bibliography into a BibTeX-formatted reference file. xml2bib usage is as for other tools

    xml2bib xml_file.xml > output_file.bib

    Starting with 3.24, xml2bib output uses lowercase tags and mixed case reference types for better interaction with Emacs. The older behavior with all uppercase tags/reference types can still be generated using the command-line switch -U/--uppercase.

    Command line options:

    • -v, --version ; report version information
    • -h, --help ; report help
    • -fc, --finalcomma ; add final comma in the BibTeX output for those that want it
    • -sd, --singledash ; use one dash instead of two (longer dash in latex) between numbers in page output
    • -b, --brackets ; use brackets instead of quotation marks around field data
    • -w, --whitespace ; add beautifying whitespace to output
    • -s, --single-refperfile ; put one reference per file name by the reference number
    • -o, --output-encoding ; interpret the input file as using the requested character set (use w/o argument for current list derived from character sets at www.kostis.net) unicode is now a character set option
    • -U, --uppercase ; use all uppercase for tags (field names) and reference types (pre-3.24 behavior)
    • -sk, --strictkey ; ensure only alphanumeric characters are used in BibTeX reference keys
    • -nl, --no-latex ; do not convert characters that can be converted to latex entities into latex entities
    • -nb, --no-bom ; do not write Byte Order Mark if writing UTF-8

    Default Output Final Comma
    @Article{Putnam1992,
    author="C. D. Putnam
    and C. S. Pikaard",
    year="1992",
    month="Nov",
    title="Cooperative binding of the
    Xenopus RNA polymerase I transcription
    factor xUBF to repetitive ribosomal
    gene enhancers",
    journal="Mol Cell Biol",
    volume="12",
    pages="4970--4980",
    number="11"}
    @Article{Putnam1992,
    author="C. D. Putnam
    and C. S. Pikaard",
    year="1992",
    month="Nov",
    title="Cooperative binding of the
    Xenopus RNA polymerase I transcription
    factor xUBF to repetitive ribosomal
    gene enhancers",
    journal="Mol Cell Biol",
    volume="12",
    pages="4970--4980",
    number="11",}
    Single Dash Whitespace
    @Article{Putnam1992,
    author="C. D. Putnam
    and C. S. Pikaard",
    year="1992",
    month="Nov",
    title="Cooperative binding of the
    Xenopus RNA polymerase I transcription
    factor xUBF to repetitive ribosomal
    gene enhancers",
    journal="Mol Cell Biol",
    volume="12",
    pages="4970-4980",
    number="11"}
    @Article{Putnam1992,
    author = "C. D. Putnam
    and C. S. Pikaard",
    year = "1992",
    month = "Jan",
    title = "Cooperative binding of
    the Xenopus RNA polymerase I transcription
    factor xUBF to repetitive ribosomal gene
    enhancers",
    journal = "Mol Cell Biol",
    volume = "12",
    pages = "4970--4980"
    }
    Brackets Uppercase
    @Article{Putnam1992,
    author={Putnam, C. D.
    and Pikaard, C. S.},
    title={Cooperative binding of the Xenopus
    RNA polymerase I transcription factor xUBF
    to repetitive ribosomal gene enhancers},
    journal={Mol Cell Biol},
    year={1992},
    month={Nov},
    volume={12},
    number={11},
    pages={4970--4980}
    }
    @ARTICLE{Putnam1992,
    AUTHOR="Putnam, C. D.
    and Pikaard, C. S.",
    TITLE="Cooperative binding of the Xenopus
    RNA polymerase I transcription factor xUBF
    to repetitive ribosomal gene enhancers",
    JOURNAL="Mol Cell Biol",
    YEAR="1992",
    MONTH="Nov",
    VOLUME="12",
    NUMBER="11",
    PAGES="4970--4980"
    }



    xml2ris

    xml2ris converts the MODS XML bibliography to RIS-formatted bibliography file. xml2ris usage is as for other tools

    xml2ris xml_file.xml > output_file.ris


    Command line options:

    • -v, --version ; report version information
    • -h, --help ; report help
    • -s, --single-refperfile put one reference per file name by the reference number
    • -o, --output-encoding interpret the input file as using the requested character set (use w/o argument for current list derived from character sets at www.kostis.net) unicode is now a character set option
    • -nb, --no-bom ; do not write Byte Order Mark if writing UTF-8

    xml2end

    xml2end converts the MODS XML bibliography to tagged Endnote (refer-format) bibliography file. xml2end usage is as for other tools

    xml2end xml_file.xml > output_file.end


    Command line options:

    • -v, --version ; report version information
    • -h, --help ; report help
    • -s, --single-refperfile put one reference per file name by the reference number
    • -o, --output-encoding interpret the input file as using the requested character set (use w/o argument for current list derived from character sets at www.kostis.net) unicode is now a character set option
    • -nb, --no-bom ; do not write Byte Order Mark if writing UTF-8

    xml2wordbib

    xml2wordbib converts the MODS XML bibliography to Word 2007-formatted XML bibliography file. xml2word usage is as for other tools

    xml2wordbib xml_file.xml > output_file.word.xml


    Command line options:

    • -v, --version ; report version information
    • -h, --help ; report help
    • -s, --single-refperfile put one reference per file name by the reference number
    • -o, --output-encoding interpret the input file as using the requested character set (use w/o argument for current list derived from character sets at www.kostis.net) unicode is now a character set option
    • -nb, --no-bom ; do not write Byte Order Mark if writing UTF-8

    faq

    How do I download the files?

    Files can be saved by right-clicking on the link. This will pull up a context-sensitive menu, from which you should choose "Save Link As..." (or whatever the appropriate item is for your web browser). Simply clicking on the links frequently loads the binary into the browser window. Not terribly useful.

    Downloads on this page are going to be archives of all of the executables (as zipped or tarred/gzipped files depending on the architecture).


    The programs don't run for me. What am I doing wrong?

    If you send me this question, I would immediately have to ask for more information. The follow items address specific problems.

    • "command not found" The message "command not found" on Linux/UNIX/MacOSX systems indicates that the commands cannot be found. This could mean that the programs are not flagged as being executable (run "chmod ugo+x xml2bib" for the appropriate binaries) or the executables are not in your current path (and have to be specified directly like ./xml2bib). A quick web search on chmod or path variables should provide many detailed resources.

    • I'm running MacOSX and I just get a terminal when I double-click on the programs. Simply put, this is not the way to run the programs. You want to run the terminal first and then issue the commands at the command line. It should be under Applications->Utilities->Terminal on most MacOSX systems I've seen. If you just double-click the program, the terminal corresponds to the input to the tool. Not so useful.

      Some links to get you started running the terminal in a standard UNIX-like fashion are at TerminalBasics.pdf [homepage.mac.com], [macdevcenter.com], and [ee.surrey.ac.uk].

      I'm happy to help with specific questions, but the more knowledgable you are the easier it will be to help (and I frankly don't have the time to help everyone learn basic UNIX).

    I am very interested in bug reports and problems in conversions. Feel free to e-mail me if you run into these issues. The absolute best bug reports provide error messages from the operating systems and/or input and outputs that detail the problems. Please remember that I'm not looking over your shoulder and I cannot read your mind to figure out what you are doing--"It doesn't work." isn't a bug report I can help you with.


    You have a MacOSX version, can you give me a MacOS9 version?

    Sorry. I'd like to, but these programs assume a command-line interface with normal standard in, standard out, and stardard error along with command-line arguments. MacOSX is a fundamental change in the operating sysem with a BSD (UNIX-like) core that I'm taking advantage of to provide a MacOSX binary. On the other hand, I don't know that much about MacOS9, and if it is possible to generate a useful binary from these sources let me know.


    This stuff is great, how can I help?

    OK, I actually don't get this question so often, though I have gotten very useful help through people who have willingly sent useful bug reports and sample problematic data to allow me to test these programs. I willingly accept bug reports, patches, new filters, suggestions on program improvements or better documentation and the like. All I can say is that users (or programmers) who contact me with these sorts of things are far more likely to get their itches scratched.



    license

    All versions of bibutils are relased under the GNU Public License (GPL). In a nutshell, feel free to download, run, and modify these programs as required. If you re-release these, you need to release the modified version of the source. (And I'd appreciate patches as well...if you care enough to make the change, then I'd like to see what you're adding or fixing.)