Skip to content

antoineallard/clean_bibliography

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Clean bibliography

This modules provides a set of functions to clean and check bibfiles to their use in publications.

Examples

How to use the module is illustrated in the following scripts:

  • clean_bibfile.py: Removes superfluous fields (which are not included in fields_to_keep.json) from a specified bib file and abbreviates the journal names, if applicable (see abbreviations.txt).

  • extract_entries_with_given_keyword.py: Extracts the entries with a specific tag from an original bib file and saves the cleaned entries in another bib file. When applicable, journal names are abbreviated (see abbreviations.txt).

  • build_pdf_filenames.py: Writes the filename for the pdf file for every article or book in a bib file into another text file. The convention is

    • Article: {abbreviated journal name}.{year}.{volume}.{first page}.{first author last name}.{title}.pdf
    • Book: {first author last name}.{year}.{title}.{edition, if specified}.pdf

Command-line tool

The module's functionalities can be accessed through a command-line interface provided by cleanbib.py.

# Cleans the entries in original.bib and writes them in cleaned.bib
python bibclean.py original.bib -o cleaned.bib

# Cleans the entries in original.bib with tag1 and/or tag2 as keywords and writes them in cleaned.bib
python bibclean.py original.bib -t tag1 tag2 -o cleaned.bib

Further details can be found by executing

python bibclean.py --help

Customization

The fields to keep are specified in fields_to_keep.json. Note that all fields are kept for entry types not specified in fields_to_keep.json.

The fields that should minimally be present in any entries as specified in minimal_fields.json.

The journal abbreviations are specified in abbreviations.txt. Note that journals for which no abbreviation is provided will trigger a warning message and the original journal name will be kept in the new bib file. The script cleanup_abbreviations.py sorts and cleans abbreviations.txt and should be used whenever a new abbreviation is added.

About

A code to clean and extract a bib file based on keywords.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages