A Python client for the Softcite software mention recognizer server

Overview

Softcite software mention recognizer client

Python client for using the Softcite software mention recognition service. It can be applied to

  • individual PDF files

  • recursively to a local directory, processing all the encountered PDF

  • to a collection of documents harvested by biblio-glutton-harvester and article-dataset-builder, with the benefit of re-using the collection manifest for injectng metadata and keeping track of progress. The collection can be stored locally or on a S3 storage.

Requirements

The client has been tested with Python 3.5-3.7.

The client requires a working Softcite software mention recognition service. Service host and port can be changed in the config.json file of the client.

Install

cd software_mention_client/

It is advised to setup first a virtual environment to avoid falling into one of these gloomy python dependency marshlands:

virtualenv --system-site-packages -p python3 env

source env/bin/activate

Install the dependencies, use:

pip3 install -r requirements.txt

Usage and options

usage: software_mention_client.py [-h] [--repo-in REPO_IN] [--file-in FILE_IN]
                                  [--file-out FILE_OUT]
                                  [--data-path DATA_PATH] [--config CONFIG]
                                  [--reprocess] [--reset] [--load]
                                  [--diagnostic] [--scorched-earth]

Softcite software mention recognizer client

optional arguments:
  -h, --help            show this help message and exit
  --repo-in REPO_IN     path to a directory of PDF files to be processed by
                        the Softcite software mention recognizer
  --file-in FILE_IN     a single PDF input file to be processed by the
                        Softcite software mention recognizer
  --file-out FILE_OUT   path to a single output the software mentions in JSON
                        format, extracted from the PDF file-in
  --data-path DATA_PATH
                        path to the resource files created/harvested by
                        biblio-glutton-harvester
  --config CONFIG       path to the config file, default is ./config.json
  --reprocess           reprocessed failed PDF
  --reset               ignore previous processing states and re-init the
                        annotation process from the beginning
  --load                load json files into the MongoDB instance, the --repo-
                        in parameter must indicate the path to the directory
                        of resulting json files to be loaded
  --diagnostic          perform a full count of annotations and diagnostic
                        using MongoDB regarding the harvesting and
                        transformation process
  --scorched-earth      remove a PDF file after its successful processing in
                        order to save storage space, careful with this!

The logs are written by default in a file ./client.log, but the location of the logs can be changed in the configuration file (default ./config.json).

Processing local PDF files

For processing a single file., the resulting json being written as file at the indicated output path:

python3 software_mention_client.py --file-in toto.pdf --file-out toto.json

For processing recursively a directory of PDF files, the results will be:

  • written to a mongodb server and database indicated in the config file

  • and in the directory of PDF files, as json files, together with each processed PDF

python3 software_mention_client.py --repo-in /mnt/data/biblio/pmc_oa_dir/

The default config file is ./config.json, but could also be specified via the parameter --config:

python3 software_mention_client.py --repo-in /mnt/data/biblio/pmc_oa_dir/ --config ./my_config.json

Processing a collection of PDF harvested by biblio-glutton-harvester

biblio-glutton-harvester and article-dataset-builder creates a collection manifest as a LMDB database to keep track of the harvesting of large collection of files. Storage of the resource can be located on a local file system or on a AWS S3 storage. The software-mention client will use the collection manifest to process these harvested documents.

  • locally:

python3 software_mention_client.py --data-path /mnt/data/biblio-glutton-harvester/data/

--data-path indicates the path to the repository of data harvested by biblio-glutton-harvester.

The resulting JSON files will be enriched by the metadata records of the processed PDF and will be stored together with each processed PDF in the data repository.

If the harvested collection is located on a S3 storage, the access information must be indicated in the configuration file of the client config.json. The extracted software mention will be written in a file with extension .software.json, for example:

-rw-rw-r-- 1 lopez lopez 1.1M Aug  8 03:26 0100a44b-6f3f-4cf7-86f9-8ef5e8401567.pdf
-rw-rw-r-- 1 lopez lopez  485 Aug  8 03:41 0100a44b-6f3f-4cf7-86f9-8ef5e8401567.software.json

If a MongoDB server access information is indicated in the configuration file config.json, the extracted information will additionally be written in MongoDB.

License and contact

Distributed under Apache 2.0 license. The dependencies used in the project are either themselves also distributed under Apache 2.0 license or distributed under a compatible license.

Main author and contact: Patrice Lopez ([email protected])

Battle.net and PlayStation title watcher that reports updates via Discord.

Renovate Renovate is a Battle.net and PlayStation title watcher that reports updates via Discord. Usage Open config_example.json and provide the confi

Ethan 1 Nov 23, 2022
๐€ ๐ฆ๐จ๐๐ฎ๐ฅ๐š๐ซ ๐“๐ž๐ฅ๐ž๐ ๐ซ๐š๐ฆ ๐†๐ซ๐จ๐ฎ๐ฉ ๐ฆ๐š๐ง๐š๐ ๐ž๐ฆ๐ž๐ง๐ญ ๐›๐จ๐ญ ๐ฐ๐ข๐ญ๐ก ๐ฎ๐ฅ๐ญ๐ข๐ฆ๐š๐ญ๐ž ๐Ÿ๐ž๐š๐ญ๐ฎ๐ซ๐ž๐ฌ !!

๐‡๐จ๐ฐ ๐“๐จ ๐ƒ๐ž๐ฉ๐ฅ๐จ๐ฒ For easiest way to deploy this Bot click on the below button ๐Œ๐š๐๐ž ๐๐ฒ ๐’๐ฎ๐ฉ๐ฉ๐จ๐ซ๐ญ ๐†๐ซ๐จ๐ฎ๐ฉ ๐’๐จ๐ฎ๐ซ๐œ๐ž๐ฌ ๐†๐ž๐ง๐ž?

Mukesh Solanki 1 Dec 10, 2021
Python script to backup/convert your Spotify playlists into the XSPF format.

Python script to backup/convert your Spotify playlists into the XSPF format.

Chris Ovenden 4 Jun 09, 2022
Discondelete, is a Discord self-bot to delete dm's or purge all messages from a guild.

Discondelete Discondelete, is a Discord self-bot to delete dm's or purge all messages from a guild. Report Bug ยท Request Feature Table of Contents Abo

core 4 Feb 28, 2022
buys ethereum based on graphics card moving average price on ebay

ebay_trades buys ethereum based on graphics card moving average price on ebay Built as a meme, this application will scrape the first 3 pages of ebay

ConnorCreate 41 Jan 05, 2023
ไผไธšๅพฎไฟกๆถˆๆฏๆŽจ้€็š„pythonๅฐ่ฃ…ๆŽฅๅฃ๏ผŒ่ฎฉไฝ ่ฝปๆพ็”จpythonๅฎž็Žฐๅฏนไผไธšๅพฎไฟก็š„ๆถˆๆฏๆŽจ้€

๐Ÿ‘‹ corpwechat-botๆ˜ฏไธ€ไธชpythonๅฐ่ฃ…็š„ไผไธšๆœบๅ™จไบบ&ๅบ”็”จๆถˆๆฏๆŽจ้€ๅบ“๏ผŒ้€š่ฟ‡ไผไธšๅพฎไฟกๆไพ›็š„apiๅฎž็Žฐใ€‚ ๅˆฉ็”จๆœฌๅบ“๏ผŒไฝ ๅฏไปฅ่ฝปๆพๅœฐๅฎž็ŽฐไปŽๆœๅŠกๅ™จ็ซฏๅ‘้€ไธ€ๆกๆ–‡ๆœฌใ€ๅ›พ็‰‡ใ€่ง†้ข‘ใ€markdown็ญ‰็ญ‰ๆถˆๆฏๅˆฐไฝ ็š„ๅพฎไฟกๆ‰‹ๆœบ็ซฏ๏ผŒ่€Œไธไพ่ต–ไบŽๅ…ถไป–็š„็ฌฌไธ‰ๆ–นๅบ”็”จ๏ผŒๅฆ‚ServerChanใ€‚ ๅฆ‚ๆžœๅ–œๆฌข่ฏฅ้กน็›ฎ๏ผŒ่ฎฐๅพ—็ป™ไธช

Chaopeng 161 Jan 06, 2023
Scheduled Block Checker for Cardano Stakepool Operators

ScheduledBlocks Scheduled Block Checker for Cardano Stakepool Operators Lightweight and Portable Scheduled Blocks Checker for Current Epoch. No cardan

SNAKE (Cardano Stakepool) 4 Oct 18, 2022
A simple bot that lives in your Telegram group, logging messages to a Postgresql database and serving statistical tables and plots to users as Telegram messages.

telegram-stats-bot Telegram-stats-bot is a simple bot that lives in your Telegram group, logging messages to a Postgresql database and serving statist

22 Dec 26, 2022
A VCVideoPlayer Bot for Telegram made with ๐Ÿ’ž By @TeamDeeCoDe

VC Video Player How To Host โœจ Heroku Deploy โœจ The easiest way to deploy this Bot is via Heroku. Credit ๐Ÿ”ฅ |๐Ÿ‡ฎ๐Ÿ‡ณ Louis |๐Ÿ‡ฎ๐Ÿ‡ณ Sammy |๐Ÿ‡ฎ๐Ÿ‡ณ Blaze |๐Ÿ‡ฎ๐Ÿ‡ณ S

TeamDeeCode 6 Feb 28, 2022
Huggingface transformers for discord

disformers Huggingface transformers for discord base source butyr/huggingface-transformer-chatbots install pip install -U disformers example see examp

SpaceDEVofficial 1 Nov 09, 2021
๐Ÿš€ A fast, flexible and lightweight Discord API wrapper for Python.

Krema A fast, flexible and lightweight Discord API wrapper for Python. Installation Unikorn unikorn add kremayard krema -no-confirmation Pip pip insta

Krema 20 Sep 04, 2022
This is Source Code of PdiskUploaderBot

PdiskUploaderBot This is the source code of PdiskUploaderBot. And the developer of this bot is AJTimePyro, His Telegram Channel & Group. You can use t

Abhijeet 8 Oct 20, 2022
Non official, but friendly QvaPay library for the Python language.

Python SDK for the QvaPay API Non official, but friendly QvaPay library for the Python language. Setup You can install this package by using the pip t

Carlos Lugones 17 Nov 25, 2022
A bot i made for a dead com server lol it gets updated daily etc

6ix-Bot-Source A bot i made for a dead com server lol it gets updated daily etc For The UserAgent CMD https://developers.whatismybrowser.com/ thats a

Swiper 9 Mar 10, 2022
Example-bot-discord - Example bot discord xD

example-python-bot-discord Clone this repository Grab a token on Discord's devel

Amitminer 1 Mar 14, 2022
Heroku app to explore boardgame data

A Dashboard for the Board Game Geeks among us Link to Application As many Board Game Geeks like myself track the scores of board game matches I decide

Maarten Grootendorst 20 Nov 23, 2022
The best Discord bot, created for r/Jailbreak

Bloo Setup instructions These instructions assume you are on macOS or Linux. Windows users, good luck. With Docker (recommended!) You will need the fo

GIR 33 Dec 16, 2022
Python written Rule34 API

Python written Rule34 API

1 Nov 11, 2021
A Telegram Bot That Can Find Lyrics Of Song

Lyrics-Search-Bot A Telegram Bot That Can Find Lyrics Of Song A Simple Telegram Bot That Can Extract Lyrics Of Any Songs Deploy Commands start - To St

Muhammed Fazin 11 Oct 21, 2022
Moon-TikTok-Checker - A TikTok Username checking tool that probably 3/4 people use to get rare usernames

Moon Checker (educational Purposes Only) What Is Moon Checker? This is a TikTok

glide 4 Nov 30, 2022