The Internet Archive Research Assistant - Daily search Internet Archive for new items matching your keywords

Last update: Dec 25, 2022

Overview

tiara - The Internet Archive Research Assistant

The Internet Archive Research Assistant - Daily search Internet Archive for new items matching your keywords

by Kay Savetz, May 2021.

Searches Internet Archive using its full text search for new items matching the keywords you specify. Run this script once a day via crontab for daily updates about new items relevant to your ongoing research subjects. It keeps track of the items it has already found, so will only alert you to new-to-you items. The script outputs its findings to an html file, and optionally emails that file to you via SendGrid or your system mail (eg Sendmail or Postfix).

Put your keywords in searchlist.txt, one search term per line. Very general terms (like "dogs") provide too many daily hits to be useful. More specific phrases work better.

Dependency: Internet Archive command line tool (Install with pip install internetarchive) The script also requires read-write access to the directory it lives in.

Issue: Internet Archive cannot generate thumbnails for all items. In these cases, you may see a broken image icon. Issue: Internet Archive's full text search doesn't seem to allow exact phrase matching. So a search for "Pliny The Elder" may turn up items mentioning Pliny The Younger, or with "Pliny" on one page and "elder" on another.

If you find this tool useful, please donate to Internet Archive

The Internet Archive Research Assistant - Daily search Internet Archive for new items matching your keywords

Related tags

Overview

tiara - The Internet Archive Research Assistant

Owner

Kay Savetz

Beyond the Imitation Game collaborative benchmark for enormous language models

Under the hood working of transformers, fine-tuning GPT-3 models, DeBERTa, vision models, and the start of Metaverse, using a variety of NLP platforms: Hugging Face, OpenAI API, Trax, and AllenNLP

Continuously update some NLP practice based on different tasks.

ETM - R package for Topic Modelling in Embedding Spaces

Programme de chiffrement et de déchiffrement inverse d'un message en python3.

code for modular summarization work published in ACL2021 by Krishna et al

Word Bot for JKLM Bomb Party

VMD Audio/Text control with natural language

Parrot is a paraphrase based utterance augmentation framework purpose built to accelerate training NLU models

The aim of this task is to predict someone's English proficiency based on a text input.

wxPython app for converting encodings, modifying and fixing SRT files

Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition

Sapiens is a human antibody language model based on BERT.

Every Google, Azure & IBM text to speech voice for free

Code for CodeT5: a new code-aware pre-trained encoder-decoder model.

Utility for Google Text-To-Speech batch audio files generator. Ideal for prompt files creation with Google voices for application in offline IVRs

Search with BERT vectors in Solr and Elasticsearch

Ukrainian TTS (text-to-speech) using Coqui TTS

Pytorch NLP library based on FastAI

Gathers machine learning and Tensorflow deep learning models for NLP problems, 1.13 < Tensorflow < 2.0