This is a text summarizing tool written in Python

Overview

Summarize

Written by: Ling Li Ya

This is a text summarizing tool written in Python.

User Guide

Some things to note:

  • The application is accessible here.
  • However, due to limited free-tier server resources, the application may crash, so it is advisable that you run this project locally.
  • You might not be able to run the abstractive models after reaching a character limit in HuggingFace Accelerated Inference API. Therefore, it is advisable that you use the Notebooks for replicating our results in the documentation.
  • Note that you might not be able to run Pegasus on the notebook successfully due to the amount of resources required, so it is advisable that you run only the Pegasus model through the application interface.

To run the project locally, please refer to the guide below.

Setup Tutorial Video (Windows)

SummarizeLocalSetup.mp4

for the detailed steps in word, refer to sections below

1. Downloading the project

Either download the .zip file in Google Classroom from our GitHub. image

Then unzip the .zip file. You will see the file summarize-main. image

2. Install prerequisites

You need Python and Node.js installed. Open up command prompt (cmd) and type in the code below.

To check whether Python is installed:

$ python

You will see this is it is installed. Note that your version might be different.
image

Type exit() to exit the Python shell if it is installed.

To check whether Node.js is installed:

$ node

You will see this is it is installed. Note that your version might be different.
image

Otherwise, download Python and/or Node.js here. Run the installer and follow its instructions. Verify your installation.

3. Install project Python dependencies

Double click on summarize-main. Single click on the summarize folder, hold down your shift key, and right click on the folder. Select Open PowerShell window here. image

A PowerShell window will pop up. Then right click on the Makefile in the file explorer and open it with Notepad. image

Something like this will pop up: image

These are the commands to install all the project Python dependencies. Simply copy the command and paste them in the PowerShell window. If you encounter this warning message: image

Simply retype the command with an additional flag pip install -r requirements.txt --use-feature-in-tree-build. Then let it run. image

4. Install our summarize library

We have made our application into a Python library and you need to install it with the command below: image

5. Run the backend server

Be sure that you select the command under the server-dev instead of server-prod. image

6. Prepare the frontend client

Open up another PowerShell window this time by holding shift and right clicking the server folder.

After you have installed Node.js, run the following command to install pnpm.

$ npm install -g pnpm

After installing pnpm, type cd client to go into the client folder in the new PowerShell window.

Then return to your Notepad and run the command pnpm i in the PowerShell window. It will take 10 - 20 seconds to install. image

7. Run the frontend client

Run this command in the PowerShell window to launch the application on localhost:3333 image

You will see this: image

8. Adding API token

To use BART, T5 and Pegasus, you need an API token. We will private message you an API token because it is not supposed to be public.


At the summarize-main project root, right click on an empty space to add a new .txt named .env. image

Click on yes for this warning: image

Open the .env file in Notepad. Type in HUGGING_FACE_API_TOKEN_={your_api_token}. It will look something like this: image

Save the file then refresh the Summarize web application page. image

You will be able to use the models now.

Code folders

  • summarize - The python library for all the algorithm
  • server - The backend server using FastAPI
  • client - The frontend app using Vue3

Misc folders

  • notebooks - A folder to keep all our jupyter notebooks testground
  • data - A folder to keep all datasets needed to train or test the algorithm
  • docs - Keep our documentation files
Owner
Marcus Lee
Currently studying Software Engineering at TARUC, Kuala Lumpur. Mainly code in TypeScript, Golang, Python, Java Interested in Backend & Fullstack Dev.
Marcus Lee
Hotpotato is a recipe portfolio App that assists users to discover and comment new recipes.

Hotpotato Hotpotato is a recipe portfolio App that assists users to discover and comment new recipes. It is a fullstack React App made with a Redux st

Nico G Pierson 13 Nov 05, 2021
🍋 A Python package to process food

Pyfood is a simple Python package to process food, in different languages. Pyfood's ambition is to be the go-to library to deal with food, recipes, on

Local Seasonal 8 Apr 04, 2022
Aml - anti-money laundering

Anti-money laundering Dedect relationship between A and E by tracing through payments with similar amounts and identifying payment chains. For example

3 Nov 21, 2022
JSON and CSV data for Swahili dictionary with over 16600+ words

kamusi JSON and CSV data for swahili dictionary with over 16600+ words. This repo consists of data from swahili dictionary with about 16683 words toge

Jordan Kalebu 8 Jan 13, 2022
A non-validating SQL parser module for Python

python-sqlparse - Parse SQL statements sqlparse is a non-validating SQL parser for Python. It provides support for parsing, splitting and formatting S

Andi Albrecht 3.1k Jan 04, 2023
Returns unicode slugs

Python Slugify A Python slugify application that handles unicode. Overview Best attempt to create slugs from unicode strings while keeping it DRY. Not

Val Neekman 1.3k Jan 04, 2023
Getting git-style versioning working on RDFlib

Getting git-style versioning working on RDFlib

Gabe Fierro 1 Feb 01, 2022
An implementation of figlet written in Python

All of the documentation and the majority of the work done was by Christopher Jones ([emai

Peter Waller 1.1k Jan 02, 2023
Extract knowledge from raw text

Extract knowledge from raw text This repository is a nearly copy-paste of "From Text to Knowledge: The Information Extraction Pipeline" with some cosm

Raphael Sourty 10 Dec 03, 2022
This is a text summarizing tool written in Python

Summarize Written by: Ling Li Ya This is a text summarizing tool written in Python. User Guide Some things to note: The application is accessible here

Marcus Lee 2 Feb 18, 2022
Question answering on russian with XLMRobertaLarge as a service

QA Roberta Ru SaaS Question answering on russian with XLMRobertaLarge as a service. Thanks for the model to Alexander Kaigorodov. Stack Flask Gunicorn

Gladkikh Prohor 21 Jul 04, 2022
Python library for creating PEG parsers

PyParsing -- A Python Parsing Module Introduction The pyparsing module is an alternative approach to creating and executing simple grammars, vs. the t

Pyparsing 1.7k Dec 27, 2022
Find a Doc is a free online resource aimed at helping connect the foreign community in Japan with health services in their native language.

Find a Doc - Localization Find a Doc is a free online resource aimed at helping connect the foreign community in Japan with health services in their n

Our Japan Life 18 Dec 19, 2022
BaseCrack is a tool written in Python that can decode all alphanumeric base encoding schemes.

BaseCrack Decoder For Base Encoding Schemes BaseCrack is a tool written in Python that can decode all alphanumeric base encoding schemes. This tool ca

Mufeed VH 383 Dec 27, 2022
WorldCloud Orçamento de Estado 2022

World Cloud Orçamento de Estado 2022 What it does This script creates a worldcloud, masked on a image, from a txt file How to run it? Install all libr

Jorge Gomes 2 Oct 12, 2021
A python tool to convert Bangla Bijoy text to Unicode text.

Unicode Converter A python tool to convert Bangla Bijoy text to Unicode text. Installation Unicode Converter can be installed via PyPi. Make sure pip

Shahad Mahmud 10 Sep 29, 2022
Wordle strategy: Find frequency of letters appearing in 5-letter words in the English language

Find frequency of letters appearing in 5-letter words in the English language In

Gabriel Apolinário 1 Jan 17, 2022
Compute distance between sequences. 30+ algorithms, pure python implementation, common interface, optional external libs usage.

TextDistance TextDistance -- python library for comparing distance between two or more sequences by many algorithms. Features: 30+ algorithms Pure pyt

Life4 3k Jan 02, 2023
TextStatistics - Get a text file wich contains English text

TextStatistics This program get a text file wich contains English text. The program analyses the text, and print some information. For this program I

2 Nov 15, 2021
Wikipedia Extractive Text Summarizer + Keywords Identification (entropy-based)

Wikipedia Extractive Text Summarizer + Keywords Identification (entropy-based)Wikipedia Extractive Text Summarizer + Keywords Identification (entropy-based)

Kevin Lai 1 Nov 08, 2021