A Python library for generating new text from existing samples.

Last update: May 17, 2022

Related tags

Overview

ReMarkov is a Python library for generating text from existing samples using Markov chains. You can use it to customize all sorts of writing from birthday messages, horoscopes, Wikipedia articles, or the utterances of your game's NPCs. Everything works without an omnipotent "AI" - it is dead-simple code and therefore fast.

Check out the examples and feel free to contribute!

Installation

pip3 install remarkov

Example

Scrape the Wikipedia page for "Computer Programming" and generate a new text from it:

./tools/scrape-wiki.py Computer_programming | remarkov build | remarkov generate

You can also use remarkov programmatically:

from remarkov import create_model

model = create_model()
model.add_text("This is a sample text and this is another.")

print(model.generate().text())
# "This is a sample text and this is a sample text and this is a sample text ..."

Development

Make sure you run pytest as module. This will add the current directory to the import path:

python3 -m pytest

This project uses black for source code formatting:

black .

Generate documentation for the project (this uses the original pdoc at pdoc.dev):

git checkout gh-pages
pdoc -t pdoc/template -o public/docs <path_to_remarkov_module>

Run type checks using mypy:

mypy -p remarkov

Publishing is done like this (don't forget to bump the version in setup.py):

pip3 install twine # optional

git tag -a <version>
git push --tags

python3 setup.py clean --all
python3 setup.py sdist bdist_wheel
twine check "dist/*"
twine upload "dist/*"

You might also like...

Qimera: Data-free Quantization with Synthetic Boundary Supporting Samples

Qimera: Data-free Quantization with Synthetic Boundary Supporting Samples This repository is the official implementation of paper [Qimera: Data-free Q

21 Nov 3, 2022

The Malware Open-source Threat Intelligence Family dataset contains 3,095 disarmed PE malware samples from 454 families

MOTIF Dataset The Malware Open-source Threat Intelligence Family (MOTIF) dataset contains 3,095 disarmed PE malware samples from 454 families, labeled

112 Dec 13, 2022

Final project for machine learning (CSC 590). Detection of hepatitis C and progression through blood samples.

Hepatitis C Blood Based Detection Final project for machine learning (CSC 590). Dataset from Kaggle. Using data from previous hepatitis C blood panels

1 Dec 28, 2021

Analysis of Antarctica sequencing samples contaminated with SARS-CoV-2

Analysis of SARS-CoV-2 reads in sequencing of 2018-2019 Antarctica samples in PRJNA692319 The samples analyzed here are described in this preprint, wh

4 Feb 9, 2022

Deep Text Search is an AI-powered multilingual text search and recommendation engine with state-of-the-art transformer-based multilingual text embedding (50+ languages).

Deep Text Search - AI Based Text Search & Recommendation System Deep Text Search is an AI-powered multilingual text search and recommendation engine w

19 Sep 29, 2022

TAP: Text-Aware Pre-training for Text-VQA and Text-Caption, CVPR 2021 (Oral)

TAP: Text-Aware Pre-training TAP: Text-Aware Pre-training for Text-VQA and Text-Caption by Zhengyuan Yang, Yijuan Lu, Jianfeng Wang, Xi Yin, Dinei Flo

61 Nov 14, 2022

Pytorch re-implementation of Paper: SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition (CVPR 2022)

SwinTextSpotter This is the pytorch implementation of Paper: SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text R

183 Jan 3, 2023

A PyTorch implementation of "From Two to One: A New Scene Text Recognizer with Visual Language Modeling Network" (ICCV2021)

From Two to One: A New Scene Text Recognizer with Visual Language Modeling Network The official code of VisionLAN (ICCV2021). VisionLAN successfully a

81 Dec 12, 2022

An example project demonstrating how the Autonomous Learning Library can be used to build new reinforcement learning agents.

About This repository shows how Autonomous Learning Library can be used to build new reinforcement learning agents. In particular, it contains a model

5 Aug 30, 2022

Comments

Release schedule
[x] Add source code documentation

[x] Improve explanation on website

[x] Adapt syntax highlighting in docs

[x] Generate samples for showcase

[x] Articles

[x] Birthday

[x] Horoscope

[x] Utterance

[x] Enable gh-pages
opened by lausek 0

Releases(v0.2.3)

v0.2.3(Jan 15, 2022)
ReMarkov Example Datasets - EN

Based on:

https://github.com/kavgan/OpinRank (Cars, Hotels)

https://github.com/dsnam/markovscope (Horoscopes)

https://github.com/hmi-utwente/video-game-text-corpora (NPC)

ReMarkov Wikipedia Scraper (Blockchain)

Source code(tar.gz)
Source code(zip)
remarkov-dataset.7z(6.16 MB)
remarkov-dataset.zip(9.05 MB)

A Python library for generating new text from existing samples.

Related tags

Overview

Installation

Example

Development

You might also like...

Qimera: Data-free Quantization with Synthetic Boundary Supporting Samples

The Malware Open-source Threat Intelligence Family dataset contains 3,095 disarmed PE malware samples from 454 families

Final project for machine learning (CSC 590). Detection of hepatitis C and progression through blood samples.

Analysis of Antarctica sequencing samples contaminated with SARS-CoV-2

Deep Text Search is an AI-powered multilingual text search and recommendation engine with state-of-the-art transformer-based multilingual text embedding (50+ languages).

TAP: Text-Aware Pre-training for Text-VQA and Text-Caption, CVPR 2021 (Oral)

Pytorch re-implementation of Paper: SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition (CVPR 2022)

A PyTorch implementation of "From Two to One: A New Scene Text Recognizer with Visual Language Modeling Network" (ICCV2021)

An example project demonstrating how the Autonomous Learning Library can be used to build new reinforcement learning agents.

Comments

Release schedule

Releases(v0.2.3)

v0.2.3(Jan 15, 2022)

ReMarkov Example Datasets - EN

Owner

git《USD-Seg:Learning Universal Shape Dictionary for Realtime Instance Segmentation》(2020) GitHub: [fig2]

Convnext-tf - Unofficial tensorflow keras implementation of ConvNeXt

[CVPR 2021] Anycost GANs for Interactive Image Synthesis and Editing

Code for "Share With Thy Neighbors: Single-View Reconstruction by Cross-Instance Consistency" paper

Implementation of "StrengthNet: Deep Learning-based Emotion Strength Assessment for Emotional Speech Synthesis"

Built a deep neural network (DNN) that functions as an end-to-end machine translation pipeline

A PyTorch implementation of "TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?"

NER for Indian languages

Official implementation of NLOS-OT: Passive Non-Line-of-Sight Imaging Using Optimal Transport (IEEE TIP, accepted)

Continuous Augmented Positional Embeddings (CAPE) implementation for PyTorch

Several simple examples for popular neural network toolkits calling custom CUDA operators.

ppo_pytorch_cpp - an implementation of the proximal policy optimization algorithm for the C++ API of Pytorch

Dynamic Bottleneck for Robust Self-Supervised Exploration

This repository implements and evaluates convolutional networks on the Möbius strip as toy model instantiations of Coordinate Independent Convolutional Networks.

PSPNet in Chainer

Final term project for Bayesian Machine Learning Lecture (XAI-623)

基于PaddleOCR搭建的OCR server... 离线部署用

Code release for General Greedy De-bias Learning

Easy-to-use micro-wrappers for Gym and PettingZoo based RL Environments

Space Invaders For Python