Python Markov Chain chatbot running on Telegram

Overview

Hanasubot

Hanasubot (Japanese 話すボット, talking bot) is a Python chatbot running on Telegram. The bot is based on Markov Chains so it can learn your word instantly, unlike neural network chatbots which require training. It uses a modified version of markovify library for that purporse. However, the output may not make sense at all, though it can sometimes generate hilarious replies.

In theory, the bot can learn in any languages, but for some languages word segmentation is required. The bot currently supports Chinese and Japanese word segmentation, with pkuseg, CkipTagger and mecab. Language detection relies on pycld2.

Hanasubot has a permission system so you can easily stop the bot learning from naughty kids in your group, while still reply them. Users with admin right can erase lines from bot corpus as well.

The bot is designed for Chinese Telegram groups so there are a lot of messages written in Chinese. I18n will happen in future and any help is welcome.

Installation

Python 3.6+ is required.

VENV_PATH=/path/to/your/venv  # Change this
python3 -m venv $VENV_PATH
source $VENV_PATH/bin/activate

pip3 install -r requirements.txt

If you are using Python 3.6, dataclasses 0.8 is required as well:

pip3 install dataclasses==0.8

For Python 3.7 and up, dataclasses is included so no need to install it.

To use CkipTagger for Traditional Chinese tokenization, you have to download the model file (see CkipTagger readme for a detailed guide):

python3 -c "from ckiptagger import data_utils; data_utils.download_data_gdown('./')"

Then unzip to a folder named ckipdata, in the same directory as the Python scripts.

Optionally, you can initialize the user dict for pkuseg and CkipTagger, before start running the bot:

touch ./pkuseg_dict.txt
touch ./ckip_dict.json

Configuration

Copy config.example.py and fill it out. Please check the comments in config file.

cp config.example.py config.py

After that, simply start the bot:

python3 tgbot.py

Bot commands and usage

Simply reply to the bot and it will say some random words if you have collected enough corpus. The bot will also learn from your message instantly. Special commands are as follows.

Require root

  • /reload_config - Reload config file without restarting the bot. Some entries cannot be dynamically reloaded though, see config.example.py for details.

Require admin

  • /erase - Remove lines from corpus. (Non-admins can only erase lines sent by themselves.)
  • /userweight - Set user weight.
  • /ban - Set user right to -1.
  • /restrict - Set user right to 1.
  • /grantnormal - Set user right to 2.
  • /granttrusted -Set user right to 3.
  • /grantadmin - Set user right to 4. Admins are able to add/remove other admins with above commands. See also the user right levels section.

Require trusted

  • /addword_cn - Add a word into pkuseg user dictionary.
  • /addword_tw - Add a word into CkipTagger user dictionary.
  • /rmword_cn - Remove a word from pkuseg user dictionary.
  • /rmword_tw - Remove a word from CkipTagger user dictionary.

Other commands

  • /clddbg - Test language detection of some texts.
  • /cutdbg - Test tokenization of some texts.
  • /policy - See what data is collected by the bot and so on.
  • /reload - Claim your admin rights after you get Telegram group admin.
  • /source - See the source code.
  • /start - Start chatting, useful when you can't find the bot messages to reply.

Database

Initialize

CREATE TABLE IF NOT EXISTS chat(
    chat_id integer PRIMARY KEY,
    chat_tgid integer NOT NULL UNIQUE,
    chat_name text
);
CREATE TABLE IF NOT EXISTS user(
    user_id integer PRIMARY KEY,
    user_tgid integer NOT NULL UNIQUE,
    user_name text,
    user_right integer DEFAULT 2,
    user_weight real DEFAULT 1.0
);
CREATE TABLE IF NOT EXISTS corpus(
    corpus_id integer PRIMARY KEY,
    corpus_time integer,
    corpus_line text NOT NULL UNIQUE,
    corpus_raw integer REFERENCES raw,
    corpus_chat integer REFERENCES chat,
    corpus_user integer REFERENCES user,
    corpus_weight real DEFAULT 1.0
);
CREATE TABLE IF NOT EXISTS raw(
    raw_id integer PRIMARY KEY,
    raw_text text UNIQUE
);

User right levels

  • 5 - root.
  • 4 - admin, can change user rights (except root users), can erase a line from corpus, and can set user_weight and corpus_weight (WIP).
  • 3 - trusted user, can feed the bot via private messages, and can add words into dictionary (for tokenization purposes).
  • 2 - normal user.
  • 1 - restricted user, bot will not write their messages into database.
  • -1 - banned user, bot will not reply to their messages.

TODOs

  • Let admins set corpus_weight
  • Batch /erase

License

MIT

A play store search telegram bot

Play-Store-Bot A play store search telegram bot Made with Python3 (C) @FayasNoushad Copyright permission under MIT License License - https://github.c

Fayas Noushad 17 Oct 28, 2022
A Python 2.7/3.x module for Amcrest Cameras using the SDK HTTP API.

A Python 2.7/3.x module for Amcrest Cameras using the SDK HTTP API. Amcrest and Dahua devices share similar firmwares. Dahua Cameras and NVRs also work with this module.

Marcelo Moreira de Mello 176 Dec 21, 2022
SQS + Lambda를 활용한 문자 메시지 및 이메일, Voice call 호출을 간단하게 구현하는 serverless 템플릿

AWS SQS With Lambda notification 서버 구축을 위한 Poc TODO serverless를 통해 sqs 관련 리소스(람다, sqs) 배포 가능한 템플릿 작성 및 배포 poc차원에서 간단한 rest api 호출을 통한 sqs fifo 큐에 메시지

김세환 4 Aug 08, 2021
Discord Token Checker and Info

Discord Token Checker A simple way to check Discord user tokens and their info in bulk. By Roover#7098. https://discord.gg/W8hnMWY6XP Proxy support co

Roover 3 Dec 09, 2021
This is a simple bot for running Python code through Discord

Python Code Runner Discord Bot This is a simple bot for running Python code through Discord. It was originally developed for the Beginner.Codes Discor

beginner.py 1 Feb 14, 2022
A bot that connects your guild chat to a Discord channel, written in Python.

Guild Chat Bot A bot that connects your guild chat to a discord channel. Uses discord.py and pyCraft Deploy on Railway Railway is a cloud development

Evernote 10 Sep 25, 2022
❄️ Don't waste your money paying for new tokens, once you have used your tokens, clean them up and resell them!

TokenCleaner Don't waste your money paying for new tokens, once you have used your tokens, clean them up and resell them! If you have a very large qua

0xVichy 59 Nov 14, 2022
A file-based quote bot written in Python

Let's Write a Python Quote Bot! This repository will get you started with building a quote bot in Python. It's meant to be used along with the Learnin

0 Jan 20, 2022
Discord ToolBox is a discord bot developed by DJD320 created for the purpose of having some convenient tools in the form of a single bot.

Discord ToolBox Discord ToolBox is a discord bot developed by DJD320 created for the purpose of having some convenient tools in the form of a single b

3 Aug 07, 2021
A Telegram bot for personal utilities

Aqua Aqua is a Telegram bot for personal utilities. Installation Prerequisites: Install Poetry for managing dependencies and fork/clone the repository

Guilherme Vasconcelos 2 Mar 30, 2022
Polar devices Python API and CLI.

loophole - Polar devices API About Python API for Polar devices. Command line interface included. Tested with: A360 Loop M400 Installation pip install

[roscoe] 145 Sep 14, 2022
ANKIT-OS/TG-SESSION-GENERATOR-BOTbisTG-SESSION-GENERATOR-BOT a special repository. Its Is A Telegram Bot To Generate String Session

ANKIT-OS/TG-SESSION-GENERATOR-BOTbisTG-SESSION-GENERATOR-BOT a special repository. Its Is A Telegram Bot To Generate String Session

ANKIT KUMAR 1 Dec 26, 2021
🥀 Find the start of the token !

Discord Token Finder Find half of your target's token with just their ID. Install 🔧 pip install -r requeriments.txt Gui Usage 💻 Go to Discord Setti

zeytroxxx 2 Dec 22, 2021
Powerful and Async API for AnimeWorld.tv 🚀

Powerful and Async API for AnimeWorld.tv 🚀

1 Nov 13, 2021
Tubee is a web application, which runs actions when your subscribed channel upload new videos

Tubee is a web application, which runs actions when your subscribed channel upload new videos, think of it as a better IFTTT but built specifically for YouTube with many enhancements.

Tomy Hsieh 11 Jan 01, 2023
A Django-style ORM idea for manipulating Google Datastore entities

No SeiQueLa ORM EM DESENVOLVIMENTO Uma ideia de ORM no estilo do Django para manipular entidades do Google Datastore. Montando seu modelo: from noseiq

Geraldo Castro 16 Nov 01, 2022
This is a simple code for discord bot !

Discord bot dice roller this is a simple code for discord bot it can roll 1d4, 1d6, 1d8, 1d10, 1d12, 1d20, 1d100 for you in your discord server. Actua

Mostafa Koolabadi 0 Jan 02, 2022
Twitter automation tool for growing organic followers.

Tiwoto Tiwoto is a simple python program that automates some kind of behaviors and keep your account active. Create an .env file in this directory and

Mehmetcan Yildiz 6 Sep 22, 2022
Binance Futures Client

Binance Futures Client

4 Aug 02, 2022