BudouX is the successor to Budou, the machine learning powered line break organizer tool.

Overview

BudouX

PyPI npm

Standalone. Small. Language-neutral.

BudouX is the successor to Budou, the machine learning powered line break organizer tool.

Example

It is standalone. It works with no dependency on third-party word segmenters such as Google cloud natural language API.

It is small. It takes only around 15 KB including its machine learning model. It's reasonable to use it even on the client-side.

It is language-neutral. You can train a model for any language by feeding a dataset to BudouX’s training script.

Last but not least, BudouX supports HTML inputs.

Demo

https://google.github.io/budoux/

Natural languages supported by default

  • Japanese

Supported Programming languages

For details about the JavaScript module, please visit JavaScript README.

Python module

Install

$ pip install budoux

Usage

You can get a list of phrases by feeding a sentence to the parser.

import budoux
parser = budoux.load_default_japanese_parser()
print(parser.parse('今日は天気です。'))
# ['今日は', '天気です。']

You can also translate an HTML string by wrapping phrases with non-breaking markup.

今日はとても天気です。 ">
print(parser.translate_html_string('今日はとても天気です。'))
# 今日はとても天気です。

If you have a custom model, you can use it as follows.

with open('/path/to/your/model.json') as f:
  model = json.load(f)
parser = budoux.Parser(model)

A model file for BudouX is a JSON file that contains pairs of a feature and its score extracted by machine learning training. Each score represents the significance of the feature in determining whether to break the sentence at a specific point.

For more details of the JavaScript model, please refer to JavaScript module README.

Caveat

BudouX supports HTML inputs and outputs HTML strings with markup that wraps phrases, but it's not meant to be used as an HTML sanitizer. BudouX doesn't sanitize any inputs. Malicious HTML inputs yield malicious HTML outputs. Please use it with an appropriate sanitizer library if you don't trust the input.

Background

English text has many clues, like spacing and hyphenation, that enable beautiful and readable line breaks. However, some CJK languages lack these clues, and so are notoriously more difficult to process. Line breaks can occur randomly and usually in the middle of a word or a phrase without a more careful approach. This is a long-standing issue in typography on the Web, which results in a degradation of readability.

Budou was proposed as a solution to this problem in 2016. It automatically translates CJK sentences into HTML with lexical phrases wrapped in non-breaking markup, so as to semantically control line breaks. Budou has solved this problem to some extent, but it still has some problems integrating with modern web production workflow.

The biggest barrier in applying Budou to a website is that it has dependency on third-party word segmenters. Usually a word segmenter is a large program that is infeasible to download for every web page request. It would also be an undesirable option making a request to a cloud-based word segmentation service for every sentence, considering the speed and cost. That’s why we need a standalone line break organizer tool equipped with its own segmentation engine small enough to be bundled in a client-side JavaScript code.

BudouX is the successor to Budou, designed to be integrated with your website with no hassle.

How it works

BudouX uses the AdaBoost algorithm to segment a sentence into phrases by considering the task as a binary classification problem to predict whether to break or not between all characters. It uses features such as the characters around the break point, their Unicode blocks, and combinations of them to make a prediction. The output machine learning model, which is encoded as a JSON file, stores pairs of the feature and its significance score. The BudouX parser takes a model file to construct a segmenter and translates input sentences into a list of phrases.

Building a custom model

You can build your own custom model for any language by preparing training data in the target language. A training dataset is a large text file that consists of sentences separated by phrases with the separator symbol "▁" (U+2581) like below.

私は▁遅刻魔で、▁待ち合わせに▁いつも▁遅刻してしまいます。
メールで▁待ち合わせ▁相手に▁一言、▁「ごめんね」と▁謝れば▁どうにか▁なると▁思っていました。
海外では▁ケータイを▁持っていない。

Assuming the text file is saved as mysource.txt, you can build your own custom model by running the following commands.

$ pip install -r requirements_dev.txt
$ python scripts/encode_data.py mysource.txt -o encoded_data.txt
$ python scripts/train.py encoded_data.txt -o weights.txt
$ python scripts/build_model.py weights.txt -o mymodel.json

Please note that train.py takes time to complete depending on your computer resources. Good news is that the training algorithm is an anytime algorithm, so you can get a weights file even if you interrupt the execution. You can build a valid model file by passing that weights file to build_model.py even in such a case.

Constructing a training dataset from the KNBC corpus for Japanese

The default model for Japanese (budoux/models/ja_knbc.json) is built using the KNBC corpus. You can create a training dataset, which we name source_knbc.txt here, from that corpus by running the command below.

$ python scripts/load_knbc.py -o source_knbc.txt

Author

Shuhei Iitsuka

Disclaimer

This is not an officially supported Google product.

Comments
  • Implement a simple node.js cli tool.

    Implement a simple node.js cli tool.

    I've implemented a simple cli on work with npm.

    I think budoux is a great tool for Japanese web development and people need CLI tool using more easier. people can format texts only to install Node.js and run npx budoux-cli.

    For test locally.

    $ cd node_cli
    $ npm link
    $ budoux-cli -H 
    

    Output Example.

    $budoux-cli
    Please, pass one text argument to translate at least.
    $budoux-cli --help
    usage: budoux [-h] [-H] [-m JSON] [-d STR] [-V] [TXT]
    
    オプション:
      -H, --html     HTML mode                                                [真偽]
          --version  バージョンを表示                                         [真偽]
          --help     ヘルプを表示                                             [真偽]
    $budoux-cli --version
    0.0.1
    $budoux-cli 今日は天気です。
    今日は
    天気です。
    $budoux-cli '今日は天気です。'
    今日は
    天気です。
    $budoux-cli -H '今日は<b>とても天気</b>です。'
    <span style="word-break: keep-all; overflow-wrap: break-word;">今日は<b><wbr>とても<wbr>天気</b>です。</span>
    
    opened by junseinagao 21
  • Add custom help formatter and shorthand of `--thres`

    Add custom help formatter and shorthand of `--thres`

    $ budoux -h
    usage: budoux [-h] [-H] [-m JSON] [-d STR] [-t THRES] [-V] [TXT]
    
    BudouX is the successor to Budou,
    the machine learning powered line break organizer tool.
    
    positional arguments:
      TXT                      text (default: None)
    
    optional arguments:
      -h, --help               show this help message and exit
      -H, --html               HTML mode (default: False)
      -m JSON, --model JSON    custom model file path (default: /Users/eggplants/prog/budoux/budoux/models/ja-knbc.json)
      -d STR, --delim STR      output delimiter in TEXT mode (default: ---)
      -t THRES, --thres THRES  threshold value to separate chunks (default: 1000)
      -V, --version            show program's version number and exit
    
    
    opened by eggplants 4
  • Unittest CI for Python

    Unittest CI for Python

    I have added unittesting CI for Python.

    Note: This project apparently works in Python 3.9 due to typing annotation (like: list[str]), so we have to restrict python_requires in setup.cfg!

    https://github.com/eggplants/budoux/blob/1c5ce47e260e38abece8c1b26ac8af00b1e6541b/setup.cfg#L21

    opened by eggplants 4
  • Issue with custom model

    Issue with custom model

    Description

    Hi there, First thanks for the lib, it's impressive the results from such a small footprint😄

    The results were not exactly what I wanted for japanese tokenization, so I decided to train my own model and it was quite simple and straightforward. Sadly after importing the generated model in javascript it doesn't work.

    import { Parser, loadDefaultJapaneseParser } from 'budoux'
    import model from './mymodel.json'
    
    // obviously the following works
    const parser = loadDefaultJapaneseParser()
    console.log(parser.parse('今日は天気です。'))
    
    // but this doesn't
    const parser = new Parser(model)
    console.log(parser.parse('今日は天気です。'))
    

    Uncaught TypeError: this.model.values is not a function or its return value is not iterable at Parser.parse (parser.js:120:47)

    opened by kefniark 2
  • Add py.typed for static analysis with mypy

    Add py.typed for static analysis with mypy

    The budoux source code contains type hints, but the following error occurs when using mypy.

    $ cat main.py
    import budoux
    parser = budoux.load_default_japanese_parser()
    $ mypy main.py
    main.py:1: error: Skipping analyzing "budoux": module is installed, but missing library stubs or py.typed marker
    main.py:1: note: See https://mypy.readthedocs.io/en/stable/running_mypy.html#missing-imports
    Found 1 error in 1 file (checked 1 source file)
    

    I added py.typed file according to the URL in the error message above. After adding the file, the error no longer occurs as shown below.

    $ mypy main.py
    Success: no issues found in 1 source file
    
    opened by ryu22e 2
  • Change `applyElement` to call `HTMLProcessor`

    Change `applyElement` to call `HTMLProcessor`

    This patch changes Parser.applyElement to call the HTMLProcessor class.

    The HTMLProcessorOptions.separator is changed to accept a Node. This is because undefined had double meanings; i.e., use the default (ZWSP) and use the <wbr> element.

    opened by kojiishi 2
  • CLI

    CLI

    I have implemented CLI.

    $ pip install -e .
    Obtaining file:///home/eggplants/prog/budoux
      Installing build dependencies ... done
      Checking if build backend supports build_editable ... done
      Getting requirements to build wheel ... done
      Preparing metadata (pyproject.toml) ... done
    Installing collected packages: budoux
      Running setup.py develop for budoux
    Successfully installed budoux-0.0.1
    $ budoux -h
    usage: budoux [-h] [-H] [-m JSON] [-d STR] [-V] [TXT]
    
    BudouX is the successor to Budou,
    the machine learning powered line break organizer tool.
    
    positional arguments:
      TXT                    text
    
    optional arguments:
      -h, --help             show this help message and exit
      -H, --html             HTML mode
      -m JSON, --model JSON  custom model file path (default: models/ja-knbc.json)
      -d STR, --delim STR    output delimiter in TEXT mode (default: '---')
      -V, --version          show program's version number and exit
    $ budoux -V
    budoux 0.0.1
    $ budoux 今日は天気です。
    今日は
    天気です。
    $ budoux -H "今日は<b>とても天気</b>です。"
    <span style="word-break: keep-all; overflow-wrap: break-word;">今日は<b ><wbr>とても<wbr>天気</b>です。</span>
    $ echo 今日は天気です。 | budoux
    $ echo -e "今日は天気です。\n昨日は曇りでした。" | budoux
    今日は
    天気です。
    ---
    昨日は
    曇りでした。
    $ budoux # interactive input
    test
    こんにちは
    おはよう
    [^d]
    test
    ---
    こんにちは
    ---
    おはよう
    $
    
    opened by eggplants 2
  • CVE-2007-4559 Patch

    CVE-2007-4559 Patch

    Patching CVE-2007-4559

    Hi, we are security researchers from the Advanced Research Center at Trellix. We have began a campaign to patch a widespread bug named CVE-2007-4559. CVE-2007-4559 is a 15 year old bug in the Python tarfile package. By using extract() or extractall() on a tarfile object without sanitizing input, a maliciously crafted .tar file could perform a directory path traversal attack. We found at least one unsantized extractall() in your codebase and are providing a patch for you via pull request. The patch essentially checks to see if all tarfile members will be extracted safely and throws an exception otherwise. We encourage you to use this patch or your own solution to secure against CVE-2007-4559. Further technical information about the vulnerability can be found in this blog.

    If you have further questions you may contact us through this projects lead researcher Kasimir Schulz.

    opened by TrellixVulnTeam 1
  • Add missing export for HTMLProcessor in index.ts

    Add missing export for HTMLProcessor in index.ts

    Hi, I was experimenting with the library and wanted to do something with the HTMLProcessor as described here

    However, it would not let me import and I believe it's because there is a missing export statement from index.ts.

    Hence I added an extra line in.

    opened by Harukaichii 1
  • Fix a mathematical bug

    Fix a mathematical bug

    This change fixes the mathematical bug in the parse method, which eventually removes the necessity of the thres parameter entirely. This also fixes the deviation between the reported metrics during model training and actual quality of the results provided by the parser.

    This PR includes:

    • a small fix to remove line breaks from the training data, which will make the resulting parser robust in processing punctuations that often come to the end of sentences.
    • retrained model files based on the change above.
    • updated parser implementation with correct score calculation logic and no thres parameter.

    ⚠️ Breaking change thres won't be available in the parse method and the CLI options any more. Please fix your program if it's relying on the thres parameter.

    opened by tushuhei 1
  • Fix when the `display` property is empty

    Fix when the `display` property is empty

    This patch supports when the display property is empty.

    This occurs when the element is not connected. In that case, HTMLProcessor uses its built-in rules to determine whether the element is inline or block.

    Fixes #74.

    opened by kojiishi 1
Releases(v0.4.0)
  • v0.4.0(Dec 14, 2022)

    What's Changed

    • Traditional Chinese support by @tushuhei in https://github.com/google/budoux/pull/101

    Full Changelog: https://github.com/google/budoux/compare/v0.3.0...v0.4.0

    Source code(tar.gz)
    Source code(zip)
  • v0.3.0(Dec 5, 2022)

    What's Changed

    Faster model training

    We made model training faster by applying JAX's JIT compilation, pooling file writes, etc.

    • Faster training data encoding by @tushuhei in https://github.com/google/budoux/pull/89
    • Add out_span option for better GPU utilization by @tushuhei in https://github.com/google/budoux/pull/90
    • Apply JAX JIT compiling for faster training by @tushuhei in https://github.com/google/budoux/pull/95
    • Check in updated Simplified Chinese model by @tushuhei in https://github.com/google/budoux/pull/99

    Smaller models

    We made models smaller by removing less important features, disabling ASCII encoding, etc.

    • Remove Unicode Block features by @tushuhei in https://github.com/google/budoux/pull/86
    • Disable ASCII encoding when building the model file by @tushuhei in https://github.com/google/budoux/pull/98
    • Output compact model by @tushuhei in https://github.com/google/budoux/pull/100

    Misc

    • encode_data: write without break line join by @tushuhei in https://github.com/google/budoux/pull/91
    • Update unit tests for the encoding script by @tushuhei in https://github.com/google/budoux/pull/92
    • Add more granularity in weight outputs by @tushuhei in https://github.com/google/budoux/pull/93
    • Remove tar module dependency by @tushuhei in https://github.com/google/budoux/pull/96

    Full Changelog: https://github.com/google/budoux/compare/v0.2.1...v0.3.0

    Source code(tar.gz)
    Source code(zip)
  • v0.2.1(Nov 8, 2022)

    What's Changed

    • Fix mypy issue by @tushuhei in https://github.com/google/budoux/pull/83
    • Add missing export for HTMLProcessor in index.ts by @Harukaichii in https://github.com/google/budoux/pull/82
    • Remove P features from JS module by @tushuhei in https://github.com/google/budoux/pull/85
    • Nit fix for mypy issue by @tushuhei in https://github.com/google/budoux/pull/87
    • Version up to 0.2.1 by @tushuhei in https://github.com/google/budoux/pull/88

    New Contributors

    • @Harukaichii made their first contribution in https://github.com/google/budoux/pull/82

    Full Changelog: https://github.com/google/budoux/compare/v0.2.0...v0.2.1

    Source code(tar.gz)
    Source code(zip)
  • v0.2.0(Aug 4, 2022)

    What's Changed

    • Fix a mathematical bug by @tushuhei in https://github.com/google/budoux/pull/78
    • Add .js extension for better module portability by @tushuhei in https://github.com/google/budoux/pull/79
    • Remove the P features by @tushuhei in https://github.com/google/budoux/pull/80
    • Version up to 0.2.0 by @tushuhei in https://github.com/google/budoux/pull/81

    ⚠️ Breaking change

    • thres won't be available in the parse method and the CLI options any more. Please fix your program if it's relying on the thres parameter.
    • The parsing logic is different to older versions due to the fix for a mathematical error and removal of some features around past results. See #78 and #80 for details.

    Full Changelog: https://github.com/google/budoux/compare/v0.1.2...v0.2.0

    Source code(tar.gz)
    Source code(zip)
  • v0.1.2(Jul 19, 2022)

    What's Changed

    • Improve package format by @tushuhei in https://github.com/google/budoux/pull/75
    • Fix when the display property is empty by @kojiishi in https://github.com/google/budoux/pull/76
    • Version up to 0.1.2 by @tushuhei in https://github.com/google/budoux/pull/77

    Full Changelog: https://github.com/google/budoux/compare/v0.1.1...v0.1.2

    Source code(tar.gz)
    Source code(zip)
  • v0.1.1(Jul 14, 2022)

    What's Changed

    • Add isort and pytest to dev dependencies by @eggplants in https://github.com/google/budoux/pull/56
    • --lang option by @eggplants in https://github.com/google/budoux/pull/55
    • Add JavaScript HTMLProcessor class by @kojiishi in https://github.com/google/budoux/pull/58
    • Bump async from 2.6.3 to 2.6.4 in /demo by @dependabot in https://github.com/google/budoux/pull/59
    • Faster encode data by @tushuhei in https://github.com/google/budoux/pull/61
    • Faster preprocess by @tushuhei in https://github.com/google/budoux/pull/62
    • Change applyElement to call HTMLProcessor by @kojiishi in https://github.com/google/budoux/pull/60
    • Normalize weights not to overflow by @tushuhei in https://github.com/google/budoux/pull/63
    • Install Jax for GPU acceleration by @tushuhei in https://github.com/google/budoux/pull/64
    • Add py.typed for static analysis with mypy by @ryu22e in https://github.com/google/budoux/pull/65
    • Update gts to 4.0.0 by @tushuhei in https://github.com/google/budoux/pull/69
    • Fix Mypy GitHub Action by @tushuhei in https://github.com/google/budoux/pull/70
    • Output precision and recall during training by @tushuhei in https://github.com/google/budoux/pull/71
    • Upgrade dependencies by @tushuhei in https://github.com/google/budoux/pull/72
    • Version up to 0.1.1 by @tushuhei in https://github.com/google/budoux/pull/73

    New Contributors

    • @kojiishi made their first contribution in https://github.com/google/budoux/pull/58
    • @ryu22e made their first contribution in https://github.com/google/budoux/pull/65

    Full Changelog: https://github.com/google/budoux/compare/v0.1.0...v0.1.1

    Source code(tar.gz)
    Source code(zip)
  • v0.1.0(Apr 1, 2022)

    • Simplified Chinese support added.
    • Now the parser starts the segmentation process from the first character of the input sentence, in contrast to the old parser which starts the process from the third character assuming that the first phrase should be longer than 3 character long.
      • While this old assumption holds in many cases in Japanese, it does not apply when it comes to Chinese. We removed this assumption according to the introduction of the Simplified Chinese model.
    Source code(tar.gz)
    Source code(zip)
  • v0.0.4(Mar 30, 2022)

    What's Changed

    • Add thres arg to Python CLI by @tushuhei in https://github.com/google/budoux/pull/32
    • Add custom help formatter and shorthand of --thres by @eggplants in https://github.com/google/budoux/pull/33
    • Update dependent Node.js packages by @tushuhei in https://github.com/google/budoux/pull/35
    • Update build-demo.yml by @tushuhei in https://github.com/google/budoux/pull/36
    • mypy and flake8 by @eggplants in https://github.com/google/budoux/pull/34
    • Add description about CLI and deploy markdownlint CI by @eggplants in https://github.com/google/budoux/pull/37
    • Update style-check.yml by @tushuhei in https://github.com/google/budoux/pull/38
    • Specify python required version by @eggplants in https://github.com/google/budoux/pull/40
    • Add chunk-size option to reduce memory for model training by @tamanyan in https://github.com/google/budoux/pull/41
    • Bump follow-redirects from 1.14.7 to 1.14.8 in /demo by @dependabot in https://github.com/google/budoux/pull/44
    • Add thres parameter to Node.js CLI by @tushuhei in https://github.com/google/budoux/pull/46
    • Add a license header to .markdownlint.yaml by @tushuhei in https://github.com/google/budoux/pull/47
    • Take split_dataset out from fit by @tushuhei in https://github.com/google/budoux/pull/42
    • Dependencies version up by @tushuhei in https://github.com/google/budoux/pull/50

    New Contributors

    • @tamanyan made their first contribution in https://github.com/google/budoux/pull/41
    • @dependabot made their first contribution in https://github.com/google/budoux/pull/44

    Full Changelog: https://github.com/google/budoux/compare/v0.0.3...v0.0.4

    Source code(tar.gz)
    Source code(zip)
  • v0.0.3(Dec 2, 2021)

    Featured changes

    • Node.js CLI by @junseinagao
    • CI improvements by @eggplants
    • BudouX Web Components by @tushuhei

    What's Changed

    • Fix Typos by @hiro0218 in https://github.com/google/budoux/pull/22
    • Add test CI for NodeJS by @eggplants in https://github.com/google/budoux/pull/19
    • Add badges (PyPI, npm) by @eggplants in https://github.com/google/budoux/pull/21
    • Fix version data by @eggplants in https://github.com/google/budoux/pull/23
    • Add cli test by @eggplants in https://github.com/google/budoux/pull/16
    • Add PR trigger to CI by @eggplants in https://github.com/google/budoux/pull/24
    • Add npm link to test CI by @eggplants in https://github.com/google/budoux/pull/25
    • Export the parser threshold value by @tushuhei in https://github.com/google/budoux/pull/26
    • Update .prettierrc.js by @tushuhei in https://github.com/google/budoux/pull/27
    • Implement a simple node.js cli tool. by @junseinagao in https://github.com/google/budoux/pull/20
    • Refactor tests of node.js cli by @junseinagao in https://github.com/google/budoux/pull/28
    • Add web components by @tushuhei in https://github.com/google/budoux/pull/29
    • Version bump by @tushuhei in https://github.com/google/budoux/pull/30

    New Contributors

    • @hiro0218 made their first contribution in https://github.com/google/budoux/pull/22
    • @junseinagao made their first contribution in https://github.com/google/budoux/pull/20

    Full Changelog: https://github.com/google/budoux/compare/v0.0.2...v0.0.3

    Source code(tar.gz)
    Source code(zip)
  • v0.0.2(Nov 24, 2021)

    What's Changed

    • CLI by @eggplants in https://github.com/google/budoux/pull/6
    • Fix Python code style by @tushuhei in https://github.com/google/budoux/pull/11
    • Fix type hints to work with older Python versions by @tushuhei in https://github.com/google/budoux/pull/13
    • add: unittest CI for Python by @eggplants in https://github.com/google/budoux/pull/14
    • Fix: encoding error in windows by @eggplants in https://github.com/google/budoux/pull/15
    • Use native unittest instead of pytest by @tushuhei in https://github.com/google/budoux/pull/17
    • 0.0.2 release by @tushuhei in https://github.com/google/budoux/pull/18

    New Contributors 🎉

    • @eggplants made their first contribution in https://github.com/google/budoux/pull/6

    Full Changelog: https://github.com/google/budoux/compare/v0.0.1...v0.0.2

    Source code(tar.gz)
    Source code(zip)
Owner
Google
Google ❤️ Open Source
Google
Python/Sage Tool for deriving Scattering Matrices for WDF R-Adaptors

R-Solver A Python tools for deriving R-Type adaptors for Wave Digital Filters. This code is not quite production-ready. If you are interested in contr

8 Sep 19, 2022
The code from the Machine Learning Bookcamp book and a free course based on the book

The code from the Machine Learning Bookcamp book and a free course based on the book

Alexey Grigorev 5.5k Jan 09, 2023
Decision Weights in Prospect Theory

Decision Weights in Prospect Theory It's clear that humans are irrational, but how irrational are they? After some research into behavourial economics

Cameron Davidson-Pilon 32 Nov 08, 2021
李航《统计学习方法》复现

本项目复现李航《统计学习方法》每一章节的算法 特点: 笔记摘要:在每个文件开头都会有一些核心的摘要 pythonic:这里会用尽可能规范的方式来实现,包括编程风格几乎严格按照PEP8 循序渐进:前期的算法会更list的方式来做计算,可读性比较强,后期几乎完全为numpy.array的计算,并且辅助详

58 Oct 22, 2021
The project's goal is to show a real world application of image segmentation using k means algorithm

The project's goal is to show a real world application of image segmentation using k means algorithm

2 Jan 22, 2022
ML-powered Loan-Marketer Customer Filtering Engine

In Loan-Marketing business employees are required to call the user's to buy loans of several fields and in several magnitudes. If employees are calling everybody in the network it is also very length

Sagnik Roy 13 Jul 02, 2022
InfiniteBoost: building infinite ensembles with gradient descent

InfiniteBoost Code for a paper InfiniteBoost: building infinite ensembles with gradient descent (arXiv:1706.01109). A. Rogozhnikov, T. Likhomanenko De

Alex Rogozhnikov 183 Jan 03, 2023
Markov bot - A Writing bot based on Markov Chain for Data Structure Lab

基于马尔可夫链的写作机器人 前端 用html/css完成 Demo展示(已给出文本的相应展示) 用户提供相关的语料库后训练的成果 后端 要完成的几个接口 解析文

DysprosiumDy 9 May 05, 2022
Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.

Horovod Horovod is a distributed deep learning training framework for TensorFlow, Keras, PyTorch, and Apache MXNet. The goal of Horovod is to make dis

Horovod 12.9k Jan 07, 2023
Quantum Machine Learning

The Machine Learning package simply contains sample datasets at present. It has some classification algorithms such as QSVM and VQC (Variational Quantum Classifier), where this data can be used for e

Qiskit 364 Jan 08, 2023
Official code for HH-VAEM

HH-VAEM This repository contains the official Pytorch implementation of the Hierarchical Hamiltonian VAE for Mixed-type Data (HH-VAEM) model and the s

Ignacio Peis 8 Nov 30, 2022
My project contrasts K-Nearest Neighbors and Random Forrest Regressors on Real World data

kNN-vs-RFR My project contrasts K-Nearest Neighbors and Random Forrest Regressors on Real World data In many areas, rental bikes have been launched to

1 Oct 28, 2021
Machine-Learning with python (jupyter)

Machine-Learning with python (jupyter) 머신러닝 야학 작심 10일과 쥬피터 노트북 기반 데이터 사이언스 시작 들어가기전 https://nbviewer.org/ 페이지를 통해서 쥬피터 노트북 내용을 볼 수 있다. 위 페이지에서 현재 레포 기

HyeonWoo Jeong 1 Jan 23, 2022
Production Grade Machine Learning Service

This project is made to help you scale from a basic Machine Learning project for research purposes to a production grade Machine Learning web service

Abdullah Zaiter 10 Apr 04, 2022
A Python library for detecting patterns and anomalies in massive datasets using the Matrix Profile

matrixprofile-ts matrixprofile-ts is a Python 2 and 3 library for evaluating time series data using the Matrix Profile algorithms developed by the Keo

Target 696 Dec 26, 2022
customer churn prediction prevention in telecom industry using machine learning and survival analysis

Telco Customer Churn Prediction - Plotly Dash Application Description This dash application allows you to predict telco customer churn using machine l

Benaissa Mohamed Fayçal 3 Nov 20, 2021
Hierarchical Time Series Forecasting using Prophet

htsprophet Hierarchical Time Series Forecasting using Prophet Credit to Rob J. Hyndman and research partners as much of the code was developed with th

Collin Rooney 131 Dec 02, 2022
LinearRegression2 Tvads and CarSales

LinearRegression2_Tvads_and_CarSales This project infers the insight that how the TV ads for cars and car Sales are being linked with each other. It i

Ashish Kumar Yadav 1 Dec 29, 2021
Reggy - Regressions with arbitrarily complex regularization terms

reggy Regressions with arbitrarily complex regularization terms. Currently suppo

Kim 1 Jan 20, 2022
A toolbox to iNNvestigate neural networks' predictions!

iNNvestigate neural networks! Table of contents Introduction Installation Usage and Examples More documentation Contributing Releases Introduction In

Maximilian Alber 1.1k Jan 05, 2023