Transform-Invariant Non-Negative Matrix Factorization

Overview

Flake8 Linter Pylint Linter Pytest and Coverage Build Documentation Publish to PyPI Open in Streamlit

Logo

Transform-Invariant Non-Negative Matrix Factorization

A comprehensive Python package for Non-Negative Matrix Factorization (NMF) with a focus on learning transform-invariant representations.

The packages supports multiple optimization backends and can be easily extended to handle application-specific types of transforms.

General Introduction

A general introduction to Non-Negative Matrix Factorization and the purpose of this package can be found on the corresponding GitHub Pages.

Installation

For using this package, you will need Python version 3.7 (or higher). The package is available via PyPI.

Installation is easiest using pip:

pip install tnmf

Demos and Examples

The package comes with a streamlit demo and a number of examples that demonstrate the capabilities of the TNMF model. They provide a good starting point for your own experiments.

Online Demo

Without requiring any installation, the demo is accessible via streamlit sharing.

Local Execution

Once the package is installed, the demo and the examples can be conveniently executed locally using the tnmf command:

  • To execute the demo, run tnmf demo.
  • A specific example can be executed by calling tnmf example .

To show the list of available examples, type tnmf example --help.

License

Copyright (c) 2021 Merck KGaA, Darmstadt, Germany

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

The full text of the license can be found in the file LICENSE in the repository root directory.

Contributing

Contributions to the package are always welcome and can be submitted via a pull request. Please note, that you have to agree to the Contributor License Agreement to contribute.

Working with the Code

To checkout the code and set up a working environment with all required Python packages, execute the following commands:

git checkout https://github.com/emdgroup/tnmf.git ./tnmf
cd tmnf
python3 -m virtualenv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt

Now, you should be able to execute the unit tests by calling pytest to verify that the code is running as expected.

Pull Requests

Before creating a pull request, you should always try to ensure that the automated code quality and unit tests do not fail. This section explains how to run them locally to understand and fix potential issues.

Code Style and Quality

Code style and quality are checked using flake8 and pylint. To execute them, change into the repository root directory, run the following commands and inspect their output:

flake8
pylint tnmf

In order for a pull request to be accaptable, no errors may be reported here.

Unit Tests

Automated unit tests reside inside the folder tnmf/tests. They can be executed via pytest by changing into the repository root directory and running

pytest

Debugging potential failures from the command line might be cumbersome. Most Python IDEs, however, also support pytest natively in their debugger. Again, for a pull request to be acceptable, no failures may be reported here.

Code Coverage

Code coverage in the unit tests is measured using coverage. A coverage report can be created locally from the repository root directory via

coverage run
coverage combine
coverage report

This will output a concise table with an overview of python files that are not fully covered with unit tests along with the line numbers of code that has not been executed. A more detailed, interactive report can be created using

coverage html

Then, you can open the file htmlcov/index.html in a web browser of your choice to navigate through code annotated with coverage data. Required overall coverage to is configured in setup.cfg, under the key fail_under in section [coverage:report].

Building the Documentation

To build the documentation locally, change into the doc subdirectory and run make html. Then, the documentation resides at doc\_build\html\index.html.

The repo for mlbtradetrees.com. Analyze any trade in baseball history!

The repo for mlbtradetrees.com. Analyze any trade in baseball history!

7 Nov 20, 2022
Data analysis and visualisation projects from a range of individual projects and applications

Python-Data-Analysis-and-Visualisation-Projects Data analysis and visualisation projects from a range of individual projects and applications. Python

Tom Ritman-Meer 1 Jan 25, 2022
wikirepo is a Python package that provides a framework to easily source and leverage standardized Wikidata information

Python based Wikidata framework for easy dataframe extraction wikirepo is a Python package that provides a framework to easily source and leverage sta

Andrew Tavis McAllister 35 Jan 04, 2023
University Challenge 2021 With Python

University Challenge 2021 This repository contains: The TeX file of the technical write-up describing the University / HYPER Challenge 2021 under late

2 Nov 27, 2021
Orchest is a browser based IDE for Data Science.

Orchest is a browser based IDE for Data Science. It integrates your favorite Data Science tools out of the box, so you don’t have to. The application is easy to use and can run on your laptop as well

Orchest 3.6k Jan 09, 2023
Data Scientist in Simple Stock Analysis of PT Bukalapak.com Tbk for Long Term Investment

Data Scientist in Simple Stock Analysis of PT Bukalapak.com Tbk for Long Term Investment Brief explanation of PT Bukalapak.com Tbk Bukalapak was found

Najibulloh Asror 2 Feb 10, 2022
PyPSA: Python for Power System Analysis

1 Python for Power System Analysis Contents 1 Python for Power System Analysis 1.1 About 1.2 Documentation 1.3 Functionality 1.4 Example scripts as Ju

758 Dec 30, 2022
Meltano: ELT for the DataOps era. Meltano is open source, self-hosted, CLI-first, debuggable, and extensible.

Meltano is open source, self-hosted, CLI-first, debuggable, and extensible. Pipelines are code, ready to be version c

Meltano 625 Jan 02, 2023
pipeline for migrating lichess data into postgresql

How Long Does It Take Ordinary People To "Get Good" At Chess? TL;DR: According to 5.5 years of data from 2.3 million players and 450 million games, mo

Joseph Wong 182 Nov 11, 2022
Created covid data pipeline using PySpark and MySQL that collected data stream from API and do some processing and store it into MYSQL database.

Created covid data pipeline using PySpark and MySQL that collected data stream from API and do some processing and store it into MYSQL database.

2 Nov 20, 2021
Leverage Twitter API v2 to analyze tweet metrics such as impressions and profile clicks over time.

Tweetmetric Tweetmetric allows you to track various metrics on your most recent tweets, such as impressions, retweets and clicks on your profile. The

Mathis HAMMEL 29 Oct 18, 2022
talkbox is a scikit for signal/speech processing, to extend scipy capabilities in that domain.

talkbox is a scikit for signal/speech processing, to extend scipy capabilities in that domain.

David Cournapeau 76 Nov 30, 2022
ICLR 2022 Paper submission trend analysis

Visualize ICLR 2022 OpenReview Data

Jintang Li 75 Dec 06, 2022
peptides.py is a pure-Python package to compute common descriptors for protein sequences

peptides.py Physicochemical properties and indices for amino-acid sequences. 🗺️ Overview peptides.py is a pure-Python package to compute common descr

Martin Larralde 32 Dec 31, 2022
Data pipelines built with polars

valves Warning: the project is very much work in progress. Valves is a collection of functions for your data .pipe()-lines. This project aimes to host

14 Jan 03, 2023
Binance Kline Data With Python

Binance Kline Data by seunghan(gingerthorp) reference https://github.com/binance/binance-public-data/ All intervals are supported: 1m, 3m, 5m, 15m, 30

shquant 5 Jul 13, 2022
Data and code accompanying the paper Politics and Virality in the Time of Twitter

Politics and Virality in the Time of Twitter Data and code accompanying the paper Politics and Virality in the Time of Twitter. In specific: the code

Cardiff NLP 3 Jul 02, 2022
PyStan, a Python interface to Stan, a platform for statistical modeling. Documentation: https://pystan.readthedocs.io

PyStan PyStan is a Python interface to Stan, a package for Bayesian inference. Stan® is a state-of-the-art platform for statistical modeling and high-

Stan 229 Dec 29, 2022
A tax calculator for stocks and dividends activities.

Revolut Stocks calculator for Bulgarian National Revenue Agency Information Processing and calculating the required information about stock possession

Doino Gretchenliev 200 Oct 25, 2022
Parses data out of your Google Takeout (History, Activity, Youtube, Locations, etc...)

google_takeout_parser parses both the Historical HTML and new JSON format for Google Takeouts caches individual takeout results behind cachew merge mu

Sean Breckenridge 27 Dec 28, 2022