Hidden Markov Models in Python, with scikit-learn like API

Related tags

Data Analysishmmlearn
Overview

hmmlearn

hmmlearn is a set of algorithms for unsupervised learning and inference of Hidden Markov Models. For supervised learning learning of HMMs and similar models see seqlearn.

Note: This package is under limited-maintenance mode.

Important links

Dependencies

The required dependencies to use hmmlearn are

  • Python >= 3.5
  • NumPy >= 1.10
  • scikit-learn >= 0.16

You also need Matplotlib >= 1.1.1 to run the examples and pytest >= 2.6.0 to run the tests.

Installation

Requires a C compiler and Python headers.

To install from PyPI:

pip install --upgrade --user hmmlearn

To install from the repo:

pip install --user git+https://github.com/hmmlearn/hmmlearn
Comments
  • Memory error : HMM for MFCC feautres

    Memory error : HMM for MFCC feautres

    I am trying to create audio vocabulary from MFCC features by applying HMM. Since I have 10 speakers in the MFCC features. I need 50 states per speaker. So I used N = 500 states and it throws Memory error, but it works fine with N =100 states.

    Memory Error is because of computational in efficiency of a machine or due to in proper initialization?

    Here is my code

    import numpy as np
    from hmmlearn import hmm
    import librosa
    import matplotlib.pyplot as plt
    
    def getMFCC(episode):
    
        filename = getPathToGroundtruth(episode)
    
        y, sr = librosa.load(filename)  # Y gives 
    
        data = librosa.feature.mfcc(y=y, sr=sr)
    
        return data
    
    def hmm_init(n,data):  #n = states d = no of feautures
    
        states =[]
    
        model = hmm.GaussianHMM(n_components=N, covariance_type="full")
    
        model.transmat_ = np.ones((N, N)) / N
    
        model.startprob_ = np.ones(N) / N
    
        fit = model.fit(data.T)
    
        z=fit.decode(data.T,algorithm='viterbi')[1]
    
        states.append(z)
    
        return states
    
    data_m = getMFCC(1)  # Provides MFCC features of numpy array [20 X 56829]
    
    N = 500
    
    D= len(data)
    
    states = hmm_init(N,data)
    
    In [23]: run Final_hmm.py
    ---------------------------------------------------------------------------
    MemoryError                               Traceback (most recent call last)
    /home/elancheliyan/Final_hmm.py in <module>()
         73 D= len(data)
         74 
    ---> 75 states = hmm_init(N,data)
         76 states.dump("states")
         77 
    
    /home/elancheliyan/Final_hmm.py in hmm_init(n, data)
         57     model.startprob_ = np.ones(N) / N
         58 
    ---> 59     fit = model.fit(data.T)
         60 
         61     z=fit.decode(data.T,algorithm='viterbi')[1]
    
    /cal/homes/elancheliyan/.local/lib/python3.5/site-packages/hmmlearn-0.2.1-py3.5-linux-x86_64.egg/hmmlearn/base.py in fit(self, X, lengths)
        434                 self._accumulate_sufficient_statistics(
        435                     stats, X[i:j], framelogprob, posteriors, fwdlattice,
    --> 436                     bwdlattice)
        437 
        438             # XXX must be before convergence check, because otherwise
    
    /cal/homes/elancheliyan/.local/lib/python3.5/site-packages/hmmlearn-0.2.1-py3.5-linux-x86_64.egg/hmmlearn/hmm.py in _accumulate_sufficient_statistics(self, stats, obs, framelogprob, posteriors, fwdlattice, bwdlattice)
        221                                           posteriors, fwdlattice, bwdlattice):
        222         super(GaussianHMM, self)._accumulate_sufficient_statistics(
    --> 223             stats, obs, framelogprob, posteriors, fwdlattice, bwdlattice)
        224 
        225         if 'm' in self.params or 'c' in self.params:
    
    /cal/homes/elancheliyan/.local/lib/python3.5/site-packages/hmmlearn-0.2.1-py3.5-linux-x86_64.egg/hmmlearn/base.py in _accumulate_sufficient_statistics(self, stats, X, framelogprob, posteriors, fwdlattice, bwdlattice)
        620                 return
        621 
    --> 622             lneta = np.zeros((n_samples - 1, n_components, n_components))
        623             _hmmc._compute_lneta(n_samples, n_components, fwdlattice,
        624                                  log_mask_zero(self.transmat_),
    
    MemoryError:
    
    
    opened by epratheeban 25
  • GMM -> GaussianMixture

    GMM -> GaussianMixture

    In sklearn GMM was replaced by GaussianMixture. See https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/mixture/gmm.py:

    class GMM(_GMMBase): """ Legacy Gaussian Mixture Model .. deprecated:: 0.18 This class will be removed in 0.20. Use :class:sklearn.mixture.GaussianMixture instead. """

    However, hmmlearn still uses the old version. A pull request is needed to upgrade hmmlearn to work with the newer API.

    opened by chanansh 24
  • Variational inference

    Variational inference

    @anntzer Leaving an early draft of incorporating Variational Inference training of HMMs so I may receive feedback before I keep going.

    Some Notes:

    • I derive from BaseHMM, and am able to reuse most of it, with a few exceptions.
    • VariationalGaussianHMM is still incomplete - only Full Covariance is supported.
    • Tests are lacking.

    Up Next:

    • Finish the different covariance types for Gaussian
    • Add Mixture of Gaussian Emissions
    opened by blckmaxima 23
  • reduce memory consumption during GHMMHMM multi sequence fits

    reduce memory consumption during GHMMHMM multi sequence fits

    Hi, today I learned about your package, started to use it, faced the memory problem, and came up with a PR that fixes it.

    I've exploited the lengths option and added another meaning to it. Currently, for the GMMHMM only. Curious users will find a way to extend my implementation to other models as well.

    This also partially addresses the comment left in https://github.com/hmmlearn/hmmlearn/commit/08dee6640483cda232f7d2fcc7935d4008f4d368:

    https://github.com/hmmlearn/hmmlearn/blob/0562ca65756ffb60da836eeeb1845e61767c705b/lib/hmmlearn/hmm.py#L918-L922

    I got rid of the unnecessary 'centered' arrays in the stats dict. If you don't want to store the post_comp_mix matrices in the stats, the logic of computing intermediate variables - c_n and c_d for the covariance - should be moved from the _do_mstep to _accumulate_sufficient_statistics function. Since this is my first PR, I've decided not to rummage through your code a lot. In either case, this should be considered in a separate PR, if you will.

    Best, Danylo

    opened by dizcza 22
  • simple multinomial example

    simple multinomial example

    Hi there!

    Using the latest master of hmmlearn, I tried running a simple MultinomialHMM example (code below) that results in the following error:

    File "build/bdist.macosx-10.5-x86_64/egg/hmmlearn/base.py", line 307, in decode ValueError: could not broadcast input array from shape (6) into shape (1)

    Could you please tell me what i am doing wrong? My expectation is that applying Viterbi should give me the most probable hidden sequence. However passing a list of observation doesn't work unlike passing a single value which does.

    Thanks!

    Vlad

    from __future__ import division
    import numpy as np
    from hmmlearn import hmm
    
    states = ["Rainy", "Sunny"]
    n_states = len(states)
    
    observations = ["walk", "shop", "clean"]
    n_observations = len(observations)
    
    start_probability = np.array([0.6, 0.4])
    
    transition_probability = np.array([
      [0.7, 0.3],
      [0.4, 0.6]
    ])
    
    emission_probability = np.array([
      [0.1, 0.4, 0.5],
      [0.6, 0.3, 0.1]
    ])
    
    model = hmm.MultinomialHMM(n_components=n_states)
    model.startprob=start_probability
    model.transmat=transition_probability
    model.emissionprob=emission_probability
    
    # predict a sequence of hidden states based on visible states
    bob_says = [0, 2, 1, 1, 2, 0]
    model = model.fit(bob_says)
    logprob, alice_hears = model.decode(bob_says, algorithm="viterbi")
    print "Bob says:", ", ".join(map(lambda x: observations[x], bob_says))
    print "Alice hears:", ", ".join(map(lambda x: states[x], alice_hears))
    
    opened by ambushed 22
  • ImportError: cannot import name hmm

    ImportError: cannot import name hmm

    Hi,

    I used the hmm module from sklearn and tried to replace it by the hmmlearn module. Unfortunately I could not import it to my notebook.

    from hmmlearn import hmm --------------------------------------------------------------------------- ImportError Traceback (most recent call last) <ipython-input-7-8b8c029fb053> in <module>() ----> 1 from hmmlearn import hmm

    ImportError: cannot import name hmm

    I tried first pip-3.3 install git+https://github.com/hmmlearn/hmmlearn.git

    As this didn't work I cloned the project and run the setup.py (with python 3.3) but I still get an import error.

    If I try to import

    import hmmlearn.hmm

    I get another error

    ImportError Traceback (most recent call last) <ipython-input-8-8dbb2cfe75b2> in <module>() ----> 1 import hmmlearn.hmm

    /home/ipython/python/lib/python3.3/site-packages/hmmlearn/hmm.py in <module>() 22 from sklearn import cluster 23 ---> 24 from .utils.fixes import log_multivariate_normal_density 25 26 from . import _hmmc

    ImportError: No module named 'hmmlearn.utils'

    What did I do wrong?

    Cheers, Evelyn

    opened by metterlein 22
  • gcc error when installing with pip install

    gcc error when installing with pip install

    I get hmmlearn/_hmmc.c:239:28: fatal error: numpy/npy_math.h: No such file or directory yet the installation seems to finish successfully.

    requirements.txt file:

    click==6.7
    cython==0.25.2
    joblib==0.11
    numpy==1.12.1
    pandas==0.19.2
    python-speech-features==0.5
    scikit-learn==0.18.1
    scipy==0.19.0
    hmmlearn==0.2.0
    
    Running setup.py bdist_wheel for hmmlearn: started
      Running setup.py bdist_wheel for hmmlearn: finished with status 'error'
      Complete output from command /opt/conda/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-build-8l6nu2n1/hmmlearn/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" bdist_wheel -d /tmp/tmpi_45qjtvpip-wheel- --python-tag cp36:
      running bdist_wheel
      running build
      running build_py
      creating build
      creating build/lib.linux-x86_64-3.6
      creating build/lib.linux-x86_64-3.6/hmmlearn
      copying hmmlearn/hmm.py -> build/lib.linux-x86_64-3.6/hmmlearn
      copying hmmlearn/utils.py -> build/lib.linux-x86_64-3.6/hmmlearn
      copying hmmlearn/base.py -> build/lib.linux-x86_64-3.6/hmmlearn
      copying hmmlearn/__init__.py -> build/lib.linux-x86_64-3.6/hmmlearn
      creating build/lib.linux-x86_64-3.6/hmmlearn/tests
      copying hmmlearn/tests/test_utils.py -> build/lib.linux-x86_64-3.6/hmmlearn/tests
      copying hmmlearn/tests/test_gaussian_hmm.py -> build/lib.linux-x86_64-3.6/hmmlearn/tests
      copying hmmlearn/tests/test_gmm_hmm.py -> build/lib.linux-x86_64-3.6/hmmlearn/tests
      copying hmmlearn/tests/test_multinomial_hmm.py -> build/lib.linux-x86_64-3.6/hmmlearn/tests
      copying hmmlearn/tests/test_base.py -> build/lib.linux-x86_64-3.6/hmmlearn/tests
      copying hmmlearn/tests/__init__.py -> build/lib.linux-x86_64-3.6/hmmlearn/tests
      running build_ext
      building 'hmmlearn._hmmc' extension
      creating build/temp.linux-x86_64-3.6
      creating build/temp.linux-x86_64-3.6/hmmlearn
      gcc -pthread -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/opt/conda/include/python3.6m -c hmmlearn/_hmmc.c -o build/temp.linux-x86_64-3.6/hmmlearn/_hmmc.o -O3
      hmmlearn/_hmmc.c:239:28: fatal error: numpy/npy_math.h: No such file or directory
       #include "numpy/npy_math.h"
                                  ^
      compilation terminated.
      error: command 'gcc' failed with exit status 1
      
      ----------------------------------------
      Failed building wheel for hmmlearn
      Running setup.py clean for hmmlearn
    Successfully built python-speech-features
    Failed to build hmmlearn
    Installing collected packages: click, cython, joblib, numpy, pytz, python-dateutil, pandas, python-speech-features, scikit-learn, scipy, hmmlearn
      Running setup.py install for hmmlearn: started
        Running setup.py install for hmmlearn: finished with status 'done'
    **Successfully installed** click-6.7 cython-0.25.2 **hmmlearn-0.2.0** joblib-0.11 numpy-1.12.1 pandas-0.19.2 python-dateutil-2.6.0 python-speech-features-0.5 pytz-2017.2 scikit-learn-0.18.1 scipy-0.19.0
    
    needs-info 
    opened by chananshgong 21
  •  probability would approach to 0 after several EM iterations

    probability would approach to 0 after several EM iterations

    When I used GaussianHMM().fit() to train HMM, there is a RuntimeWarning: divide by zero encountered in log. Then I found that the start probability would approach to 0 after several EM iterations. My question is how to avoid probability approaching to 0 ?

    opened by LinZzzzzzzzz 21
  • ImportError: DLL load failed: The specified module could not be found.

    ImportError: DLL load failed: The specified module could not be found.

    my OS is win7 x64 . visual studio 2015, also visual studio 2013, and python 3.5 x64(by anaconda) are set up. hmmlearn is set up successfully. and validated by the code: >>>import hmmlearn >>> hmmlearn.version and the output is '0.2.0' which is last version of hmmlearn. but, if i put the code like the following, >>>from hmmlearn import hmm i get the error as the following,

    C:\Anaconda3_64\python.exe E:/pycharm/plot_hmm_stock_analysis/hmm_stock_analysis.py Traceback (most recent call last): File "E:/pycharm/plot_hmm_stock_analysis/hmm_stock_analysis.py", line 17, in from hmmlearn import hmm File "C:\Anaconda3_64\lib\site-packages\hmmlearn-0.2.0-py3.5-win-amd64.egg\hmmlearn\hmm.py", line 14, in from sklearn import cluster File "C:\Anaconda3_64\lib\site-packages\sklearn__init__.py", line 57, in from .base import clone File "C:\Anaconda3_64\lib\site-packages\sklearn\base.py", line 11, in from .utils.fixes import signature File "C:\Anaconda3_64\lib\site-packages\sklearn\utils__init__.py", line 11, in from .validation import (as_float_array, File "C:\Anaconda3_64\lib\site-packages\sklearn\utils\validation.py", line 16, in from ..utils.fixes import signature File "C:\Anaconda3_64\lib\site-packages\sklearn\utils\fixes.py", line 324, in from scipy.sparse.linalg import lsqr as sparse_lsqr File "C:\Anaconda3_64\lib\site-packages\scipy\sparse\linalg__init__.py", line 109, in from .isolve import * File "C:\Anaconda3_64\lib\site-packages\scipy\sparse\linalg\isolve__init__.py", line 6, in from .iterative import * File "C:\Anaconda3_64\lib\site-packages\scipy\sparse\linalg\isolve\iterative.py", line 7, in from . import _iterative ImportError: DLL load failed: The specified module could not be found.

    why? and how to fix it!?

    by the way, if in cmd, using "pip freeze" commond, it shows hmmlearn and the version of it is 0.2.0. BUT, if using "conda list", no hmmlearn shows!!

    opened by genliu777 18
  • GMMHMM models training not converging (?)

    GMMHMM models training not converging (?)

    Hi all, I am having a problem when trying to fit multiple GMMHMM models to solve a classification problem of emotions recognition from speech samples. Basically, the models often don't converge: even if the monitor reports 'True' if printed, I can see in the history that the likelihood is not strictly increasing. Actually, it decreases at some point and the training procedure stops.

    Here, I report only the procedure for training one of the models (I should have seven, each one trained with a different training set). The data loaded are attached: data_training.npy.zip

    from hmmlearn import hmm
    import numpy as np 
    data = np.load('data_training.npy', allow_pickle=True)
    
    
    hmm = hmm.GMMHMM(n_components=2, n_mix=2,n_iter=1000, covariance_type="diag", verbose=True ) 
    
    X_sequence_concat = np.concatenate(data) 
    lengths = []
    for el in data:
        lengths.append(len(el))
    
    hmm.fit(X_sequence_concat, np.array(lengths))   
    print("Is the HMM training converged? " + str(hmm.monitor_.converged))
    

    In my actual implementation I have to do this for seven different models and sometimes I get this problem and sometimes I don't, as you can see from the results reported below:

    Schermata 2021-04-23 alle 13 14 43

    Can you please help me? I'm really struggling with this and I can't find a possible cause of the problem.

    Thanks in advance!

    bug 
    opened by giorgiolbt 16
  • [ENH, MRG] Add PoissonHMM

    [ENH, MRG] Add PoissonHMM

    Adds a PoissonHMM with an example.

    I think this is somewhat close so if you have time, a review would be great @anntzer @blckmaxima.

    I'm not sure if there is a standard we could compare to like the Wikipedia one for the MultiNomialHMM or if that's necessary.

    opened by alexrockhill 15
  • Allow to modify kmeans default params at model creation

    Allow to modify kmeans default params at model creation

    I am requesting a new feature

    Everywhere in the code, Kmeans clusters init uses Kmeans default params (except n_clusters) :

        n_init=10,
        max_iter=300,
        tol=1e-4,
        verbose=0,
        random_state=None,
        copy_x=True,
        algorithm="lloyd"
    

    ...

    main_kmeans = cluster.KMeans(n_clusters=nc, random_state=self.random_state) or kmeans = cluster.KMeans(n_clusters=self.n_components, random_state=self.random_state)

    I got great improvments in my particular case (lot of very very noisy datasets) by modifying kmeans cluster initialization

    kmeans = cluster.KMeans(n_clusters=self.n_components,n_init=100, random_state=self.random_state, tol=1e-6)

    so will be great to allow to pass kmeans parameters when instanciating the model.

    for instance: hmm = hmm.GaussianHMM(n_components, ..., kmeans_params={'n_init': xxx, 'max_iter': yyy, 'tol':zzz})

    the n_init params for kmeans++ is quite important in some cases.

    thx

    opened by tlunati 1
  • Add Method to get n_params and AIC/BIC for GaussianHMM

    Add Method to get n_params and AIC/BIC for GaussianHMM

    Reference Issues/PRs

    None

    What does this implement/fix? Explain your changes.

    Any other comments?

    I haven't fully implemented the methods for the other classes, e.g. GMMHMM etc. but it's in the works.

    opened by richy1996 3
  • TypeError

    TypeError

    image

    # make our generative model with two components, a fair die and a
    # loaded die
    gen_model = hmm.MultinomialHMM(n_components=2, random_state=99)
    
    # the first state is the fair die so let's start there so no one
    # catches on right away
    gen_model.startprob_ = np.array([1.0, 0.0])
    
    # now let's say that we sneak the loaded die in:
    # here, we have a 95% chance to continue using the fair die and a 5%
    # chance to switch to the loaded die
    # when we enter the loaded die state, we have a 90% chance of staying
    # in that state and a 10% chance of leaving
    gen_model.transmat_ = np.array([[0.95, 0.05],
                                    [0.1, 0.9]])
    
    # now let's set the emission means:
    # the first state is a fair die with equal probabilities and the
    # second is loaded by being biased toward rolling a six
    gen_model.emissionprob_ = \
        np.array([[1 / 6, 1 / 6, 1 / 6, 1 / 6, 1 / 6, 1 / 6],
                  [1 / 10, 1 / 10, 1 / 10, 1 / 10, 1 / 10, 1 / 2]])
    
    # simulate the loaded dice rolls
    rolls, gen_states = gen_model.sample(30000)
    
    # plot states over time, let's just look at the first rolls for clarity
    fig, ax = plt.subplots()
    ax.plot(gen_states[:500])
    ax.set_title('States over time')
    ax.set_xlabel('Time (# of rolls)')
    ax.set_ylabel('State')
    fig.show()
    
    # plot rolls for the fair and loaded states
    fig, ax = plt.subplots()
    ax.hist(rolls[gen_states == 0], label='fair', alpha=0.5,
            bins=np.arange(7) - 0.5, density=True)
    ax.hist(rolls[gen_states == 1], label='loaded', alpha=0.5,
            bins=np.arange(7) - 0.5, density=True)
    ax.set_title('Roll probabilities by state')
    ax.set_xlabel('Count')
    ax.set_ylabel('Roll')
    ax.legend()
    fig.show()
    

    MultinomialHMM has undergone major changes. The previous version was implementing CategoricalHMM (a special case of MultinomialHMM). This new implementation follows the standard definition for a Multinomial distribution, e.g. as in https://en.wikipedia.org/wiki/Multinomial_distributionSee these issues for details: https://github.com/hmmlearn/hmmlearn/issues/335 https://github.com/hmmlearn/hmmlearn/issues/340

    TypeError                                 Traceback (most recent call last)
    [<ipython-input-9-4c6e1e68a7c1>](https://localhost:8080/#) in <module>()
         22 
         23 # simulate the loaded dice rolls
    ---> 24 rolls, gen_states = gen_model.sample(30000)
         25 
         26 # plot states over time, let's just look at the first rolls for clarity
    
    3 frames
    [/root/.local/lib/python3.7/site-packages/hmmlearn/base.py](https://localhost:8080/#) in sample(self, n_samples, random_state, currstate)
        461         state_sequence = [currstate]
        462         X = [self._generate_sample_from_state(
    --> 463             currstate, random_state=random_state)]
        464 
        465         for t in range(n_samples - 1):
    
    [/root/.local/lib/python3.7/site-packages/hmmlearn/hmm.py](https://localhost:8080/#) in _generate_sample_from_state(self, state, random_state)
        481         sample = multinomial.rvs(
        482             n=self.n_trials, p=self.emissionprob_[state, :],
    --> 483             size=1, random_state=self.random_state)
        484         return sample.squeeze(0)  # shape (1, nf) -> (nf,)
        485 
    
    [/usr/local/lib/python3.7/dist-packages/scipy/stats/_multivariate.py](https://localhost:8080/#) in rvs(self, n, p, size, random_state)
       3216         %(_doc_callparams_note)s
       3217         """
    -> 3218         n, p, npcond = self._process_parameters(n, p)
       3219         random_state = self._get_random_state(random_state)
       3220         return random_state.multinomial(n, p, size)
    
    [/usr/local/lib/python3.7/dist-packages/scipy/stats/_multivariate.py](https://localhost:8080/#) in _process_parameters(self, n, p)
       3016         pcond |= np.any(p > 1, axis=-1)
       3017 
    -> 3018         n = np.array(n, dtype=np.int, copy=True)
       3019 
       3020         # true for bad n
    
    TypeError: int() argument must be a string, a bytes-like object or a number, not 'NoneType'
    
    opened by skr3178 4
  • Floating Pointer Errors to clear during calls to log()

    Floating Pointer Errors to clear during calls to log()

    opened by blckmaxima 0
Releases(0.2.5)
Performance analysis of predictive (alpha) stock factors

Alphalens Alphalens is a Python Library for performance analysis of predictive (alpha) stock factors. Alphalens works great with the Zipline open sour

Quantopian, Inc. 2.5k Jan 09, 2023
Spectacular AI SDK fuses data from cameras and IMU sensors and outputs an accurate 6-degree-of-freedom pose of a device.

Spectacular AI SDK examples Spectacular AI SDK fuses data from cameras and IMU sensors (accelerometer and gyroscope) and outputs an accurate 6-degree-

Spectacular AI 94 Jan 04, 2023
Vectorizers for a range of different data types

Vectorizers for a range of different data types

Tutte Institute for Mathematics and Computing 69 Dec 29, 2022
Hydrogen (or other pure gas phase species) depressurization calculations

HydDown Hydrogen (or other pure gas phase species) depressurization calculations This code is published under an MIT license. Install as simple as: pi

Anders Andreasen 13 Nov 26, 2022
A distributed block-based data storage and compute engine

Nebula is an extremely-fast end-to-end interactive big data analytics solution. Nebula is designed as a high-performance columnar data storage and tabular OLAP engine.

Columns AI 131 Dec 26, 2022
First steps with Python in Life Sciences

First steps with Python in Life Sciences This course material is part of the "First Steps with Python in Life Science" three-day course of SIB-trainin

SIB Swiss Institute of Bioinformatics 22 Jan 08, 2023
TheMachineScraper πŸ±β€πŸ‘€ is an Information Grabber built for Machine Analysis

TheMachineScraper πŸ±β€πŸ‘€ is a tool made purely for analysing machine data for any reason.

doop 5 Dec 01, 2022
Finding project directories in Python (data science) projects, just like there R rprojroot and here packages

Find relative paths from a project root directory Finding project directories in Python (data science) projects, just like there R here and rprojroot

Daniel Chen 102 Nov 16, 2022
πŸ§ͺ Panel-Chemistry - exploratory data analysis and build powerful data and viz tools within the domain of Chemistry using Python and HoloViz Panel.

πŸ§ͺπŸ“ˆ 🐍. The purpose of the panel-chemistry project is to make it really easy for you to do DATA ANALYSIS and build powerful DATA AND VIZ APPLICATIONS within the domain of Chemistry using using Python a

Marc Skov Madsen 97 Dec 08, 2022
First and foremost, we want dbt documentation to retain a DRY principle. Every time we repeat ourselves, we waste our time. Second, we want to understand column level lineage and automate impact analysis.

dbt-osmosis First and foremost, we want dbt documentation to retain a DRY principle. Every time we repeat ourselves, we waste our time. Second, we wan

Alexander Butler 150 Jan 06, 2023
Retail-Sim is python package to easily create synthetic dataset of retaile store.

Retailer's Sale Data Simulation Retail-Sim is python package to easily create synthetic dataset of retaile store. Simulation Model Simulator consists

Corca AI 7 Sep 30, 2022
Semi-Automated Data Processing

Perform semi automated exploratory data analysis, feature engineering and feature selection on provided dataset by visualizing every possibilities on each step and assisting the user to make a meanin

Arun Singh Babal 1 Jan 17, 2022
MEAD: A Large-scale Audio-visual Dataset for Emotional Talking-face Generation [ECCV2020]

MEAD: A Large-scale Audio-visual Dataset for Emotional Talking-face Generation [ECCV2020] by Kaisiyuan Wang, Qianyi Wu, Linsen Song, Zhuoqian Yang, Wa

112 Dec 28, 2022
Implementation in Python of the reliability measures such as Omega.

reliabiliPy Summary Simple implementation in Python of the [reliability](https://en.wikipedia.org/wiki/Reliability_(statistics) measures for surveys:

Rafael Valero FernΓ‘ndez 2 Apr 27, 2022
A data analysis using python and pandas to showcase trends in school performance.

A data analysis using python and pandas to showcase trends in school performance. A data analysis to showcase trends in school performance using Panda

Jimmy Faccioli 0 Sep 07, 2021
Geospatial data-science analysis on reasons behind delay in Grab ride-share services

Grab x Pulis Detailed analysis done to investigate possible reasons for delay in Grab services for NUS Data Analytics Competition 2022, to be found in

Keng Hwee 6 Jun 07, 2022
Creating a statistical model to predict 10 year treasury yields

Predicting 10-Year Treasury Yields Intitially, I wanted to see if the volatility in the stock market, represented by the VIX index (data source), had

10 Oct 27, 2021
CleanX is an open source python library for exploring, cleaning and augmenting large datasets of X-rays, or certain other types of radiological images.

cleanX CleanX is an open source python library for exploring, cleaning and augmenting large datasets of X-rays, or certain other types of radiological

Candace Makeda Moore, MD 20 Jan 05, 2023
A utility for functional piping in Python that allows you to access any function in any scope as a partial.

WithPartial Introduction WithPartial is a simple utility for functional piping in Python. The package exposes a context manager (used with with) calle

Michael Milton 1 Oct 26, 2021
Validation and inference over LinkML instance data using souffle

Translates LinkML schemas into Datalog programs and executes them using Souffle, enabling advanced validation and inference over instance data

Linked data Modeling Language 7 Aug 07, 2022