A modular active learning framework for Python

Overview

modAL

Modular Active Learning framework for Python3

travis-ci-master codecov-master readthedocs

Page contents

Introduction

modAL is an active learning framework for Python3, designed with modularity, flexibility and extensibility in mind. Built on top of scikit-learn, it allows you to rapidly create active learning workflows with nearly complete freedom. What is more, you can easily replace parts with your custom built solutions, allowing you to design novel algorithms with ease.

Active learning from bird's-eye view

With the recent explosion of available data, you have can have millions of unlabelled examples with a high cost to obtain labels. For instance, when trying to predict the sentiment of tweets, obtaining a training set can require immense manual labour. But worry not, active learning comes to the rescue! In general, AL is a framework allowing you to increase classification performance by intelligently querying you to label the most informative instances. To give an example, suppose that you have the following data and classifier with shaded regions signifying the classification probability.

Suppose that you can query the label of an unlabelled instance, but it costs you a lot. Which one would you choose? By querying an instance in the uncertain region, surely you obtain more information than querying by random. Active learning gives you a set of tools to handle problems like this. In general, an active learning workflow looks like the following.

The key components of any workflow are the model you choose, the uncertainty measure you use and the query strategy you apply to request labels. With modAL, instead of choosing from a small set of built-in components, you have the freedom to seamlessly integrate scikit-learn or Keras models into your algorithm and easily tailor your custom query strategies and uncertainty measures.

modAL in action

Let's see what modAL can do for you!

From zero to one in a few lines of code

Active learning with a scikit-learn classifier, for instance RandomForestClassifier, can be as simple as the following.

from modAL.models import ActiveLearner
from sklearn.ensemble import RandomForestClassifier

# initializing the learner
learner = ActiveLearner(
    estimator=RandomForestClassifier(),
    X_training=X_training, y_training=y_training
)

# query for labels
query_idx, query_inst = learner.query(X_pool)

# ...obtaining new labels from the Oracle...

# supply label for queried instance
learner.teach(X_pool[query_idx], y_new)

Replacing parts quickly

If you would like to use different uncertainty measures and query strategies than the default uncertainty sampling, you can either replace them with several built-in strategies or you can design your own by following a few very simple design principles. For instance, replacing the default uncertainty measure to classification entropy looks the following.

from modAL.models import ActiveLearner
from modAL.uncertainty import entropy_sampling
from sklearn.ensemble import RandomForestClassifier

learner = ActiveLearner(
    estimator=RandomForestClassifier(),
    query_strategy=entropy_sampling,
    X_training=X_training, y_training=y_training
)

Replacing parts with your own solutions

modAL was designed to make it easy for you to implement your own query strategy. For example, implementing and using a simple random sampling strategy is as easy as the following.

import numpy as np

def random_sampling(classifier, X_pool):
    n_samples = len(X_pool)
    query_idx = np.random.choice(range(n_samples))
    return query_idx, X_pool[query_idx]

learner = ActiveLearner(
    estimator=RandomForestClassifier(),
    query_strategy=random_sampling,
    X_training=X_training, y_training=y_training
)

For more details on how to implement your custom strategies, visit the page Extending modAL!

An example with active regression

To see modAL in real action, let's consider an active regression problem with Gaussian Processes! In this example, we shall try to learn the noisy sine function:

import numpy as np

X = np.random.choice(np.linspace(0, 20, 10000), size=200, replace=False).reshape(-1, 1)
y = np.sin(X) + np.random.normal(scale=0.3, size=X.shape)

For active learning, we shall define a custom query strategy tailored to Gaussian processes. In a nutshell, a query stategy in modAL is a function taking (at least) two arguments (an estimator object and a pool of examples), outputting the index of the queried instance. In our case, the arguments are regressor and X.

def GP_regression_std(regressor, X):
    _, std = regressor.predict(X, return_std=True)
    return np.argmax(std)

After setting up the query strategy and the data, the active learner can be initialized.

from modAL.models import ActiveLearner
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import WhiteKernel, RBF

n_initial = 5
initial_idx = np.random.choice(range(len(X)), size=n_initial, replace=False)
X_training, y_training = X[initial_idx], y[initial_idx]

kernel = RBF(length_scale=1.0, length_scale_bounds=(1e-2, 1e3)) \
         + WhiteKernel(noise_level=1, noise_level_bounds=(1e-10, 1e+1))

regressor = ActiveLearner(
    estimator=GaussianProcessRegressor(kernel=kernel),
    query_strategy=GP_regression_std,
    X_training=X_training.reshape(-1, 1), y_training=y_training.reshape(-1, 1)
)

The initial regressor is not very accurate.

The blue band enveloping the regressor represents the standard deviation of the Gaussian process at the given point. Now we are ready to do active learning!

# active learning
n_queries = 10
for idx in range(n_queries):
    query_idx, query_instance = regressor.query(X)
    regressor.teach(X[query_idx].reshape(1, -1), y[query_idx].reshape(1, -1))

After a few queries, we can see that the prediction is much improved.

Additional examples

Including this, many examples are available:

Installation

modAL requires

  • Python >= 3.5
  • NumPy >= 1.13
  • SciPy >= 0.18
  • scikit-learn >= 0.18

You can install modAL directly with pip:

pip install modAL

Alternatively, you can install modAL directly from source:

pip install git+https://github.com/modAL-python/modAL.git

Documentation

You can find the documentation of modAL at https://modAL-python.github.io, where several tutorials and working examples are available, along with a complete API reference. For running the examples, Matplotlib >= 2.0 is recommended.

Citing

If you use modAL in your projects, you can cite it as

@article{modAL2018,
    title={mod{AL}: {A} modular active learning framework for {P}ython},
    author={Tivadar Danka and Peter Horvath},
    url={https://github.com/modAL-python/modAL},
    note={available on arXiv at \url{https://arxiv.org/abs/1805.00979}}
}

About the developer

modAL is developed by me, Tivadar Danka (aka cosmic-cortex in GitHub). I have a PhD in pure mathematics, but I fell in love with biology and machine learning right after I finished my PhD. I have changed fields and now I work in the Bioimage Analysis and Machine Learning Group of Peter Horvath, where I am working to develop active learning strategies for intelligent sample analysis in biology. During my work I realized that in Python, creating and prototyping active learning workflows can be made really easy and fast with scikit-learn, so I ended up developing a general framework for this. The result is modAL :) If you have any questions, requests or suggestions, you can contact me at [email protected]! I hope you'll find modAL useful!

Issues
  • Pandas support & support for applying transformations configured in sklearn.pipeline

    Pandas support & support for applying transformations configured in sklearn.pipeline

    Most notable changes

    • query strategies now only return the indices of the selected instances, the query method then includes the instances themselves
      • old interface is still supported, but its usage results in a deprecation warning
    • added on_transformed parameter to learners; when True and the estimator uses sklearn.pipeline, the transformations configured in that pipeline are applied before calculating metrics on the data set
      • Committees also support this functionality, but as they have no X_training (could be different for each of their learners), the training data can yet not be transformed

    Note

    @cosmic-cortex , after playing around with your code, I must say you have created a great library! I am open to discussion to get this functionality merged, but please don't feel any pressure to do so if you are not satisfied with the implementation. I just needed to resolve #104 for my project and my fork is now sufficient for my needs.

    Note2

    Not sure where this functionality should be addressed in the docs.

    opened by BoyanH 15
  • vote_entropy

    vote_entropy

    I guess, the vote_entropy and KL_Divergence is not being returned, and all values corresponds to zero. Also, if I am doing it wrong, can you suggest a code snippet, how to use, Kl_Divergence or vote_entropy instead of concensus entropy for querying the points. when using query by committee

    opened by srivastavapravesh14-zz 10
  • cold start handling in ranked batch sampling

    cold start handling in ranked batch sampling

    Hi!

    The behavior of cold start handling in ranked batch sampling seems different from the Cardoso et al.'s "Ranked batch-mode active learning".

    https://github.com/modAL-python/modAL/blob/452898fc181b6d4ae6399dfdcb311ceb952c8486/modAL/batch.py#L133-L139

    In modAL's implementation, in the case of cold start, the instance selected by select_cold_start_instance is not added to the instance list instance_index_ranking. While in "Ranked batch-mode active learning", the instance selected by select_cold_start_instance seems to be the first item in instance_index_ranking.

    https://github.com/modAL-python/modAL/blob/452898fc181b6d4ae6399dfdcb311ceb952c8486/modAL/batch.py#L46

    If my understanding on the algorithm proposed in the paper and modAL's implementation is correct, we can change the return of select_cold_start_instance to return best_coldstart_instance_index, X[best_coldstart_instance_index].reshape(1, -1), store best_coldstart_instance_index in instance_index_ranking, and revise ranked_batch correspondingly.

    opened by zhangyu94 10
  • Support batch-mode queries?

    Support batch-mode queries?

    Hi,

    I've run into a bit of a use-case that I'm not sure is quite supported by modAL – nor the broader libraries for active learning – but would be relatively simple to implement. After reviewing modAL's internals a bit, I don't think it officially supports active learning with batch-mode queries.

    The sampling strategies (for example, uncertainty sampling) do support the n_instances parameter, but from what I can tell, uncertainty sampling may return redundant/sub-optimal queries if we return more than one instance from the unlabeled set. This is a bit prohibitive in settings where we'd like to ask an active learner to return multiple (if not all) examples from the unlabeled set/pool, and the computational cost for re-training an active learning model goes without saying.

    I found requests for batch-mode support in the popular libact library (issues #57 and #89) but, to the best of my knowledge, I'm not sure they were addressed in any of their PRs.

    In that case, does it make sense to implement something like [Ranked batch-mode active learning] by Cardoso et al.? I took a crack at it this weekend for a better personal understanding, but if it's worth integrating and supporting in modAL I'm happy to polish it and talk it through in a PR.

    Thanks!

    opened by dataframing 10
  • Pytorch runnable example

    Pytorch runnable example

    this is a runnable example of modAL using pytorch models, wrapped with skorch. this example is very similar to the one we can find in modAL/examples/keras_integration.py

    opened by damienlancry 9
  • use different query strategies

    use different query strategies

    I am using keras/tensorflow models with this framework and the activelearner class. As soon as I try to change the query strategy, different errors occur.

      learner = ActiveLearner(
    estimator=classifier,
    query_strategy=expected_error_reduction,
    X_training=x_initial_training,
    y_training=y_initial_training,
    )
    prescore = learner.score(x_test, y_test)
    n_queries = 50
    postscore = np.zeros(shape=(n_queries, 1))
    for idx in range(n_queries):
        print('Query no. %d' % (idx + 1))
        query_idx, query_instance = learner.query(x_pool)
        learner.teach(
            X=x_pool[query_idx],
            y=y_pool[query_idx],
            only_new=True,
            epochs=10,
            validation_data=(x_val, y_val),
        )
       # remove queried instances from pool
       x_pool = np.delete(x_pool, query_idx, axis=0)
       y_pool = np.delete(y_pool, query_idx, axis=0)
       postscore[idx, 0] = learner.score(x_test, y_test)
    

    What do I have to change to implement the different strategies. The trainings_input is 3D shape. I tried up to now all uncertainty methods of which only the default selection did work. Now I was trying the expected error_reduction strategy, but there occur errors as well.

    I am afraid the 3D shape of the training data is killing all the other algorithms, but for a LSTM this kind of shape is required.

    opened by alexv1247 9
  • docs: refactor documentation

    docs: refactor documentation

    Autoconversion of docstrings with pyment doesn't work well, because the initial format was not following a strict standard. So there are a lot of manual corrections. I have chosen Google style for docstring, however conversion from it to NumPy style with pyment could be easier.

    The first half of modAL.models looks good, but there may be some improvements (further deduplication) in coming days. Review and comments on committed parts could help to finish the whole refactoring (I hope, by the weekend).

    opened by nikolay-bushkov 9
  • DBAL with Image Data implementation using modAL

    DBAL with Image Data implementation using modAL

    I created an example script trying to reproduce the results of Deep Bayesian Active Learning with Image Data using modAL. I used this keras code from one of the authors. I cannot think of anything I am doing differently and yet their code works and not mine. For the acquisition function instead of using their modified keras, i used yarin gal's implementation (first author). Can you spot any mistake in my code? EDIT: I actually found a mistake in my code, I was not really computing the entropy but rather the other half of BALD function. I fixed this mistake and am currently running the code. EDIT2: Still not working

    opened by damienlancry 8
  • Entropy sampling query startegy instable

    Entropy sampling query startegy instable

    I'm using entropy sampling startegy to select samples for RandomForest classification of 7 classes. However when i did my query with entropy sampling (i tried also uncertainty samplig) i have a different result every time i run the query. the selected samples are never the same (i have not changed my input data).

    Thank you in advance for your help.

    opened by YousraH 8
  • about learner.teach

    about learner.teach

    it seems that each time we run the learner. teach, the model will fit the initial data plus the new data from the beginning just like an untrained new model, can the model just learn the new data with the weight which has been trained on the initial data?

    opened by luxu1220 7
  • Using RandomForestClassificatier on vectors for predicting labels gives

    Using RandomForestClassificatier on vectors for predicting labels gives "Found input variables with inconsistent numbers of samples"

    I am learning from Active Regression tutorial page but it has not taken up the case of applying learners to more than one dimension vectors ( I was not able to find a specific example in the doc for this, so please point if you know one ).

    In the function named my_stuff

    My learner is

    regressor = ActiveLearner(
            estimator=RandomForestClassifier(),
            query_strategy=entropy_sampling,
            X_training=X_training, y_training=y_training.ravel()
        )
    
    

    My dataset X is (13084, 50) ( meaning 13084 vectors each having 50 length ) and y is (13084, 1) ( similar meaning ).

    Here X_training is (5, 50) and y_training is (5, 1). In this section of the code( taken blatantly from the tutorial page mentioned above ):

    for idx in range(n_queries):
            query_idx, query_instance = regressor.query(X)
            print(query_idx, 'query_idx', X_training.shape, y_training.shape)
            regressor.teach(X[query_idx].reshape(-1, 1), y[query_idx].reshape(-1, 1))
    

    The program ended abruptly, so upon using python debugger I found the error:

    ValueError: Found input variables with inconsistent numbers of samples: [50, 1]
    > /path/to/file/predict.py(286)my_stuff()
    -> regressor.teach(X[query_idx].reshape(-1, 1), y[query_idx].reshape(-1, 1))
    
    

    regressor Here X[query_idx].reshape(-1, 1) has shape (50, 1) and y[query_idx].reshape(-1, 1) has shape (1, 1).

    What would be the correct procedure for the teach procedure?

    opened by berserker1 6
  • decision_function instead of predict_proba

    decision_function instead of predict_proba

    Several non-probabilistic estimators, such as SVMs in particular, can be used with uncertainty sampling. Scikit-Learn estimators that support the decision_function method can be used with the closest-to-hyperplane selection algorithm [Bloodgood]. This is actually a very popular strategy in AL research and would be very easy to implement.

    opened by Luke-Kurlandski-TCNJ 0
  • How to extract the image names and labels in the training set after completing the active learning loop and write them to a CSV file

    How to extract the image names and labels in the training set after completing the active learning loop and write them to a CSV file

    I am using the Keras script at https://modal-python.readthedocs.io/en/latest/content/examples/Keras_integration.html for my classification task. After completing the active learning loop, how do we extract the image names and labels in the training set that gives the optimal test performance and write them to a CSV file?

    # read training data
    (X_train, y_train), (X_test, y_test) = mnist.load_data()
    X_train = X_train.reshape(60000, 28, 28, 1).astype('float32') / 255
    X_test = X_test.reshape(10000, 28, 28, 1).astype('float32') / 255
    y_train = keras.utils.to_categorical(y_train, 10)
    y_test = keras.utils.to_categorical(y_test, 10)
    
    # assemble initial data: random sampling of 1000 samples
    n_initial = 1000
    initial_idx = np.random.choice(range(len(X_train)), size=n_initial, replace=False)
    X_initial = X_train[initial_idx]
    y_initial = y_train[initial_idx]
    
    # generate the pool
    # remove the initial data from the training dataset
    X_pool = np.delete(X_train, initial_idx, axis=0)
    y_pool = np.delete(y_train, initial_idx, axis=0)
    
    """
    Training the ActiveLearner
    """
    
    # initialize ActiveLearner
    learner = ActiveLearner(
        estimator=classifier,
        query_strategy=entropy_sampling,
        X_training=X_initial, y_training=y_initial,
        verbose=1
    )
    
    # the active learning loop
    n_queries = 100
    for idx in range(n_queries):
        query_idx, query_instance = learner.query(X_pool, n_instances=100, verbose=0)
        print(query_idx)
        learner.teach(
            X=X_pool[query_idx], y=y_pool[query_idx], only_new=True,
            verbose=1
        )
        # remove queried instance from pool
        X_pool = np.delete(X_pool, query_idx, axis=0)
        y_pool = np.delete(y_pool, query_idx, axis=0)
    
    # the final accuracy score
    print(learner.score(X_test, y_test, verbose=1))
    
    opened by sivaramakrishnan-rajaraman 0
  • keras image classification model using AL

    keras image classification model using AL

    Hi, I have a few questions:

    1. When performing initial training via ActiveLearner(estimator=keras_classifier, X_training=X_train, y_training=y_train) does X_training, y_training support generators?

    2. Working with HD images, I need to save huge amounts of data in X_pool which kills the process. Is there a method that takes a subset of X_pool for each query iteration? maybe learner.teach(X_pool=X_pool) also supports generators?

    3. The learner only gets NumPy array object or does it supports tf.tensors?

    thank you!

    opened by noy2121 0
  • Fine-tuning a keras model using active learning

    Fine-tuning a keras model using active learning

    Hi there,

    I would like to apply active learning to a keras model trained with another script.

    The model, with architecture, is stored in a .h5 file. I followed the keras example, but in my case I want to set the initial training set to None since my only aim is to get the selected indices for new samples.

    I get the following error:

    AttributeError: 'KerasClassifier' object has no attribute 'model'

    Any help would be much appreciated.

    from keras.wrappers.scikit_learn import KerasClassifier
    from modAL.uncertainty import uncertainty_sampling
    from modAL.models import ActiveLearner
    
      def get_model():
          model = keras.models.load_model(model_path)
          return model
    
    classifier = KerasClassifier(get_model)
    
    # initialize ActiveLearner
        learner = ActiveLearner(
            estimator=classifier,
            verbose=1
        )
    
      X_pool = get_image_data_to_ndarray(folder_path=folder_path, img_size=img_size)
    
      query_idx, query_instance = learner.query(X_pool, n_instances=100, verbose=0)
    
    
    opened by Tchaikovic 2
Releases(0.4.1)
  • 0.4.1(Jan 7, 2021)

  • 0.4.0(Nov 1, 2020)

    Release notes

    modAL 0.4.0 is finally here! This new release is made possible by the contributions of @BoyanH, @damienlancry, and @OskarLiew, many thanks to them!

    New features

    • pandas.DataFrame support, thanks to @BoyanH! This was a frequently requested feature which I was unable to properly implement, but @BoyanH has found a solution for this in #105.
    • Support for scikit-learn pipelines, also by @BoyanH. Now learners support querying on the transformed data by setting on_transformed=True upon initialization.

    Changes

    • Query strategies should no longer return the selected instances, only the indices for the queried objects. (See #104 by @BoyanH.)

    Fixes

    • Committee sets classes when fitting, this solves the error which occurred when no training data was provided during initialization. This fix was contributed in #100 by @OskarLiew, thanks for that!
    • Some typos in the ranked batch mode sampling example, fixed by @damienlancry.
    Source code(tar.gz)
    Source code(zip)
  • 0.3.6(Aug 21, 2020)

  • 0.3.5(Nov 11, 2019)

    Changes

    • ActiveLearner now supports np.nan and np.inf in the data by setting force_all_finite=False upon initialization. #58
    • Bayesian optimization fixed for multidimensional functions.
    • Calls to check_X_y no longer converts between datatypes. #49
    • Expected error reduction implementation error fixed. #45
    • modAL.utils.data_vstack now falls back to numpy.concatenate if possible.
    • Multidimensional data for ranked batch sampling and expected error reduction fixed. #41

    Fixes by @zhangyu94:

    • modAL.selection.shuffled_argmax #32
    • Cold start instance in modAL.batch.ranked_batch fixed. #30
    • Best instance index in modAL.batch.select_instance fixed. #29
    Source code(tar.gz)
    Source code(zip)
  • 0.3.4(Dec 5, 2018)

    New features

    • To handle the case when the maximum utility score is not unique, a random tie break option was introduced. From this version, passing random_tie_break=True to the query strategies first shuffles the pool then uses a stable sorting to find the instances to query. In the case where the maximum utility score is not unique, it is equivalent of randomly sampling from the top scoring instances.

    Changes

    • modAL.expected_error.expected_error_reduction runtime improved by omitting unnecessary cloning of the estimator for every instance in the pool.
    Source code(tar.gz)
    Source code(zip)
  • 0.3.3(Nov 30, 2018)

  • 0.3.2(Nov 26, 2018)

  • 0.3.1(Oct 2, 2018)

    Release notes

    The new release of modAL is here! This is a milestone in its evolution, because it has just received its first contributions from the open source community! :) Thanks for @dataframing and @nikolay-bushkov for their work! Hoping to see many more contributions from the community, because modAL still has a long way to go! :)

    New features

    • Ranked batch mode queries by @dataframing. With this query strategy, several instances can be queried for labeling, which alleviates a lot of problems in uncertainty sampling. For details, see Ranked batch mode learning by Cardoso et al.
    • Sparse matrix support by @nikolay-bushkov. From now, if the estimator can handle sparse matrices, you can use them to fit the active learning models!
    • Cold start support has been added to all the models. This means that now learner.query() can be used without training the model first.

    Changes

    • The documentation has gone under a major refactoring thanks to @nikolay-bushkov! Type annotations have been added and the docstrings were refactored to follow Google style docstrings. The website has been changed accordingly. Instead of GitHub pages, ReadTheDocs are used and the old website is merged with the API reference. Regarding the examples, Jupyter notebooks were added by @dataframing. For details, check it out at https://modAL-python.github.io/!
    • .query() methods changed for BaseLearner and BaseCommittee to allow more general arguments for query strategies. Now it can accept any argument as long as the query_strategy function supports it.
    • .score() method was added for Committee. Fixes #6.
    • The modAL.density module was refactored using functions from sklearn.metrics.pairwise. This resulted in a major increase in performance as well as a more sustainable codebase for the module.

    Bugfixes

    • 1D array handling issues fixed, numpy.vstack calls replaced with numpy.concatenate. Fixes #15.
    • np.sum(generator) calls were replaced with np.sum(np.from_iter(generator)) because deprecation of the original one.
    Source code(tar.gz)
    Source code(zip)
  • 0.3.0(Apr 25, 2018)

    Release notes

    New features

    • Bayesian optimization. Bayesian optimization is a method for optimizing black box functions for which evaluation may be expensive and derivatives may not be available. It uses a query loop very similar to active learning, which makes it possible to implement it using an API identical to the ActiveLearner. Sampling for values are made by strategies estimating the possible gains for each point. Among these, three strategies are implemented currently: probability of improvement, expected improvement and upper confidence bounds.

    Changes

    • modAL.models.BaseLearner abstract base class implemented. ActiveLearner and BayesianOptimizer both inherit from it.
    • modAL.models.ActiveLearner.query() now passes the ActiveLearner object to the query function instead of just the estimator.

    Fixes

    • modAL.utils.selection.multi_argmax() now works for arrays with shape (-1, ) as well as (-1, 1).
    Source code(tar.gz)
    Source code(zip)
  • 0.2.1(Apr 18, 2018)

    Release notes

    New features

    • modAL.utils.combination.make_query_strategy function factory to make the implementation of custom query strategies easier.
    • ActiveLearner and Committee models can be fitted using new data only by passing only_new=True to their .teach() methods. This is useful when working with models where the fitting does not occur from scratch, for instance tensorflow or keras models.

    Fixes

    • Checks added to modAL.utils.selection.weighted_random() to avoid division with zero.
    • ABC metaclassing now compatible with earlier Python versions (i.e. Python 2.7). Fixes #3 .
    • sklearn.utils.check_array calls removed from modAL.models, performing checks now up to the estimator. As a consequence, images doesn't need to be flattened. Fixes #5 .
    • BaseCommittee now inherits from sklearn.base.BaseEstimator.
    • modAL.utils.combination.make_linear_combination rewritten using genexps, resulting in performance increase.
    Source code(tar.gz)
    Source code(zip)
  • 0.2.0(Feb 10, 2018)

    Release notes

    New features

    • Information density measures. With the information_density function in modAL.density, density-based information metrics can be employed.
    • Functions for making new utility measures by linear combinations and products. With the function factories in modAL.utils.combination, functions can be transformed into their linear combination and product.

    Changes

    • ActiveLearner constructor arguments renamed: predictor was renamed to estimator, X_initial and y_initial was renamed to X_training and y_training.
    • ActiveLearner, Committee and CommitteeRegressor now also inherits from sklearn.base.BaseEstimator. Because of this, for instance, get_params() and set_params() methods can be used.
    • The private attributes of ActiveLearner, Committee and CommitteeRegressor now exposed as public attributes.
    • As a result of the previous, the classes now can be cloned with sklearn.base.clone.
    Source code(tar.gz)
    Source code(zip)
  • 0.1.0(Jan 8, 2018)

    modAL 0.1.0

    Modular Active Learning framework for Python3

    Release notes

    modAL is finally released! For its capabilities and documentation, see the page https://cosmic-cortex.github.io/modAL/!

    Installation

    modAL requires

    • Python >= 3.5
    • NumPy >= 1.13
    • SciPy >= 0.18
    • scikit-learn >= 0.18

    You can install modAL directly with pip:

    pip install modAL
    

    Alternatively, you can install modAL directly from source:

    pip install git+https://github.com/cosmic-cortex/modAL.git
    Source code(tar.gz)
    Source code(zip)
Owner
modAL
A modular active learning framework for Python3
modAL
Karate Club: An API Oriented Open-source Python Framework for Unsupervised Learning on Graphs (CIKM 2020)

Karate Club is an unsupervised machine learning extension library for NetworkX. Please look at the Documentation, relevant Paper, Promo Video, and Ext

Benedek Rozemberczki 1.6k Apr 15, 2022
A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.

Light Gradient Boosting Machine LightGBM is a gradient boosting framework that uses tree based learning algorithms. It is designed to be distributed a

Microsoft 13.7k Apr 20, 2022
A unified framework for machine learning with time series

Welcome to sktime A unified framework for machine learning with time series We provide specialized time series algorithms and scikit-learn compatible

The Alan Turing Institute 5.2k Apr 22, 2022
An open source framework that provides a simple, universal API for building distributed applications. Ray is packaged with RLlib, a scalable reinforcement learning library, and Tune, a scalable hyperparameter tuning library.

Ray provides a simple, universal API for building distributed applications. Ray is packaged with the following libraries for accelerating machine lear

null 20k Apr 17, 2022
BigDL: Distributed Deep Learning Framework for Apache Spark

BigDL: Distributed Deep Learning on Apache Spark What is BigDL? BigDL is a distributed deep learning library for Apache Spark; with BigDL, users can w

null 3.9k Apr 15, 2022
A Lucid Framework for Transparent and Interpretable Machine Learning Models.

Currently a Beta-Version lucidmode is an open-source, low-code and lightweight Python framework for transparent and interpretable machine learning mod

lucidmode 14 Feb 13, 2022
Automated modeling and machine learning framework FEDOT

This repository contains FEDOT - an open-source framework for automated modeling and machine learning (AutoML). It can build custom modeling pipelines for different real-world processes in an automated way using an evolutionary approach. FEDOT supports classification (binary and multiclass), regression, clustering, and time series prediction tasks.

National Center for Cognitive Research of ITMO University 148 Jul 5, 2021
Xeasy-ml is a packaged machine learning framework.

xeasy-ml 1. What is xeasy-ml Xeasy-ml is a packaged machine learning framework. It allows a beginner to quickly build a machine learning model and use

null 9 Mar 14, 2022
machine learning model deployment project of Iris classification model in a minimal UI using flask web framework and deployed it in Azure cloud using Azure app service

This is a machine learning model deployment project of Iris classification model in a minimal UI using flask web framework and deployed it in Azure cloud using Azure app service. We initially made this project as a requirement for an internship at Indian Servers. We are now making it open to contribution.

Krishna Priyatham Potluri 69 Jan 24, 2022
Merlion: A Machine Learning Framework for Time Series Intelligence

Merlion is a Python library for time series intelligence. It provides an end-to-end machine learning framework that includes loading and transforming data, building and training models, post-processing model outputs, and evaluating model performance. I

Salesforce 2.4k Apr 14, 2022
XManager: A framework for managing machine learning experiments 🧑‍🔬

XManager is a platform for packaging, running and keeping track of machine learning experiments. It currently enables one to launch experiments locally or on Google Cloud Platform (GCP). Interaction with experiments is done via XManager's APIs through Python launch scripts.

DeepMind 486 Apr 14, 2022
Hypernets: A General Automated Machine Learning framework to simplify the development of End-to-end AutoML toolkits in specific domains.

A General Automated Machine Learning framework to simplify the development of End-to-end AutoML toolkits in specific domains.

DataCanvas 196 Apr 12, 2022
ZenML 🙏: MLOps framework to create reproducible ML pipelines for production machine learning.

ZenML is an extensible, open-source MLOps framework to create production-ready machine learning pipelines. It has a simple, flexible syntax, is cloud and tool agnostic, and has interfaces/abstractions that are catered towards ML workflows.

ZenML 1.9k Apr 15, 2022
Python Research Framework

Python Research Framework

EleutherAI 97 Mar 10, 2022
A Python implementation of GRAIL, a generic framework to learn compact time series representations.

GRAIL A Python implementation of GRAIL, a generic framework to learn compact time series representations. Requirements Python 3.6+ numpy scipy tslearn

null 3 Nov 24, 2021
A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.

Master status: Development status: Package information: TPOT stands for Tree-based Pipeline Optimization Tool. Consider TPOT your Data Science Assista

Epistasis Lab at UPenn 8.5k Apr 23, 2022
Python Extreme Learning Machine (ELM) is a machine learning technique used for classification/regression tasks.

Python Extreme Learning Machine (ELM) Python Extreme Learning Machine (ELM) is a machine learning technique used for classification/regression tasks.

Augusto Almeida 71 Apr 7, 2022
MIT-Machine Learning with Python–From Linear Models to Deep Learning

MIT-Machine Learning with Python–From Linear Models to Deep Learning | One of the 5 courses in MIT MicroMasters in Statistics & Data Science Welcome t

null 1 Oct 19, 2021
Probabilistic programming framework that facilitates objective model selection for time-varying parameter models.

Time series analysis today is an important cornerstone of quantitative science in many disciplines, including natural and life sciences as well as eco

Christoph Mark 99 Mar 28, 2022