mlpack: a scalable C++ machine learning library --

Overview

mlpack: a fast, flexible machine learning library
a fast, flexible machine learning library

Home | Documentation | Doxygen | Community | Help | IRC Chat

Jenkins Coveralls License NumFOCUS

Download: current stable version (3.4.2)

mlpack is an intuitive, fast, and flexible C++ machine learning library with bindings to other languages. It is meant to be a machine learning analog to LAPACK, and aims to implement a wide array of machine learning methods and functions as a "swiss army knife" for machine learning researchers. In addition to its powerful C++ interface, mlpack also provides command-line programs, Python bindings, Julia bindings, Go bindings and R bindings.

mlpack uses an open governance model and is fiscally sponsored by NumFOCUS. Consider making a tax-deductible donation to help the project pay for developer time, professional services, travel, workshops, and a variety of other needs.


0. Contents

  1. Introduction
  2. Citation details
  3. Dependencies
  4. Building mlpack from source
  5. Running mlpack programs
  6. Using mlpack from Python
  7. Further documentation
  8. Bug reporting

1. Introduction

The mlpack website can be found at https://www.mlpack.org and it contains numerous tutorials and extensive documentation. This README serves as a guide for what mlpack is, how to install it, how to run it, and where to find more documentation. The website should be consulted for further information:

2. Citation details

If you use mlpack in your research or software, please cite mlpack using the citation below (given in BibTeX format):

@article{mlpack2018,
    title     = {mlpack 3: a fast, flexible machine learning library},
    author    = {Curtin, Ryan R. and Edel, Marcus and Lozhnikov, Mikhail and
                 Mentekidis, Yannis and Ghaisas, Sumedh and Zhang,
                 Shangtong},
    journal   = {Journal of Open Source Software},
    volume    = {3},
    issue     = {26},
    pages     = {726},
    year      = {2018},
    doi       = {10.21105/joss.00726},
    url       = {https://doi.org/10.21105/joss.00726}
}

Citations are beneficial for the growth and improvement of mlpack.

3. Dependencies

mlpack has the following dependencies:

  Armadillo      >= 8.400.0
  Boost (math_c99, spirit) >= 1.58.0
  CMake          >= 3.2.2
  ensmallen      >= 2.10.0
  cereal         >= 1.1.2

All of those should be available in your distribution's package manager. If not, you will have to compile each of them by hand. See the documentation for each of those packages for more information.

If you would like to use or build the mlpack Python bindings, make sure that the following Python packages are installed:

  setuptools
  cython >= 0.24
  numpy
  pandas >= 0.15.0

If you would like to build the Julia bindings, make sure that Julia >= 1.3.0 is installed.

If you would like to build the Go bindings, make sure that Go >= 1.11.0 is installed with this package:

 Gonum

If you would like to build the R bindings, make sure that R >= 4.0 is installed with these R packages.

 Rcpp >= 0.12.12
 RcppArmadillo >= 0.8.400.0
 RcppEnsmallen >= 0.2.10.0
 BH >= 1.58
 roxygen2

If the STB library headers are available, image loading support will be compiled.

If you are compiling Armadillo by hand, ensure that LAPACK and BLAS are enabled.

4. Building mlpack from source

This document discusses how to build mlpack from source. These build directions will work for any Linux-like shell environment (for example Ubuntu, macOS, FreeBSD etc). However, mlpack is in the repositories of many Linux distributions and so it may be easier to use the package manager for your system. For example, on Ubuntu, you can install the mlpack library and command-line executables (e.g. mlpack_pca, mlpack_kmeans etc.) with the following command:

$ sudo apt-get install libmlpack-dev mlpack-bin

On Fedora or Red Hat (EPEL): $ sudo dnf install mlpack-devel mlpack-bin

Note: Older Ubuntu versions may not have the most recent version of mlpack available---for instance, at the time of this writing, Ubuntu 16.04 only has mlpack 3.4.2 available. Options include upgrading your Ubuntu version, finding a PPA or other non-official sources, or installing with a manual build.

There are some useful pages to consult in addition to this section:

mlpack uses CMake as a build system and allows several flexible build configuration options. You can consult any of the CMake tutorials for further documentation, but this tutorial should be enough to get mlpack built and installed.

First, unpack the mlpack source and change into the unpacked directory. Here we use mlpack-x.y.z where x.y.z is the version.

$ tar -xzf mlpack-x.y.z.tar.gz
$ cd mlpack-x.y.z

Then, make a build directory. The directory can have any name, but 'build' is sufficient.

$ mkdir build
$ cd build

The next step is to run CMake to configure the project. Running CMake is the equivalent to running ./configure with autotools. If you run CMake with no options, it will configure the project to build with no debugging symbols and no profiling information:

$ cmake ../

Options can be specified to compile with debugging information and profiling information:

$ cmake -D DEBUG=ON -D PROFILE=ON ../

Options are specified with the -D flag. The allowed options include:

DEBUG=(ON/OFF): compile with debugging symbols
PROFILE=(ON/OFF): compile with profiling symbols
ARMA_EXTRA_DEBUG=(ON/OFF): compile with extra Armadillo debugging symbols
BOOST_ROOT=(/path/to/boost/): path to root of boost installation
ARMADILLO_INCLUDE_DIR=(/path/to/armadillo/include/): path to Armadillo headers
ARMADILLO_LIBRARY=(/path/to/armadillo/libarmadillo.so): Armadillo library
BUILD_CLI_EXECUTABLES=(ON/OFF): whether or not to build command-line programs
BUILD_PYTHON_BINDINGS=(ON/OFF): whether or not to build Python bindings
PYTHON_EXECUTABLE=(/path/to/python_version): Path to specific Python executable
PYTHON_INSTALL_PREFIX=(/path/to/python/): Path to root of Python installation
BUILD_JULIA_BINDINGS=(ON/OFF): whether or not to build Julia bindings
JULIA_EXECUTABLE=(/path/to/julia): Path to specific Julia executable
BUILD_GO_BINDINGS=(ON/OFF): whether or not to build Go bindings
GO_EXECUTABLE=(/path/to/go): Path to specific Go executable
BUILD_GO_SHLIB=(ON/OFF): whether or not to build shared libraries required by Go bindings
BUILD_R_BINDINGS=(ON/OFF): whether or not to build R bindings
R_EXECUTABLE=(/path/to/R): Path to specific R executable
BUILD_TESTS=(ON/OFF): whether or not to build tests
BUILD_SHARED_LIBS=(ON/OFF): compile shared libraries as opposed to
   static libraries
DISABLE_DOWNLOADS=(ON/OFF): whether to disable all downloads during build
DOWNLOAD_ENSMALLEN=(ON/OFF): If ensmallen is not found, download it
ENSMALLEN_INCLUDE_DIR=(/path/to/ensmallen/include): path to include directory
   for ensmallen
DOWNLOAD_STB_IMAGE=(ON/OFF): If STB is not found, download it
STB_IMAGE_INCLUDE_DIR=(/path/to/stb/include): path to include directory for
   STB image library
USE_OPENMP=(ON/OFF): whether or not to use OpenMP if available
BUILD_DOCS=(ON/OFF): build Doxygen documentation, if Doxygen is available
   (default ON)

Other tools can also be used to configure CMake, but those are not documented here. See this section of the build guide for more details, including a full list of options, and their default values.

By default, command-line programs will be built, and if the Python dependencies (Cython, setuptools, numpy, pandas) are available, then Python bindings will also be built. OpenMP will be used for parallelization when possible by default.

Once CMake is configured, building the library is as simple as typing 'make'. This will build all library components as well as 'mlpack_test'.

$ make

If you do not want to build everything in the library, individual components of the build can be specified:

$ make mlpack_pca mlpack_knn mlpack_kfn

If the build fails and you cannot figure out why, register an account on Github and submit an issue. The mlpack developers will quickly help you figure it out:

mlpack on Github

Alternately, mlpack help can be found in IRC at #mlpack on chat.freenode.net.

If you wish to install mlpack to /usr/local/include/mlpack/, /usr/local/lib/, and /usr/local/bin/, make sure you have root privileges (or write permissions to those three directories), and simply type

$ make install

You can now run the executables by name; you can link against mlpack with -lmlpack and the mlpack headers are found in /usr/local/include/mlpack/ and if Python bindings were built, you can access them with the mlpack package in Python.

If running the programs (i.e. $ mlpack_knn -h) gives an error of the form

error while loading shared libraries: libmlpack.so.2: cannot open shared object file: No such file or directory

then be sure that the runtime linker is searching the directory where libmlpack.so was installed (probably /usr/local/lib/ unless you set it manually). One way to do this, on Linux, is to ensure that the LD_LIBRARY_PATH environment variable has the directory that contains libmlpack.so. Using bash, this can be set easily:

export LD_LIBRARY_PATH="/usr/local/lib/:$LD_LIBRARY_PATH"

(or whatever directory libmlpack.so is installed in.)

5. Running mlpack programs

After building mlpack, the executables will reside in build/bin/. You can call them from there, or you can install the library and (depending on system settings) they should be added to your PATH and you can call them directly. The documentation below assumes the executables are in your PATH.

Consider the 'mlpack_knn' program, which finds the k nearest neighbors in a reference dataset of all the points in a query set. That is, we have a query and a reference dataset. For each point in the query dataset, we wish to know the k points in the reference dataset which are closest to the given query point.

Alternately, if the query and reference datasets are the same, the problem can be stated more simply: for each point in the dataset, we wish to know the k nearest points to that point.

Each mlpack program has extensive help documentation which details what the method does, what each of the parameters is, and how to use them:

$ mlpack_knn --help

Running mlpack_knn on one dataset (that is, the query and reference datasets are the same) and finding the 5 nearest neighbors is very simple:

$ mlpack_knn -r dataset.csv -n neighbors_out.csv -d distances_out.csv -k 5 -v

The -v (--verbose) flag is optional; it gives informational output. It is not unique to mlpack_knn but is available in all mlpack programs. Verbose output also gives timing output at the end of the program, which can be very useful.

6. Using mlpack from Python

If mlpack is installed to the system, then the mlpack Python bindings should be automatically in your PYTHONPATH, and importing mlpack functionality into Python should be very simple:

>>> from mlpack import knn

Accessing help is easy:

>>> help(knn)

The API is similar to the command-line programs. So, running knn() (k-nearest-neighbor search) on the numpy matrix dataset and finding the 5 nearest neighbors is very simple:

>>> output = knn(reference=dataset, k=5, verbose=True)

This will store the output neighbors in output['neighbors'] and the output distances in output['distances']. Other mlpack bindings function similarly, and the input/output parameters exactly match those of the command-line programs.

7. Further documentation

The documentation given here is only a fraction of the available documentation for mlpack. If doxygen is installed, you can type make doc to build the documentation locally. Alternately, up-to-date documentation is available for older versions of mlpack:

8. Bug reporting

(see also mlpack help)

If you find a bug in mlpack or have any problems, numerous routes are available for help.

Github is used for bug tracking, and can be found at https://github.com/mlpack/mlpack/. It is easy to register an account and file a bug there, and the mlpack development team will try to quickly resolve your issue.

In addition, mailing lists are available. The mlpack discussion list is available at

mlpack discussion list

and the git commit list is available at

commit list

Lastly, the IRC channel #mlpack on Freenode can be used to get help.

Issues
  • Cereal

    Cereal

    This pull request contains all commits cherry-picked from #2415 that is related to the transition from boost serialization into cereal.

    c: testing c: core 
    opened by shrit 169
  • Neuro CMAES-Algorithm

    Neuro CMAES-Algorithm

    I have used and modified Bang's genome and have implemeted cmaes into it .. Tests are remaining for it ..

    opened by kartik-nighania 136
  • [GSoC] Augmented RNN models - benchmarking framework

    [GSoC] Augmented RNN models - benchmarking framework

    This PR is part of my GSoC project "Augmented RNNs". Imeplemented:

    • class CopyTask for evaluating models on the sequence copy problem, showcasing benchmarking framework;
    • unit test for it (a simple non-ML model that is hardcoded to copy the sequence required number of times is expected to ace the CopyTask).
    opened by 17minutes 102
  • Adding All Loss Functions

    Adding All Loss Functions

    Hello, I was going through loss functions and managed to get a list of loss functions that aren't implemented yet. I found these using pytorch and tensor flow kindly refer for more informations. The list goes as:

    1. HingeEmbedding Loss (taken by me)
    2. CosineEmbedding Loss (taken up by @kartikdutt18)
    3. MultiLabelMargin Loss
    4. TripletMargin Loss
    5. L1 Loss
    6. BCE Loss

    This might not be complete list. I will update this list as I find more. I hope this is ok with the community. Kindly feel free to take up any of the idle loss functions here. Thank You. :)

    help wanted good first issue s: stale c: methods 
    opened by ojhalakshya 93
  • ARMADILLO_INCLUDE_DIR-NOTFOUND/armadillo_bits/config.hpp not found

    ARMADILLO_INCLUDE_DIR-NOTFOUND/armadillo_bits/config.hpp not found

    -- The C compiler identification is GNU 4.8.1 -- The CXX compiler identification is GNU 4.8.1 -- Check for working C compiler: /usr/usc/gnu/gcc/4.8.1/bin/gcc -- Check for working C compiler: /usr/usc/gnu/gcc/4.8.1/bin/gcc -- works -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Check for working CXX compiler: /usr/usc/gnu/gcc/4.8.1/bin/g++ -- Check for working CXX compiler: /usr/usc/gnu/gcc/4.8.1/bin/g++ -- works -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Checking for C++11 compiler -- Checking for C++11 compiler - available -- Looking for backtrace -- Looking for backtrace - found -- backtrace facility detected in default set of libraries -- Found Backtrace: /usr/include
    CMake Error at CMake/FindArmadillo.cmake:327 (message): ARMADILLO_INCLUDE_DIR-NOTFOUND/armadillo_bits/config.hpp not found! Cannot determine what to link against. Call Stack (most recent call first): CMakeLists.txt:113 (find_package)

    how can I solve this problem? thanks a lot.

    t: question s: answered 
    opened by acgtun 83
  • Implementation of SPSA optimizer

    Implementation of SPSA optimizer

    As of now, I have just created the basic files necessary to implement the optimizer for the sake of creating the PR... I'll push the code in the subsequent commits :v:

    opened by Rajiv2605 79
  • Adapting armadillo's parser for mlpack(Removing Boost Dependencies)

    Adapting armadillo's parser for mlpack(Removing Boost Dependencies)

    For background knowledge, look at these

    Sample code to use the feature

    #include <iostream>
    #include <mlpack/core.hpp>
    
    int main()
    {
      arma::Mat<double> data;
      std::fstream file;
      
      file.open("data.csv");
      mlpack::data::load_data<double>(data, arma::csv_ascii, file);
      data.raw_print();
      
      return 0;  
    }
    
    c: core update dependencies 
    opened by heisenbuug 74
  • [WIP] Proximal Policy Optimization

    [WIP] Proximal Policy Optimization

    Implementing the proximal policy optimization algorithm in RL. Adding the basic skeleton.

    s: stale s: needs review c: methods t: added feature 
    opened by robotcator 72
  • Resolve Comments in Go Bindings(#1492) and Add Markdown Documentation

    Resolve Comments in Go Bindings(#1492) and Add Markdown Documentation

    Hi @rcurtin I have tired to resolve some of the comments in PR#1492 and also add Markdown Documentation for Go Bindings.

    DONE:

    • [x] Build a fully working Go binding using make go.
    • [x] Configure CMake with cmake ../, which would find Go using FindGo.cmake.
    • [x] Add Markdown Documentation for Go Bindings.
    • [x] Resolve underscores to camelcase
    • [x] Tried to avoid unnecessary copies.
    • [x] Resolve output in arma_util.cpp , that was going out of scope.
    • [x] Removing unnecessary inputOptions and outputOptions.
    • [x] Resolve documentation for multiple outputs.
    • [x] Add Some getter and setter method for Umat,Urow and Ucol
    • [x] Add test for Umat ,Urow and Ucol
    • [x] Resolve Style issues(lines less than 80 characters) in go_binding_test.go
    • [x] Add vector of strings and int parameter type and added their tests.
    • [x] Add matrix with dataset info parameter type.
    s: keep open c: automatic bindings t: added feature 
    opened by Yashwants19 68
  • Algorithm yet to be implemented

    Algorithm yet to be implemented

    Hi there, I am interested in implementing an algorithm or a feature in mlpack which hasn't been implemented yet. It would be great if you could suggest any :smile:

    opened by Rajiv2605 61
  • getting all class prediction with probability score for classification model like random forest

    getting all class prediction with probability score for classification model like random forest

    Problem location

    Hi, I need urgent help, please help me to solve this, I'm working on random forest implementation for the prediction dataset. I'm following this tutorial. https://www.mlpack.org/doc/stable/doxygen/sample_ml_app.html

    Description of problem

    for a single data sample inference, it gives one predicted class label. But I want to get the other class prediction for the same sample sort with probability score, like the Sklearn function model.predict_proba(). But I did not find any way to get the prediction. but failed to do that. could you please help me with this?

    t: bug report s: stale c: documentation 
    opened by aminul-palash 2
  • Added Simple Exponential Smoothing model for time series

    Added Simple Exponential Smoothing model for time series

    As Discussed in issue #2668 mlpack lacks time series forecasting methods so this PR attempts to add Simple Exponential Smoothing.

    In this PR i'll try to implement the Simple exponential smoothing method for forecasting . Initially i have implemented the header file for this model detailing the methods i'll use; I'm new to this process of contribution so it would be really helpful if u could point to changes in the code and the style ; as I improve my code and complete the implementation of SES method.

    Thanks

    s: needs review c: methods 
    opened by Ris-Bali 1
  • Instance Norm

    Instance Norm

    I'm creating a new PR for #2900 as I've lost access to that branch.

    @zoq I've rebased this layer against the latest master branch as discussed on the IRC.

    c: methods t: added feature 
    opened by hello-fri-end 2
  • Subset Selection on data

    Subset Selection on data

    Methods to select subsets of training data for data efficient learning

    Motivation

    Training models on large data sets consumes a lot of time and energy. Algorithms to select subsets of training data that best model the superset and performing training on these is a new paradigm in ML.

    Implementation ideas

    An implementation similar to algorithms like GLISTER, Gradient matching etc. as mentioned in DECILE in C++ will better equip the library.

    opened by VedangAsgaonkar 1
  • Removing Remaining Boost

    Removing Remaining Boost

    Related to #3095

    To test erfinv fn https://keisan.casio.com/exec/system/1180573448

    Details to be put later.

    opened by shubham1206agra 5
  • Added checks for relative input shapes in linear regression and k means clustering

    Added checks for relative input shapes in linear regression and k means clustering

    This PR implements a part of the idea discussed in issue #2820. I have added size checks using CheckSameSizes and CheckSameDimensionality utilities in src/mlpack/core/methods/kmeans/kmeans.hpp, src/mlpack/core/methods/linear_regression/linear_regression.cpp and src/mlpack/core/methods/linear_regression/linear_regression_main.cpp. I located all places in which such changes could be made in these methods using grep.

    Kindly review these and inform me about any changes I should make. I am new to open source and your reviews will help me a lot. Thanks.

    s: needs review c: binding c: methods t: added feature 
    opened by VedangAsgaonkar 2
  • input_labels parameter in preprocess_split function can't be empty

    input_labels parameter in preprocess_split function can't be empty

    Problem location

    https://www.mlpack.org/doc/mlpack-3.4.2/python_documentation.html#preprocess_split

    Description of problem

    d = preprocess_split(input=np.empty([0, 0]), input_labels=np.empty([0, 0], dtype=np.uint64), no_shuffle=False, seed=0, test_ratio=0.2, verbose=False) shows an error of

    error: Mat::row(): index out of bounds
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "mlpack/preprocess_split.pyx", line 167, in mlpack.preprocess_split.preprocess_split
    RuntimeError: Mat::row(): index out of bounds
    

    As per line 159, an empty array can't have input_labels.

    t: bug report c: binding 
    opened by dnabanita7 1
  • Randomized ReLU Activation Function

    Randomized ReLU Activation Function

    Related to #2181

    References:

    • https://arxiv.org/pdf/1505.00853.pdf
    • https://pytorch.org/docs/stable/generated/torch.nn.RReLU.html#torch.nn.RReLU
    s: needs review c: methods t: added feature 
    opened by shubham1206agra 1
  • LayerNorm copy and move constructor created

    LayerNorm copy and move constructor created

    Related to #2625

    s: needs review t: added feature 
    opened by SuvarshaChennareddy 4
  • Implementation of Threshold Activation Fn Done

    Implementation of Threshold Activation Fn Done

    Related to #2181

    Reference: https://pytorch.org/docs/stable/generated/torch.nn.Threshold.html#torch.nn.Threshold

    s: needs review c: methods t: added feature 
    opened by shubham1206agra 1
Releases(3.4.2)
  • 3.4.2(Oct 28, 2020)

    Released Oct. 28, 2020.

    • Added Mean Absolute Percentage Error.
    • Added Softmin activation function as layer in ann/layer.
    • Fix spurious ARMA_64BIT_WORD compilation warnings on 32-bit systems (#2665).
    Source code(tar.gz)
    Source code(zip)
  • 3.4.1(Sep 7, 2020)

    Released Sep. 7, 2020.

    • Fix incorrect parsing of required matrix/model parameters for command-line bindings (#2600).

    • Add manual type specification support to data::Load() and data::Save() (#2084, #2135, #2602).

    • Remove use of internal Armadillo functionality (#2596, #2601, #2602).

    Source code(tar.gz)
    Source code(zip)
  • 3.4.0(Sep 1, 2020)

    Released Sept. 1st, 2020.

    • Issue warnings when metrics produce NaNs in KFoldCV (#2595).

    • Added bindings for R during Google Summer of Code (#2556).

    • Added common striptype function for all bindings (#2556).

    • Refactored common utility function of bindings to bindings/util (#2556).

    • Renamed InformationGain to HoeffdingInformationGain in methods/hoeffding_trees/information_gain.hpp (#2556).

    • Added macro for changing stream of printing and warnings/errors (#2556).

    • Added Spatial Dropout layer (#2564).

    • Force CMake to show error when it didn't find Python/modules (#2568).

    • Refactor ProgramInfo() to separate out all the different information (#2558).

    • Add bindings for one-hot encoding (#2325).

    • Added Soft Actor-Critic to RL methods (#2487).

    • Added Categorical DQN to q_networks (#2454).

    • Added N-step DQN to q_networks (#2461).

    • Add Silhoutte Score metric and Pairwise Distances (#2406).

    • Add Go bindings for some missed models (#2460).

    • Replace boost program_options dependency with CLI11 (#2459).

    • Additional functionality for the ARFF loader (#2486); use case sensitive categories (#2516).

    • Add bayesian_linear_regression binding for the command-line, Python, Julia, and Go. Also called "Bayesian Ridge", this is equivalent to a version of linear regression where the regularization parameter is automatically tuned (#2030).

    • Fix defeatist search for spill tree traversals (#2566, #1269).

    • Fix incremental training of logistic regression models (#2560).

    • Change default configuration of BUILD_PYTHON_BINDINGS to OFF (#2575).

    Source code(tar.gz)
    Source code(zip)
  • 3.3.2(Jun 18, 2020)

    Released June 18, 2020.

    • Added Noisy DQN to q_networks (#2446).

    • Add [preview release of] Go bindings (#1884).

    • Added Dueling DQN to q_networks, Noisy linear layer to ann/layer and Empty loss to ann/loss_functions (#2414).

    • Storing and adding accessor method for action in q_learning (#2413).

    • Added accessor methods for ANN layers (#2321).

    • Addition of Elliot activation function (#2268).

    • Add adaptive max pooling and adaptive mean pooling layers (#2195).

    • Add parameter to avoid shuffling of data in preprocess_split (#2293).

    • Add MatType parameter to LSHSearch, allowing sparse matrices to be used for search (#2395).

    • Documentation fixes to resolve Doxygen warnings and issues (#2400).

    • Add Load and Save of Sparse Matrix (#2344).

    • Add Intersection over Union (IoU) metric for bounding boxes (#2402).

    • Add Non Maximal Supression (NMS) metric for bounding boxes (#2410).

    • Fix no_intercept and probability computation for linear SVM bindings (#2419).

    • Fix incorrect neighbors for k > 1 searches in approx_kfn binding, for the QDAFN algorithm (#2448).

    • Add RBF layer in ann module to make RBFN architecture (#2261).

    Source code(tar.gz)
    Source code(zip)
  • 3.3.1(Apr 30, 2020)

    Released April 29th, 2020.

    • Minor Julia and Python documentation fixes (#2373).

    • Updated terminal state and fixed bugs for Pendulum environment (#2354, #2369).

    • Added EliSH activation function (#2323).

    • Add L1 Loss function (#2203).

    • Pass CMAKE_CXX_FLAGS (compilation options) correctly to Python build (#2367).

    • Expose ensmallen Callbacks for sparseautoencoder (#2198).

    • Bugfix for LARS class causing invalid read (#2374).

    • Add serialization support from Julia; use mlpack.serialize() and mlpack.deserialize() to save and load from IOBuffers.

    Source code(tar.gz)
    Source code(zip)
  • 3.3.0(Apr 7, 2020)

    Released April 7th, 2020.

    • Templated return type of Forward function of loss functions (#2339).

    • Added R2 Score regression metric (#2323).

    • Added mean squared logarithmic error loss function for neural networks (#2210).

    • Added mean bias loss function for neural networks (#2210).

    • The DecisionStump class has been marked deprecated; use the DecisionTree class with NoRecursion=true or use ID3DecisionStump instead (#2099).

    • Added probabilities_file parameter to get the probabilities matrix of AdaBoost classifier (#2050).

    • Fix STB header search paths (#2104).

    • Add DISABLE_DOWNLOADS CMake configuration option (#2104).

    • Add padding layer in TransposedConvolutionLayer (#2082).

    • Fix pkgconfig generation on non-Linux systems (#2101).

    • Use log-space to represent HMM initial state and transition probabilities (#2081).

    • Add functions to access parameters of Convolution and AtrousConvolution layers (#1985).

    • Add Compute Error function in lars regression and changing Train function to return computed error (#2139).

    • Add Julia bindings (#1949). Build settings can be controlled with the BUILD_JULIA_BINDINGS=(ON/OFF) and JULIA_EXECUTABLE=/path/to/julia CMake parameters.

    • CMake fix for finding STB include directory (#2145).

    • Add bindings for loading and saving images (#2019); mlpack_image_converter from the command-line, mlpack.image_converter() from Python.

    • Add normalization support for CF binding (#2136).

    • Add Mish activation function (#2158).

    • Update init_rules in AMF to allow users to merge two initialization rules (#2151).

    • Add GELU activation function (#2183).

    • Better error handling of eigendecompositions and Cholesky decompositions (#2088, #1840).

    • Add LiSHT activation function (#2182).

    • Add Valid and Same Padding for Transposed Convolution layer (#2163).

    • Add CELU activation function (#2191)

    • Add Log-Hyperbolic-Cosine Loss function (#2207)

    • Change neural network types to avoid unnecessary use of rvalue references (#2259).

    • Bump minimum Boost version to 1.58 (#2305).

    • Refactor STB support so HAS_STB macro is not needed when compiling against mlpack (#2312).

    • Add Hard Shrink Activation Function (#2186).

    • Add Soft Shrink Activation Function (#2174).

    • Add Hinge Embedding Loss Function (#2229).

    • Add Cosine Embedding Loss Function (#2209).

    • Add Margin Ranking Loss Function (#2264).

    • Bugfix for incorrect parameter vector sizes in logistic regression and softmax regression (#2359).

    Source code(tar.gz)
    Source code(zip)
  • 3.2.1(Nov 26, 2019)

    Released Oct. 1, 2019. (But I forgot to release it on Github; sorry about that.)

    • Enforce CMake version check for ensmallen #2032.
    • Fix CMake check for Armadillo version #2029.
    • Better handling of when STB is not installed #2033.
    • Fix Naive Bayes classifier computations in high dimensions #2022.
    Source code(tar.gz)
    Source code(zip)
  • 3.2.0(Sep 26, 2019)

    Released Sept. 25, 2019.

    • Fix occasionally-failing RADICAL test (#1924).

    • Fix gcc 9 OpenMP compilation issue (#1970).

    • Added support for loading and saving of images (#1903).

    • Add Multiple Pole Balancing Environment (#1901, #1951).

    • Added functionality for scaling of data (#1876); see the command-line binding mlpack_preprocess_scale or Python binding preprocess_scale().

    • Add new parameter maximum_depth to decision tree and random forest bindings (#1916).

    • Fix prediction output of softmax regression when test set accuracy is calculated (#1922).

    • Pendulum environment now checks for termination. All RL environments now have an option to terminate after a set number of time steps (no limit by default) (#1941).

    • Add support for probabilistic KDE (kernel density estimation) error bounds when using the Gaussian kernel (#1934).

    • Fix negative distances for cover tree computation (#1979).

    • Fix cover tree building when all pairwise distances are 0 (#1986).

    • Improve KDE pruning by reclaiming not used error tolerance (#1954, #1984).

    • Optimizations for sparse matrix accesses in z-score normalization for CF (#1989).

    • Add kmeans_max_iterations option to GMM training binding gmm_train_main.

    • Bump minimum Armadillo version to 8.400.0 due to ensmallen dependency requirement (#2015).

    Source code(tar.gz)
    Source code(zip)
  • mlpack-3.1.1(May 27, 2019)

    Released May 26, 2019.

    • Fix random forest bug for numerical-only data (#1887).
    • Significant speedups for random forest (#1887).
    • Random forest now has minimum_gain_split and subspace_dim parameters (#1887).
    • Decision tree parameter print_training_error deprecated in favor of print_training_accuracy.
    • output option changed to predictions for adaboost and perceptron binding. Old options are now deprecated and will be preserved until mlpack 4.0.0 (#1882).
    • Concatenated ReLU layer (#1843).
    • Accelerate NormalizeLabels function using hashing instead of linear search (see src/mlpack/core/data/normalize_labels_impl.hpp) (#1780).
    • Add ConfusionMatrix() function for checking performance of classifiers (#1798).
    • Install ensmallen headers when it is downloaded during build (#1900).
    Source code(tar.gz)
    Source code(zip)
  • mlpack-3.1.0(Apr 26, 2019)

    Released April 25, 2019. Release email

    • Add DiagonalGaussianDistribution and DiagonalGMM classes to speed up the diagonal covariance computation and deprecate DiagonalConstraint (#1666).

    • Add kernel density estimation (KDE) implementation with bindings to other languages (#1301).

    • Where relevant, all models with a Train() method now return a double value representing the goodness of fit (i.e. final objective value, error, etc.) (#1678).

    • Add implementation for linear support vector machine (see src/mlpack/methods/linear_svm).

    • Change DBSCAN to use PointSelectionPolicy and add OrderedPointSelection (#1625).

    • Residual block support (#1594).

    • Bidirectional RNN (#1626).

    • Dice loss layer (#1674, #1714) and hard sigmoid layer (#1776).

    • output option changed to predictions and output_probabilities to probabilities for Naive Bayes binding (mlpack_nbc/nbc()). Old options are now deprecated and will be preserved until mlpack 4.0.0 (#1616).

    • Add support for Diagonal GMMs to HMM code (#1658, #1666). This can provide large speedup when a diagonal GMM is acceptable as an emission probability distribution.

    • Python binding improvements: check parameter type (#1717), avoid copying Pandas dataframes (#1711), handle Pandas Series objects (#1700).

    Source code(tar.gz)
    Source code(zip)
  • mlpack-3.0.4(Nov 13, 2018)

    Released November 13, 2018.

    • Bump minimum CMake version to 3.3.2.
    • CMake fixes for Ninja generator by Marc Espie (#1550, #1537, #1523).
    • More efficient linear regression implementation (#1500).
    • Serialization fixes for neural networks (#1508, #1535).
    • Mean shift now allows single-point clusters (#1536).
    Source code(tar.gz)
    Source code(zip)
  • mlpack-3.0.3(Jul 29, 2018)

    Released July 27th, 2018.

    • Fix Visual Studio compilation issue (#1443).
    • Allow running local_coordinate_coding binding with no initial_dictionary parameter when input_model is not specified (#1457).
    • Make use of OpenMP optional via the CMake USE_OPENMP configuration variable (#1474).
    • Accelerate FNN training by 20-30% by avoiding redundant calculations (#1467).
    • Fix math::RandomSeed() usage in tests (#1462, #1440).
    • Generate better Python setup.py with documentation (#1460).
    Source code(tar.gz)
    Source code(zip)
  • mlpack-3.0.2(Jun 9, 2018)

    Released June 8th, 2018.

    • Documentation generation fixes for Python bindings (#1421).
    • Fix build error for man pages if command-line bindings are not being built (#1424).
    • Add shuffle parameter and Shuffle() method to KFoldCV (#1412). This will shuffle the data when the object is constructed, or when Shuffle() is called.
    • Added neural network layers: AtrousConvolution (#1390), Embedding (#1401), and LayerNorm (layer normalization) (#1389).
    • Add Pendulum environment for reinforcement learning (#1388) and update Mountain Car environment (#1394).
    Source code(tar.gz)
    Source code(zip)
  • mlpack-3.0.1(May 11, 2018)

    Released May 10th, 2018.

    • Fix intermittently failing tests (#1387).
    • Add Big-Batch SGD (BBSGD) optimizer in src/mlpack/core/optimizers/bigbatch_sgd (#1131).
    • Fix simple compiler warnings (#1380, #1373).
    • Simplify NeighborSearch constructor and Train() overloads (#1378).
    • Add warning for OpenMP setting differences (#1358/#1382). When mlpack is compiled with OpenMP but another application linking against mlpack is not (or vice versa), a compilation warning will now be issued.
    • Restructured loss functions in src/mlpack/methods/ann/ (#1365).
    • Add environments for reinforcement learning tests (#1368, #1370, #1329).
    • Allow single outputs for multiple timestep inputs for recurrent neural networks (#1348).
    • Neural networks: add He and LeCun normal initializations (#1342), add FReLU and SELU activation functions (#1346, #1341), add alpha-dropout (#1349).
    Source code(tar.gz)
    Source code(zip)
  • mlpack-3.0.0(Mar 31, 2018)

    Released March 30th, 2018.

    • Speed and memory improvements for DBSCAN. --single_mode can now be used for situations where previously RAM usage was too high.
    • Bump minimum required version of Armadillo to 6.500.0.
    • Add automatically generated Python bindings. These have the same interface as the command-line programs.
    • Add deep learning infrastructure in src/mlpack/methods/ann/.
    • Add reinforcement learning infrastructure in src/mlpack/methods/reinforcement_learning/.
    • Add optimizers: AdaGrad, CMAES, CNE, FrankeWolfe, GradientDescent, GridSearch, IQN, Katyusha, LineSearch, ParallelSGD, SARAH, SCD, SGDR, SMORMS3, SPALeRA, SVRG.
    • Add hyperparameter tuning infrastructure and cross-validation infrastructure in src/mlpack/core/cv/ and src/mlpack/core/hpt/.
    • Fix bug in mean shift.
    • Add random forests (see src/mlpack/methods/random_forest).
    • Numerous other bugfixes and testing improvements.
    • Add randomized Krylov SVD and Block Krylov SVD.
    Source code(tar.gz)
    Source code(zip)
  • mlpack-2.2.5(Aug 26, 2017)

  • mlpack-2.2.4(Jul 19, 2017)

    Released July 18th, 2017.

    • Speed and memory improvements for DBSCAN. --single_mode can now be used for situations where previously RAM usage was too high.
    • Fix bug in CF causing incorrect recommendations.
    Source code(tar.gz)
    Source code(zip)
  • mlpack-2.2.3(May 24, 2017)

  • mlpack-2.2.2(May 5, 2017)

    Released May 4th, 2017.

    • Install backwards-compatibility mlpack_allknn and mlpack_allkfn programs; note they are deprecated and will be removed in mlpack 3.0.0 (#992).
    • Fix RStarTree bug that surfaced on OS X only (#964).
    • Small fixes for MiniBatchSGD and SGD and tests.
    Source code(tar.gz)
    Source code(zip)
  • mlpack-2.2.1(Apr 13, 2017)

  • mlpack-2.2.0(Mar 21, 2017)

    Released Mar. 21st, 2017.

    • Bugfix for mlpack_knn program (#816).
    • Add decision tree implementation in methods/decision_tree/. This is very similar to a C4.5 tree learner.
    • Add DBSCAN implementation in methods/dbscan/.
    • Add support for multidimensional discrete distributions (#810, #830).
    • Better output for Log::Debug/Log::Info/Log::Warn/Log::Fatal for Armadillo objects (#895, #928).
    • Refactor categorical CSV loading with boost::spirit for faster loading (#681).
    Source code(tar.gz)
    Source code(zip)
  • mlpack-2.1.1(Dec 22, 2016)

    Released Dec. 22nd, 2016.

    • HMMs now use random initialization; this should fix some convergence issues (#828).
    • HMMs now initialize emissions according to the distribution of observations (#833).
    • Minor fix for formatted output (#814).
    • Fix DecisionStump to properly work with any input type.
    Source code(tar.gz)
    Source code(zip)
  • mlpack-2.1.0(Oct 31, 2016)

    Released Oct. 31st, 2016.

    • Fixed CoverTree to properly handle single-point datasets.
    • Fixed a bug in CosineTree (and thus QUIC-SVD) that caused split failures for some datasets (#717).
    • Added mlpack_preprocess_describe program, which can be used to print statistics on a given dataset (#742).
    • Fix prioritized recursion for k-furthest-neighbor search (mlpack_kfn and the KFN class), leading to orders-of-magnitude speedups in some cases.
    • Bump minimum required version of Armadillo to 4.200.0.
    • Added simple Gradient Descent optimizer, found in src/mlpack/core/optimizers/gradient_descent/ (#792).
    • Added approximate furthest neighbor search algorithms QDAFN and DrusillaSelect in src/mlpack/methods/approx_kfn/, with command-line program mlpack_approx_kfn.
    Source code(tar.gz)
    Source code(zip)
  • mlpack-2.0.3(Jul 21, 2016)

    Released July 21st, 2016.

    • Standardize some parameter names for programs (old names are kept for reverse compatibility, but warnings will now be issued).
    • RectangleTree optimizations (#721).
    • Fix memory leak in NeighborSearch (#731).
    • Documentation fix for k-means tutorial (#730).
    • Fix TreeTraits for BallTree (#727).
    • Fix incorrect parameter checks for some command-line programs.
    • Fix error in HMM training with probabilities for each point (#636).
    Source code(tar.gz)
    Source code(zip)
  • mlpack-2.0.2(Jun 20, 2016)

    Released June 20th, 2016.

    • Added the function LSHSearch::Projections(), which returns an arma::cube with each projection table in a slice (#663). Instead of Projection(i), you should now use Projections().slice(i).
    • A new constructor has been added to LSHSearch that creates objects using projection tables provided in an arma::cube (#663).
    • LSHSearch projection tables refactored for speed (#675).
    • Handle zero-variance dimensions in DET (#515).
    • Add MiniBatchSGD optimizer (src/mlpack/core/optimizers/minibatch_sgd/) and allow its use in mlpack_logistic_regression and mlpack_nca programs.
    • Add better backtrace support from Grzegorz Krajewski for Log::Fatal messages when compiled with debugging and profiling symbols. This requires libbfd and libdl to be present during compilation.
    • CosineTree test fix from Mikhail Lozhnikov (#358).
    • Fixed HMM initial state estimation (#600).
    • Changed versioning macros __MLPACK_VERSION_MAJOR, __MLPACK_VERSION_MINOR, and __MLPACK_VERSION_PATCH to MLPACK_VERSION_MAJOR, MLPACK_VERSION_MINOR, and MLPACK_VERSION_PATCH. The old names will remain in place until mlpack 3.0.0.
    • Renamed mlpack_allknn, mlpack_allkfn, and mlpack_allkrann to mlpack_knn, mlpack_kfn, and mlpack_krann. The mlpack_allknn, mlpack_allkfn, and mlpack_allkrann programs will remain as copies until mlpack 3.0.0.
    • Add --random_initialization option to mlpack_hmm_train, for use when no labels are provided.
    • Add --kill_empty_clusters option to mlpack_kmeans and KillEmptyClusters policy for the KMeans class (#595, #596).
    Source code(tar.gz)
    Source code(zip)
  • mlpack-2.0.1(Mar 3, 2016)

    Released Feb. 4th, 2016.

    • Fix CMake to properly detect when MKL is being used with Armadillo.
    • Minor parameter handling fixes to mlpack_logistic_regression (#504, #505).
    • Properly install arma_config.hpp.
    • Memory handling fixes for Hoeffding tree code.
    • Add functions that allow changing training-time parameters to HoeffdingTree class.
    • Fix infinite loop in sparse coding test.
    • Documentation spelling fixes (#501).
    • Properly handle covariances for Gaussians with large condition number (#496), preventing GMMs from filling with NaNs during training (and also HMMs that use GMMs).
    • CMake fixes for finding LAPACK and BLAS as Armadillo dependencies when ATLAS is used.
    • CMake fix for projects using mlpack's CMake configuration from elsewhere (#512).
    Source code(tar.gz)
    Source code(zip)
  • mlpack-2.0.0(Dec 24, 2015)

    Released Dec. 23rd, 2015.

    • Removed overclustering support from k-means because it is not well-tested, may be buggy, and is (I think) unused. If this was support you were using, open a bug or get in touch with us; it would not be hard for us to reimplement it.
    • Refactored KMeans to allow different types of Lloyd iterations.
    • Added implementations of k-means: Elkan's algorithm, Hamerly's algorithm, Pelleg-Moore's algorithm, and the DTNN (dual-tree nearest neighbor) algorithm.
    • Significant acceleration of LRSDP via the use of accu(a % b) instead of trace(a * b).
    • Added MatrixCompletion class (matrix_completion), which performs nuclear norm minimization to fill unknown values of an input matrix.
    • No more dependence on Boost.Random; now we use C++11 STL random support.
    • Add softmax regression, contributed by Siddharth Agrawal and QiaoAn Chen.
    • Changed NeighborSearch, RangeSearch, FastMKS, LSH, and RASearch API; these classes now take the query sets in the Search() method, instead of in the constructor.
    • Use OpenMP, if available. For now OpenMP support is only available in the DET training code.
    • Add support for predicting new test point values to LARS and the command-line 'lars' program.
    • Add serialization support for Perceptron and LogisticRegression.
    • Refactor SoftmaxRegression to predict into an arma::Row<size_t> object, and add a softmax_regression program.
    • Refactor LSH to allow loading and saving of models.
    • ToString() is removed entirely (#487).
    • Add --input_model_file and --output_model_file options to appropriate machine learning algorithms.
    • Rename all executables to start with an "mlpack" prefix (#229).
    Source code(tar.gz)
    Source code(zip)
  • mlpack-1.0.12(Jan 7, 2015)

  • mlpack-1.0.0(Dec 22, 2014)

  • mlpack-1.0.1(Dec 22, 2014)

    Released March 3rd, 2012.

    • Added kernel principal components analysis (kernel PCA), found in src/mlpack/methods/kernel_pca/ (#47).
    • Fix for Lovasz-Theta AugLagrangian tests (#188).
    • Fixes for allknn output (#191, #192).
    • Added range search executable (#198).
    • Adapted citations in documentation to BiBTeX; no citations in -h output (#201).
    • Stop use of 'const char*' and prefer 'std::string' (#183).
    • Support seeds for random numbers (#182).
    Source code(tar.gz)
    Source code(zip)
Owner
mlpack
a scalable C++ machine learning library
mlpack
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

Website | Documentation | Tutorials | Installation | Release Notes CatBoost is a machine learning method based on gradient boosting over decision tree

CatBoost 6.3k Feb 2, 2022
Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow

eXtreme Gradient Boosting Community | Documentation | Resources | Contributors | Release Notes XGBoost is an optimized distributed gradient boosting l

Distributed (Deep) Machine Learning Community 22.1k Jan 27, 2022
High performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI interface.

What is xLearn? xLearn is a high performance, easy-to-use, and scalable machine learning package that contains linear model (LR), factorization machin

Chao Ma 3k Jan 24, 2022
Kubeflow is a machine learning (ML) toolkit that is dedicated to making deployments of ML workflows on Kubernetes simple, portable, and scalable.

SDK: Overview of the Kubeflow pipelines service Kubeflow is a machine learning (ML) toolkit that is dedicated to making deployments of ML workflows on

Kubeflow 2.7k Jan 27, 2022
STUMPY is a powerful and scalable Python library for computing a Matrix Profile, which can be used for a variety of time series data mining tasks

STUMPY STUMPY is a powerful and scalable library that efficiently computes something called the matrix profile, which can be used for a variety of tim

TD Ameritrade 2.1k Jan 27, 2022
A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.

Master status: Development status: Package information: TPOT stands for Tree-based Pipeline Optimization Tool. Consider TPOT your Data Science Assista

Epistasis Lab at UPenn 8.4k Jan 28, 2022
Python Extreme Learning Machine (ELM) is a machine learning technique used for classification/regression tasks.

Python Extreme Learning Machine (ELM) Python Extreme Learning Machine (ELM) is a machine learning technique used for classification/regression tasks.

Augusto Almeida 68 Jan 19, 2022
Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques

Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning.

Vowpal Wabbit 7.8k Jan 30, 2022
CD) in machine learning projectsImplementing continuous integration & delivery (CI/CD) in machine learning projects

CML with cloud compute This repository contains a sample project using CML with Terraform (via the cml-runner function) to launch an AWS EC2 instance

Iterative 15 Dec 17, 2021
LiuAlgoTrader is a scalable, multi-process ML-ready framework for effective algorithmic trading

LiuAlgoTrader is a scalable, multi-process ML-ready framework for effective algorithmic trading. The framework simplify development, testing, deployment, analysis and training algo trading strategies. The framework automatically analyzes trading sessions, and the analysis may be used to train predictive models.

Amichay Oren 326 Jan 27, 2022
UpliftML: A Python Package for Scalable Uplift Modeling

UpliftML is a Python package for scalable unconstrained and constrained uplift modeling from experimental data. To accommodate working with big data, the package uses PySpark and H2O models as base learners for the uplift models. Evaluation functions expect a PySpark dataframe as input.

Booking.com 200 Jan 25, 2022
cuML - RAPIDS Machine Learning Library

cuML - GPU Machine Learning Algorithms cuML is a suite of libraries that implement machine learning algorithms and mathematical primitives functions t

RAPIDS 2.6k Feb 1, 2022
A library of extension and helper modules for Python's data analysis and machine learning libraries.

Mlxtend (machine learning extensions) is a Python library of useful tools for the day-to-day data science tasks. Sebastian Raschka 2014-2021 Links Doc

Sebastian Raschka 3.8k Jan 31, 2022
MLBox is a powerful Automated Machine Learning python library.

MLBox is a powerful Automated Machine Learning python library. It provides the following features: Fast reading and distributed data preprocessing/cle

Axel 1.3k Jan 22, 2022
Library for machine learning stacking generalization.

stacked_generalization Implemented machine learning *stacking technic[1]* as handy library in Python. Feature weighted linear stacking is also availab

null 113 Dec 30, 2021
Uber Open Source 1.3k Jan 25, 2022
QuickAI is a Python library that makes it extremely easy to experiment with state-of-the-art Machine Learning models.

QuickAI is a Python library that makes it extremely easy to experiment with state-of-the-art Machine Learning models.

null 139 Jan 15, 2022
Pandas Machine Learning and Quant Finance Library Collection

Pandas Machine Learning and Quant Finance Library Collection

null 115 Jan 20, 2022