machine learning with logical rules in Python

Overview

Travis Coveralls CircleCI Python27 Python35

logo.png

skope-rules

Skope-rules is a Python machine learning module built on top of scikit-learn and distributed under the 3-Clause BSD license.

Skope-rules aims at learning logical, interpretable rules for "scoping" a target class, i.e. detecting with high precision instances of this class.

Skope-rules is a trade off between the interpretability of a Decision Tree and the modelization power of a Random Forest.

See the AUTHORS.rst file for a list of contributors.

schema.png

Installation

You can get the latest sources with pip :

pip install skope-rules

Quick Start

SkopeRules can be used to describe classes with logical rules :

from sklearn.datasets import load_iris
from skrules import SkopeRules

dataset = load_iris()
feature_names = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width']
clf = SkopeRules(max_depth_duplication=2,
                 n_estimators=30,
                 precision_min=0.3,
                 recall_min=0.1,
                 feature_names=feature_names)

for idx, species in enumerate(dataset.target_names):
    X, y = dataset.data, dataset.target
    clf.fit(X, y == idx)
    rules = clf.rules_[0:3]
    print("Rules for iris", species)
    for rule in rules:
        print(rule)
    print()
    print(20*'=')
    print()

SkopeRules can also be used as a predictor if you use the "score_top_rules" method :

from sklearn.datasets import load_boston
from sklearn.metrics import precision_recall_curve
from matplotlib import pyplot as plt
from skrules import SkopeRules

dataset = load_boston()
clf = SkopeRules(max_depth_duplication=None,
                 n_estimators=30,
                 precision_min=0.2,
                 recall_min=0.01,
                 feature_names=dataset.feature_names)

X, y = dataset.data, dataset.target > 25
X_train, y_train = X[:len(y)//2], y[:len(y)//2]
X_test, y_test = X[len(y)//2:], y[len(y)//2:]
clf.fit(X_train, y_train)
y_score = clf.score_top_rules(X_test) # Get a risk score for each test example
precision, recall, _ = precision_recall_curve(y_test, y_score)
plt.plot(recall, precision)
plt.xlabel('Recall')
plt.ylabel('Precision')
plt.title('Precision Recall curve')
plt.show()

For more examples and use cases please check our documentation. You can also check the demonstration notebooks.

Links with existing literature

The main advantage of decision rules is that they are offering interpretable models. The problem of generating such rules has been widely considered in machine learning, see e.g. RuleFit [1], Slipper [2], LRI [3], MLRules[4].

A decision rule is a logical expression of the form "IF conditions THEN response". In a binary classification setting, if an instance satisfies conditions of the rule, then it is assigned to one of the two classes. If this instance does not satisfy conditions, it remains unassigned.

  1. In [2, 3, 4], rules induction is done by considering each single decision rule as a base classifier in an ensemble, which is built by greedily minimizing some loss function.
  2. In [1], rules are extracted from an ensemble of trees; a weighted combination of these rules is then built by solving a L1-regularized optimization problem over the weights as described in [5].

In this package, we use the second approach. Rules are extracted from tree ensemble, which allow us to take advantage of existing fast algorithms (such as bagged decision trees, or gradient boosting) to produce such tree ensemble. Too similar or duplicated rules are then removed, based on a similarity threshold of their supports.. The main goal of this package is to provide rules verifying precision and recall conditions. It still implement a score (decision_function) method, but which does not solve the L1-regularized optimization problem as in [1]. Instead, weights are simply proportional to the OOB associated precision of the rule.

This package also offers convenient methods to compute predictions with the k most precise rules (cf score_top_rules() and predict_top_rules() functions).

[1] Friedman and Popescu, Predictive learning via rule ensembles,Technical Report, 2005.

[2] Cohen and Singer, A simple, fast, and effective rule learner, National Conference on Artificial Intelligence, 1999.

[3] Weiss and Indurkhya, Lightweight rule induction, ICML, 2000.

[4] Dembczyński, Kotłowski and Słowiński, Maximum Likelihood Rule Ensembles, ICML, 2008.

[5] Friedman and Popescu, Gradient directed regularization, Technical Report, 2004.

Dependencies

skope-rules requires:

  • Python (>= 2.7 or >= 3.3)
  • NumPy (>= 1.10.4)
  • SciPy (>= 0.17.0)
  • Pandas (>= 0.18.1)
  • Scikit-Learn (>= 0.17.1)

For running the examples Matplotlib >= 1.1.1 is required.

Documentation

You can access the full project documentation here

You can also check the notebooks/ folder which contains some examples of utilization.

Comments
  • TerminatedWorkerError

    TerminatedWorkerError

    I keep running into a TerminatedWorkerError when running clf.fit with skope rules. I seem to have ample memory so I'm unsure what's going on. Any potential ideas?

    Traceback (most recent call last):
      File "experiment.py", line 171, in <module>
        result = process(topic)
      File "experiment.py", line 95, in process
        clf.fit(features, training_data_labels)
      File "/home/ubuntu/.local/share/virtualenvs/taxonomy-analysis2-BU9HWu51/lib/python3.7/site-packages/skrules/skope_rules.py", line 312, in fit
        clf.fit(X, y)
      File "/home/ubuntu/.local/share/virtualenvs/taxonomy-analysis2-BU9HWu51/lib/python3.7/site-packages/sklearn/ensemble/bagging.py", line 244, in fit
        return self._fit(X, y, self.max_samples, sample_weight=sample_weight)
      File "/home/ubuntu/.local/share/virtualenvs/taxonomy-analysis2-BU9HWu51/lib/python3.7/site-packages/sklearn/ensemble/bagging.py", line 378, in _fit
        for i in range(n_jobs))
      File "/home/ubuntu/.local/share/virtualenvs/taxonomy-analysis2-BU9HWu51/lib/python3.7/site-packages/sklearn/externals/joblib/parallel.py", line 930, in __call__
        self.retrieve()
      File "/home/ubuntu/.local/share/virtualenvs/taxonomy-analysis2-BU9HWu51/lib/python3.7/site-packages/sklearn/externals/joblib/parallel.py", line 833, in retrieve
        self._output.extend(job.get(timeout=self.timeout))
      File "/home/ubuntu/.local/share/virtualenvs/taxonomy-analysis2-BU9HWu51/lib/python3.7/site-packages/sklearn/externals/joblib/_parallel_backends.py", line 521, in wrap_future_result
        return future.result(timeout=timeout)
      File "/usr/lib/python3.7/concurrent/futures/_base.py", line 432, in result
        return self.__get_result()
      File "/usr/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
        raise self._exception
    sklearn.externals.joblib.externals.loky.process_executor.TerminatedWorkerError: A worker process managed by the executor was unexpectedly terminated. This could be caused by a segmentation fault while calling the function or by an excessive memory usage causing the Operating System to kill the worker. The exit codes of the workers are {SIGKILL(-9)}
    
    opened by AlJohri 37
  • Remove unnecessary checks for numpy/scipy in setup.py.

    Remove unnecessary checks for numpy/scipy in setup.py.

    These deps are listing in install_requires, so will get installed by pip.

    Addresses issue #19.

    Interesting project, thanks for your dev efforts folks :-)

    opened by timstaley 7
  • Skope Rules should accept any kind of feature name

    Skope Rules should accept any kind of feature name

    SkopeRules uses pandas.eval method for evaluating semantic rules. It leads to error when features have meaningful characters in their name (eg: (,)=- ). For example :

    from sklearn.datasets import load_iris
    from skrules import SkopeRules
    dataset = load_iris()
    
    X, y, features_names = dataset.data, dataset.target, dataset.feature_names
    y = (y == 0)  # Predicting the first specy vs all
    clf = SkopeRules(max_depth_duplication=2,
                     n_estimators=30,
                     precision_min=0.3,
                     recall_min=0.1,
                     feature_names=features_names)
    clf.fit(X, y)
    

    will lead to following error :

    Traceback (most recent call last):
      File "main.py", line 20, in <module>
        clf.fit(X, y)
      File "/usr/local/lib/python3.6/site-packages/skrules/skope_rules.py", line 350, in fit
        for r in set(rules_from_tree)]
      File "/usr/local/lib/python3.6/site-packages/skrules/skope_rules.py", line 350, in <listcomp>
        for r in set(rules_from_tree)]
      File "/usr/local/lib/python3.6/site-packages/skrules/skope_rules.py", line 600, in _eval_rule_perf
        detected_index = list(X.query(rule).index)
      File "/usr/local/lib/python3.6/site-packages/pandas/core/frame.py", line 2297, in query
        res = self.eval(expr, **kwargs)
      File "/usr/local/lib/python3.6/site-packages/pandas/core/frame.py", line 2366, in eval
        return _eval(expr, inplace=inplace, **kwargs)
      File "/usr/local/lib/python3.6/site-packages/pandas/core/computation/eval.py", line 290, in eval
        truediv=truediv)
      File "/usr/local/lib/python3.6/site-packages/pandas/core/computation/expr.py", line 732, in __init__
        self.terms = self.parse()
      File "/usr/local/lib/python3.6/site-packages/pandas/core/computation/expr.py", line 749, in parse
        return self._visitor.visit(self.expr)
      File "/usr/local/lib/python3.6/site-packages/pandas/core/computation/expr.py", line 310, in visit
        node = ast.fix_missing_locations(ast.parse(clean))
      File "/usr/local/Cellar/python3/3.6.4/Frameworks/Python.framework/Versions/3.6/lib/python3.6/ast.py", line 35, in parse
        return compile(source, filename, mode, PyCF_ONLY_AST)
      File "<unknown>", line 1
        petal length (cm )<=2.5999999046325684
    

    Skope Rules should accept any kind of feature name. It means we have to transform feature name for computation and transforming it back at the end.

    opened by floriangardin 3
  • Fix import error in modern Python

    Fix import error in modern Python

    collections.Iterable alias was removed in Python 3.10 and typing.Iterable alias is marked as deprecated; fallback to explicit import from collections.abc.

    opened by patrick-nicholson-czi 1
  • ImportError: cannot import name 'Iterable' from 'collections'

    ImportError: cannot import name 'Iterable' from 'collections'

    Python 3.10
    skope-rules==1.0.1
    

    Error

    ---------------------------------------------------------------------------
    ImportError                               Traceback (most recent call last)
    Input In [1], in <module>
         15 from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis
         16 from interpret.glassbox import ExplainableBoostingClassifier
    ---> 17 from skrules import SkopeRules
    
    File /t/pyenv/versions/py-default/lib/python3.10/site-packages/skrules/__init__.py:1, in <module>
    ----> 1 from .skope_rules import SkopeRules
          2 from .rule import Rule, replace_feature_name
          4 __all__ = ['SkopeRules', 'Rule']
    
    File /t/pyenv/versions/py-default/lib/python3.10/site-packages/skrules/skope_rules.py:2, in <module>
          1 import numpy as np
    ----> 2 from collections import Counter, Iterable
          3 import pandas
          4 import numbers
    
    ImportError: cannot import name 'Iterable' from 'collections' (/t/pyenv/versions/3.10.2/lib/python3.10/collections/__init__.py)
    
    
    opened by elcolie 1
  • Fix/update tests

    Fix/update tests

    Description

    This PR contains some fixes to get tests working and build passing, primarily around updating tests and imports to handle deprecation of various sklearn testing functions and other imports.

    Instead of pinning a new sklearn version, have tried to maintain compatibility with a bunch of try-except blocks, but would be happy to hear thoughts on this approach. May also be worthwhile to add different sklearn versions in the travis CI build.

    opened by AndrewTanQB 1
  • issue in mask indexing

    issue in mask indexing

    Hi, thank you for sharing this great package.

    However, I think I might find a mistake in the mask indexing.

    mask = ~samples

    samples is numpy array, and when you put ~, you can get -(value+1).

    ex. samples = np.array([1,2,3,4]) ~samples [-2, -3, -4, -5]

    please check this issue.

    Thanks!

    opened by stat17-hb 1
  • Release new version to pypi.org?

    Release new version to pypi.org?

    There are a number of useful commits on the master branch, e.g. https://github.com/scikit-learn-contrib/skope-rules/pull/24.

    It's been more than 1.5 years since the last release. Would it be possible for you to upload a new package to pypi.org?

    opened by ecederstrand 1
  • Any variable name can be used in

    Any variable name can be used in "feature_names"

    Now any variable name can be used in the "feature_names" list parameter of Skope Rules. I decoupled the feature names from the internal queries logic.

    opened by floriangardin 1
  • conda-forge package

    conda-forge package

    It would be nice to add a skope-rules package to conda-forge https://conda-forge.org/ (in addition to pypi)

    P.S. You can use grayskull https://github.com/conda-incubator/grayskull to generate a boilerplate for the conda recipe.

    opened by candalfigomoro 2
  • cannot import name 'ScopeRules' from 'skrules'

    cannot import name 'ScopeRules' from 'skrules'

    Hi!

    The package import spell, which is clearly described in the package readme, does not work image

    image

    Six imported. What should I do to make the package work?

    opened by avraam-inside 1
  • Questions about how to use and interpret rules?

    Questions about how to use and interpret rules?

    1. Can SkopeRules be used for multiclass classification or only binary classification.

    2. How do I interpret the outputted decision rules? Do the top-k rules in the example notebook correspond to the rules that best classify the test data, ordered in descending order by precision? If I want to classify new test data, do I consider the top-1 rule, the majority vote from the top-k rules, or some other approach?

    3. If I want to understand the underlying method and how rules are computed, is Predictive Learning via Rule Ensembles by Friedman and Popescu the closest work?

    opened by preethiseshadri518 0
  • Not compatible with sklearn v1?

    Not compatible with sklearn v1?

    Minimal example:

    >>> import sklearn
    >>> sklearn.__version__
    1.0.1
    >>> import skrules
    ---------------------------------------------------------------------------
    ImportError                               Traceback (most recent call last)
    <ipython-input-3-195b491d5645> in <module>
    ----> 1 import skrules
    
    ~/.virtualenvs/risk-modeling/lib/python3.9/site-packages/skrules/__init__.py in <module>
    ----> 1 from .skope_rules import SkopeRules
          2 from .rule import Rule, replace_feature_name
          3 
          4 __all__ = ['SkopeRules', 'Rule']
    
    ~/.virtualenvs/risk-modeling/lib/python3.9/site-packages/skrules/skope_rules.py in <module>
         10 from sklearn.tree import DecisionTreeClassifier, DecisionTreeRegressor
         11 from sklearn.ensemble import BaggingClassifier, BaggingRegressor
    ---> 12 from sklearn.externals import six
         13 from sklearn.tree import _tree
         14 
    
    ImportError: cannot import name 'six' from 'sklearn.externals' (/home/mwintner/.virtualenvs/risk-modeling/lib/python3.9/site-packages/sklearn/externals/__init__.py)
    

    According to some stackoverflow sources like this one, six is not in sklearn.externals beyond sklearn v0.23.

    opened by mwintner-fn 1
  • The oob score

    The oob score

    I think the oob score computed in the fit function is wrong.

    The authors get the oob sample indices by "mask = ~samples", and then apply X[mask, :] to get the oob samples. Actually, I test the case and found that there are many same elements between samples and X[mask,:], and the length of training samples and mask samples are the same. For example, if we totally have 100 samples, when 80 samples are used to train the model, then the length of oob samples should be 100-80=20 (without considering replacement).

    I also turn to the implementation of sampling oob of randomforest, and I found following codes:

    random_instance = check_random_state(random_state) sample_indices = random_instance.randint(0, samples, max_samples) # get the indices of training samples sample_counts = np.bincount(sample_indices, minlength=len(samples)) unsampled_mask = sample_counts == 0 indices_range = np.arange(len(samples)) unsampled_indices = indices_range[unsampled_mask] # get the indices of oob samples

    then the unsampled_indices is the truely oob sample indices.

    opened by wjj5881005 0
Releases(v1.0.1)
Owner
scikit-learn compatible projects
Fast solver for L1-type problems: Lasso, sparse Logisitic regression, Group Lasso, weighted Lasso, Multitask Lasso, etc.

celer Fast algorithm to solve Lasso-like problems with dual extrapolation. Currently, the package handles the following problems: Lasso weighted Lasso

168 Dec 13, 2022
A library of sklearn compatible categorical variable encoders

Categorical Encoding Methods A set of scikit-learn-style transformers for encoding categorical variables into numeric by means of different techniques

2.1k Jan 02, 2023
Genetic Algorithm, Particle Swarm Optimization, Simulated Annealing, Ant Colony Optimization Algorithm,Immune Algorithm, Artificial Fish Swarm Algorithm, Differential Evolution and TSP(Traveling salesman)

scikit-opt Swarm Intelligence in Python (Genetic Algorithm, Particle Swarm Optimization, Simulated Annealing, Ant Colony Algorithm, Immune Algorithm,A

郭飞 3.7k Jan 01, 2023
A Python Package to Tackle the Curse of Imbalanced Datasets in Machine Learning

imbalanced-learn imbalanced-learn is a python package offering a number of re-sampling techniques commonly used in datasets showing strong between-cla

6.2k Jan 01, 2023
A library of extension and helper modules for Python's data analysis and machine learning libraries.

Mlxtend (machine learning extensions) is a Python library of useful tools for the day-to-day data science tasks. Sebastian Raschka 2014-2021 Links Doc

Sebastian Raschka 4.2k Dec 28, 2022
Extra blocks for scikit-learn pipelines.

scikit-lego We love scikit learn but very often we find ourselves writing custom transformers, metrics and models. The goal of this project is to atte

vincent d warmerdam 941 Dec 30, 2022
scikit-learn cross validators for iterative stratification of multilabel data

iterative-stratification iterative-stratification is a project that provides scikit-learn compatible cross validators with stratification for multilab

745 Jan 05, 2023
A Python library for dynamic classifier and ensemble selection

DESlib DESlib is an easy-to-use ensemble learning library focused on the implementation of the state-of-the-art techniques for dynamic classifier and

425 Dec 18, 2022
Topological Data Analysis for Python🐍

Scikit-TDA is a home for Topological Data Analysis Python libraries intended for non-topologists. This project aims to provide a curated library of TD

Scikit-TDA 373 Dec 24, 2022
machine learning with logical rules in Python

skope-rules Skope-rules is a Python machine learning module built on top of scikit-learn and distributed under the 3-Clause BSD license. Skope-rules a

504 Dec 31, 2022
Multivariate imputation and matrix completion algorithms implemented in Python

A variety of matrix completion and imputation algorithms implemented in Python 3.6. To install: pip install fancyimpute Do not use conda. We don't sup

Alex Rubinsteyn 1.1k Dec 18, 2022
Large-scale linear classification, regression and ranking in Python

lightning lightning is a library for large-scale linear classification, regression and ranking in Python. Highlights: follows the scikit-learn API con

1.6k Dec 31, 2022
Data Analysis Baseline Library

dabl The data analysis baseline library. "Mr Sanchez, are you a data scientist?" "I dabl, Mr president." Find more information on the website. State o

Andreas Mueller 122 Dec 27, 2022
Scikit-learn compatible estimation of general graphical models

skggm : Gaussian graphical models using the scikit-learn API In the last decade, learning networks that encode conditional independence relationships

213 Jan 02, 2023
scikit-learn inspired API for CRFsuite

sklearn-crfsuite sklearn-crfsuite is a thin CRFsuite (python-crfsuite) wrapper which provides interface simlar to scikit-learn. sklearn_crfsuite.CRF i

418 Jan 09, 2023
A scikit-learn based module for multi-label et. al. classification

scikit-multilearn scikit-multilearn is a Python module capable of performing multi-label learning tasks. It is built on-top of various scientific Pyth

803 Jan 05, 2023
(AAAI' 20) A Python Toolbox for Machine Learning Model Combination

combo: A Python Toolbox for Machine Learning Model Combination Deployment & Documentation & Stats Build Status & Coverage & Maintainability & License

Yue Zhao 606 Dec 21, 2022