The implementation of FOLD-R++ algorithm

Overview

FOLD-R-PP

The implementation of FOLD-R++ algorithm. The target of FOLD-R++ algorithm is to learn an answer set program for a classification task.

Installation

Prerequisites

FOLD-R++ is developed with only python3. Numpy is the only dependency:

python3 -m pip install numpy

Instruction

Data preparation

The FOLD-R++ algorithm takes tabular data as input, the first line for the tabular data should be the feature names of each column. The FOLD-R++ does not need encoding for training. It can deal with numeric, categorical, and even mixed type features (one column contains categorical and numeric values) directly. But, the numeric features should be specified before loading data, otherwise they would be dealt like categorical features (only literals with = and != would be generated).

There are many UCI datasets can be found in the data directory, and the code pieces of data preparation should be added to datasets.py.

For example, the UCI breast-w dataset can be loaded with the following code:

columns = ['clump_thickness', 'cell_size_uniformity', 'cell_shape_uniformity', 'marginal_adhesion',
'single_epi_cell_size', 'bare_nuclei', 'bland_chromatin', 'normal_nucleoli', 'mitoses']
nums = columns
data, num_idx, columns = load_data('data/breastw/breastw.csv', attrs=columns, label=['label'], numerics=nums, pos='benign')

columns lists all the features needed, nums lists all the numeric features, label implies the feature name of the label, pos indicates the positive value of the label.

Training

The FOLD-R++ algorithm generates an explainable model that is represented with an answer set program for classification tasks. Here's an training example for breast-w dataset:

X_train, Y_train = split_xy(data_train)
X_pos, X_neg = split_X_by_Y(X_train, Y_train)
rules1 = foldrpp(X_pos, X_neg, [])

We have got a rule set rules1 in a nested intermediate representation. Flatten and decode the nested rules to answer set program:

fr1 = flatten(rules1)
rule_set = decode_rules(fr1, attrs)
for r in rule_set:
    print(r)

The training process can be started with: python3 main.py

An answer set program that is compatible with s(CASP) is generated as below.

% breastw dataset (699, 10).
% the answer set program generated by foldr++:

label(X,'benign'):- bare_nuclei(X,'?').
label(X,'benign'):- bland_chromatin(X,N6), N6=<4.0,
		    clump_thickness(X,N0), N0=<6.0,  
                    bare_nuclei(X,N5), N5=<1.0, not ab7(X).   
label(X,'benign'):- cell_size_uniformity(X,N1), N1=<2.0,
		    not ab3(X), not ab5(X), not ab6(X).  
label(X,'benign'):- cell_size_uniformity(X,N1), N1=<4.0,
		    bare_nuclei(X,N5), N5=<3.0,
		    clump_thickness(X,N0), N0=<3.0, not ab8(X).  
ab2(X):- clump_thickness(X,N0), N0=<1.0.  
ab3(X):- bare_nuclei(X,N5), N5>5.0, not ab2(X).  
ab4(X):- cell_shape_uniformity(X,N2), N2=<1.0.  
ab5(X):- clump_thickness(X,N0), N0>7.0, not ab4(X).  
ab6(X):- bare_nuclei(X,N5), N5>4.0, single_epi_cell_size(X,N4), N4=<1.0.  
ab7(X):- marginal_adhesion(X,N3), N3>4.0.  
ab8(X):- marginal_adhesion(X,N3), N3>6.0.  

% foldr++ costs:  0:00:00.027710  post: 0:00:00.000127
% acc 0.95 p 0.96 r 0.9697 f1 0.9648 

Testing in Python

The testing data X_test, a set of testing data, can be predicted with the predict function in Python.

Y_test_hat = predict(rules1, X_test)

The classify function can also be used to classify a single data.

y_test_hat = classify(rules1, x_test)

Justification by using s(CASP)

Classification and justification can be conducted with s(CASP), but the data also need to be converted into predicate format. The decode_test_data function can be used for generating predicates for testing data.

data_pred = decode_test_data(data_test, attrs)
for p in data_pred:
    print(p)

Here is an example of generated testing data predicates along with the answer set program for acute dataset:

% acute dataset (120, 7) 
% the answer set program generated by foldr++:

ab2(X):- a5(X,'no'), a1(X,N0), N0>37.9.
label(X,'yes'):- not a4(X,'no'), not ab2(X).

% foldr++ costs:  0:00:00.001990  post: 0:00:00.000040
% acc 1.0 p 1.0 r 1.0 f1 1.0 

id(1).
a1(1,37.2).
a2(1,'no').
a3(1,'yes').
a4(1,'no').
a5(1,'no').
a6(1,'no').

id(2).
a1(2,38.1).
a2(2,'no').
a3(2,'yes').
a4(2,'yes').
a5(2,'no').
a6(2,'yes').

id(3).
a1(3,37.5).
a2(3,'no').
a3(3,'no').
a4(3,'yes').
a5(3,'yes').
a6(3,'yes').

s(CASP)

All the resources of s(CASP) can be found at https://gitlab.software.imdea.org/ciao-lang/sCASP.

Citation

@misc{wang2021foldr,
      title={FOLD-R++: A Toolset for Automated Inductive Learning of Default Theories from Mixed Data}, 
      author={Huaduo Wang and Gopal Gupta},
      year={2021},
      eprint={2110.07843},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}
Minimal deep learning library written from scratch in Python, using NumPy/CuPy.

SmallPebble Project status: experimental, unstable. SmallPebble is a minimal/toy automatic differentiation/deep learning library written from scratch

Sidney Radcliffe 92 Dec 30, 2022
TensorFlow implementation of Style Transfer Generative Adversarial Networks: Learning to Play Chess Differently.

Adversarial Chess TensorFlow implementation of Style Transfer Generative Adversarial Networks: Learning to Play Chess Differently. Requirements To run

Muthu Chidambaram 30 Sep 07, 2021
KeypointDeformer: Unsupervised 3D Keypoint Discovery for Shape Control

KeypointDeformer: Unsupervised 3D Keypoint Discovery for Shape Control Tomas Jakab, Richard Tucker, Ameesh Makadia, Jiajun Wu, Noah Snavely, Angjoo Ka

Tomas Jakab 87 Nov 30, 2022
Code for the Higgs Boson Machine Learning Challenge organised by CERN & EPFL

A method to solve the Higgs boson challenge using Least Squares - Novae This project is the Project 1 of EPFL CS-433 Machine Learning. The project is

Giacomo Orsi 1 Nov 09, 2021
CARL provides highly configurable contextual extensions to several well-known RL environments.

CARL (context adaptive RL) provides highly configurable contextual extensions to several well-known RL environments.

AutoML-Freiburg-Hannover 51 Dec 28, 2022
Decoding the Protein-ligand Interactions Using Parallel Graph Neural Networks

Decoding the Protein-ligand Interactions Using Parallel Graph Neural Networks Requirements python 0.10+ rdkit 2020.03.3.0 biopython 1.78 openbabel 2.4

Neeraj Kumar 3 Nov 23, 2022
[NeurIPS-2021] Slow Learning and Fast Inference: Efficient Graph Similarity Computation via Knowledge Distillation

Efficient Graph Similarity Computation - (EGSC) This repo contains the source code and dataset for our paper: Slow Learning and Fast Inference: Effici

23 Nov 11, 2022
A chemical analysis of lipophilicities & molecule drawings including ML

A chemical analysis of lipophilicity & molecule drawings including a bit of ML analysis. This is a simple project that includes two Jupyter files (one

Aurimas A. Nausėdas 7 Nov 22, 2022
Mememoji - A facial expression classification system that recognizes 6 basic emotions: happy, sad, surprise, fear, anger and neutral.

a project built with deep convolutional neural network and ❤️ Table of Contents Motivation The Database The Model 3.1 Input Layer 3.2 Convolutional La

Jostine Ho 761 Dec 05, 2022
Pytorch implementation of paper "Efficient Nearest Neighbor Language Models" (EMNLP 2021)

Pytorch implementation of paper "Efficient Nearest Neighbor Language Models" (EMNLP 2021)

Junxian He 57 Jan 01, 2023
A system for quickly generating training data with weak supervision

Programmatically Build and Manage Training Data Announcement The Snorkel team is now focusing their efforts on Snorkel Flow, an end-to-end AI applicat

Snorkel Team 5.4k Jan 02, 2023
Reliable probability face embeddings

ProbFace, arxiv This is a demo code of training and testing [ProbFace] using Tensorflow. ProbFace is a reliable Probabilistic Face Embeddging (PFE) me

Kaen Chan 34 Dec 31, 2022
DeepVoxels is an object-specific, persistent 3D feature embedding.

DeepVoxels is an object-specific, persistent 3D feature embedding. It is found by globally optimizing over all available 2D observations of

Vincent Sitzmann 196 Dec 25, 2022
Embracing Single Stride 3D Object Detector with Sparse Transformer

SST: Single-stride Sparse Transformer This is the official implementation of paper: Embracing Single Stride 3D Object Detector with Sparse Transformer

TuSimple 385 Dec 28, 2022
The final project of "Applying AI to 3D Medical Imaging Data" from "AI for Healthcare" nanodegree - Udacity.

Quantifying Hippocampus Volume for Alzheimer's Progression Background Alzheimer's disease (AD) is a progressive neurodegenerative disorder that result

Omar Laham 1 Jan 14, 2022
Numbering permanent and deciduous teeth via deep instance segmentation in panoramic X-rays

Numbering permanent and deciduous teeth via deep instance segmentation in panoramic X-rays In this repo, you will find the instructions on how to requ

Intelligent Vision Research Lab 4 Jul 21, 2022
Final project code: Implementing BicycleGAN, for CIS680 FA21 at University of Pennsylvania

680 Final Project: BicycleGAN Haoran Tang Instructions 1. Training To train the network, please run train.py. Change hyper-parameters and folder paths

Haoran Tang 0 Apr 22, 2022
Unofficial reimplementation of ECAPA-TDNN for speaker recognition (EER=0.86 for Vox1_O when train only in Vox2)

Introduction This repository contains my unofficial reimplementation of the standard ECAPA-TDNN, which is the speaker recognition in VoxCeleb2 dataset

Tao Ruijie 277 Dec 31, 2022
Arquitetura e Desenho de Software.

S203 Este é um repositório dedicado às aulas de Arquitetura e Desenho de Software, cuja sigla é "S203". E agora, José? Como não tenho muito a falar aq

Fabio 7 Oct 23, 2021
End-To-End Optimization of LiDAR Beam Configuration

End-To-End Optimization of LiDAR Beam Configuration arXiv | IEEE Xplore This repository is the official implementation of the paper: End-To-End Optimi

Niclas 30 Nov 28, 2022