The easiest tool for extracting radiomics features and training ML models on them.

Last update: Aug 04, 2022

Overview

Simple pipeline for experimenting with radiomics features

Installation

git clone https://github.com/piotrekwoznicki/ClassyRadiomics.git
cd classrad
pip install -e .

Example - Hydronephrosis detection from CT images:

Extract radiomics features and save them to CSV table

df = pd.read_csv(table_dir / "paths.csv")
extractor = FeatureExtractor(
    df=df,
    out_path=(table_dir / "features.csv"),
    image_col="img_path",
    mask_col="seg_path",
    verbose=True,
)
extractor.extract_features()

Create a dataset from the features table

feature_df = pd.read_csv(table_dir / "features.csv")
data = Dataset(
    dataframe=feature_df,
    features=feature_cols,
    target=label_col="Hydronephrosis",
    task_name="Hydronephrosis detection"
)
data.cross_validation_split_test_from_column(
    column_name="cohort", test_value="control"
)

Select classifiers to compare

classifier_names = [
    "Gaussian Process Classifier",
    "Logistic Regression",
    "SVM",
    "Random Forest",
    "XGBoost",
]
classifiers = [MLClassifier(name) for name in classifier_names]

Create an evaluator to train and evaluate selected classifiers

evaluator = Evaluator(dataset=data, models=classifiers)
evaluator.evaluate_cross_validation()
evaluator.boxplot_by_class()
evaluator.plot_all_cross_validation()
evaluator.plot_test()

Comments

Preprocessing features fails during machine learning

Describe the bug

Trying to use Machine Learning in the self-hosted webapp, as well as in example_WORC.ipynb fails.

Steps/Code to Reproduce

import pandas as pd
from pathlib import Path
from autorad.external.download_WORC import download_WORCDatabase

# Set where we will save our data and results
base_dir = Path.cwd() / "autorad_tutorial"
data_dir = base_dir / "data"
result_dir = base_dir / "results"
data_dir.mkdir(exist_ok=True, parents=True)
result_dir.mkdir(exist_ok=True, parents=True)

%load_ext autoreload
%autoreload 2

download data (it may take a few minutes)
download_WORCDatabase(
dataset="Desmoid",
data_folder=data_dir,
n_subjects=100,
)

from autorad.utils.preprocessing import get_paths_with_separate_folder_per_case

# create a table with all the paths
paths_df = get_paths_with_separate_folder_per_case(data_dir, relative=True)
paths_df.sample(5)


from autorad.data.dataset import ImageDataset
from autorad.feature_extraction.extractor import FeatureExtractor
import logging

logging.getLogger().setLevel(logging.CRITICAL)

image_dataset = ImageDataset(
    paths_df,
    ID_colname="ID",
    root_dir=data_dir,
)

# Let's take a look at the data, plotting random 10 cases
image_dataset.plot_examples(n=10, window=None)

extractor = FeatureExtractor(image_dataset, extraction_params="MR_default.yaml")
feature_df = extractor.run()

feature_df.head()

label_df = pd.read_csv(data_dir / "labels.csv")
label_df.sample(5)

from autorad.data.dataset import FeatureDataset

merged_feature_df = feature_df.merge(label_df, left_on="ID",
    right_on="patient_ID", how="left")
feature_dataset = FeatureDataset(
    merged_feature_df,
    target="diagnosis",
    ID_colname="ID"
)

splits_path = result_dir / "splits.json"
feature_dataset.split(method="train_val_test", save_path=splits_path)

from autorad.models.classifier import MLClassifier
from autorad.training.trainer import Trainer

models = MLClassifier.initialize_default_sklearn_models()
print(models)

trainer = Trainer(
    dataset=feature_dataset,
    models=models,
    result_dir=result_dir,
    experiment_name="Fibromatosis_vs_sarcoma_classification",
)
trainer.run_auto_preprocessing(
        selection_methods=["boruta"],
        oversampling=False,
        )

Expected Results

Initialising the trainer and running preprocessing on the features

Actual Results

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [15], in <cell line: 7>()
      1 trainer = Trainer(
      2     dataset=feature_dataset,
      3     models=models,
      4     result_dir=result_dir,
      5     experiment_name="Fibromatosis_vs_sarcoma_classification",
      6 )
----> 7 trainer.run_auto_preprocessing(
      8         selection_methods=["boruta"],
      9         oversampling=False,
     10         )

File ~/AutoRadiomics/autorad/training/trainer.py:78, in Trainer.run_auto_preprocessing(self, oversampling, selection_methods)
     70 preprocessor = Preprocessor(
     71     normalize=True,
     72     feature_selection_method=selection_method,
     73     oversampling_method=oversampling_method,
     74 )
     75 try:
     76     preprocessed[selection_method][
     77         oversampling_method
---> 78     ] = preprocessor.fit_transform(self.dataset.data)
     79 except AssertionError:
     80     log.error(
     81         f"Preprocessing with {selection_method} and {oversampling_method} failed."
     82     )

File ~/AutoRadiomics/autorad/preprocessing/preprocessor.py:66, in Preprocessor.fit_transform(self, data)
     64 result_y = {}
     65 all_features = X.train.columns.tolist()
---> 66 X_train_trans, y_train_trans = self.pipeline.fit_transform(
     67     X.train, y.train
     68 )
     69 self.selected_features = self.pipeline["select"].selected_features(
     70     column_names=all_features
     71 )
     72 result_X["train"] = pd.DataFrame(
     73     X_train_trans, columns=self.selected_features
     74 )

File ~/miniconda3/envs/AutoRadiomics/lib/python3.10/site-packages/sklearn/pipeline.py:434, in Pipeline.fit_transform(self, X, y, **fit_params)
    432 fit_params_last_step = fit_params_steps[self.steps[-1][0]]
    433 if hasattr(last_step, "fit_transform"):
--> 434     return last_step.fit_transform(Xt, y, **fit_params_last_step)
    435 else:
    436     return last_step.fit(Xt, y, **fit_params_last_step).transform(Xt)

File ~/AutoRadiomics/autorad/feature_selection/selector.py:47, in CoreSelector.fit_transform(self, X, y)
     44 def fit_transform(
     45     self, X: np.ndarray, y: np.ndarray
     46 ) -> tuple[np.ndarray, np.ndarray]:
---> 47     self.fit(X, y)
     48     return X[:, self.selected_columns], y

File ~/AutoRadiomics/autorad/feature_selection/selector.py:124, in BorutaSelector.fit(self, X, y, verbose)
    122 with warnings.catch_warnings():
    123     warnings.simplefilter("ignore")
--> 124     model.fit(X, y)
    125 self.selected_columns = np.where(model.support_)[0].tolist()
    126 if not self.selected_columns:

File ~/miniconda3/envs/AutoRadiomics/lib/python3.10/site-packages/boruta/boruta_py.py:201, in BorutaPy.fit(self, X, y)
    188 def fit(self, X, y):
    189     """
    190     Fits the Boruta feature selection with the provided estimator.
    191 
   (...)
    198         The target values.
    199     """
--> 201     return self._fit(X, y)

File ~/miniconda3/envs/AutoRadiomics/lib/python3.10/site-packages/boruta/boruta_py.py:251, in BorutaPy._fit(self, X, y)
    249 def _fit(self, X, y):
    250     # check input params
--> 251     self._check_params(X, y)
    252     self.random_state = check_random_state(self.random_state)
    253     # setup variables for Boruta

File ~/miniconda3/envs/AutoRadiomics/lib/python3.10/site-packages/boruta/boruta_py.py:517, in BorutaPy._check_params(self, X, y)
    513 """
    514 Check hyperparameters as well as X and y before proceeding with fit.
    515 """
    516 # check X and y are consistent len, X is Array and y is column
--> 517 X, y = check_X_y(X, y)
    518 if self.perc <= 0 or self.perc > 100:
    519     raise ValueError('The percentile should be between 0 and 100.')

File ~/miniconda3/envs/AutoRadiomics/lib/python3.10/site-packages/sklearn/utils/validation.py:964, in check_X_y(X, y, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, multi_output, ensure_min_samples, ensure_min_features, y_numeric, estimator)
    961 if y is None:
    962     raise ValueError("y cannot be None")
--> 964 X = check_array(
    965     X,
    966     accept_sparse=accept_sparse,
    967     accept_large_sparse=accept_large_sparse,
    968     dtype=dtype,
    969     order=order,
    970     copy=copy,
    971     force_all_finite=force_all_finite,
    972     ensure_2d=ensure_2d,
    973     allow_nd=allow_nd,
    974     ensure_min_samples=ensure_min_samples,
    975     ensure_min_features=ensure_min_features,
    976     estimator=estimator,
    977 )
    979 y = _check_y(y, multi_output=multi_output, y_numeric=y_numeric)
    981 check_consistent_length(X, y)

File ~/miniconda3/envs/AutoRadiomics/lib/python3.10/site-packages/sklearn/utils/validation.py:746, in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator)
    744         array = array.astype(dtype, casting="unsafe", copy=False)
    745     else:
--> 746         array = np.asarray(array, order=order, dtype=dtype)
    747 except ComplexWarning as complex_warning:
    748     raise ValueError(
    749         "Complex data not supported\n{}\n".format(array)
    750     ) from complex_warning

ValueError: could not broadcast input array from shape (60,1015) into shape (60,)

opened by wagon-master 3

BUG: Time and memory inefficient concating in pandas on every case.

In the feature extraction, we concat a pd.DataFrame for every case. AFAIK this construction of a pd.DataFrame leads to a new memory allocation (and copying) every time, which is highly memory inefficient. Especially, when parallelized on many CPUs, combined with the already memory intensive forking in joblib this can lead to OOM-Events (and is slow of course). Wouldn't it be more convenient to return only the feature set, that is currently processed. https://github.com/pwoznicki/AutoRadiomics/blob/e475893c566de057d742f32da5cb9ece23a44eb0/autorad/feature_extraction/extractor.py#L109-L115 These are subsequently collected in results anyways: https://github.com/pwoznicki/AutoRadiomics/blob/e475893c566de057d742f32da5cb9ece23a44eb0/autorad/feature_extraction/extractor.py#L135-L144

opened by laqua-stack 2
Feature/add inference mlflow
Major changes:

fixed training with autologging of training parameters, preprocessor and classifier in MLFlow

webapp: added Predict subpage for inference on a single case, giving out class probability and Shap explanation

webapp: moved all steps into subpages

webapp: added Getting started in the landing page

Fixes:

webapp: fixed extraction params discarding Feature Names selected from Feature Classes
opened by pwoznicki 1

example_WORC.ipynb not being up to date with the repository

Describe the bug

In example_WORC.ipynb there are function calls that do not work due to code in the repository being changed while the example_WORC.ipynb code wasn't updated to reflect those changes

Steps/Code to Reproduce

import pandas as pd
from pathlib import Path
from autorad.external.download_WORC import download_WORCDatabase

# Set where we will save our data and results
base_dir = Path.cwd() / "autorad_tutorial"
data_dir = base_dir / "data"
result_dir = base_dir / "results"
data_dir.mkdir(exist_ok=True, parents=True)
result_dir.mkdir(exist_ok=True, parents=True)

%load_ext autoreload
%autoreload 2

download data (it may take a few minutes)
download_WORCDatabase(
dataset="Desmoid",
data_folder=data_dir,
n_subjects=100,
)



from autorad.data.utils import get_paths_with_separate_folder_per_case  # 1

# create a table with all the paths
paths_df = get_paths_with_separate_folder_per_case(data_dir, relative=True)
paths_df.sample(5)


from autorad.data.dataset import ImageDataset
from autorad.feature_extraction.extractor import FeatureExtractor
import logging

logging.getLogger().setLevel(logging.CRITICAL)

image_dataset = ImageDataset(
    paths_df,
    ID_colname="ID",
    root_dir=data_dir,
)

# Let's take a look at the data, plotting random 10 cases
image_dataset.plot_examples(n=10, window=None)

extractor = FeatureExtractor(image_dataset, extraction_params="default_MR.yaml") # 2
feature_df = extractor.run()

Expected Results

1: Importing the function get_paths_with_separate_folder_per_case

2: Using default_MR.yaml as value for extraction_params

Actual Results

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Input In [7], in <cell line: 1>()
----> 1 from autorad.data.utils import get_paths_with_separate_folder_per_case
      3 # create a table with all the paths
      4 paths_df = get_paths_with_separate_folder_per_case(data_dir, relative=True)

ModuleNotFoundError: No module named 'autorad.data.utils'

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [18], in <cell line: 1>()
----> 1 extractor = FeatureExtractor(image_dataset, extraction_params="default_MR.yaml")
      2 feature_df = extractor.run()

File ~/AutoRadiomics/autorad/feature_extraction/extractor.py:41, in FeatureExtractor.__init__(self, dataset, feature_set, extraction_params, n_jobs)
     39 self.dataset = dataset
     40 self.feature_set = feature_set
---> 41 self.extraction_params = self._get_extraction_param_path(
     42     extraction_params
     43 )
     44 log.info(f"Using extraction params from {self.extraction_params}")
     45 self.n_jobs = set_n_jobs(n_jobs)

File ~/AutoRadiomics/autorad/feature_extraction/extractor.py:55, in FeatureExtractor._get_extraction_param_path(self, extraction_params)
     53     result = default_extraction_param_dir / extraction_params
     54 else:
---> 55     raise ValueError(
     56         f"Extraction parameter file {extraction_params} not found."
     57     )
     58 return result

ValueError: Extraction parameter file default_MR.yaml not found.

Fix

1: change from autorad.data.utils to from autorad.utils.preprocessing 2: change extractor = FeatureExtractor(image_dataset, extraction_params="default_MR.yaml") to extractor = FeatureExtractor(image_dataset, extraction_params="MR_default.yaml")

opened by wagon-master 1

Bugfix/refactor
New features:

log feature dataset and splits in MLFlow

update docs & add getting-started

Fixes:

fix evaluation in the web app

fix docs build in readthedocss
opened by pwoznicki 0
Support various readers (Nibabel, ITK)

Currently we use Nibabel for loading images. It works only for Nifti images, but a user may want to load a DICOM image, without converting it to Nifti.

Consider using MONAI LoadImage() function that provides a common interface for loading both Nifti and DICOM images.
enhancement

opened by pwoznicki 0

Releases(v0.2.2)

v0.2.2(Jul 30, 2022)

Includes fixes for the web application, fixed bugs in spatial util functions, and function for voxel-based extraction
Source code(tar.gz)
Source code(zip)

Owner

Piotr Woźnicki

Recently graduated medical doctor, working on medical image analysis.

GitHub Repository

meProp: Sparsified Back Propagation for Accelerated Deep Learning

meProp The codes were used for the paper meProp: Sparsified Back Propagation for Accelerated Deep Learning with Reduced Overfitting (ICML 2017) [pdf]

107 Nov 18, 2022

PyTorch implementations of Generative Adversarial Networks.

This repository has gone stale as I unfortunately do not have the time to maintain it anymore. If you would like to continue the development of it as

13.4k Jan 08, 2023

In this project we combine techniques from neural voice cloning and musical instrument synthesis to achieve good results from as little as 16 seconds of target data.

Neural Instrument Cloning In this project we combine techniques from neural voice cloning and musical instrument synthesis to achieve good results fro

127 Dec 23, 2022

Official Chainer implementation of GP-GAN: Towards Realistic High-Resolution Image Blending (ACMMM 2019, oral)

GP-GAN: Towards Realistic High-Resolution Image Blending (ACMMM 2019, oral) [Project] [Paper] [Demo] [Related Work: A2RL (for Auto Image Cropping)] [C

402 Dec 27, 2022

SeqFormer: a Frustratingly Simple Model for Video Instance Segmentation

SeqFormer: a Frustratingly Simple Model for Video Instance Segmentation SeqFormer SeqFormer: a Frustratingly Simple Model for Video Instance Segmentat

298 Dec 22, 2022

Tiny Kinetics-400 for test

Kinetics-400迷你数据集 English | 简体中文该数据集旨在解决的问题：参照Kinetics-400数据格式，训练基于自己数据的视频理解模型。数据集介绍 Kinetics-400是视频领域benchmark常用数据集，详细介绍可以参考其官方网站Kinetics。整个数据集包含40

38 Jan 06, 2023

For encoding a text longer than 512 tokens, for example 800. Set max_pos to 800 during both preprocessing and training.

LongScientificFormer For encoding a text longer than 512 tokens, for example 800. Set max_pos to 800 during both preprocessing and training. Some code

6 Nov 02, 2022

DeLiGAN - This project is an implementation of the Generative Adversarial Network

This project is an implementation of the Generative Adversarial Network proposed in our CVPR 2017 paper - DeLiGAN : Generative Adversarial Net

110 Sep 13, 2022

Traditional deepdream with VQGAN+CLIP and optical flow. Ready to use in Google Colab

VQGAN-CLIP-Video cat.mp4 policeman.mp4 schoolboy.mp4 forsenBOG.mp4

23 Oct 26, 2022

Generative Art Using Neural Visual Grammars and Dual Encoders

Generative Art Using Neural Visual Grammars and Dual Encoders Arnheim 1 The original algorithm from the paper Generative Art Using Neural Visual Gramm

231 Jan 05, 2023

Official PyTorch implementation of N-ImageNet: Towards Robust, Fine-Grained Object Recognition with Event Cameras (ICCV 2021)

N-ImageNet: Towards Robust, Fine-Grained Object Recognition with Event Cameras Official PyTorch implementation of N-ImageNet: Towards Robust, Fine-Gra

32 Dec 26, 2022

PyTorch implementation of "ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context" (INTERSPEECH 2020)

ContextNet ContextNet has CNN-RNN-transducer architecture and features a fully convolutional encoder that incorporates global context information into

24 Nov 24, 2022

The easiest tool for extracting radiomics features and training ML models on them.

Related tags

Overview

Simple pipeline for experimenting with radiomics features

Installation

Example - Hydronephrosis detection from CT images:

Extract radiomics features and save them to CSV table

Create a dataset from the features table

Select classifiers to compare

Create an evaluator to train and evaluate selected classifiers

Comments

Preprocessing features fails during machine learning

Describe the bug

Steps/Code to Reproduce

Expected Results

Actual Results

BUG: Time and memory inefficient concating in pandas on every case.

Feature/add inference mlflow

example_WORC.ipynb not being up to date with the repository

Describe the bug

Steps/Code to Reproduce

Expected Results

Actual Results

Fix

Bugfix/refactor

Support various readers (Nibabel, ITK)

Releases(v0.2.2)

v0.2.2(Jul 30, 2022)

Owner

Piotr Woźnicki

meProp: Sparsified Back Propagation for Accelerated Deep Learning

PyTorch implementations of Generative Adversarial Networks.

In this project we combine techniques from neural voice cloning and musical instrument synthesis to achieve good results from as little as 16 seconds of target data.

Official Chainer implementation of GP-GAN: Towards Realistic High-Resolution Image Blending (ACMMM 2019, oral)

SeqFormer: a Frustratingly Simple Model for Video Instance Segmentation

Tiny Kinetics-400 for test

For encoding a text longer than 512 tokens, for example 800. Set max_pos to 800 during both preprocessing and training.

DeLiGAN - This project is an implementation of the Generative Adversarial Network

Traditional deepdream with VQGAN+CLIP and optical flow. Ready to use in Google Colab

Generative Art Using Neural Visual Grammars and Dual Encoders

Official PyTorch implementation of N-ImageNet: Towards Robust, Fine-Grained Object Recognition with Event Cameras (ICCV 2021)

Open source person re-identification library in python

AttGAN: Facial Attribute Editing by Only Changing What You Want (IEEE TIP 2019)

Deep Q-network learning to play flappybird.

SeqTR: A Simple yet Universal Network for Visual Grounding

Bringing Computer Vision and Flutter together , to build an awesome app !!

Implementation of ResMLP, an all MLP solution to image classification, in Pytorch

Implementation of H-Transformer-1D, Hierarchical Attention for Sequence Learning using 🤗 transformers

Official PyTorch code of Holistic 3D Scene Understanding from a Single Image with Implicit Representation (CVPR 2021)

PyTorch implementation of "ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context" (INTERSPEECH 2020)