Active Learning demo using two small datasets

Last update: Nov 10, 2021

Related tags

Data Analysis ActiveLearningDemo

Overview

ActiveLearningDemo

How to run

step one

put the dataset folder and use command below to split the dataset to the required structure

run utils.py

For each dataset, six .mat documents should be included: TrainingMatrix.mat, TrainingLabels.mat, TestingMatrix.mat, TestingLabels.mat, UnlabeledMatrix.mat and UnlabeledLabels.mat.

step two

Train the model. You can set arguments:

Active learning

optional arguments:
  -h, --help            show this help message and exit
  --src SRC             dataset path
  --dst DST             destination path
  --type TYPE           sample strategy:random, entropy, combine
  --solver SOLVER       model solver
  --max_iter MAX_ITER   max iteration of each training
  --k K                 samele added for each iteration
  --n N                 number of iterations
  --plot_type PLOT_TYPE
                        plot single for one case(single) or plot average for
                        entire database(average)

You can utilize both one dataset with multiple subsets inside and one case of a dataset with only six .mat documents. By default, I used "newton-cg" solver and "combine" type which can train model with both strategies at once. To get results on different datasets directly, you can use:

python main.py --src your dataset path(./datasets/MMI) --dst output path(./img)

Result

MMI dataset

use "lbfgs" solver:

use "newton-cg" solver:

MindReading dataset

use "lbfgs" solver:

use "newton-cg" solver:

Active Learning demo using two small datasets

Related tags

Overview

ActiveLearningDemo

How to run

Result

Owner

4CAT: Capture and Analysis Toolkit

Statistical Rethinking: A Bayesian Course Using CmdStanPy and Plotnine

Implementation in Python of the reliability measures such as Omega.

We're Team Arson and we're using the power of predictive modeling to combat wildfires.

A crude Hy handle on Pandas library

Streamz helps you build pipelines to manage continuous streams of data

The Master's in Data Science Program run by the Faculty of Mathematics and Information Science

.npy, .npz, .mtx converter.

Conduits - A Declarative Pipelining Tool For Pandas

Evidence enables analysts to deliver a polished business intelligence system using SQL and markdown.

Yet Another Workflow Parser for SecurityHub

Sentiment analysis on streaming twitter data using Spark Structured Streaming & Python

a tool that compiles a csv of all h1 program stats

peptides.py is a pure-Python package to compute common descriptors for protein sequences

Performance analysis of predictive (alpha) stock factors

🧪 Panel-Chemistry - exploratory data analysis and build powerful data and viz tools within the domain of Chemistry using Python and HoloViz Panel.

Toolchest provides APIs for scientific and bioinformatic data analysis.

Binance Kline Data With Python

X-news - Pipeline data use scrapy, kafka, spark streaming, spark ML and elasticsearch, Kibana

Accurately separate the TLD from the registered domain and subdomains of a URL, using the Public Suffix List.