Active Learning demo using two small datasets

Last update: Nov 10, 2021

Related tags

Data Analysis ActiveLearningDemo

Overview

ActiveLearningDemo

How to run

step one

put the dataset folder and use command below to split the dataset to the required structure

run utils.py

For each dataset, six .mat documents should be included: TrainingMatrix.mat, TrainingLabels.mat, TestingMatrix.mat, TestingLabels.mat, UnlabeledMatrix.mat and UnlabeledLabels.mat.

step two

Train the model. You can set arguments:

Active learning

optional arguments:
  -h, --help            show this help message and exit
  --src SRC             dataset path
  --dst DST             destination path
  --type TYPE           sample strategy:random, entropy, combine
  --solver SOLVER       model solver
  --max_iter MAX_ITER   max iteration of each training
  --k K                 samele added for each iteration
  --n N                 number of iterations
  --plot_type PLOT_TYPE
                        plot single for one case(single) or plot average for
                        entire database(average)

You can utilize both one dataset with multiple subsets inside and one case of a dataset with only six .mat documents. By default, I used "newton-cg" solver and "combine" type which can train model with both strategies at once. To get results on different datasets directly, you can use:

python main.py --src your dataset path(./datasets/MMI) --dst output path(./img)

Result

MMI dataset

use "lbfgs" solver:

use "newton-cg" solver:

MindReading dataset

use "lbfgs" solver:

use "newton-cg" solver:

Active Learning demo using two small datasets

Related tags

Overview

ActiveLearningDemo

How to run

Result

Owner

A Python package for the mathematical modeling of infectious diseases via compartmental models

The official repository for ROOT: analyzing, storing and visualizing big data, scientifically

Open-source Laplacian Eigenmaps for dimensionality reduction of large data in python.

Mining the Stack Overflow Developer Survey

follow-analyzer helps GitHub users analyze their following and followers relationship

Python library for creating data pipelines with chain functional programming

Making the DAEN information accessible.

ASOUL直播间弹幕抓取&&数据分析

This tool parses log data and allows to define analysis pipelines for anomaly detection.

Data cleaning tools for Business analysis

Created covid data pipeline using PySpark and MySQL that collected data stream from API and do some processing and store it into MYSQL database.

small package with utility functions for analyzing (fly) calcium imaging data

Statistical package in Python based on Pandas

Statistical Rethinking course winter 2022

Ejercicios Panda usando Pandas

Pipeline and Dataset helpers for complex algorithm evaluation.

Extract Thailand COVID-19 Cluster data from daily briefing pdf.

wikirepo is a Python package that provides a framework to easily source and leverage standardized Wikidata information

Desafio proposto pela IGTI em seu bootcamp de Cloud Data Engineer

pyETT: Python library for Eleven VR Table Tennis data