Fuzzy Overclustering (FOC)

Overview

Fuzzy Overclustering (FOC)

In real-world datasets, we need consistent annotations between annotators to give a certain ground-truth label. However, in many applications these consistent annotations can not be given due to issues like intra- and interobserver variability. We call these inconsistent label fuzzy. Our method Fuzzy Overclustering overclusters the data and can therefore handle these fuzzy labels better than Out-of-the-Box Semi-Supervised Methods.

More details are given in the accpeted full paper at https://doi.org/10.3390/s21196661 or in the preprint https://arxiv.org/abs/2012.01768

The main idea is illustrated below. The graphic and caption are taken from the original work.

main idea of paper

Illustration of fuzzy data and overclustering -- The grey dots represent unlabeled data and the colored dots labeled data from different classes. The dashed lines represent decision boundaries. For certain data, a clear separation of the different classes with one decision boundary is possible and both classes contain the same amount of data (top). For fuzzy data determining a decision boundary is difficult because of intermediate datapoints between the classes (middle). These fuzzy datapoints can often not be easily sorted into one consistent class between annotators. If you overcluster the data, you get smaller but more consistent substructures in the fuzzy data (bottom). The images illustrate possible examples for \certain data (cat & dog) and \fuzzy plankton data (trichodesmium puff and tuft). The center plankton image was considered to be trichodesmium puff or tuft by around half of the annotators each. The left and right plankton image were consistently annotated as their respective class.

Installation

We advise to use docker for the experiments. We recommend a python3 container with tesnorflow 1.14 preinstalled. Additionally the following commands need to be executed:

apt-get update
apt-get install -y libsm6 libxext6 libxrender-dev libgl1-mesa-glx

After this ensure that the requirements from requirements.txt are installed. The most important packages are keras, scipy and opencv.

Usage

The parameters are given in arguments.yaml with their description. Most of the parameters can be left at the default value. Especially the dataset, batch size and epoch related parameters are imported.

As a rule of thumb the following should be applied:

  • overcluster_k = 5-6 * the number of classes
  • batch_size = repetition * overcluster_k * 2-3

You need to define three directories for the execution with docker:

  • DATASET_ROOT, this folder contains a folder with the dataset name. This folder contains a trainand val folder. It needs a folder unlabeled if the parameter unlabeled_data is used. Each folder contains subfolder with the given classes.
  • LOG_ROOT, inside a subdiretory logs all experimental results will be stored with regard to the given IDs and a time stamp
  • SRC_ROOT root of the this project source code

The DOCKER_IMAGE is the above defined image.

You can visualize the results with tensorboard --logdir . from inside the log_dir

Example Usages

bash % test pipeline running docker run -it --rm -v :/data-ssd -v :/data1 -v :/src -w="/src" python main.py --IDs foc experiment_name not_use_mi --dataset [email protected] --unlabeled_data [email protected] --frozen_batch_size 130 --batch_size 130 --overcluster_k 60 --num_gpus 1 --normal_epoch 2 --frozen_epoch 1 % training FOC-Light docker run -it --rm -v :/data-ssd -v :/data1 -v :/home -w="/home" python main.py --experiment_identifiers foc experiment_name not_use_mi --dataset stl10 --frozen_batch_size 130 --batch_size 130 --overcluster_k 60 --num_gpus 1 % training FOC (no warmup) % needs multiple GPUs or very large ones (change num gpu to 1 in this case) docker run -it --rm -v :/data-ssd -v :/data1 -v :/home -w="/home" python main.py --experiment_identifiers foc experiment_name not_use_mi --dataset stl10 --frozen_batch_size 390 --batch_size 390 --overcluster_k 60 --num_gpus 3 --lambda_m 1 --sample_repetition 3 ">
% test container
docker run -it --rm -v 
               
                :/data-ssd -v 
                
                 :/data1   -v 
                 
                  :/src -w="/src" 
                  
                    bash


% test pipeline running
docker run -it --rm -v 
                   
                    :/data-ssd -v 
                    
                     :/data1 -v 
                     
                      :/src -w="/src" 
                      
                        python main.py --IDs foc experiment_name not_use_mi --dataset [email protected] --unlabeled_data [email protected] --frozen_batch_size 130 --batch_size 130 --overcluster_k 60 --num_gpus 1 --normal_epoch 2 --frozen_epoch 1 % training FOC-Light docker run -it --rm -v 
                       
                        :/data-ssd -v 
                        
                         :/data1 -v 
                         
                          :/home -w="/home" 
                          
                            python main.py --experiment_identifiers foc experiment_name not_use_mi --dataset stl10 --frozen_batch_size 130 --batch_size 130 --overcluster_k 60 --num_gpus 1 % training FOC (no warmup) % needs multiple GPUs or very large ones (change num gpu to 1 in this case) docker run -it --rm -v 
                           
                            :/data-ssd -v 
                            
                             :/data1 -v 
                             
                              :/home -w="/home" 
                              
                                python main.py --experiment_identifiers foc experiment_name not_use_mi --dataset stl10 --frozen_batch_size 390 --batch_size 390 --overcluster_k 60 --num_gpus 3 --lambda_m 1 --sample_repetition 3 
                              
                             
                            
                           
                          
                         
                        
                       
                      
                     
                    
                   
                  
                 
                
               
Pytorch implementation code for [Neural Architecture Search for Spiking Neural Networks]

Neural Architecture Search for Spiking Neural Networks Pytorch implementation code for [Neural Architecture Search for Spiking Neural Networks] (https

Intelligent Computing Lab at Yale University 28 Nov 18, 2022
This is the official implementation of our proposed SwinMR

SwinMR This is the official implementation of our proposed SwinMR: Swin Transformer for Fast MRI Please cite: @article{huang2022swin, title={Swi

A Yang Lab (led by Dr Guang Yang) 27 Nov 17, 2022
YoloAll is a collection of yolo all versions. you you use YoloAll to test yolov3/yolov5/yolox/yolo_fastest

官方讨论群 QQ群:552703875 微信群:15158106211(先加作者微信,再邀请入群) YoloAll项目简介 YoloAll是一个将当前主流Yolo版本集成到同一个UI界面下的推理预测工具。可以迅速切换不同的yolo版本,并且可以针对图片,视频,摄像头码流进行实时推理,可以很方便,直观

DL-Practise 244 Jan 01, 2023
Official repository for the CVPR 2021 paper "Learning Feature Aggregation for Deep 3D Morphable Models"

Deep3DMM Official repository for the CVPR 2021 paper Learning Feature Aggregation for Deep 3D Morphable Models. Requirements This code is tested on Py

38 Dec 27, 2022
Code repository for "Reducing Underflow in Mixed Precision Training by Gradient Scaling" presented at IJCAI '20

Reducing Underflow in Mixed Precision Training by Gradient Scaling This project implements the gradient scaling method to improve the performance of m

Ruizhe Zhao 5 Apr 14, 2022
Unofficial implementation of Google "CutPaste: Self-Supervised Learning for Anomaly Detection and Localization" in PyTorch

CutPaste CutPaste: image from paper Unofficial implementation of Google's "CutPaste: Self-Supervised Learning for Anomaly Detection and Localization"

Lilit Yolyan 59 Nov 27, 2022
Official pytorch code for SSAT: A Symmetric Semantic-Aware Transformer Network for Makeup Transfer and Removal

SSAT: A Symmetric Semantic-Aware Transformer Network for Makeup Transfer and Removal This is the official pytorch code for SSAT: A Symmetric Semantic-

ForeverPupil 57 Dec 13, 2022
Segment axon and myelin from microscopy data using deep learning

Segment axon and myelin from microscopy data using deep learning. Written in Python. Using the TensorFlow framework. Based on a convolutional neural network architecture. Pixels are classified as eit

NeuroPoly 103 Nov 29, 2022
Automated Attendance Project Using Face Recognition

dependencies for project: cmake 3.22.1 dlib 19.22.1 face-recognition 1.3.0 openc

Rohail Taha 1 Jan 09, 2022
OpenMMLab Detection Toolbox and Benchmark

MMDetection is an open source object detection toolbox based on PyTorch. It is a part of the OpenMMLab project.

OpenMMLab 22.5k Jan 05, 2023
This repository contains FEDOT - an open-source framework for automated modeling and machine learning (AutoML)

package tests docs license stats support This repository contains FEDOT - an open-source framework for automated modeling and machine learning (AutoML

National Center for Cognitive Research of ITMO University 482 Dec 26, 2022
A PyTorch implementation of DenseNet.

A PyTorch Implementation of DenseNet This is a PyTorch implementation of the DenseNet-BC architecture as described in the paper Densely Connected Conv

Brandon Amos 771 Dec 15, 2022
Dictionary Learning with Uniform Sparse Representations for Anomaly Detection

Dictionary Learning with Uniform Sparse Representations for Anomaly Detection Implementation of the Uniform DL Representation for AD algorithm describ

Paul Irofti 1 Nov 23, 2022
A script helps the user to update Linux and Mac systems through the terminal

Description This script helps the user to update Linux and Mac systems through the terminal. All the user has to install some requirements and then ru

Roxcoder 2 Jan 23, 2022
Efficient 3D human pose estimation in video using 2D keypoint trajectories

3D human pose estimation in video with temporal convolutions and semi-supervised training This is the implementation of the approach described in the

Meta Research 3.1k Dec 29, 2022
Ascend your Jupyter Notebook usage

Jupyter Ascending Sync Jupyter Notebooks from any editor About Jupyter Ascending lets you edit Jupyter notebooks from your favorite editor, then insta

Untitled AI 254 Jan 08, 2023
GPU-accelerated Image Processing library using OpenCL

pyclesperanto pyclesperanto is a python package for clEsperanto - a multi-language framework for GPU-accelerated image processing. clEsperanto uses Op

17 Dec 25, 2022
CLDF dataset derived from Robbeets et al.'s "Triangulation Supports Agricultural Spread" from 2021

CLDF dataset derived from Robbeets et al.'s "Triangulation Supports Agricultural Spread" from 2021 How to cite If you use these data please cite the o

Digital Linguistics 2 Dec 20, 2021
ScriptProfilerPy - Module to visualize where your python script is slow

ScriptProfiler helps you track where your code is slow It provides: Code lines t

Lucas BLP 3 Jun 02, 2022
Monk is a low code Deep Learning tool and a unified wrapper for Computer Vision.

Monk - A computer vision toolkit for everyone Why use Monk Issue: Want to begin learning computer vision Solution: Start with Monk's hands-on study ro

Tessellate Imaging 507 Dec 04, 2022