Data Competition: automated systems that can detect whether people are not wearing masks or are wearing masks incorrectly

Overview

Table of contents

  1. Introduction
  2. Dataset
  3. Model & Metrics
  4. How to Run

DATA COMPETITION

The COVID-19 pandemic, which is caused by the SARS-CoV-2 virus, is still continuing strong, infecting hundreds of millions of people and killing millions. Face masks reduce transmission by preventing aerosols and droplets from spreading too far into the atmosphere. As a result, there is a growing demand for automated systems that can detect whether people are not wearing masks or are wearing masks incorrectly. This competition was designed in order to solve the problem mentioned above. This competition is unlike any other that has come before it. With a fixed model, participants will receive model code and configuration code that organizers use to train models. The candidate's task is to use data processing and generation techniques to improve the model's performance, then submit the dataset to the organizing team for training and evaluation on the private test set. The winner is the team with the highest score on the private test set.

Dataset

  • A dataset of 1100 images will be sent to you. This is an object detection dataset consisting of employee images at the office. The dataset has been assigned 3 labels by us which are no mask, mask, and incorrect mask, with the numbers 0,1,2 corresponding to each.

  • The dataset has been divided into three parts for you: train, valid, and public test. We have prepared a private test to be able to evaluate the candidate's model. This private test will be made public after the contest ends. In the public test, you can get a basic idea of the private test. Download the dataset here

  • To improve the model's performance, you can re-label it and employ data augmentation to generate more images (up to 3000).

The number of each label in each part is shown below:

No mask Mask incorrect mask
Train 308 882 51
Val 97 190 9
Public_test 47 95 13

Model & Metrics

  • The challenge is defined as object detection challenge. In the competition, We use YOLOv5s and also use a pre-trained model trained with easy mask dataset to greatly reduce training time.

  • We fix all hyperparameters of the model and do not use any augmentation tips in the source code. Therefore, each participant need to build the best possible dataset by relabeling incorrect labels, splitting train/val, augmentation tips, adding new dataset, etc.

  • In training process, Early Stopping method with patience setten to 100 iterations is used to keep track of validation set's [email protected]. Detail about [email protected] metric:

[email protected] = [email protected] = 0.2 * AP50_w + 0.3 * AP50_nw + 0.5 * AP50_wi

Where,
AP50_w: AP50 on valid mask boxes
AP50_nw: AP50 on non-mask boxes
AP50_wi: AP50 on invalid mask boxes

  • The [email protected] metric is also used as the main metric to evaluate participant's submission on private testing set.

How to Run

QuickStart

Click the image below

Open In Colab

Install requirements

  • All requirements are included in requirements.txt

  • Run the script below to clone and install all requirements

git clone https://github.com/fsoft-ailab/Data-Competition
cd Data-Competition
pip3 install -r requirements.txt

Training

  • Put your dataset into the Data-Competition folder. The structure of dataset folder is followed as folder structure below:
folder-name
├── images
│   ├── train
│   │   ├── train_img1.jpg
│   │   ├── train_img2.jpg
│   │   └── ...
│   │   
│   └── val
│       ├── val_img1.jpg
│       ├── val_img2.jpg
│       └── ...
│   
└── labels
    ├── train
    │   ├── train_img1.txt
    │   ├── train_img2.txt
    │   └── ...
    │   
    └── val
        ├── val_img1.txt
        ├── val_img2.txt
        └── ...
  • Change relative paths to train and val images folder in config/data_cfg.yaml file

  • train_cfg.yaml where we set up the model during training. You should not change such hyperparameters because it will result in incorrect results. The training results are saved in the results/train/ .

  • Run the script below to train the model. Specify particular name to identify your experiment:

python3 train.py --batch-size 64 --device 0 --name 
    

   

Note: If you get out of memory error, you can decrease batch-size to multiple of 2 as 32, 16.

Evaluation

  • Run script below to evaluate on particular dataset.
  • The --task's value is only one of train, val, or test, respectively evaluating on the training set, validation set, or public testing set.
  • Note: Specify relative path to images folder which you evaluate in config/data_cfg.yaml file.
python3 val.py --weights 
   
     --task test --name 
    
      --batch-size 64 --device 0
                                                 val
                                                 train

    
   
  • Results are saved at results/evaluate/ / .

Detection

  • You can use this script to make inferences on particular folder

  • Results are saved at .

python3 detect.py --weights 
   
     --source 
    
      --dir 
     
       --device 0

     
    
   
  • You can find more default arguments at detect.py

References

Owner
Thanh Dat Vu
Thanh Dat Vu
Predictive Modeling & Analytics on Home Equity Line of Credit

Predictive Modeling & Analytics on Home Equity Line of Credit Data (Python) HMEQ Data Set In this assignment we will use Python to examine a data set

Dhaval Patel 1 Jan 09, 2022
PrimaryBid - Transform application Lifecycle Data and Design and ETL pipeline architecture for ingesting data from multiple sources to redshift

Transform application Lifecycle Data and Design and ETL pipeline architecture for ingesting data from multiple sources to redshift This project is composed of two parts: Part1 and Part2

Emmanuel Boateng Sifah 1 Jan 19, 2022
Data collection, enhancement, and metrics calculation.

l3_data_collection Data collection, enhancement, and metrics calculation. Summary Repository containing code for QuantDAO's JDT data collection task.

Ruiwyn 3 Dec 23, 2022
Semi-Automated Data Processing

Perform semi automated exploratory data analysis, feature engineering and feature selection on provided dataset by visualizing every possibilities on each step and assisting the user to make a meanin

Arun Singh Babal 1 Jan 17, 2022
BasstatPL is a package for performing different tabulations and calculations for descriptive statistics.

BasstatPL is a package for performing different tabulations and calculations for descriptive statistics. It provides: Frequency table constr

Angel Chavez 1 Oct 31, 2021
Created covid data pipeline using PySpark and MySQL that collected data stream from API and do some processing and store it into MYSQL database.

Created covid data pipeline using PySpark and MySQL that collected data stream from API and do some processing and store it into MYSQL database.

2 Nov 20, 2021
Automatic earthquake catalog building workflow: EQTransformer + Siamese EQTransformer + PickNet + REAL + HypoInverse

Automatic regional-scale earthquake catalog building workflow: EQTransformer + Siamese EQTransforme

Xiao Zhuowei 9 Nov 27, 2022
WithPipe is a simple utility for functional piping in Python.

A utility for functional piping in Python that allows you to access any function in any scope as a partial.

Michael Milton 1 Oct 26, 2021
An extension to pandas dataframes describe function.

pandas_summary An extension to pandas dataframes describe function. The module contains DataFrameSummary object that extend describe() with: propertie

Mourad 450 Dec 30, 2022
Open source platform for Data Science Management automation

Hydrosphere examples This repo contains demo scenarios and pre-trained models to show Hydrosphere capabilities. Data and artifacts management Some mod

hydrosphere.io 6 Aug 10, 2021
Flenser is a simple, minimal, automated exploratory data analysis tool.

Flenser Have you ever been handed a dataset you've never seen before? Flenser is a simple, minimal, automated exploratory data analysis tool. It runs

John McCambridge 79 Sep 20, 2022
wikirepo is a Python package that provides a framework to easily source and leverage standardized Wikidata information

Python based Wikidata framework for easy dataframe extraction wikirepo is a Python package that provides a framework to easily source and leverage sta

Andrew Tavis McAllister 35 Jan 04, 2023
CINECA molecular dynamics tutorial set

High Performance Molecular Dynamics Logging into CINECA's computer systems To logon to the M100 system use the following command from an SSH client ss

J. W. Dell 0 Mar 13, 2022
Analysiscsv.py for extracting analysis and exporting as CSV

wcc_analysis Lichess page documentation: https://lichess.org/page/world-championships Each WCC has a study, studies are fetched using: https://lichess

32 Apr 25, 2022
Building house price data pipelines with Apache Beam and Spark on GCP

This project contains the process from building a web crawler to extract the raw data of house price to create ETL pipelines using Google Could Platform services.

1 Nov 22, 2021
This mini project showcase how to build and debug Apache Spark application using Python

Spark app can't be debugged using normal procedure. This mini project showcase how to build and debug Apache Spark application using Python programming language. There are also options to run Spark a

Denny Imanuel 1 Dec 29, 2021
AWS Glue ETL Code Samples

AWS Glue ETL Code Samples This repository has samples that demonstrate various aspects of the new AWS Glue service, as well as various AWS Glue utilit

AWS Samples 1.2k Jan 03, 2023
Hydrogen (or other pure gas phase species) depressurization calculations

HydDown Hydrogen (or other pure gas phase species) depressurization calculations This code is published under an MIT license. Install as simple as: pi

Anders Andreasen 13 Nov 26, 2022
General Assembly's 2015 Data Science course in Washington, DC

DAT8 Course Repository Course materials for General Assembly's Data Science course in Washington, DC (8/18/15 - 10/29/15). Instructor: Kevin Markham (

Kevin Markham 1.6k Jan 07, 2023
Desafio proposto pela IGTI em seu bootcamp de Cloud Data Engineer

Desafio Modulo 4 - Cloud Data Engineer Bootcamp - IGTI Objetivos Criar infraestrutura como código Utuilizando um cluster Kubernetes na Azure Ingestão

Otacilio Filho 4 Jan 23, 2022