A transformer which can randomly augment VOC format dataset (both image and bbox) online.

Overview

VocAug

It is difficult to find a script which can augment VOC-format dataset, especially the bbox. Or find a script needs complex requirements so it is hard to use. Or, it is offline but not online so it needs very very large disk volume.

Here, is a simple transformer which can randomly augment VOC format dataset online! It can work with only numpy and cv2 packages!

The highlight is,

  1. it augments both image and b-box!!!
  2. it only use cv2 & numpy, means it could be used simply without any other awful packages!!!
  3. it is an online transformer!!!

It contains methods of:

  1. Random HSV augmentation
  2. Random Cropping augmentation
  3. Random Flipping augmentation
  4. Random Noise augmentation
  5. Random rotation or translation augmentation

All the methods can adjust abundant arguments in the constructed function of class VocAug.voc_aug.

Here are some visualized examples:

(click to enlarge)

e.g. #1 e.g. #2
eg1 eg2

More

This script was created when I was writing YOLOv1 object detectin algorithm for learning and entertainment. See more details at https://github.com/BestAnHongjun/YOLOv1-pytorch

Quick Start

1. Download this repo.

git clone https://github.com/BestAnHongjun/VOC-Augmentation.git

or you can download the zip file directly.

2. Enter project directory

cd VOC-Augmentation

3. Install the requirements

pip install -r requirements.txt

For some machines with mixed environments, you need to use pip3 but not pip.

Or you can install the requirements by hand. The default version is ok.

pip install numpy
pip install opencv-python
pip install opencv-contrib-python
pip install matplotlib

4.Create your own project directory

Create your own project directory, then copy the VocAug directory to yours. Or you can use this directory directly.

5. Create your own demo.py file

Or you can use my demo.py directly.

Thus, you should have a project directory with structure like this:

Project_Dir
  |- VocAug (dir)
  |- demo.py

Open your demo.py.

First, import some system packages.

import os
import matplotlib.pyplot as plt

Second, import my VocAug module in your project directory.

from VocAug.voc_aug import voc_aug
from VocAug.transform.voc2vdict import voc2vdict
from VocAug.utils.viz_bbox import viz_vdict

Third, Create two transformer.

voc2vdict_transformer = voc2vdict()
augmentation_transformer = voc_aug()

For the class voc2vdict, when you call its instance with args of xml_file_path and image_file_path, it can read the xml file and the image file and then convert them to VOC-format-dict, represented by vdict.

What is vdict? It is a python dict, which has a structure like:

vdict = {
    "image": numpy.array([[[....]]]),   # Cv2 image Mat. (Shape:[h, w, 3], RGB format)
    "filename": 000048,                 # filename without suffix
    "objects": [{                       # A list of dicts representing b-boxes
        "class_name": "house",
        "class_id": 2,                  # index of self.class_list
        "bbox": (x_min, y_min, x_max, y_max)
    }, {
        ...
    }]
}

For the class voc_aug, when you call its instance by args of vdict, it can augment both image and bbox of the vdict, then return a vdict augmented.

It will randomly use augmentation methods include:

  1. Random HSV augmentation
  2. Random Cropping augmentation
  3. Random Flipping augmentation
  4. Random Noise augmentation
  5. Random rotation or translation augmentation

Then, let's augment the vdict.

# prepare the xml-file-path and the image-file-path
filename = "000007"
file_dir = os.path.join(os.path.dirname(os.path.abspath(__file__)), "dataset")
xml_file_path = os.path.join(file_dir, "Annotations", "{}.xml".format(filename))
image_file_path = os.path.join(file_dir, "JPEGImages", "{}.jpg".format(filename))

# Firstly convert the VOC format xml&image path to VOC-dict(vdict), then augment it.
src_vdict = voc2vdict_transformer(xml_file_path, image_file_path)
image_aug_vdict = augmentation_transformer(src_vdict)

The 000007.jpg and 000007.xml is in the dataset directory under Annotations and JPEGImages separately.

Then you can visualize the vdict. I have prepare a tool for you. That is viz_vdict function in VocAug.utils.viz_bbox module. It will return you a cv2 image when you input a vdict into it.

You can use it like:

image_src = src_vdict.get("image")
image_src_with_bbox = viz_vdict(src_vdict)

image_aug = image_aug_vdict.get("image")
image_aug_with_bbox = viz_vdict(image_aug_vdict)

Visualize them by matplotlib.

plt.figure(figsize=(15, 10))
plt.subplot(2, 2, 1)
plt.title("src")
plt.imshow(image_src)
plt.subplot(2, 2, 3)
plt.title("src_bbox")
plt.imshow(image_src_with_bbox)
plt.subplot(2, 2, 2)
plt.title("aug")
plt.imshow(image_aug)
plt.subplot(2, 2, 4)
plt.title("aug_bbox")
plt.imshow(image_aug_with_bbox)
plt.show()

Then you will get a random result like this. eg1

For more detail see demo.py .

Detail of Algorithm

I am writing this part...

Owner
Coder.AN
Researcher, CoTAI Lab, Dalian Maritime University. Focus on Computer Vision, Moblie Vision, and Machine Learning. Contact me at
Coder.AN
Approaches to modeling terrain and maps in python

topography 🌎 Contains different approaches to modeling terrain and topographic-style maps in python Features Inverse Distance Weighting (IDW) A given

John Gutierrez 1 Aug 10, 2022
RoFormer_pytorch

PyTorch RoFormer 原版Tensorflow权重(https://github.com/ZhuiyiTechnology/roformer) chinese_roformer_L-12_H-768_A-12.zip (提取码:xy9x) 已经转化为PyTorch权重 chinese_r

yujun 283 Dec 12, 2022
Code for "My(o) Armband Leaks Passwords: An EMG and IMU Based Keylogging Side-Channel Attack" paper

Myo Keylogging This is the source code for our paper My(o) Armband Leaks Passwords: An EMG and IMU Based Keylogging Side-Channel Attack by Matthias Ga

Secure Mobile Networking Lab 7 Jan 03, 2023
DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting

DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting Created by Yongming Rao*, Wenliang Zhao*, Guangyi Chen, Yansong Tang, Zheng Z

Yongming Rao 321 Dec 27, 2022
Optimizaciones incrementales al problema N-Body con el fin de evaluar y comparar las prestaciones de los traductores de Python en el ámbito de HPC.

Python HPC Optimizaciones incrementales de N-Body (all-pairs) con el fin de evaluar y comparar las prestaciones de los traductores de Python en el ámb

Andrés Milla 12 Aug 04, 2022
Json2Xml tool will help you convert from json COCO format to VOC xml format in Object Detection Problem.

JSON 2 XML All codes assume running from root directory. Please update the sys path at the beginning of the codes before running. Over View Json2Xml t

Nguyễn Trường Lâu 6 Aug 22, 2022
Self-Guided Contrastive Learning for BERT Sentence Representations

Self-Guided Contrastive Learning for BERT Sentence Representations This repository is dedicated for releasing the implementation of the models utilize

Taeuk Kim 16 Dec 04, 2022
Repo for "TableParser: Automatic Table Parsing with Weak Supervision from Spreadsheets" at [email protected]

TableParser Repo for "TableParser: Automatic Table Parsing with Weak Supervision from Spreadsheets" at DS3 Lab 11 Dec 13, 2022

Code base for NeurIPS 2021 publication titled Kernel Functional Optimisation (KFO)

KernelFunctionalOptimisation Code base for NeurIPS 2021 publication titled Kernel Functional Optimisation (KFO) We have conducted all our experiments

2 Jun 29, 2022
An unofficial personal implementation of UM-Adapt, specifically to tackle joint estimation of panoptic segmentation and depth prediction for autonomous driving datasets.

Semisupervised Multitask Learning This repository is an unofficial and slightly modified implementation of UM-Adapt[1] using PyTorch. This code primar

Abhinav Atrishi 11 Nov 25, 2022
Code repository for the paper: Hierarchical Kinematic Probability Distributions for 3D Human Shape and Pose Estimation from Images in the Wild (ICCV 2021)

Hierarchical Kinematic Probability Distributions for 3D Human Shape and Pose Estimation from Images in the Wild Akash Sengupta, Ignas Budvytis, Robert

Akash Sengupta 149 Dec 14, 2022
Code for the upcoming CVPR 2021 paper

The Temporal Opportunist: Self-Supervised Multi-Frame Monocular Depth Jamie Watson, Oisin Mac Aodha, Victor Prisacariu, Gabriel J. Brostow and Michael

Niantic Labs 496 Dec 30, 2022
Code for the paper "Training GANs with Stronger Augmentations via Contrastive Discriminator" (ICLR 2021)

Training GANs with Stronger Augmentations via Contrastive Discriminator (ICLR 2021) This repository contains the code for reproducing the paper: Train

Jongheon Jeong 174 Dec 29, 2022
GPU implementation of $k$-Nearest Neighbors and Shared-Nearest Neighbors

GPU implementation of kNN and SNN GPU implementation of $k$-Nearest Neighbors and Shared-Nearest Neighbors Supported by numba cuda and faiss library E

Hyeon Jeon 7 Nov 23, 2022
MoveNet Single Pose on DepthAI

MoveNet Single Pose tracking on DepthAI Running Google MoveNet Single Pose models on DepthAI hardware (OAK-1, OAK-D,...). A convolutional neural netwo

64 Dec 29, 2022
The code repository for "RCNet: Reverse Feature Pyramid and Cross-scale Shift Network for Object Detection" (ACM MM'21)

RCNet: Reverse Feature Pyramid and Cross-scale Shift Network for Object Detection (ACM MM'21) By Zhuofan Zong, Qianggang Cao, Biao Leng Introduction F

TempleX 9 Jul 30, 2022
AtlasNet: A Papier-Mâché Approach to Learning 3D Surface Generation

AtlasNet [Project Page] [Paper] [Talk] AtlasNet: A Papier-Mâché Approach to Learning 3D Surface Generation Thibault Groueix, Matthew Fisher, Vladimir

577 Dec 17, 2022
HandFoldingNet ✌️ : A 3D Hand Pose Estimation Network Using Multiscale-Feature Guided Folding of a 2D Hand Skeleton

HandFoldingNet ✌️ : A 3D Hand Pose Estimation Network Using Multiscale-Feature Guided Folding of a 2D Hand Skeleton Wencan Cheng, Jae Hyun Park, Jong

cwc1260 23 Oct 21, 2022
Learning to Initialize Neural Networks for Stable and Efficient Training

GradInit This repository hosts the code for experiments in the paper, GradInit: Learning to Initialize Neural Networks for Stable and Efficient Traini

Chen Zhu 124 Dec 30, 2022
Vanilla and Prototypical Networks with Random Weights for image classification on Omniglot and mini-ImageNet. Made with Python3.

vanilla-rw-protonets-project Vanilla Prototypical Networks and PNs with Random Weights for image classification on Omniglot and mini-ImageNet. Made wi

Giovani Candido 8 Aug 31, 2022