Fast image augmentation library and easy to use wrapper around other libraries. Documentation: https://albumentations.ai/docs/ Paper about library: https://www.mdpi.com/2078-2489/11/2/125

Overview

Albumentations

PyPI version CI

Albumentations is a Python library for image augmentation. Image augmentation is used in deep learning and computer vision tasks to increase the quality of trained models. The purpose of image augmentation is to create new training samples from the existing data.

Here is an example of how you can apply some augmentations from Albumentations to create new images from the original one: parrot

Why Albumentations

  • Albumentations supports all common computer vision tasks such as classification, semantic segmentation, instance segmentation, object detection, and pose estimation.
  • The library provides a simple unified API to work with all data types: images (RBG-images, grayscale images, multispectral images), segmentation masks, bounding boxes, and keypoints.
  • The library contains more than 70 different augmentations to generate new training samples from the existing data.
  • Albumentations is fast. We benchmark each new release to ensure that augmentations provide maximum speed.
  • It works with popular deep learning frameworks such as PyTorch and TensorFlow. By the way, Albumentations is a part of the PyTorch ecosystem.
  • Written by experts. The authors have experience both working on production computer vision systems and participating in competitive machine learning. Many core team members are Kaggle Masters and Grandmasters.
  • The library is widely used in industry, deep learning research, machine learning competitions, and open source projects.

Table of contents

Authors

Alexander Buslaev — Computer Vision Engineer at Mapbox | Kaggle Master

Alex Parinov — Computer Vision Architect at X5 Retail Group | Kaggle Master

Vladimir I. Iglovikov — Senior Computer Vision Engineer at Lyft Level5 | Kaggle Grandmaster

Evegene Khvedchenya — AI/ML Advisor and Independent researcher | Kaggle Master

Mikhail Druzhinin — Computer Vision Engineer at Simicon | Kaggle Expert

Installation

Albumentations requires Python 3.6 or higher. To install the latest version from PyPI:

pip install -U albumentations

Other installation options are described in the documentation.

Documentation

The full documentation is available at https://albumentations.ai/docs/.

A simple example

import albumentations as A
import cv2

# Declare an augmentation pipeline
transform = A.Compose([
    A.RandomCrop(width=256, height=256),
    A.HorizontalFlip(p=0.5),
    A.RandomBrightnessContrast(p=0.2),
])

# Read an image with OpenCV and convert it to the RGB colorspace
image = cv2.imread("image.jpg")
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

# Augment an image
transformed = transform(image=image)
transformed_image = transformed["image"]

Getting started

I am new to image augmentation

Please start with the introduction articles about why image augmentation is important and how it helps to build better models.

I want to use Albumentations for the specific task such as classification or segmentation

If you want to use Albumentations for a specific task such as classification, segmentation, or object detection, refer to the set of articles that has an in-depth description of this task. We also have a list of examples on applying Albumentations for different use cases.

I want to know how to use Albumentations with deep learning frameworks

We have examples of using Albumentations along with PyTorch and TensorFlow.

I want to explore augmentations and see Albumentations in action

Check the online demo of the library. With it, you can apply augmentations to different images and see the result. Also, we have a list of all available augmentations and their targets.

Who is using Albumentations

See also:

List of augmentations

Pixel-level transforms

Pixel-level transforms will change just an input image and will leave any additional targets such as masks, bounding boxes, and keypoints unchanged. The list of pixel-level transforms:

Spatial-level transforms

Spatial-level transforms will simultaneously change both an input image as well as additional targets such as masks, bounding boxes, and keypoints. The following table shows which additional targets are supported by each transform.

Transform Image Masks BBoxes Keypoints
CenterCrop
CoarseDropout
Crop
CropAndPad
CropNonEmptyMaskIfExists
ElasticTransform
Flip
GridDistortion
GridDropout
HorizontalFlip
IAAAffine
IAAPiecewiseAffine
Lambda
LongestMaxSize
MaskDropout
NoOp
OpticalDistortion
PadIfNeeded
Perspective
RandomCrop
RandomCropNearBBox
RandomGridShuffle
RandomResizedCrop
RandomRotate90
RandomScale
RandomSizedBBoxSafeCrop
RandomSizedCrop
Resize
Rotate
ShiftScaleRotate
SmallestMaxSize
Transpose
VerticalFlip

A few more examples of augmentations

Semantic segmentation on the Inria dataset

inria

Medical imaging

medical

Object detection and semantic segmentation on the Mapillary Vistas dataset

vistas

Keypoints augmentation

Benchmarking results

To run the benchmark yourself, follow the instructions in benchmark/README.md

Results for running the benchmark on the first 2000 images from the ImageNet validation set using an Intel Xeon Gold 6140 CPU. All outputs are converted to a contiguous NumPy array with the np.uint8 data type. The table shows how many images per second can be processed on a single core; higher is better.

albumentations
0.5.0
imgaug
0.4.0
torchvision (Pillow-SIMD backend)
0.7.0
keras
2.4.3
augmentor
0.2.8
solt
0.1.9
HorizontalFlip 9909 2821 2267 873 2301 6223
VerticalFlip 4374 2218 1952 4339 1968 3562
Rotate 371 296 163 27 60 345
ShiftScaleRotate 635 437 147 28 - -
Brightness 2751 1178 419 229 418 2300
Contrast 2756 1213 352 - 348 2305
BrightnessContrast 2738 699 195 - 193 1179
ShiftRGB 2757 1176 - 348 - -
ShiftHSV 597 284 58 - - 137
Gamma 2844 - 382 - - 946
Grayscale 5159 428 709 - 1064 1273
RandomCrop64 175886 3018 52103 - 41774 20732
PadToSize512 3418 - 574 - - 2874
Resize512 1003 634 1036 - 1016 977
RandomSizedCrop_64_512 3191 939 1594 - 1529 2563
Posterize 2778 - - - - -
Solarize 2762 - - - - -
Equalize 644 413 - - 735 -
Multiply 2727 1248 - - - -
MultiplyElementwise 118 209 - - - -
ColorJitter 368 78 57 - - -

Python and library versions: Python 3.8.6 (default, Oct 13 2020, 20:37:26) [GCC 8.3.0], numpy 1.19.2, pillow-simd 7.0.0.post3, opencv-python 4.4.0.44, scikit-image 0.17.2, scipy 1.5.2.

Contributing

To create a pull request to the repository, follow the documentation at https://albumentations.ai/docs/contributing/

Comments

In some systems, in the multiple GPU regime, PyTorch may deadlock the DataLoader if OpenCV was compiled with OpenCL optimizations. Adding the following two lines before the library import may help. For more details https://github.com/pytorch/pytorch/issues/1355

cv2.setNumThreads(0)
cv2.ocl.setUseOpenCL(False)

Citing

If you find this library useful for your research, please consider citing Albumentations: Fast and Flexible Image Augmentations:

@Article{info11020125,
    AUTHOR = {Buslaev, Alexander and Iglovikov, Vladimir I. and Khvedchenya, Eugene and Parinov, Alex and Druzhinin, Mikhail and Kalinin, Alexandr A.},
    TITLE = {Albumentations: Fast and Flexible Image Augmentations},
    JOURNAL = {Information},
    VOLUME = {11},
    YEAR = {2020},
    NUMBER = {2},
    ARTICLE-NUMBER = {125},
    URL = {https://www.mdpi.com/2078-2489/11/2/125},
    ISSN = {2078-2489},
    DOI = {10.3390/info11020125}
}
Comments
  • [TensorFlow] Failed to get reproducible trainings with albumentations included to the data pipeline

    [TensorFlow] Failed to get reproducible trainings with albumentations included to the data pipeline

    🐛 Bug

    I could not get my training work in reproducible way when albumentations added to the data pipeline. I followed this thread https://github.com/albumentations-team/albumentations/issues/93 and fixed all possible seeds, so in overall my snippet that should have enabled reproducible experiments looks like this:

    import os
    import random
    
    import numpy as np
    import tensorflow as tf
    
    def set_random_seed(seed: int = 42):
        """
        Globally fix all possible sources of randomness to keep experiment reproducible 
        """
        random.seed(seed)
        np.random.seed(seed)
        tf.random.set_seed(seed)
        os.environ['PYTHONHASHSEED'] = str(seed)
        os.environ['TF_DETERMINISTIC_OPS'] = '1'
        os.environ['TF_CUDNN_DETERMINISTIC'] = '1'
    

    Unfortunately, this doesn't help me to get reproducible results. I have executed training process 6 times and got all different results. You can also see the whole picture in W&B:

    • https://wandb.ai/roma-glushko/rock-paper-scissors/runs/2bdgnbwx (best_val_acc: 0.7104, best_epoch: 3)
    • https://wandb.ai/roma-glushko/rock-paper-scissors/runs/2qo9pbls (best_val_acc: 0.7875, best_epoch: 8)
    • https://wandb.ai/roma-glushko/rock-paper-scissors/runs/uf6cknge (best_val_acc: 0.6771, best_epoch: 8)
    • https://wandb.ai/roma-glushko/rock-paper-scissors/runs/tem3umbx (best_val_acc: 0.7729, best_epoch: 6)
    • https://wandb.ai/roma-glushko/rock-paper-scissors/runs/czsjm7px (best_val_acc: 0.7208, best_epochs: 0 and 8)
    • https://wandb.ai/roma-glushko/rock-paper-scissors/runs/29dif98z (best_val_acc: 0.8, best_epoch: 9)
    Screenshot 2021-05-23 at 12 29 29
    • Mean: 0.74478
    • Std: 0.044726

    Also, I tried to set random.seed() right before passing my batch into a.Compose() pipeline. That did not really help.

    However, when I comment out albumentations from my data pipeline or replace it with some pure TF augmentations, I can get my training reproducible.

    Any clues what's wrong here?

    To Reproduce

    Steps to reproduce the behavior:

    1. Clone the project state at 0.1.0-bugrep tag:
    git clone --depth 1 --branch 0.1.0-bugrep https://github.com/roma-glushko/rock-paper-scissor
    
    1. Pull dataset:
    cd data
    kaggle datasets download --unzip frtgnn/rock-paper-scissor
    
    1. Install project deps:
    poetry install
    
    1. Uncomment any of the reported augmentations in the config file (they are all commented out in the git): https://github.com/roma-glushko/rock-paper-scissor/blob/master/configs/basic_config.py

    2. Run training a couple of times and you get results that differs by a lot:

    python train.py
    

    Expected behavior

    In order to do experiments that analyze impact of different ideas and changes, I would like to see my training process reproducible.

    Environment

    • Albumentations version (e.g., 0.1.8): 0.5.2
    • Python version (e.g., 3.7): 3.8.6
    • OS (e.g., Linux): Ubuntu 20.10
    • How you installed albumentations (conda, pip, source): poetry (pip-like)
    • tensorflow-gpu: 2.5.0 (for the sake of compatibility with RTX3070 (ampere arch.))

    Additional context

    This report is reproduced in a project that is also mentioned in https://github.com/albumentations-team/albumentations/issues/905

    The data pipeline is the same for both issues:

    def augment_image(inputs, labels, augmentation_pipeline: a.Compose):
        def apply_augmentation(images):
            aug_data = augmentation_pipeline(image=images.astype('uint8'))
            return aug_data['image']
    
        inputs = tf.numpy_function(func=apply_augmentation, inp=[inputs], Tout=tf.uint8)
    
        return inputs, labels
    
    
    def get_dataset(
            dataset_path: str,
            subset_type: str,
            augmentation_pipeline: a.Compose,
            validation_fraction: float = 0.2,
            batch_size: int = 32,
            image_size: Tuple[int, int] = (300, 300),
            seed: int = 42
    ) -> tf.data.Dataset:
        augmentation_func = partial(
            augment_image,
            augmentation_pipeline=augmentation_pipeline,
        )
    
        dataset = image_dataset_from_directory(
            dataset_path,
            subset=subset_type,
            class_names=class_names,
            validation_split=validation_fraction,
            image_size=image_size,
            batch_size=batch_size,
            seed=seed,
        )
    
        return dataset \
            .map(augmentation_func, num_parallel_calls=AUTOTUNE) \
            .prefetch(AUTOTUNE)
    
    Tensorflow Reproducibility 
    opened by roma-glushko 19
  • ValueError: Expected x_max for bbox (0.94375, 0.5775173611111111, 1.003125, 0.6372395833333333, 0) to be in the range [0.0, 1.0], got 1.003125.

    ValueError: Expected x_max for bbox (0.94375, 0.5775173611111111, 1.003125, 0.6372395833333333, 0) to be in the range [0.0, 1.0], got 1.003125.

    🐛 Bug

    I tried to use any of transforms like VerticalFlip, RandomSizedBBoxSafeCrop and others box coordinate transformations but always i got the error "Expected x_max for bbox (0.9515625, 0.5316840277777778, 1.003125, 0.6955729166666667, 0) to be in the range [0.0, 1.0], got 1.003125". if i replace lines x_min, x_max = x_min / cols, x_max / cols, y_min, y_max = y_min / rows, y_max / rows in bbox_utils.py in normalize_bbox method by x_min, x_max = min(x_min / cols, 1.0), min(x_max / cols, 1.0), y_min, y_max = min(y_min / rows, 1.0), min(y_max / rows, 1.0) . it works correctly.

    To Reproduce

    Steps to reproduce the behavior:

    1. transforms = [ VerticalFlip(), RandomBrightnessContrast(), RandomShadow(p=0.5), RandomSnow(p=0.5), RandomFog(), JpegCompression()] augmentor = Compose(transforms, bbox_params=BboxParams(format='yolo', label_fields=['category_id']))
    2. Input bboxes [[0.492578125, 0.5118055555555555, 0.01328125, 0.02638888888888889], [0.501171875, 0.5013888888888889, 0.01171875, 0.019444444444444445], [0.509765625, 0.5020833333333333, 0.01328125, 0.020833333333333332], [0.51640625, 0.51875, 0.0265625, 0.034722222222222224], [0.581640625, 0.5131944444444444, 0.02265625, 0.029166666666666667], [0.613671875, 0.5145833333333333, 0.02734375, 0.034722222222222224], [0.7546875, 0.5319444444444444, 0.0859375, 0.08055555555555556], [0.46796875, 0.5423611111111111, 0.065625, 0.10138888888888889], [0.9734375, 0.6097222222222223, 0.0515625, 0.1638888888888889]]

    Traceback (most recent call last): File "/home/robo/Code/Python/ONNX/mobilenetv2.py", line 655, in for batch_data, boxes in det_dataset.get_batchGPU(batch_size): File "/home/robo/Code/Python/ONNX/mobilenetv2.py", line 609, in get_batchGPU max_length_box = self.get_image(start_index, batch_size, batch, labels) File "/home/robo/Code/Python/ONNX/mobilenetv2.py", line 579, in get_image sample = self.getItemGPURandomGreed(start_index) File "/home/robo/Code/Python/ONNX/mobilenetv2.py", line 569, in getItemGPURandomGreed return self.getItemGPUVariableGreed(indx, np.random.randint(1, 3), np.random.randint(1, 3)) File "/home/robo/Code/Python/ONNX/mobilenetv2.py", line 564, in getItemGPUVariableGreed aug = augmentor(**annotation) File "/home/robo/.local/lib/python3.6/site-packages/albumentations/core/composition.py", line 174, in call p.preprocess(data) File "/home/robo/.local/lib/python3.6/site-packages/albumentations/core/utils.py", line 63, in preprocess data[data_name] = self.check_and_convert(data[data_name], rows, cols, direction="to") File "/home/robo/.local/lib/python3.6/site-packages/albumentations/core/utils.py", line 71, in check_and_convert return self.convert_to_albumentations(data, rows, cols) File "/home/robo/.local/lib/python3.6/site-packages/albumentations/augmentations/bbox_utils.py", line 51, in convert_to_albumentations return convert_bboxes_to_albumentations(data, self.params.format, rows, cols, check_validity=True) File "/home/robo/.local/lib/python3.6/site-packages/albumentations/augmentations/bbox_utils.py", line 305, in convert_bboxes_to_albumentations return [convert_bbox_to_albumentations(bbox, source_format, rows, cols, check_validity) for bbox in bboxes] File "/home/robo/.local/lib/python3.6/site-packages/albumentations/augmentations/bbox_utils.py", line 305, in return [convert_bbox_to_albumentations(bbox, source_format, rows, cols, check_validity) for bbox in bboxes] File "/home/robo/.local/lib/python3.6/site-packages/albumentations/augmentations/bbox_utils.py", line 253, in convert_bbox_to_albumentations check_bbox(bbox) File "/home/robo/.local/lib/python3.6/site-packages/albumentations/augmentations/bbox_utils.py", line 332, in check_bbox "to be in the range [0.0, 1.0], got {value}.".format(bbox=bbox, name=name, value=value) ValueError: Expected x_max for bbox (0.9515625, 0.5316840277777778, 1.003125, 0.6955729166666667, 0) to be in the range [0.0, 1.0], got 1.003125.

    Environment

    • Albumentations version 0.4.2:
    • Python version 3.6.8:
    • OS Ubuntu 18.04:
    • pip :
    opened by adelkaiarullin 19
  • RandomShadow input type wrong

    RandomShadow input type wrong

    🐛 Bug

    Weather transformation. For RandomRain, RandomSnow, RandomSunFlare the inputs are just numpy uint8. However, RandomShadow does not allow the same input format.

    To Reproduce

    albu_shadow = albu.RandomShadow(p=1, num_shadows_lower=1, num_shadows_upper=1, shadow_dimension=5, shadow_roi=(0, 0.5, 1, 1))
    
    x_np = albu_shadow(image=x_np)['image']
    

    Error:

    TypeError: Expected Ptr<cv::UMat> for argument img
    

    Expected behavior

    It should take the same uint8 numpy array as input.

    Environment

    • Albumentations version: 0.4.5
    • Python version: 3.7.6
    • OS (e.g., Linux): Linux
    • How you installed albumentations: pip
    bug 
    opened by shamangary 15
  • Random Crop yields incorrect value for bounding box

    Random Crop yields incorrect value for bounding box

    🐛 Bug

    The bbox_random_crop function does not produce a reasonable result. Consider the following snippet of code.

    To Reproduce

    from albumentations import functional as F
    bbox = (0.129, 0.7846, 0.1626, 0.818)
    cropped_bbox = F.bbox_random_crop(bbox=bbox, crop_height=100, crop_width=100, h_start=0.7846, w_start=0.12933, rows=1500, cols=1500)
    cropped_bbox
    (0.125, 0.788999, 0.629, 1.29)
    #Notice y2 is outside of crop range.
    #But the following assert passes
    assert bbox[3] < (100/1500) + 0.7846
    #Fails
    assert all([(y >= 0) & (y<=1) for y in list(cropped_bbox)])
    

    Expected behavior

    A augmented bbox that is fully within the image crop. The crop_height plus the start of the crop is larger than the y2 of the bounding box, but 1.29 coordinate in the cropped box suggestion it is outside of the crop area.

    Environment

    • Albumentations version (e.g., 0.1.8): 0.5.2
    • Python version (e.g., 3.7): 3.7
    • OS (e.g., Linux): OSX
    • How you installed albumentations (conda, pip, source): pip
    • Any other relevant information:

    Additional context

    I am making a custom augmentation to Zoom in on a given bounding box. CropSafe (but not all boxes). Is there syntax that i'm misunderstanding, it doesn't feel like this could be the case. Dtype issue?

    opened by bw4sz 14
  • Changed downscale interpolation to avoid aliasing

    Changed downscale interpolation to avoid aliasing

    Hi !

    I recently used the albumentation library for a Kaggle competition and more particularly the Downscale transform.

    After looking at the result the transform gave me I was a little bit surprised:

    Result using bilinear interpolation

    1384a31b212c25586c99d68780b8e4e77decfff5

    Result using bicubic interpolation

    123b17028c9c2b00d43dec232461946937c81c19

    We can see a lot of artifacts and aliasing happening here.

    After checking the source code, I noticed that the same interpolation method was used both for the downscaling part and for the upscaling to the original size part. However, as the OpenCV doc mentions:

    cv2.INTER_AREA: resampling using pixel area relation. It may be a preferred method for image decimation, as it gives moire’-free results. But when the image is zoomed, it is similar to the INTER_NEAREST method

    This was indeed the case, here are the results of the same image but with cv2.INTER_AREA applied for the downscaling part:

    Result using bilinear interpolation

    d11cb87ef4cfdbcbd6c5c5a83ce5a9f8938284b2

    Result using bicubic interpolation

    c0b4f79380f147766a94a3106cdad62c38bf1cc8

    So we can see that now the images are of way better quality and better recreates what actual image resizing might look like, which is the main goal of the data transformation.

    Awaiting merge 
    opened by nathanhubens 14
  • RandomSunFlare dump

    RandomSunFlare dump

    🐛 Bug

    To Reproduce

    add function A.RandomSunFlare(p=0.2)

    Steps to reproduce the behavior:

    /Pytorch/lib/python3.6/site-packages/albumentations/augmentations/functional.py", line 863, in add_sun_flare cv2.circle(overlay, (x, y), rad3, (r_color, g_color, b_color), -1) cv2.error: OpenCV(4.5.4) :-1: error: (-5:Bad argument) in function 'circle'

    Overload resolution failed:

    • Can't parse 'center'. Sequence item with index 1 has a wrong type
    • Can't parse 'center'. Sequence item with index 1 has a wrong type

    Expected behavior

    Environment

    • Albumentations version (e.g., 0.1.8): albumentations 1.1.0
    • Python version (e.g., 3.6): 3.6
    • OS (e.g., Linux): linux
    • How you installed albumentations (conda, pip, source): pip
    • Any other relevant information:

    Additional context

    bug Need more info 
    opened by Tim5Tang 13
  • YOLO format without normalization and denormalization

    YOLO format without normalization and denormalization

    Since yolo and albumentations are normalized formats, we don't need to normalize and denormalize the values in the conversion step. The previous approach gave round-off errors.

    These changes should fix the following issues:

    • #922
    • #903
    • #862
    • #883
    • #848
    • #679
    opened by Dipet 12
  • Implementation of #617. `check_validity` BBox parameter

    Implementation of #617. `check_validity` BBox parameter

    Fix #617 check_validity parameter is added to BboxParams. Setting it to False gives a way to handle bounding boxes extending beyond the image. See motivation for it in #617.

    Need more info Branch conflict 
    opened by IlyaOvodov 12
  • ToTransform before Normalize causes Tensor no attribute astype Error

    ToTransform before Normalize causes Tensor no attribute astype Error

    This is my albumentations transform. Before, this was Normalize --> ToTensor. Changing the order (which I think is the right order) produces an error.

    def get_transforms(phase, mean, std):
        list_transforms = []
        if phase == "train":
            list_transforms.extend(
                [
                    HorizontalFlip(p=0.2),
                    ShiftScaleRotate(
                        shift_limit=0,  # no resizing
                        scale_limit=0.1,
                        rotate_limit=10, # rotate
                        p=0.5,
                        border_mode=cv2.BORDER_CONSTANT
                    )
                    # albu.RandomRotate90(),
                    # albu.Cutout(),
                    # albu.RandomBrightnessContrast(
                    #     brightness_limit=0.2, contrast_limit=0.2, p=0.3
                    # ),
                    # # albu.GridDistortion(p=0.3),
                    # albu.HueSaturationValue(p=0.2),
                    # albu.RandomContrast(p=0.2),
                    # albu.MedianBlur(p=0.2)
                    # Resize(320, 480),
                ]
            )
        list_transforms.extend(
            [
                ToTensor(),
                Normalize(mean=mean, std=std, p=1),
                
            ]
        )
    
        list_trfms = Compose(list_transforms)
        return list_trfms
    

    When loading using DataLoader, it generates an error

         90     denominator = np.reciprocal(std, dtype=np.float32)
         91 
    ---> 92     img = img.astype(np.float32)
         93     img -= mean
         94     img *= denominator
    
    AttributeError: 'Tensor' object has no attribute 'astype'
    
    opened by sarmientoj24 11
  • HorizontalFlip and VerticalFlip inconsistent with multilabel masks

    HorizontalFlip and VerticalFlip inconsistent with multilabel masks

    🐛 Bug

    Given a pair of image and its corresponding mask, the generated output for the augmented mask through augmentation is not the same as the image.(it is inconsistent) when HorizontalFlip() and VerticalFlip() are included in the augmentations.

    To Reproduce

    The following snippet is a small dataloader that i usually use. Can't share code.

        self.aug = Compose([
                            RandomBrightnessContrast(),
                            HueSaturationValue(),
                            RandomGamma(),
                            GaussNoise(),
                            GaussianBlur(),
                            # HorizontalFlip(),
                            # VerticalFlip(),
                            ])
    def __getitem__(self, patient_id):
        image_path = os.path.join(self.df.iloc[patient_id, 0])
        z = np.load(image_path)
        image = z['patch']
        gt_data = z['patch_gt']
        # print("Pre : ", image.shape, gt_data.shape)
        gt_data = gt_data.swapaxes(0, 2)
        # print("Pre swapped: ", image.shape, gt_data.shape)
        # gt_data = gt_data[:4, :, :]
        if not self.valid:
            augmented = self.aug(image=image, mask=gt_data)
            image = augmented['image']
            gt_data = augmented['mask']
        image = (image/255).astype(np.float32)
        # print("Post Augment :", image.shape, gt_data.shape)
        image = image.swapaxes(0, 2)
        gt_data = gt_data.swapaxes(0, 2)
        # print("Post Augment Swapped: ", image.shape, gt_data.shape)
        image = torch.FloatTensor(image)
        gt_data = torch.FloatTensor(gt_data)
        #mask.shape = (5, 256, 256)
        #image.shape = (256, 256, 3)
        return image, gt_data
    

    Expected behavior

    The augmentation over the images for horizontal and vertical flip should be working fine for both the mask and the image, but for some reason, there are errors in mask augmentations during horizontal and vertical flips.

    Image shape : 256, 256, 3 Mask Shape : 5, 256, 256

    Environment

    • Albumentations version : 0.43.
    • Python version : 3.6
    • OS (e.g., Linux): Linux
    • How you installed albumentations (conda, pip, source): pip
    • Any other relevant information:

    Additional context

    opened by Geeks-Sid 11
  • Shift augmentation in `ShiftScaleRotate` works incorrect for keypoints and bboxes

    Shift augmentation in `ShiftScaleRotate` works incorrect for keypoints and bboxes

    Version: 1.12 Shift augmentation in ShiftScaleRotate works incorrect for keypoints and bboxes. Please compare how it's applied to img: https://github.com/albu/albumentations/blob/c26383ecd9eeb51d57185bfd699179a8a41f7b6d/albumentations/augmentations/functional.py#L143

    BBoxes: https://github.com/albu/albumentations/blob/c26383ecd9eeb51d57185bfd699179a8a41f7b6d/albumentations/augmentations/functional.py#L635

    and keypoints: https://github.com/albu/albumentations/blob/c26383ecd9eeb51d57185bfd699179a8a41f7b6d/albumentations/augmentations/functional.py#L861

    'dx' and 'dy' is percentage values of image width and height. As we don't have access to image shape during these transforms it may be good to set shift range in pixels not in percents.

    bug 
    opened by mortido 11
  • Rotate & SafeRotate doesn't properly rotate the label in YOLO format

    Rotate & SafeRotate doesn't properly rotate the label in YOLO format

    🐛 Bug

    I'm using the following code to rotate the image and its label -

    def bboxes2TxtFile(bboxes, category_ids, output_path):
        with open(output_path, 'w') as f:
    
            for i, bbox in enumerate(bboxes):
                f.write(f"{category_ids[i]} {bbox[0]} {bbox[1]} {bbox[2]} {bbox[3]}\n")
    
    
    #======MAIN======
    transform = A.Compose(
                [
                    A.SafeRotate(always_apply=True, p=1.0, limit=(-360, 360), interpolation=0, border_mode=0, value=(0, 0, 0), mask_value=None)
                ],
                bbox_params=A.BboxParams(format='yolo', label_fields=['category_ids']),
            )
    
            transformed = transform(image=image, bboxes=bboxes, category_ids=category_ids)
            
            cv2.imwrite(outputImageDir+"SR_"+imageFile, transformed['image'])
    

    But I'm getting incorrect labels for the same. I used A.Rotate too but still, the same error persists. I'm attaching screenshots of the visualization.

    Rotated Label's visualization (Red lines drawn manually represent the expected output) - image

    Image without augmentation - image

    Test image and label for testing - test.zip

    opened by pillai-karthik 0
  • The `add_targets` method sets targets instead of adding them

    The `add_targets` method sets targets instead of adding them

    Suppose you do

    import albumentations as A
    t = A.ToGray()  # Works with all transformations
    t.add_targets({"my_image1": "image"})
    t.add_targets({"my_image2": "image"})
    print(t._additional_targets)
    

    You get {'my_image2': 'image'}. That seems to be what you want since you (albu) wrote

    class BasicTransform(Serializable):
        ...
        def add_targets(self, additional_targets):
            ...
            self._additional_targets = additional_targets
    

    But, given the name of the function, and the docstring "Add targets to transform them the same way as one of existing targets", I expected {'my_image1': "image", 'my_image2': "image"}.

    My own use case is very uncommon, but I could see other people adding some targets in 2 different places for some other reason. I would suggest either:

    • Replace self._additional_targets = additional_targets by self._additional_targets.update(additional_targets)
    • Rename add_targets to set_targets / set_additional_targets

    For completeness purpose, my own use case is that I have a child class (named Tee) of BasicTransform which outputs 2 keys from one key. So my pipeline looks like:

    before_tee1 = A.SomeTransform(...)
    before_tee2 = A.SomeTransform(...)
    tee = Tee(...)
    after_tee1 = A.SomeTransform(...)
    after_tee2 = A.SomeTransform(...)
    for transfo in [after_tee1, after_tee2]:
        transfo.add_targets({"image_copy": "image"})
    
    ...
    
    dynamic_composed_transfo = A.Compose(
        [before_tee1, before_tee2, tee, after_tee1, after_tee2], additional_targets=dynamic_targets)
    
    opened by ernest-tg 2
  • [Company] Voxel

    [Company] Voxel

    Name

    Voxel

    Website

    https://voxelai.com/

    Link to a logo

    https://uploads-ssl.webflow.com/62cf4f5eff50c678585c2a90/62e762b95d9964ef16f76281_Color.svg

    Confirmation

    • [X] I hereby confirm that I have permission from the company specified above to use its name, link to a website, and logo to mention it as a user of Albumentations.
    opened by bilalsal 0
  • Motion blur ValueError when blur_limit is even

    Motion blur ValueError when blur_limit is even

    🐛 Bug

    I'm not sure that this line https://github.com/albumentations-team/albumentations/blob/master/albumentations/augmentations/blur/transforms.py#L77 works as expected. It doesn't take into account allow_shifted parameter at all, since there is or operator.

    To Reproduce

    Steps to reproduce the behavior:

    1. A.Compose([A.MotionBlur(blur_limit=20)])

    ValueError: Blur limit must be odd when centered=True. Got: (3, 20)

    Expected behavior

    I think that check for motion_limit should take the expression like this

    if not allow_shifted and (self.blur_limit[0] % 2 != 1 or self.blur_limit[1] % 2 != 1):
    

    Environment

    • Albumentations version (e.g., 0.1.8): 1.3.0
    • Python version (e.g., 3.7): 3.10.8
    • OS (e.g., Linux): MacOs
    • How you installed albumentations (conda, pip, source): pip
    • Any other relevant information:
    bug good first issue documentation 
    opened by Andredance 5
Releases(1.3.0)
  • 1.3.0(Sep 20, 2022)

    albu_1_3_0

    Breaking changes

    • Renamed method to rotate_method inside Rotate to keep consistency between naming parameters. (#1258 by @Dipet, thanks to @MichaelMonashev)

    New augmentations

    • RandomCropFromBorders - Crops image based on indents from image borders. (#1240 by @Dipet based on #476 by @ZFTurbo)
    • BBoxSafeRandomCrop - Crops image without loss of bboxes. Instead of RandomSizedBBoxSafeCrop this implementation do not apply resize to target size. (#579 by @SunQpark)
    • Spatter - Simulates corruption which can occlude a lens in the form of rain or mud. (#573 by @akarsakov)
    • Defocus - Imitates lens defocusing. (#551 by @akarsakov)
    • ZoomBlur - Imitates lens blur on zoomig. (#551 by @akarsakov)

    Bugfixes

    • Fixed wrong result in RandomBrightnessContrast when brightness_by_max=False. (#487 by @Dipet)
    • Fixed wrong bbox clipping inside Perspective and Affine. (#1231 by @Dipet)
    • Fixed incorrect removal of bboxes when min_visibility=0 or min_visibility=1. (#616 by @IlyaOvodov)
    • Fixed wrong keypoint's cropping inside Rotate when crop_border=True. (#1250 by @Dipet, thanks to @jonkoi)
    • Fixed wrong propagation of always_apply Compose children. (#561 by @albu)
    • RandomSunFlare now correctly works with src_color, and use all three color values. (#1285 by @hoel-bagard)
    • RandomGamma now correctly works with float gamma_limit. (#1286 by @zahragolpa)

    Minor changes:

    • Speeded up Normalize in some case up to 2 times. (#563 by @Dipet)
    • GridDistortion, ElasticTransform and OpticalDistortion now supports bbox targets. (#476, #1262 by @ZFTurbo and @Dipet)
    • MotionBlur now supports allow_shifted flag. When it's value is False only non shifted kernels generated. (#1239 by @Dipet)
    • Updated versions of type formatters. (#1245 by @ternaus)
    • GridDistortion now supports normalized flag. When it is set to True will be applied distortion inside image border. (#722 by @poke1024)
    • Now you can describe downscale and upscale interpolation method for Downscale. This is needed to avoid interpolation artefacts. (#584 by @nathanhubens)
    • Refactoring. Spatial transforms moved to geometric files. (#1241 by @ternaus)
    • Refactoring. Common functions moved into albumentations.augmentations.utils.py. (#1260 by @Dipet)
    • Refactoring. Blur transforms moved into albumentations.augmentations.blur. (#1259 by @Dipet)
    Source code(tar.gz)
    Source code(zip)
  • 1.2.1(Jul 12, 2022)

    175977113-5874a3f9-515b-42d3-a01f-73297934b912(2)

    Minor changes

    • A.Rotate and A.ShiftScaleRotate now support new rotation method for bounding boxes, ellipse. (#1203 by @victor1cea)
    • A.Rotate now supports new argument crop_border. If set to True, the rotated image will be cropped as much as possible to eliminate pixel values at the edges that were not well defined after rotation. (#1214 by @bonlime)
    • Tests that use multiprocessing now run much faster (#1218 by @Dipet)
    • Improved type hints (#1219 by @Dipet )
    • Fixed a deprecation warning in match_histograms. (#1121 by @BloodAxe)

    Bugfixes

    • A.CropNonEmptyMaskIfExists modified the first element of masks in-place. Now, this behavior is fixed and A.CropNonEmptyMaskIfExists doesn't do in-place modification of input masks. (#1193 by @ORippler).
    • Albumentations now correctly serialized and desirealized fill_value and mask_fill_value parameters for A.GridDropout. (#1191 by @victor1cea)
    • A.ColorJitter now correctly works with A.ReplayCompose. (#1199 by @zakajd)
    • Fixed incorrect behavior of A.ColorJitter for np.float32 input images when contrast is set to 0 (previously, all values were set to 0.5 instead of using the average value).. (#1207 by @Dipet)
    • A.Rotate, A.Affine and A.ShiftScaleRotate now do rotation in the same way. Fixed incorrect rotation angle for A.Affine. A.Rotate and A.ShiftScaleRotate now correctly rotate the keypoints 90 degrees and don't leave black lines around the edges of the image. (#1091 by @Dipet )
    Source code(tar.gz)
    Source code(zip)
  • 1.2.0(Jun 15, 2022)

    New augmentations

    New augmentations

    New augmentations:

    • A.UnsharpMask. This transform sharpens the input image using Unsharp Masking processing and overlays the result with the original image. (#1063 by @zakajd)
    • A.RingingOvershoot. This transform creates ringing or overshoot artifacts by convolving the image with a 2D sinc filter. (#1064 by @zakajd)
    • A.AdvancedBlur. This transform blurs the input image using a Generalized Normal filter with randomly selected parameters. It also adds multiplicative noise to generated kernel before convolution. (#1066 by @zakajd)
    • A.PixelDropout. This transformation randomly replaces pixels with the passed value. (#1082 by @Dipet)

    Bugfixes

    • Fixed a problem that prevented A.RandomShadow from working with non-contiguous input. (#1117 by @i-aki-y)
    • A.PadIfNeeded now works with an arbitrary number of channels. (#1069 by @BloodAxe)
    • Fixed all np.random use cases to prevent identical values when using multiprocessing. (#1070 by @Dipet)
    • The slant param now has an effect in A.RandomRain. (#1179 by @victor1cea)
    • translate_percent now uses 0 as a default value in the A.Affine transform. (#1183 by @victor1cea)
    • A.SafeRotate no longer loses blocks and keypoints. (#1109 by @Dipet)
    • A.CropAndPad now correctly handles bboxes when keep_size=True. (#1059 by @cannon)
    • A.RandomCrop, A.RandomSizedCrop, and A.RandomSizedBBoxSafeCrop now sample last pixel. (#1080 by @Multihuntr)

    Minor changes:

    • Old code is refactored, and more type hints are added (#1052 by @Dipet).
    • A.Compose now warns the user if it receives a single augmentation instead of a sequence of augmentations. (#1055 by @Dipet)
    • A.CoarseDropout and A.RandomGridShuffle now support keypoints. (#1084 by @BloodAxe)
    • A.ToTensorV2 now supports the masks target. (#1097 by @alessiobonfiglio)
    • A.PadIfNeeded now supports random padding. (#1160 by @mys007 )
    • Improved and corrected documentation: #1047 by @shyn, #1164 by @notplus, #1105 by @i-aki-y
    • Speeded up tests by removing unnecessary tests. (#1188 by @creafz)
    • A.Affine now has keep_ratio flag. (#1104 by @i-aki-y)
    Source code(tar.gz)
    Source code(zip)
  • 1.1.0(Oct 4, 2021)

    133947365-6cba891b-4537-4d97-8b84-5ac9ce908d1d

    New augmentations

    • TemplateTransform. This transform allows the blending of an input image with specified templates. (#572 by @akarsakov )
    • PixelDistributionAdaptation. A new domain adaptation augmentation. It fits a simple transform on both the original and reference image, transforms the original image with transform trained on this image, and performs inverse transformation using transform fitted on the reference image. See the examples of this transform in the qudida repository. (#959 by @arsenyinfo)

    Minor changes:

    • LongestMaxSize and SmallestMaxSize now can also accept a list of sizes as their max_size argument and the actual max_size value will be sampled randomly from this list. (#930 by @kmistry-wx )
    • A.Affine now accepts mask_interpolation as a parameter. (#975 by @dskkato )
    • A.RandomRain now alters brightness in HSV space instead of HLS space to prevent image corruption. (#990 by @ErlingLie)
    • Albumentations now raises ValueError if bbox_params is not specified and bbox transformation is called (#1013 by @VirajBagal)
    • CoarseDropout can now set the height and width of holes based on the fraction of original image height and width (#1014 by @VirajBagal )
    • ElasticTransform got performance optimizations. (#1004 by @b0nce)

    Bugfixes

    • Fixed a bug when CropNonEmptyMaskIfExists thrown an error when it was used with a keypoint even though keypoints were mentioned as a correct target. (#986 by @GalDude33 )
    • Fixed KeyError with RandomCropNearBBox when it received values with x_min <= 0 or y_min <= 0 (#993 by @Dipet )
    Source code(tar.gz)
    Source code(zip)
  • 1.0.3(Jul 15, 2021)

    • Fixed problem with incorrect shape at keypoints and bboxes processors after ToTensorV2 #963
    • Fixed problems with float values in YOLO format in edge cases #958
    Source code(tar.gz)
    Source code(zip)
  • 1.0.2(Jul 9, 2021)

    1. Fixed YOLO format conversion problem when bbox greater than image by 1 pixel. Now YOLO bbox will be converted to Albumentations format without bbox denormalization. More info in PR: #924
    2. Removed redundant search of first & last dual transform #946
    Source code(tar.gz)
    Source code(zip)
  • 1.0.1(Jul 6, 2021)

    Added position argument to PadIfNeeded (#933 by @yisaienkov)

    Possible values: center top_left, top_right, bottom_left, bottom_right, with center being the default value.

    One possible use case for this feature is object detection where you need to pad an image to square, but you want predicted bounding boxes being equal to the bounding box of the unpadded image.

    image_padding_2 image source

    Source code(tar.gz)
    Source code(zip)
  • 1.0.0(Jun 1, 2021)

    Breaking changes

    • imgaug dependency is now optional, and by default, Albumentations won't install it. This change was necessary to prevent simultaneous install of both opencv-python-headless and opencv-python (you can read more about the problem in this issue). If you still need imgaug as a dependency, you can use the pip install -U albumentations[imgaug] command to install Albumentations with imgaug.
    • Deprecated augmentation ToTensor that converts NumPy arrays to PyTorch tensors is completely removed from Albumentations. You will get a RuntimeError exception if you try to use it. Please switch to ToTensorV2 in your pipelines.

    New augmentations

    • A.RandomToneCurve. See a notebook for examples of this augmentation (#839 by @aaroswings)
    • SafeRotate. Safely Rotate Images Without Cropping (#888 by @deleomike)
    • SomeOf transform that applies N augmentations from a list. Generalizing of OneOf (#889 by @henrique)
    • We are deprecating imgaug transforms and providing Albumentations' implementations for them. (#786 by @KiriLev, #787 by @KiriLev, #790, #843, #844, #849, #885, #892)

    By default, Albumentations doesn't require imgaug as a dependency. But if you need imgaug, you can install it along with Albumentations by running pip install -U albumentations[imgaug].

    Here is a table of deprecated imgaug augmentations and respective augmentations from Albumentations that you should use instead:

    | Old deprecated augmentation | New augmentation | |-----------------------------|------------------| | IAACropAndPad | CropAndPad | | IAAFliplr | HorizontalFlip | | IAAFlipud | VerticalFlip | | IAAEmboss | Emboss | | IAASharpen | Sharpen | | IAAAdditiveGaussianNoise | GaussNoise | | IAAPerspective | Perspective | | IAASuperpixels | Superpixels | | IAAAffine | Affine | | IAAPiecewiseAffine | PiecewiseAffine |

    Major changes

    • Serialization logic is updated. Previously, Albumentations used the full classpath to identify an augmentation (e.g. albumentations.augmentations.transforms.RandomCrop). With the updated logic, Albumentations will use only the class name for augmentations defined in the library (e.g., RandomCrop). For custom augmentations created by users and not distributed with Albumentations, the library will continue to use the full classpath to avoid name collisions (e.g., when a user creates a custom augmentation named RandomCrop and uses it in a pipeline).

      This new logic will allow us to refactor the code without breaking serialized augmentation pipelines created using previous versions of Albumentations. This change will also reduce the size of YAML and JSON files with serialized data.

      The new serialization logic is backward compatible. You can load serialized augmentation pipelines created in previous versions of Albumentations because Albumentations supports the old format.

    Bugfixes

    • Fixed a bug that prevented A.ReplayCompose to work with bounding boxes and keypoints correctly. (#748)
    • A.GlassBlur now correctly works with float32 inputs (#826)
    • MultiplicativeNoise now correctly works with gray images with shape [h, w, 1]. (#793)

    Minor changes

    • Code for geometric transforms moved to a standalone module albumentations.augmentations.geometric. (#784)
    • Code for crop transforms moved to a standalone module albumentations.augmentations.crops. (#791)
    • CI now runs tests under Python 3.9 as well (#830)
    • Linters and code formatters for CI and pre-commit hooks are updated to the latest versions (#831)
    • Logic in setup.py that detects existing installations of OpenCV now also looks for opencv-contrib-python and opencv-contrib-python-headless (#837 by @agchang-cgl)
    Source code(tar.gz)
    Source code(zip)
  • 0.5.2(Nov 29, 2020)

    Minor changes

    • ToTensorV2 now automatically expands grayscale images with the shape [H, W] to the shape [H, W, 1]. PR #604 by @Ingwar.
    • CropNonEmptyMaskIfExists now also works with multiple masks that are provided by the masks argument to the transform function. Previously this augmentation worked only with a single mask provided by the mask argument. PR #761
    Source code(tar.gz)
    Source code(zip)
  • 0.5.1(Nov 2, 2020)

    Breaking changes

    • API for A.FDA is changed to resemble API of A.HistogramMatching. Now, both transformations expect to receive a list of reference images, a function to read those image, and additional augmentation parameters. (#734)
    • A.HistogramMatching now usesread_rgb_image as a default read_fn. This function reads an image from the disk as an RGB NumPy array. Previously, the default read_fn was cv2.imread which read an image as a BGR NumPy array. (#734)

    New transformations

    • A.Sequential transform that can apply augmentations in a sequence. This transform is not intended to be a replacement for A.Compose. Instead, it should be used inside A.Compose the same way A.OneOf or A.OneOrOther. For instance, you can combine A.OneOf with A.Sequential to create an augmentation pipeline containing multiple sequences of augmentations and apply one randomly chosen sequence to input data. (#735)

    Minor changes

    • A.ShiftScaleRotate now has two additional optional parameters: shift_limit_x and shift_limit_y. If either of those parameters (or both of them) is set A.ShiftScaleRotate will use the set values to shift images on the respective axis. (#735)
    • A.ToTensorV2 now supports an additional argument transpose_mask (False by default). If the argument is set to True and an input mask has 3 dimensions, A.ToTensorV2 will transpose dimensions of a mask tensor in addition to transposing dimensions of an image tensor. (#735)

    Bugfixes

    • A.FDA now correctly uses coordinates of the center of an image. (#730)
    • Fixed problems with grayscale images for A.HistogramMatching. (#734)
    • Fixed a bug that led to an exception when A.load() was called to deserialize a pipeline that contained A.ToTensor or A.ToTensorV2, but those transforms were not imported in the code before the call. (#735)
    Source code(tar.gz)
    Source code(zip)
  • 0.5.0(Oct 19, 2020)

    Breaking changes

    • Albumentations now explicitly checks that all inputs to augmentations are named arguments and raise an exception otherwise. So if an augmentation receives input like aug(image) instead of aug(image=image), Albumentations will raise an exception. (#560)
    • Dropped support of Python 3.5 (#709)
    • Keypoints and bboxes are checked for visibility after each transform (#566)

    New transformations

    • A.FDA transform for Fourier-based domain adaptation. (#685)
    • A.HistogramMatching transform that applies histogram matching. (#708)
    • A.ColorJitter transform that behaves similarly to ColorJitter from torchvision (though there are some minor differences due to different internal logic for working with HSV colorspace in Pillow, which is used in torchvision and OpenCV, which is used in Albumentations). (#705)

    Minor changes

    • A.PadIfNeeded now accepts additional pad_width_divisor, pad_height_divisor (None by default) to ensure image has width & height that is dividable by given values. (#700)
    • Added support to apply A.CoarseDropout to masks via mask_fill_value. (#699)
    • A.GaussianBlur now supports the sigma parameter that sets standard deviation for Gaussian kernel. (#674, #673) .

    Bugfixes

    • Fixed bugs in A.HueSaturationValue for float dtype. (#696, #710)
    • Fixed incorrect rounding error on bboxes in YOLO format. (#688)
    Source code(tar.gz)
    Source code(zip)
  • 0.4.6(Jul 19, 2020)

    Improvements

    • Change the ImgAug dependency version from “imgaug>=0.2.5,<0.2.7” to “imgaug>=0.4.0". Now Albumentations won’t downgrade your existing ImgAug installation to the old version. PR #658.
    • Do not try to resize an image if it already has the required height and width. That eliminates the redundant call to the OpenCV function that requires additional copying of the input data. PR #639. ReplayCompose is now serializable. PR #623 by IlyaOvodov
    • Documentation fixes and updates.

    Bug Fixes

    • Fix a bug that causes some keypoints and bounding boxes to lie outside the visible part of the augmented image if an augmentation pipeline contained augmentations that increase the height and width of an image (such as PadIfNeeded). That happened because Albumentations checked which bounding boxes and keypoints lie outside the image only after applying all augmentations. Now Albumentations will check and remove keypoints and bounding boxes that lie outside the image after each augmentation. If, for some reason, you need the old behavior, pass check_each_transform=False in your KeypointParams or BboxParams. Issue #565 and PR #566.
    • Fix a bug that causes an exception when Albumentations received images with the number of color channels that are even but are not multiples of 4 (such as 6, 10, etc.). PR #638.
    • Fix the off-by-one error in applying steps for GridDistortion. Commit 9c225a99a379594098dbea2a077fd22da684ade9
    • Fix bugs that prevent serialization of ImageCompression and GaussNoise. PR #569
    • Fix a bug that causes errors with some values for label_fields in BboxParams. PR #504 by IlyaOvodov
    • Fix a bug that prevents HueSaturationValue for working with grayscale images. PR #500.
    Source code(tar.gz)
    Source code(zip)
  • 0.4.0(Oct 14, 2019)

    Table of Contents

    New transforms

    ISONoise

    https://github.com/albu/albumentations/commit/2e25667f8c39eba3e6be0e85719e5156422ee9a9 Target: image

    This transform mimics the noise that images will have if the ISO parameter of the camera is high. Wiki

    Solarize

    https://github.com/albu/albumentations/commit/e365b52df6c6535a1bf06733b607915231f2f9d4 Targets: image

    Solarize inverts all pixels above some threshold. It is an essential part of the work AutoAugment: Learning Augmentation Policies from Data.

    Equilize

    https://github.com/albu/albumentations/commit/9f71038c95c4124bdaf3ee13a9823225bb8d85da Target: image

    Equalizes image histogram. It is an essential part of the work AutoAugment: Learning Augmentation Policies from Data.

    Posterize

    https://github.com/albu/albumentations/commit/ad95fa005fd5325deb73461bfb6e543fca342f45 Target: image

    Reduce the number of bits for each pixel. It is an essential part of the work AutoAugment: Learning Augmentation Policies from Data.

    ImageCompression

    Target: image https://github.com/albu/albumentations/commit/b6127864d45cfa5b5299578d309680baa0ce7aa3 Decrease Jpeg or WebP compression to the image.

    Downscale

    https://github.com/albu/albumentations/commit/df831d6605140e7aa013deab6012d85af9854be3 Target: image

    Decreases image quality by downscaling and upscaling back.

    RandomResizedCrop

    https://github.com/albu/albumentations/commit/4dbe41e8795c7b7d48e0cc4501efe8046e21765b Targets: image, mask, bboxes, keypoints

    Crop the given Image to the random size and aspect ratio. This transform is an essential part of many image classification pipelines. Very popular for ImageNet classification.

    It has the same API as RandomResizedCrop in torchvision.

    RandomGridShuffle

    https://github.com/albu/albumentations/commit/4cf6c36bc2332729d91e44f58f18f44b66db3c6f Targets: image, mask

    Partition an image into tiles. Shuffle them and merge back.

    CropNonEmptyMaskIfExists

    Targets: image, mask, bboxes, keypoints

    Crop area with a mask if the mask is non-empty, else make a random crop.

    ToTensorV2

    https://github.com/albu/albumentations/commit/a5026800d84c6c1998f224b86dedbf3f005ae994 Targets: image, mask

    Convert image and mask to torch.Tensor

    New features

    Added YOLO format to bounding boxes.

    https://github.com/albu/albumentations/commit/d05db9e9aae6b7607c33c4cdce69be011c2f8802

    The Yolo format of a bounding box has a format [x, y, width, height], where values normalized to the size of the image. Ex: [0.3, 0.1, 0.05, 0.07]

    Added Deterministic / Replay mode

    https://github.com/albu/albumentations/commit/9942689f9846c59006c80718ee8db38e02ee2104

    Augmentations pipeline has a lot of randomnesses, which is hard to debug. We added Determentsic / Replay mode in which you can track what parameters were applied to the input and use precisely the same transform to another input if necessary.

    Jupyter notebook with an example.

    Added fill_value to the Cutout transform.

    https://github.com/albu/albumentations/commit/d85bab59eb8ccb0a2fec86750f94173e18e86395

    Separated fill_value for images and masks

    https://github.com/albu/albumentations/commit/2c1a1485f690b4e8ead50f5bb29d3838fbbc177d

    One of the use cases is it to use mask_value, which is equal to the ignore_index of your loss. This will decrease the level of noise and may improve convergence.

    Speedup in the RGBShift

    https://github.com/albu/albumentations/commit/c3cc277f37b172bebf7177c779a7cf3cdf7120d3

    3.2 times faster for uint8 images.

    Speedup in HueSaturationValue

    https://github.com/albu/albumentations/commit/448761df9a008384cf914343f25e3cfb7c4d7551

    2 times faster for uint8 images.

    Speedup in RandomBrightnessContrast

    https://github.com/albu/albumentations/commit/4e12c6ec3e55cf79cf242a09c5cdc813bcfc6401

    2.7 times faster for uint8 images.

    Speedup in RandomGamma

    https://github.com/albu/albumentations/commit/ac499d0365bfb2494cb535e82591fc3460d4595a

    4 times faster for uint8 images.

    Added support for images and masks with more than 3 channels

    https://github.com/albu/albumentations/commit/c028a9557cc960da11720a0a505a19cdd4fe0b24

    Added key points support

    https://github.com/albu/albumentations/commit/30a3f3024dc34597307c466a6307e2e6d27e9d3e Not all spatial tranforms jave keypoints support yet. In this release we added Crop, CropNonEmptyMaskIfExists, LongestMaxSize, RandomCropNearBBox, Resize, SmallestMaxSize, and Transpose.

    Add per channel transform composition https://github.com/albu/albumentations/commit/7fb635c66acd5e6c3e9ca50a37a9496956644f36

    Bug Fixes

    • Bugfix in the GaussNoisehttps://github.com/albu/albumentations/commit/1bc367f54be07fed0fc0ef39d718dc040b7927d4
    • Bugfix in the RandomGamma https://github.com/albu/albumentations/commit/389d31ab333a9681413ab3eddef8c2a41dfe73df
    • Bugfix in the RandomSizedBBoxSafeCrop https://github.com/albu/albumentations/commit/9db2a74bfcd1ed38a7b5430ff4f43c1a30346f6f

    Documentation Updated

    Added a page that lists pre-prints and papers that cite albumentations

    We are delighted that albumentations are helpful to the academic community. We extended documentation with a page that lists all papers and preprints that cite albumentations in their work. This page is automatically generated by parsing Google Scholar. At this moment, this number is 24.

    Added a page that lists competitions in which top teams used albumentations.

    We are delighted that albumentations help people to get top results in machine learning competitions at Kaggle and other platforms. We added a "Hall of Fame" where people can share their achievements. This page is manually created. We encourage people to add more information about their results with pull requests, following the contributing guide.

    People that made this release happen

    @albu @Dipet @creafz @BloodAxe @ternaus @vfdev-5 @arsenyinfo @qubvel @toshiks @Jae-Hyuck @BelBES @alekseynp @timeous @jveitchmichaelis @bfialkoff

    Source code(tar.gz)
    Source code(zip)
  • 0.3.0(Jun 26, 2019)

    Added serialization / deserialization

    • Now we can define transformations in a python dictionary, json, yaml files and they will be deserialized and used in the code.
    • Now we can define transformations in the code and serialize them in python dictionary, json and yaml files.

    Jupyter notebook with an example

    Special thanks to @creafz

    Added new transformations

    Special thanks to @vfdev-5 @ternaus @BloodAxe @kirillbobyrev

    Bugfixes and improvements

    Special thanks to @qubvel @ternaus @albu @BloodAxe

    Source code(tar.gz)
    Source code(zip)
  • 0.2.0(Mar 4, 2019)

    Added support for the keypoint transformations to

    Notebook with an example

    Special thanks to the Evegene Khvedchenya (@BloodAxe) for the work.

    Added an option to apply the same transformation to the more than one target of the same type.

    The possible use case are image2image or stereo-image pipelines.

    Notebook with an example

    Special thanks to Alexander Buslaev (@albu) for the work.

    Added new transformations

    Speed up in

    Bug fixes

    And many others.

    Additional

    • Performance benchmark was extended to the Augmentor and Solt libraries.
    • Added table to Readme that shows all implemented transformations with the set of possible targets: images, bounding boxes, masks, key points. (Special thanks to Alex Parinov @creafz )
    • The library can be installed in anaconda.

    Contributors

    @BloodAxe @albu @creafz @ternaus @erikgaas @marcocaccin @libfun @DBusAI @alexobednikov @StrikerRUS @IlyaOvodov @ZFTurbo @Vcv85 @georgymironov @LinaShiryaeva @vfdev-5 @daisukelab @cdicle

    Source code(tar.gz)
    Source code(zip)
  • v0.1.1(Sep 26, 2018)

    Bounding boxes support

    Transformations that support bounding boxes

    The main change in this release is the addition of the operations on bounding boxes to the

    Supported formats

    Currently supported the following formats for the bounding boxes:

    1. COCO: [x_min, y_min, width, height], ex [97, 12, 150, 200]
    2. Pascal VOC: [x_min, y_min, x_max, y_max], ex [97, 12, 247, 212]

    Bounding box filtering

    It may happen that after the transformation a big part of the bounding box was cropped and it is needed to exclude such boxes.

    We support such a bounding box filtering based on the:

    • Bounding box area, measured in pixels.
    • Visible box area, measured in percent.

    Smaller changes

    • Added support for 8-bit images.
    • We changed all np.random occurrences to random due to the numpy behavior reported in https://github.com/pytorch/pytorch/issues/5059
    • Multiple bugfixes.

    Added notebooks with examples

    Source code(tar.gz)
    Source code(zip)
Run tesseract with the tesserocr bindings with @OCR-D's interfaces

ocrd_tesserocr Crop, deskew, segment into regions / tables / lines / words, or recognize with tesserocr Introduction This package offers OCR-D complia

OCR-D 38 Oct 14, 2022
(CVPR 2021) ST3D: Self-training for Unsupervised Domain Adaptation on 3D Object Detection

ST3D Code release for the paper ST3D: Self-training for Unsupervised Domain Adaptation on 3D Object Detection, CVPR 2021 Authors: Jihan Yang*, Shaoshu

CVMI Lab 224 Dec 28, 2022
轻量级公式 OCR 小工具:一键识别各类公式图片,并转换为 LaTeX 格式

QC-Formula | 青尘公式 OCR 介绍 轻量级开源公式 OCR 小工具:一键识别公式图片,并转换为 LaTeX 格式。 支持从 电脑本地 导入公式图片;(后续版本将支持直接从网页导入图片) 公式图片支持 .png / .jpg / .bmp,大小为 4M 以内均可; 支持印刷体及手写体,前

青尘工作室 26 Jan 07, 2023
question‘s area recognition using image processing and regular expression

======================================== Paper-Question-recognition ======================================== question‘s area recognition using image p

Yuta Mizuki 7 Dec 27, 2021
This tool will help you convert your text to handwriting xD

So your teacher asked you to upload written assignments? Hate writing assigments? This tool will help you convert your text to handwriting xD

Saurabh Daware 4.2k Jan 07, 2023
Convert scans of handwritten notes to beautiful, compact PDFs

Convert scans of handwritten notes to beautiful, compact PDFs

Matt Zucker 4.8k Jan 01, 2023
A pure pytorch implemented ocr project including text detection and recognition

ocr.pytorch A pure pytorch implemented ocr project. Text detection is based CTPN and text recognition is based CRNN. More detection and recognition me

coura 444 Dec 30, 2022
Multi-choice answer sheet correction system using computer vision with opencv & python.

Multi choice answer correction 🔴 5 answer sheet samples with a specific solution for detecting answers and sheet correction. 🔴 By running the soluti

Reza Firouzi 7 Mar 07, 2022
Recognizing cropped text in natural images.

ASTER: Attentional Scene Text Recognizer with Flexible Rectification ASTER is an accurate scene text recognizer with flexible rectification mechanism.

Baoguang Shi 681 Jan 02, 2023
OpenCV-Erlang/Elixir bindings

evision [WIP] : OS : arch Build Status Ubuntu 20.04 arm64 Ubuntu 20.04 armv7 Ubuntu 20.04 s390x Ubuntu 20.04 ppc64le Ubuntu 20.04 x86_64 macOS 11 Big

Cocoa 194 Jan 05, 2023
a Deep Learning Framework for Text

DeLFT DeLFT (Deep Learning Framework for Text) is a Keras and TensorFlow framework for text processing, focusing on sequence labelling (e.g. named ent

Patrice Lopez 350 Dec 19, 2022
A semi-automatic open-source tool for Layout Analysis and Region EXtraction on early printed books.

LAREX LAREX is a semi-automatic open-source tool for layout analysis on early printed books. It uses a rule based connected components approach which

162 Jan 05, 2023
A version of nrsc5-gui that merges the interface developed by cmnybo with the architecture developed by zefie in order to start a new baseline that is not heavily dependent upon Python processing.

NRSC5-DUI is a graphical interface for nrsc5. It makes it easy to play your favorite FM HD radio stations using an RTL-SDR dongle. It will also displa

61 Dec 22, 2022
Deep LearningImage Captcha 2

滑动验证码深度学习识别 本项目使用深度学习 YOLOV3 模型来识别滑动验证码缺口,基于 https://github.com/eriklindernoren/PyTorch-YOLOv3 修改。 只需要几百张缺口标注图片即可训练出精度高的识别模型,识别效果样例: 克隆项目 运行命令: git cl

Python3WebSpider 117 Dec 28, 2022
A fastai/PyTorch package for unpaired image-to-image translation.

Unpaired image-to-image translation A fastai/PyTorch package for unpaired image-to-image translation currently with CycleGAN implementation. This is a

Tanishq Abraham 120 Dec 02, 2022
huoyijie 1.2k Dec 29, 2022
Implement 'Single Shot Text Detector with Regional Attention, ICCV 2017 Spotlight'

SSTDNet Implement 'Single Shot Text Detector with Regional Attention, ICCV 2017 Spotlight' using pytorch. This code is work for general object detecti

HotaekHan 84 Jan 05, 2022
The first open-source library that detects the font of a text in a image.

Typefont Typefont is an experimental library that detects the font of a text in a image. Usage Import the main function and invoke it like in the foll

Vasile Pește 1.6k Feb 24, 2022
caffe re-implementation of R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection

R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection Abstract This is a caffe re-implementation of R2CNN: Rotational Region CNN fo

candler 80 Dec 28, 2021
It is a image ocr tool using the Tesseract-OCR engine with the pytesseract package and has a GUI.

OCR-Tool It is a image ocr tool made in Python using the Tesseract-OCR engine with the pytesseract package and has a GUI. This is my second ever pytho

Khant Htet Aung 4 Jul 11, 2022