Build fully-functioning computer vision models with PyTorch

Overview

Detecto Logo

Documentation Status Downloads

Detecto is a Python package that allows you to build fully-functioning computer vision and object detection models with just 5 lines of code. Inference on still images and videos, transfer learning on custom datasets, and serialization of models to files are just a few of Detecto's features. Detecto is also built on top of PyTorch, allowing an easy transfer of models between the two libraries.

The table below shows a few examples of Detecto's performance:

Still Image Video
Detecto still image Video demo of Detecto

Installation

To install Detecto using pip, run the following command:

pip3 install detecto

Installing with pip should download all of Detecto's dependencies automatically. However, if an issue arises, you can manually download the dependencies listed in the requirements.txt file.

Usage

The power of Detecto comes from its simplicity and ease of use. Creating and running a pre-trained Faster R-CNN ResNet-50 FPN from PyTorch's model zoo takes 4 lines of code:

from detecto.core import Model
from detecto.visualize import detect_video

model = Model()  # Initialize a pre-trained model
detect_video(model, 'input_video.mp4', 'output.avi')  # Run inference on a video

Below are several more examples of things you can do with Detecto:

Transfer Learning on Custom Datasets

Most of the times, you want a computer vision model that can detect custom objects. With Detecto, you can train a model on a custom dataset with 5 lines of code:

from detecto.core import Model, Dataset

dataset = Dataset('custom_dataset/')  # Load images and label data from the custom_dataset/ folder

model = Model(['dog', 'cat', 'rabbit'])  # Train to predict dogs, cats, and rabbits
model.fit(dataset)

model.predict(...)  # Start using your trained model!

Inference and Visualization

When using a model for inference, Detecto returns predictions in an easy-to-use format and provides several visualization tools:

from detecto.core import Model
from detecto import utils, visualize

model = Model()

image = utils.read_image('image.jpg')  # Helper function to read in images

labels, boxes, scores = model.predict(image)  # Get all predictions on an image
predictions = model.predict_top(image)  # Same as above, but returns only the top predictions

print(labels, boxes, scores)
print(predictions)

visualize.show_labeled_image(image, boxes, labels)  # Plot predictions on a single image

images = [...]
visualize.plot_prediction_grid(model, images)  # Plot predictions on a list of images

visualize.detect_video(model, 'input_video.mp4', 'output.avi')  # Run inference on a video
visualize.detect_live(model)  # Run inference on a live webcam

Advanced Usage

If you want more control over how you train your model, Detecto lets you do just that:

from detecto import core, utils
from torchvision import transforms
import matplotlib.pyplot as plt

# Convert XML files to CSV format
utils.xml_to_csv('training_labels/', 'train_labels.csv')
utils.xml_to_csv('validation_labels/', 'val_labels.csv')

# Define custom transforms to apply to your dataset
custom_transforms = transforms.Compose([
    transforms.ToPILImage(),
    transforms.Resize(800),
    transforms.ColorJitter(saturation=0.3),
    transforms.ToTensor(),
    utils.normalize_transform(),
])

# Pass in a CSV file instead of XML files for faster Dataset initialization speeds
dataset = core.Dataset('train_labels.csv', 'images/', transform=custom_transforms)
val_dataset = core.Dataset('val_labels.csv', 'val_images')  # Validation dataset for training

# Create your own DataLoader with custom options
loader = core.DataLoader(dataset, batch_size=2, shuffle=True) 

model = core.Model(['car', 'truck', 'boat', 'plane'])
losses = model.fit(loader, val_dataset, epochs=15, learning_rate=0.001, verbose=True)

plt.plot(losses)  # Visualize loss throughout training
plt.show()

model.save('model_weights.pth')  # Save model to a file

# Directly access underlying torchvision model for even more control
torch_model = model.get_internal_model()
print(type(torch_model))

For more examples, visit the docs, which includes a quickstart tutorial.

Alternatively, check out the demo on Colab.

API Documentation

The full API documentation can be found at detecto.readthedocs.io. The docs are split into three sections, each corresponding to one of Detecto's modules:

Core

The detecto.core module contains the central classes of the package: Dataset, DataLoader, and Model. These are used to read in a labeled dataset and train a functioning object detection model.

Utils

The detecto.utils module contains a variety of useful helper functions. With it, you can read in images, convert XML files into CSV files, apply standard transforms to images, and more.

Visualize

The detecto.visualize module is used to display labeled images, plot predictions, and run object detection on videos.

Contributing

All issues and pull requests are welcome! To run the code locally, first fork the repository and then run the following commands on your computer:

git clone https://github.com/<your-username>/detecto.git
cd detecto
# Recommended to create a virtual environment before the next step
pip3 install -r requirements.txt

When adding code, be sure to write unit tests and docstrings where necessary.

Tests are located in detecto/tests and can be run using pytest:

python3 -m pytest

To generate the documentation locally, run the following commands:

cd docs
make html

The documentation can then be viewed at docs/_build/html/index.html.

Contact

Detecto was created by Alan Bi. Feel free to reach out on Twitter or through email!

Comments
  • Can't do model.fit(dataset)

    Can't do model.fit(dataset)

    Actually I got the following errors when I was trying this code on Google Colab.

    from detecto import core, utils, visualize
    
    dataset = core.Dataset('images/')
    model = core.Model(['dog', 'cat'])
    
    model.fit(dataset)
    

    This is the error I got in the Google Colab terminal. Please help me, anyone

    /usr/local/lib/python3.6/dist-packages/torch/nn/functional.py:2854: UserWarning: The default behavior for interpolate/upsample with float scale_factor will change in 1.6.0 to align with other frameworks/libraries, and use scale_factor directly, instead of relying on the computed output size. If you wish to keep the old behavior, please set recompute_scale_factor=True. See the documentation of nn.Upsample for details. 
      warnings.warn("The default behavior for interpolate/upsample with float scale_factor will change "
    ---------------------------------------------------------------------------
    RuntimeError                              Traceback (most recent call last)
    <ipython-input-9-11e9a74e8844> in <module>()
          4 model = core.Model(['dog', 'cat'])
          5 
    ----> 6 model.fit(dataset)
    
    /usr/local/lib/python3.6/dist-packages/detecto/core.py in fit(self, dataset, val_dataset, epochs, learning_rate, momentum, weight_decay, gamma, lr_step_size, verbose)
        467             # Training step
        468             self._model.train()
    --> 469             for images, targets in dataset:
        470                 self._convert_to_int_labels(targets)
        471                 images, targets = self._to_device(images, targets)
    
    /usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py in __next__(self)
        343 
        344     def __next__(self):
    --> 345         data = self._next_data()
        346         self._num_yielded += 1
        347         if self._dataset_kind == _DatasetKind.Iterable and \
    
    /usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py in _next_data(self)
        383     def _next_data(self):
        384         index = self._next_index()  # may raise StopIteration
    --> 385         data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
        386         if self._pin_memory:
        387             data = _utils.pin_memory.pin_memory(data)
    
    /usr/local/lib/python3.6/dist-packages/torch/utils/data/_utils/fetch.py in fetch(self, possibly_batched_index)
         42     def fetch(self, possibly_batched_index):
         43         if self.auto_collation:
    ---> 44             data = [self.dataset[idx] for idx in possibly_batched_index]
         45         else:
         46             data = self.dataset[possibly_batched_index]
    
    /usr/local/lib/python3.6/dist-packages/torch/utils/data/_utils/fetch.py in <listcomp>(.0)
         42     def fetch(self, possibly_batched_index):
         43         if self.auto_collation:
    ---> 44             data = [self.dataset[idx] for idx in possibly_batched_index]
         45         else:
         46             data = self.dataset[possibly_batched_index]
    
    /usr/local/lib/python3.6/dist-packages/detecto/core.py in __getitem__(self, idx)
        196                         box[0, 0], box[0, 2] = box[0, (2, 0)]
        197                 else:
    --> 198                     image = t(image)
        199 
        200             # Scale down box if necessary
    
    /usr/local/lib/python3.6/dist-packages/torchvision/transforms/transforms.py in __call__(self, tensor)
        164             Tensor: Normalized Tensor image.
        165         """
    --> 166         return F.normalize(tensor, self.mean, self.std, self.inplace)
        167 
        168     def __repr__(self):
    
    /usr/local/lib/python3.6/dist-packages/torchvision/transforms/functional.py in normalize(tensor, mean, std, inplace)
        206     if std.ndim == 1:
        207         std = std[:, None, None]
    --> 208     tensor.sub_(mean).div_(std)
        209     return tensor
        210 
    
    RuntimeError: The size of tensor a (4) must match the size of tensor b (3) at non-singleton dimension 0
    
    opened by karthick965938 16
  • Runtime error during model.predict() call

    Runtime error during model.predict() call

    Hi all, I have trained a detecto model to detect rectangles and squares in image. After training, when I call model.predict(image), the following runtime error occurs. Any guidance to get rid of this. Thanks I am using: python 3.6 torch: 1.5.0+cu101

    here is the call traceback

    3 image = utils.read_image('/color_matching/sample_photo/img.jpg')
    ----> 4 predictions = model.predict_top([image])
    

    /usr/local/lib/python3.6/dist-packages/torchvision/models/detection/_utils.py in decode(self, rel_codes, boxes)

    183         box_sum += val
    184         pred_boxes = self.decode_single(
    -->185             rel_codes.reshape(box_sum, -1), concat_boxes)
    186 
    187         return pred_boxes.reshape(box_sum, -1, 4)
    

    RuntimeError: cannot reshape tensor of 0 elements into shape [0, -1] because the unspecified dimension size -1 can be any value and is ambiguous.

        *:  The image has size(500, 500)
    
    opened by MrAhmadRaza 14
  • !_src.empty() in function 'cv::cvtColor'

    !_src.empty() in function 'cv::cvtColor'

    Describe the bug During training at model.fit(dataset) OpenCV gives an error cv2.error: OpenCV(4.3.0) C:\projects\opencv-python\opencv\modules\imgproc\src\color.cpp:182: error: (-215:Assertion failed) !_src.empty() in function 'cv::cvtColor'

    Code and Data

    from detecto import core, utils, visualize
    import matplotlib.pyplot as plt
    import os
    
    IMAGE_PATH = os.path.abspath(os.path.join(os.path.dirname( __file__ ), 'images'))
    LABEL_PATH = os.path.abspath(os.path.join(os.path.dirname( __file__ ), 'labels'))
    
    
    '''image = utils.read_image(os.path.join(IMAGE_PATH, 'image0.jpg'))
    plt.imshow(image)
    plt.show()'''
    
    
    
    # Images and XML files in separate folders
    dataset = core.Dataset(LABEL_PATH, IMAGE_PATH)
    
    image, target = dataset[0]
    print(image, target)
    
    model = core.Model(['bat', 'batter', 'pitch', 'field', 'player', 'scoreboard', 'stumps'])
    
    model.fit(dataset)
    
    
    # Specify the path to your image
    image = utils.read_image(os.path.join(IMAGE_PATH, 'image-3361.jpg'))
    predictions = model.predict(image)
    
    # predictions format: (labels, boxes, scores)
    labels, boxes, scores = predictions
    
    print(labels) 
    print(boxes)
    print(scores)
    

    It prints the image and target so we know it's reading in the dataset:

    tensor([[[-0.9705, -0.6794, -0.5596,  ..., -0.3541, -0.9534, -2.1008],
             [-0.9705, -0.6794, -0.5596,  ..., -0.3712, -0.9534, -2.1008],
             [-0.9705, -0.6794, -0.5596,  ..., -0.3541, -0.9363, -2.1008],
             ...,
             [-0.9192, -0.3369, -0.5253,  ..., -0.0801, -0.7650, -2.1008],
             [-0.9192, -0.3369, -0.5253,  ..., -0.0801, -0.7650, -2.1008],
             [-0.9192, -0.3369, -0.5253,  ..., -0.0801, -0.7650, -2.1008]],
    
            [[-0.3725, -0.0749,  0.0476,  ...,  0.1877, -0.4601, -1.6506],
             [-0.3725, -0.0749,  0.0476,  ...,  0.1702, -0.4601, -1.6506],
             [-0.3725, -0.0749,  0.0476,  ...,  0.1877, -0.4426, -1.6506],
             ...,
             [-0.3375,  0.2577,  0.0651,  ...,  0.3627, -0.3725, -1.7731],
             [-0.3375,  0.2577,  0.0651,  ...,  0.3627, -0.3725, -1.7731],
             [-0.3375,  0.2577,  0.0651,  ...,  0.3627, -0.3725, -1.7731]],
    
            [[-1.2119, -0.9156, -0.7936,  ..., -0.4275, -0.8633, -1.8044],
             [-1.2119, -0.9156, -0.7936,  ..., -0.4450, -0.8633, -1.8044],
             [-1.2119, -0.9156, -0.7936,  ..., -0.4275, -0.8458, -1.8044],
             ...,
             [-1.3513, -0.7587, -0.9504,  ..., -0.3230, -0.7761, -1.8044],
             [-1.3513, -0.7587, -0.9504,  ..., -0.3230, -0.7761, -1.8044],
             [-1.3513, -0.7587, -0.9504,  ..., -0.3230, -0.7761, -1.8044]]]) {'boxes': tensor([[795, 385, 835, 462]]), 'labels': 'bat'}
    

    Stacktrace

    Traceback (most recent call last):
      File "potato.py", line 23, in <module>
        model.fit(dataset)
      File "C:\Users\julis\AppData\Local\Programs\Python\Python38\lib\site-packages\detecto\core.py", line 468, in fit
        for images, targets in dataset:
      File "C:\Users\julis\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\utils\data\dataloader.py", line 363, in __next__
        data = self._next_data()
      File "C:\Users\julis\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\utils\data\dataloader.py", line 403, in _next_data
        data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
      File "C:\Users\julis\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\utils\data\_utils\fetch.py", line 44, in fetch
        data = [self.dataset[idx] for idx in possibly_batched_index]
      File "C:\Users\julis\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\utils\data\_utils\fetch.py", line 44, in <listcomp>
        data = [self.dataset[idx] for idx in possibly_batched_index]
      File "C:\Users\julis\AppData\Local\Programs\Python\Python38\lib\site-packages\detecto\core.py", line 149, in __getitem__
        image = read_image(img_name)
      File "C:\Users\julis\AppData\Local\Programs\Python\Python38\lib\site-packages\detecto\utils.py", line 134, in read_image
        return cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    cv2.error: OpenCV(4.3.0) C:\projects\opencv-python\opencv\modules\imgproc\src\color.cpp:182: error: (-215:Assertion failed) !_src.empty() in function 'cv::cvtColor'
    

    After some research this is when OpenCV can't find the image but I don't understand how this is possible since it prints the first item in the dataset list

    Environment:

    • OS: Windows 10
    • Python version: 3.8
    • Detecto version: latest
    • torch version: 1.5.0
    • torchvision version: 0.6.0

    Additional context Image name is : 'image-3361.jpg' Label name is: 'image-3361.xml'

    bug 
    opened by simeon9696 13
  • AttributeError: 'Model' object has no attribute 'state_dict'

    AttributeError: 'Model' object has no attribute 'state_dict'

    While saving the best weights, I am facing this error.

    torch.save(model.state_dict, 'checkpoint.pth')

    Environment: Google Colab

    • torch version: 1.7
    • torchvision version: 0.6.1 Could you please suggest a corrective measure for this error!
    bug 
    opened by kiraans 9
  • Not detecting the custom object

    Not detecting the custom object

    have trained the model on about 126 images, 30% images contain two objects in a image

    from detecto import core, utils, visualize
    
    
    dataset = core.Dataset('../custom_dataset/')
    model = core.Model(['stamp'])
    
    model.fit(dataset)
    model.save('model_weights_3.pth')
    print(f'[INFO] Model Saved successfully!')
    image = utils.read_image('../custom_dataset/page-15.jpg')
    predictions = model.predict(image)
    print(predictions)
    

    Output:

    /usr/local/lib/python3.6/dist-packages/torch/nn/functional.py:2854: UserWarning: The default behavior for interpolate/upsample with float scale_factor will change in 1.6.0 to align with other frameworks/libraries, and use scale_factor directly, instead of relying on the computed output size. If you wish to keep the old behavior, please set recompute_scale_factor=True. See the documentation of nn.Upsample for details. 
      warnings.warn("The default behavior for interpolate/upsample with float scale_factor will change "
    [INFO] Model Saved successfully!
    ([], tensor([], size=(0, 4)), tensor([]))
    
    opened by prasad01dalavi 9
  • ValueError: some of the strides of a given numpy array are negative. This is currently not supported, but will be added in future releases.

    ValueError: some of the strides of a given numpy array are negative. This is currently not supported, but will be added in future releases.

    Hi,

    I try to train a custom model on my machine and i get this error : ValueError: some of the strides of a given numpy array are negative. This is currently not supported, but will be added in future releases. I research a little bit and i saw that was an issue on Pytorch, but do you have any clue to fix that on detecto ??

    My environment : Windows python 3.7 GPU: GeForce GTX 1050

    nvidia-smi
    +-----------------------------------------------------------------------------+
    | NVIDIA-SMI 441.45       Driver Version: 441.45       CUDA Version: 10.2
    

    Thank you

    opened by PLPote 8
  • RuntimeError: Error(s) in loading state_dict for FasterRCNN

    RuntimeError: Error(s) in loading state_dict for FasterRCNN

    I've saved a model using model.save('model_weights.pth') after a successful run.

    When now attempting to load the model using the code;

    from detecto.core import Model
    
    model = Model.load('model_weights.pth', ['aboriginalflags'])
    

    I get the error

    Traceback (most recent call last):
      File "show.py", line 5, in <module>
        model = Model.load('model_weights.pth', ['aboriginalflags'])
      File "/usr/local/lib/python3.8/dist-packages/detecto/core.py", line 566, in load
        model._model.load_state_dict(torch.load(file, map_location=model._device))
      File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1044, in load_state_dict
        raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
    RuntimeError: Error(s) in loading state_dict for FasterRCNN:
    	size mismatch for roi_heads.box_predictor.cls_score.weight: copying a param with shape torch.Size([3, 1024]) from checkpoint, the shape in current model is torch.Size([2, 1024]).
    	size mismatch for roi_heads.box_predictor.cls_score.bias: copying a param with shape torch.Size([3]) from checkpoint, the shape in current model is torch.Size([2]).
    	size mismatch for roi_heads.box_predictor.bbox_pred.weight: copying a param with shape torch.Size([12, 1024]) from checkpoint, the shape in current model is torch.Size([8, 1024]).
    	size mismatch for roi_heads.box_predictor.bbox_pred.bias: copying a param with shape torch.Size([12]) from checkpoint, the shape in current model is torch.Size([8]).
    

    Ubuntu 20

    (loving the lib, great work)

    bug 
    opened by thomasdavis 7
  • Bug in bounding box display?

    Bug in bounding box display?

    Describe the bug It looks like bounding boxes are not displayed correctly in all circumstances. In my case, a sheared image generated by Roboflow, with adjusted bounding boxes in VOC XML format, appear incorrectly when displayed with Detecto; the same bounding boxes do appear correctly when displayed with imgaug.

    Code and Data I've created a notebook demonstrating the issue here; the repo has the source image and VOC XML file.

    Detecto's version: image

    The colours make it a bit hard to see what's going on, but:

    • There's a mixing bowl centered around x=200, y=100, but the mixing bowl box apears to be off; the lower right-hand corner of the box and its label are at approximately x=150, y=250.

    • There's a plate oriented horizontally at approximately x=200 to 300, y=200; but its bounding box is oriented vertically, with its lower right-hand corner at approximately x=200, y=325.

    • There are a couple bowls at approximately x=100, y=150 to 250. The bounding boxes are a little unclear to me, so I'll leave them out of this discussion.

    Imgaug's version:

    image

    • The mixing bowl box is oriented correctly and centered on the bowl itself.
    • Same for the plate box.
    • The bowl boxes are also correctly oriented and centered.

    Environment:

    • OS: Linux, Fedora 33
    • Python version: 3.9.1
    • Detecto version: 1.2.0
    • torch version: 1.7.1
    • torchvision version: 0.8.2

    Additional context The notebook and image files can be found at https://github.com/saintaardvark/detecto_bug/.

    Thanks for a great library, and please let me know if you need any further info.

    bug 
    opened by saintaardvark 6
  • jit.trace() not working

    jit.trace() not working

    I am using detecto model which is trained on custom data set, but the problem is when i try to use that model with jit.trace it throws an error.

    Code:

    from detecto.core import Model import torch

    model = Model.load(model_path, classes)

    example = torch.rand(1, 3, 224, 224) model_trace = torch.jit.trace(model, example)

    Error:

    Traceback (most recent call last): File “c:/Users/shehr/Desktop/Android Project/Android_Model.py”, line 21, in model_trace = torch.jit.script(model, example) File “C:\Users\shehr\Anaconda3\envs\tf_gpu\lib\site-packages\torch\jit_init_.py”, line 1257, in script qualified_name = _qualified_name(obj) File “C:\Users\shehr\Anaconda3\envs\tf_gpu\lib\site-packages\torch_jit_internal.py”, line 682, in _qualified_name name = obj.name AttributeError: ‘Model’ object has no attribute ‘name’

    Any idea why m getting this ? Any help will be appreciated.

    opened by shehroz010 6
  • IndexError: single positional indexer is out-of-bounds

    IndexError: single positional indexer is out-of-bounds

    Describe the bug Trying to train a custom object detector when I get the error listed in the title. I think it's because it's not reading in my folders with the images and labels? I've tried'labels' ,'/labels' and '/labels/'

    Code and Data

    from detecto import core, utils, visualize
    
    # Images and XML files in separate folders
    dataset = core.Dataset('labels/', 'images/')
    
    image, target = dataset[0]
    print(image, target)
    
    model = core.Model(['bat', 'batter', 'pitch', 'field', 'player', 'scoreboard'])
    
    model.fit(dataset)
    
    
    # Specify the path to your image
    image = utils.read_image('images/image0.jpg')
    predictions = model.predict(image)
    
    # predictions format: (labels, boxes, scores)
    labels, boxes, scores = predictions
    
    print(labels) 
    print(boxes)
    print(scores)
    

    Stacktrace

    Traceback (most recent call last):
      File "c:/Users/julis/Documents/ap-cricket/functions/train.py", line 9, in <module>
        image, target = dataset[0]
      File "C:\Users\julis\AppData\Local\Programs\Python\Python38\lib\site-packages\detecto\core.py", line 148, in __getitem__
        img_name = os.path.join(self._root_dir, self._csv.iloc[idx, 0])
      File "C:\Users\julis\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\indexing.py", line 873, in __getitem__
        return self._getitem_tuple(key)
      File "C:\Users\julis\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\indexing.py", line 1443, in _getitem_tuple
        self._has_valid_tuple(tup)
      File "C:\Users\julis\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\indexing.py", line 702, in _has_valid_tuple
        self._validate_key(k, i)
      File "C:\Users\julis\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\indexing.py", line 1352, in _validate_key
        self._validate_integer(key, axis)
      File "C:\Users\julis\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\indexing.py", line 1437, in _validate_integer
        raise IndexError("single positional indexer is out-of-bounds")
    IndexError: single positional indexer is out-of-bounds
    

    Environment:

    • OS: Windows 10
    • Python version: 3.8
    • Detecto version:
    • torch version: 1.5.0
    • torchvision version : 0.6.0

    Additional context Image name is : 'image0.jpg' Label name is: 'image0.xml'

    bug 
    opened by simeon9696 6
  • Support for multiple objects in single image

    Support for multiple objects in single image

    FasterRCNN supports training with multiple objects per image (according to following)

    From: https://pytorch.org/docs/stable/torchvision/models.html#object-detection-instance-segmentation-and-person-keypoint-detection

    During training, the model expects both the input tensors, as well as a targets (list of dictionary), containing:

    • boxes (FloatTensor[N, 4]): the ground-truth boxes in [x1, y1, x2, y2] format, with values of x between 0 and W and values of y between 0 and H
    • labels (Int64Tensor[N]): the class label for each ground-truth box

    Is this supported by detecto? I have only been able to pass in {"boxes": Tensor(1, 4), "labels": str}

    Would you be open to a pull request fixing this? It looks like the only place where there is an issue is in core.Model

    class Model:
    ...
        # Converts all string labels in a list of target dicts to
        # their corresponding int mappings
        def _convert_to_int_labels(self, targets):
            for target in targets:
                # Convert string labels to integer mapping
                if _is_iterable(target["labels"]):
                    target["labels"] = torch.tensor(self._int_mapping[label] for label in target["labels"])
                else:
                    target["labels"] = torch.tensor(self._int_mapping[target["labels"]]).view(1)
    

    This would now accept target["labels"] = "one_object" or target["labels"] = ["obj1_class", "obj2_class"] and would still return a tensor of length equal to the number of objects.

    opened by dereklukacs 6
  • Please include RandomVerticalFlip transformation in Dataset.__getitem__ function

    Please include RandomVerticalFlip transformation in Dataset.__getitem__ function

    Hi, There is already a piece of code that intercepts the RandomHorizontalFlip and adjusts the bounding "boxes" accordingly. Please may you implement the same for RandomVerticalFlip?

    Like so:

    elif isinstance(t, transforms.RandomVerticalFlip): if random.random() < random_vertical_flip: image = transforms.RandomVerticalFlip(1)(image) for idx, box in enumerate(targets['boxes']): # Flip box's y-coordinates box[1] = height - box[1] box[3] = height - box[3] box[[1,3]] = box[[3,1]] targets['boxes'][idx] = box

    enhancement 
    opened by iancoetzer 1
  • The `load()` Method Doesn't Honor the Type of Model (`model_name`)

    The `load()` Method Doesn't Honor the Type of Model (`model_name`)

    The line which loads the weights (https://github.com/alankbi/detecto/blob/c57b5fe3037bf8bc077a41d74c769dd5b164b994/detecto/core.py#L616) doesn't honor the current model.

    It creates a new model which means the user can not use non default net to lad data.

    bug 
    opened by RoyiAvital 0
  • Train on empty annotation (without objects)

    Train on empty annotation (without objects)

    I noticed that detecto does not train on images, where there are no object/s in the annotation.

    My problem is, that my trained model detects objects with high confidence in completely different testing images, although there is nothing that is should detect.

    I need to train on images with no objects in the annotation, so that it can learn not to detect something, when there is nothing.

    enhancement 
    opened by trueToastedCode 0
  • Any constraints on Input Image Size for Custom Object Training

    Any constraints on Input Image Size for Custom Object Training

    I am able to train on the given sample dataset of dog. But not on my data is a balance sheet and income statement (will be having classes as table and cell) which has the size of (1125, 2000, 3), not able to share the image here because of confidentiality. Is there any restriction for image shape which model does not allow us to train?

    When I run the training I do not get any error. But output is empty array and when checked for loss it is nan

    I have tried with both way i.e. with csv file and without csv file where I was able to get output for dog but not for my document image

    bug 
    opened by prasad01dalavi 2
  • Read folder

    Read folder

    I added another helper function that just reads the given folder as a generator. I needed this type of functionality for something I'm doing so I figured I'd write a PR for it here.

    opened by HallowedDust5 0
Releases(v1.2.2)
  • v1.2.2(Feb 2, 2022)

    List of changes:

    • Add a new model_name parameter to core.Model (#94)
      • The default model remains the same ("fasterrcnn_resnet50_fpn", or Model.DEFAULT)
      • New options include the following:
        • "fasterrcnn_mobilenet_v3_large_fpn" (Model.MOBILENET)
        • "fasterrcnn_mobilenet_v3_large_320_fpn" (Model.MOBILENET_320)
    • Fix occasional bug with drawing bounding boxes for the detect_live function (#101)
    Source code(tar.gz)
    Source code(zip)
  • v1.2.1(Mar 8, 2021)

    List of changes:

    • Allow support for bounding box edge values to be provided in any order (#66)
      • Before, only xmin, ymin, xmax, ymax was supported in the XML files
    • Add pretrained flag to core.Model to specify whether to use PyTorch's pretrained weights (#71)
    Source code(tar.gz)
    Source code(zip)
  • v1.2.0(Oct 24, 2020)

    A few features and bug fixes are included in this release that update the behavior of the package. Most users should remain unaffected and not experience any noticeable performance differences or bugs; however, please take a look at the following change list to note everything that has been changed:

    • Change Dataset to group objects by image when indexing (#40, #60, #62)
      • Previously, if an image had multiple labeled objects in it, the Dataset would treat each object as a separate row/index
      • The dimensions of boxes and labels in the targets dict are now (N, 4) and N, respectively, where N is the number of labeled objects in each image (labels has changed from a string to a list of strings)
      • The xml_to_csv function now has an additional column named image_id
    • Add more detailed verbose messages on calls to model.fit (#53)
      • Now, the verbose flag is set to True
      • A warning is given if the user is trying to fit the model on a CPU
      • Training and validation progress is shown for each epoch
    • Pin torch and torchvision dependency versions to 1.6.0 and 0.7.0, respectively (#52)

    To prevent these changes from affecting existing codebases, users can lock their Detecto version to anything below 1.2.0 to ensure no breaking changes.

    Source code(tar.gz)
    Source code(zip)
  • v1.1.6(Aug 20, 2020)

  • v1.1.5(Aug 15, 2020)

  • v1.1.4(May 29, 2020)

  • v1.1.3(Mar 26, 2020)

  • v1.1.2(Feb 14, 2020)

    Given that custom-trained Detecto models often output predictions with scores of 0.6 or 0.7, the default score_filter parameter for visualize_detect_video and visualize.plot_prediction_grid has been lowered to 0.6.

    To prevent this change in behavior, you can explicitly set score_filter=0.8 as follows:

    from detecto import visualize
    
    visualize.detect_video(model, 'input_vid.mp4', 'output_vid.avi', score_filter=0.8)
    visualize.plot_prediction_grid(model, images, score_filter=0.8)
    
    Source code(tar.gz)
    Source code(zip)
  • v1.1.1(Jan 29, 2020)

    Summary of changes (coming from pull request #19):

    • Changes behavior of detect_video and plot_prediction_grid to show all predictions above a certain threshold (optional parameter) rather than the top prediction per class
    • Allows show_labeled_image to accept labels to display along with boxes
    • Fixes bug when running detect_video on a default model

    All changes are non-breaking.

    Source code(tar.gz)
    Source code(zip)
  • v1.1.0(Jan 29, 2020)

    Major updates to the core module make it even easier to use Detecto. In addition, a new utils.split_video function helps users generate image data from video footage.

    • Split video function #12
    • Increased flexibility of classes in core module #15
    • Support for default pre-trained model #18

    Note: for using the default model, it's recommended to wait until v1.0.1 is released, which will fix a few minor bugs and improve existing visualization methods to better support the default model's predictions.

    Source code(tar.gz)
    Source code(zip)
  • v1.0.2(Jan 20, 2020)

    • Removes Pillow version dependency #8
    • Adds automatic reverse normalization to visualize.show_labeled_image #7
    • Fixes excessive memory usage build error for Read the Docs
    Source code(tar.gz)
    Source code(zip)
Owner
Alan Bi
Co-founder of @teamscode. Past intern at Fidelity and Expedia. Making Detecto. Duke University '23
Alan Bi
Cross-Image Region Mining with Region Prototypical Network for Weakly Supervised Segmentation

Cross-Image Region Mining with Region Prototypical Network for Weakly Supervised Segmentation The code of: Cross-Image Region Mining with Region Proto

LiuWeide 16 Nov 26, 2022
HiFT: Hierarchical Feature Transformer for Aerial Tracking (ICCV2021)

HiFT: Hierarchical Feature Transformer for Aerial Tracking Ziang Cao, Changhong Fu, Junjie Ye, Bowen Li, and Yiming Li Our paper is Accepted by ICCV 2

Intelligent Vision for Robotics in Complex Environment 55 Nov 23, 2022
For visualizing the dair-v2x-i dataset

3D Detection & Tracking Viewer The project is based on hailanyi/3D-Detection-Tracking-Viewer and is modified, you can find the original version of the

34 Dec 29, 2022
MoveNetを用いたPythonでの姿勢推定のデモ

MoveNet-Python-Example MoveNetのPythonでの動作サンプルです。 ONNXに変換したモデルも同梱しています。変換自体を試したい方はMoveNet_tf2onnx.ipynbを使用ください。 2021/08/24時点でTensorFlow Hubで提供されている以下モデ

KazuhitoTakahashi 38 Dec 17, 2022
Laplace Redux -- Effortless Bayesian Deep Learning

Laplace Redux - Effortless Bayesian Deep Learning This repository contains the code to run the experiments for the paper Laplace Redux - Effortless Ba

Runa Eschenhagen 28 Dec 07, 2022
GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond

GCNet for Object Detection By Yue Cao, Jiarui Xu, Stephen Lin, Fangyun Wei, Han Hu. This repo is a official implementation of "GCNet: Non-local Networ

Jerry Jiarui XU 1.1k Dec 29, 2022
my graduation project is about live human face augmentation by projection mapping by using CNN

Live-human-face-expression-augmentation-by-projection my graduation project is about live human face augmentation by projection mapping by using CNN o

1 Mar 08, 2022
Existing Literature about Machine Unlearning

Machine Unlearning Papers 2021 Brophy and Lowd. Machine Unlearning for Random Forests. In ICML 2021. Bourtoule et al. Machine Unlearning. In IEEE Symp

Jonathan Brophy 213 Jan 08, 2023
Spline is a tool that is capable of running locally as well as part of well known pipelines like Jenkins (Jenkinsfile), Travis CI (.travis.yml) or similar ones.

Welcome to spline - the pipeline tool Important note: Since change in my job I didn't had the chance to continue on this project. My main new project

Thomas Lehmann 29 Aug 22, 2022
PyTorch implementation for Score-Based Generative Modeling through Stochastic Differential Equations (ICLR 2021, Oral)

Score-Based Generative Modeling through Stochastic Differential Equations This repo contains a PyTorch implementation for the paper Score-Based Genera

Yang Song 757 Jan 04, 2023
Warning: This project does not have any current developer. See bellow.

Pylearn2: A machine learning research library Warning : This project does not have any current developer. We will continue to review pull requests and

Laboratoire d’Informatique des Systèmes Adaptatifs 2.7k Dec 26, 2022
Python Implementation of algorithms in Graph Mining, e.g., Recommendation, Collaborative Filtering, Community Detection, Spectral Clustering, Modularity Maximization, co-authorship networks.

Graph Mining Author: Jiayi Chen Time: April 2021 Implemented Algorithms: Network: Scrabing Data, Network Construbtion and Network Measurement (e.g., P

Jiayi Chen 3 Mar 03, 2022
A Repository of Community-Driven Natural Instructions

A Repository of Community-Driven Natural Instructions TLDR; this repository maintains a community effort to create a large collection of tasks and the

AI2 244 Jan 04, 2023
GANfolk: Using AI to create portraits of fictional people to sell as NFTs

GANfolk are AI-generated renderings of fictional people. Each image in the collection was created by a pair of Generative Adversarial Networks (GANs) with names and backstories also created with AI.

Robert A. Gonsalves 32 Dec 02, 2022
Fast sparse deep learning on CPUs

SPARSEDNN **If you want to use this repo, please send me an email: [email pro

Ziheng Wang 44 Nov 30, 2022
FaceAnon - Anonymize people in images and videos using yolov5-crowdhuman

Face Anonymizer Blur faces from image and video files in /input/ folder. Require

22 Nov 03, 2022
Deep ViT Features as Dense Visual Descriptors

dino-vit-features [paper] [project page] Official implementation of the paper "Deep ViT Features as Dense Visual Descriptors". We demonstrate the effe

Shir Amir 113 Dec 24, 2022
HyperLib: Deep learning in the Hyperbolic space

HyperLib: Deep learning in the Hyperbolic space Background This library implements common Neural Network components in the hypberbolic space (using th

105 Dec 25, 2022
This repository contains several image-to-image translation models, whcih were tested for RGB to NIR image generation. The models are Pix2Pix, Pix2PixHD, CycleGAN and PointWise.

RGB2NIR_Experimental This repository contains several image-to-image translation models, whcih were tested for RGB to NIR image generation. The models

5 Jan 04, 2023
Angle data is a simple data type.

angledat Angle data is a simple data type. Installing + using Put angledat.py in the main dir of your project. Import it and use. Comments Comments st

1 Jan 05, 2022