Detectron2 is FAIR's next-generation platform for object detection and segmentation.

Overview

Detectron2 is Facebook AI Research's next generation software system that implements state-of-the-art object detection algorithms. It is a ground-up rewrite of the previous version, Detectron, and it originates from maskrcnn-benchmark.

What's New

  • It is powered by the PyTorch deep learning framework.
  • Includes more features such as panoptic segmentation, Densepose, Cascade R-CNN, rotated bounding boxes, PointRend, DeepLab, etc.
  • Can be used as a library to support different projects on top of it. We'll open source more research projects in this way.
  • It trains much faster.
  • Models can be exported to TorchScript format or Caffe2 format for deployment.

See our blog post to see more demos and learn about detectron2.

Installation

See INSTALL.md.

Getting Started

Follow the installation instructions to install detectron2.

See Getting Started with Detectron2, and the Colab Notebook to learn about basic usage.

Learn more at our documentation. And see projects/ for some projects that are built on top of detectron2.

Model Zoo and Baselines

We provide a large set of baseline results and trained models available for download in the Detectron2 Model Zoo.

License

Detectron2 is released under the Apache 2.0 license.

Citing Detectron2

If you use Detectron2 in your research or wish to refer to the baseline results published in the Model Zoo, please use the following BibTeX entry.

@misc{wu2019detectron2,
  author =       {Yuxin Wu and Alexander Kirillov and Francisco Massa and
                  Wan-Yen Lo and Ross Girshick},
  title =        {Detectron2},
  howpublished = {\url{https://github.com/facebookresearch/detectron2}},
  year =         {2019}
}
Comments
  • Add support for ONNX-only and Caffe2 ONNX export

    Add support for ONNX-only and Caffe2 ONNX export

    Summary of changes

    This PR fixes both ONNX-only and Caffe2 ONNX exporters for the latest versions of this repo and PyTorch.

    For ONNX-only, the main issue is that add_export_config(cfg) is not exposed when Caffe2 is not compiled along with PyTorch, but for ONNX-only scenarios, such dependency is not needed. Therefore, add_export_config is moved from detectron2/export/api.py to detectron2/export/__init__.py

    A second contribution is a new test_export_onnx.py test file that export almost the same models as the test_export_tracing.py tests.

    For the Caffe2-ONNX, the main issue was a dependency on ONNX optimizer pass which is deprecated in newer ONNX versions. This PR removes such dependency because fuse_bn_into_conv optimization pass is already performed by torch.onnx.export anyway.

    Fixes https://github.com/facebookresearch/detectron2/issues/3488 Fixes https://github.com/pytorch/pytorch/issues/69674 (PyTorch repo)

    CLA Signed 
    opened by thiagocrepaldi 75
  • How do I compute validation loss during training?

    How do I compute validation loss during training?

    How do I compute validation loss during training?

    I'm trying to compute the loss on a validation dataset for each iteration during training. To do so, I've created my own hook:

    class ValidationLoss(detectron2.engine.HookBase):
        def __init__(self, config, dataset_name):
            super(ValidationLoss, self).__init__()
            self._loader = detectron2.data.build_detection_test_loader(config, dataset_name)
            
        def after_step(self):
            for batch in self._loader:
                loss = self.trainer.model(batch)
                log.debug(f"validation loss: {loss}")
    

    ... which I register with a DefaultTrainer. The hook code is called during training, but fails with the following:

    INFO:detectron2.engine.train_loop:Starting training from iteration 0
    ERROR:detectron2.engine.train_loop:Exception during training:
    Traceback (most recent call last):
      File "/ascldap/users/tshead/miniconda3/lib/python3.7/site-packages/detectron2/engine/train_loop.py", line 133, in train
        self.after_step()
      File "/ascldap/users/tshead/miniconda3/lib/python3.7/site-packages/detectron2/engine/train_loop.py", line 153, in after_step
        h.after_step()
      File "<ipython-input-6-63b308743b7d>", line 8, in after_step
        loss = self.trainer.model(batch)
      File "/ascldap/users/tshead/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
        result = self.forward(*input, **kwargs)
      File "/ascldap/users/tshead/miniconda3/lib/python3.7/site-packages/detectron2/modeling/meta_arch/rcnn.py", line 123, in forward
        proposals, proposal_losses = self.proposal_generator(images, features, gt_instances)
      File "/ascldap/users/tshead/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
        result = self.forward(*input, **kwargs)
      File "/ascldap/users/tshead/miniconda3/lib/python3.7/site-packages/detectron2/modeling/proposal_generator/rpn.py", line 164, in forward
        losses = {k: v * self.loss_weight for k, v in outputs.losses().items()}
      File "/ascldap/users/tshead/miniconda3/lib/python3.7/site-packages/detectron2/modeling/proposal_generator/rpn_outputs.py", line 322, in losses
        gt_objectness_logits, gt_anchor_deltas = self._get_ground_truth()
      File "/ascldap/users/tshead/miniconda3/lib/python3.7/site-packages/detectron2/modeling/proposal_generator/rpn_outputs.py", line 262, in _get_ground_truth
        for image_size_i, anchors_i, gt_boxes_i in zip(self.image_sizes, anchors, self.gt_boxes):
    TypeError: zip argument #3 must support iteration
    INFO:detectron2.engine.hooks:Total training time: 0:00:00 (0:00:00 on hooks)
    

    The traceback seems to imply that ground truth data is missing, which made me think that the data loader was the problem. However, switching to a training loader produces a different error:

    class ValidationLoss(detectron2.engine.HookBase):
        def __init__(self, config, dataset_name):
            super(ValidationLoss, self).__init__()
            self._loader = detectron2.data.build_detection_train_loader(config, dataset_name)
            
        def after_step(self):
            for batch in self._loader:
                loss = self.trainer.model(batch)
                log.debug(f"validation loss: {loss}")
    
    INFO:detectron2.engine.train_loop:Starting training from iteration 0
    ERROR:detectron2.engine.train_loop:Exception during training:
    Traceback (most recent call last):
      File "/ascldap/users/tshead/miniconda3/lib/python3.7/site-packages/detectron2/engine/train_loop.py", line 133, in train
        self.after_step()
      File "/ascldap/users/tshead/miniconda3/lib/python3.7/site-packages/detectron2/engine/train_loop.py", line 153, in after_step
        h.after_step()
      File "<ipython-input-6-e0d2c509cc72>", line 7, in after_step
        for batch in self._loader:
      File "/ascldap/users/tshead/miniconda3/lib/python3.7/site-packages/detectron2/data/common.py", line 109, in __iter__
        for d in self.dataset:
      File "/ascldap/users/tshead/miniconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 345, in __next__
        data = self._next_data()
      File "/ascldap/users/tshead/miniconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 856, in _next_data
        return self._process_data(data)
      File "/ascldap/users/tshead/miniconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 881, in _process_data
        data.reraise()
      File "/ascldap/users/tshead/miniconda3/lib/python3.7/site-packages/torch/_utils.py", line 394, in reraise
        raise self.exc_type(msg)
    TypeError: Caught TypeError in DataLoader worker process 0.
    Original Traceback (most recent call last):
      File "/ascldap/users/tshead/miniconda3/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop
        data = fetcher.fetch(index)
      File "/ascldap/users/tshead/miniconda3/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
        data = [self.dataset[idx] for idx in possibly_batched_index]
      File "/ascldap/users/tshead/miniconda3/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
        data = [self.dataset[idx] for idx in possibly_batched_index]
      File "/ascldap/users/tshead/miniconda3/lib/python3.7/site-packages/detectron2/data/common.py", line 39, in __getitem__
        data = self._map_func(self._dataset[cur_idx])
      File "/ascldap/users/tshead/miniconda3/lib/python3.7/site-packages/detectron2/utils/serialize.py", line 23, in __call__
        return self._obj(*args, **kwargs)
    TypeError: 'str' object is not callable
    
    INFO:detectron2.engine.hooks:Total training time: 0:00:00 (0:00:00 on hooks)
    

    As a sanity check, inference works just fine:

    class ValidationLoss(detectron2.engine.HookBase):
        def __init__(self, config, dataset_name):
            super(ValidationLoss, self).__init__()
            self._loader = detectron2.data.build_detection_test_loader(config, dataset_name)
            
        def after_step(self):
            for batch in self._loader:
                with detectron2.evaluation.inference_context(self.trainer.model):
                    loss = self.trainer.model(batch)
                    log.debug(f"validation loss: {loss}")
    
    INFO:detectron2.engine.train_loop:Starting training from iteration 0
    DEBUG:root:validation loss: [{'instances': Instances(num_instances=100, image_height=720, image_width=720, fields=[pred_boxes = Boxes(tensor([[4.4867e+02, 1.9488e+02, 5.1496e+02, 3.9878e+02],
            [4.2163e+02, 1.1204e+02, 6.1118e+02, 5.5378e+02],
            [8.7323e-01, 3.0374e+02, 9.2917e+01, 3.8698e+02],
            [4.3202e+02, 2.0296e+02, 5.7938e+02, 3.6817e+02],
            ...
    

    ... but that isn't what I want, of course. Any thoughts?

    Thanks in advance, Tim

    opened by tshead2 35
  • How to detect only one class (person) from the coco pre trained model

    How to detect only one class (person) from the coco pre trained model

    Can any one tell me how to select only one class which is 'person' in my case from coco data for instance segmentation. Pre trained model. (mask_rcnn_R_50_FPN_3x.yaml)

    I want to detect 'person' from the given image only.

    opened by anki92 34
  • onnx model exportable support

    onnx model exportable support

    Since onnx provides almost all ops needs by maskrcnn, it would be great if model can exported to onnx and would be benefit more from TensorRT acceleration for these large models.

    enhancement 
    opened by jinfagang 34
  • Improve documentation concerning the new config files

    Improve documentation concerning the new config files

    📚 Documentation Improvements

    In short

    Concerning: https://detectron2.readthedocs.io/en/latest/tutorials/configs.html Problem: Documentation does not seem to have been updated to reflect the new config files (.py rather than .yaml) Solution: Update the documentation

    Problem description

    FAIR recently published new Mask R-CNN baselines and this was my first introduction to the new config file that no longer relies on YAML files but on 'raw' .py files. I am trying to load the new baselines using the config files mentioned in the MODEL_ZOO (see this table). For example:

    from detectron2 import model_zoo
    model = model_zoo.get("new_baselines/mask_rcnn_regnetx_4gf_dds_FPN_400ep_LSJ.py", trained=True)
    

    This gives

    RuntimeError: new_baselines/mask_rcnn_regnetx_4gf_dds_FPN_400ep_LSJ not available in Model Zoo!
    

    I have installed Detectron2 using the installation instructions. When looking up the documentation on configs, it seems that this has not been updated to reflect the new configs and still solely mentions YAML files.

    Proposed solution

    It could be that the CONFIG_PATH_TO_URL_SUFFIX dictionary in _ModelZooUrls class still has to be updated and that this is actually a bug (see here), but I find it hard to estimate wheter this is meant behavior (i.e. the new config file should be loaded differently) or a bug due to my limited understanding of the new config files. Either way, I therefore feel like the documentation on readthedocs should be updated to reflect the change from .yaml to .py.

    documentation 
    opened by orbiskcw 33
  • Add support for ONNX-only

    Add support for ONNX-only

    This PR is composed of different fixes to enable and end-to-end ONNX export functionality for detectron2 models

    • add_export_config API is publicly available exposed even when caffe2 is not compiled along with PyTorch (that is the new default behavior on latest PyTorch). A warning message informing users about its deprecation on future versions is also added

    • tensor.shape[0] replaces len(tensor) and for idx, img in enumerate(tensors) replaces for tmp_var1, tmp_var2 in zip(tensors, batched_imgs) so that the tracer does not lose reference to the user input on the graphs.

      • Before the changes above, the graph (see below) does not have an actual input. Instead, the input is exported as a model weight image
      • After the fix, the user images are properly acknowledged as model's input (see below) during ONNX export image
    • Added unit tests (tests/torch_export_onnx.py) for detectron2 models

    • ONNX is added as dependency for the CI to be able to run the aforementioned tests

    • Added custom symbolic functions to allow CI pipelines to succeed. The symbolics are needed because PyTorch 1.8, 1.9 and 1.10 adopted by detectron2 have several bugs. They can be removed when 1.11+ is adopted by detectron2's CI infra

    Fixes https://github.com/facebookresearch/detectron2/issues/3488 Fixes https://github.com/pytorch/pytorch/issues/69674 (PyTorch repo)

    CLA Signed 
    opened by thiagocrepaldi 32
  • Added docker compose file with useful tweaks.

    Added docker compose file with useful tweaks.

    From my perspective a docker-compose file has several benefits. On the one hand, it increases the comfort and on the other hand it is a way to supply users with useful tweaks. The commited docker-compose file addresses several issues and tweaks:

    1. It fixes potential problems with dataloaders (see #384).
    2. It includes Multi-GPU and performance tweaks as suggested by NVIDIA (see https://docs.nvidia.com/deeplearning/frameworks/user-guide/index.html#caffeovr).
    3. It adds GUI support (see #379).
    4. It enables caching of downloaded models (see #382).
    5. It makes docker run with the UID of the host user.

    Of course, all this can be accomplished with a long docker command. However, a docker-compose file give a central place to gather all recommendations for running detectron2 with docker, without bloating the Dockerfile with comments.

    CLA Signed 
    opened by maxfrei750 32
  • Properly convert a Detectron2  model to ONNX for Deployment

    Properly convert a Detectron2 model to ONNX for Deployment

    Hello,

    I am trying to convert a Detectron2 model to ONNX format and make inference without use detectron2 dependence in inference stage.

    Even is possible to find some information about that here : https://detectron2.readthedocs.io/en/latest/tutorials/deployment.html The implementation of this task is constantly being updated and the information found in this documentation is not clear enough to carry out this task .

    Some one can help me with some Demo/Tutorial of how make it ?

    @thiagocrepaldi

    Some information:

    My model was trained using pre-trained weight from:

    'faster_rcnn_50': { 'model_path': 'COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml', 'weights_path': 'model_final_280758.pkl' },

    I have 4 classes.

    Of course now i have my our weight. My model was saved in .pth forrmat.

    I used my our dataset, with image ( .png )

    Code in Python

    documentation 
    opened by vitorbds 28
  • How to apply mask_rcnn segmentation on a balloon video ?

    How to apply mask_rcnn segmentation on a balloon video ?

    Hi, I am going through the google colab example tutorial.

    I am trying to apply mask_rcnn segmentation on a random youtube balloon-video instead of an balloon-image to detect balloon only (one class).

    How can I assign .yaml and .pkl files that were generated using images earlier in the tutorial to a random video? thanks

    I tried the foolowing but it didn't work. I think I am having trouble assign the trained config and model files.

    !cd detectron2_repo && python demo/demo.py --config-file configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml --video-input ../video-clip_b.mp4 --confidence-threshold 0.6 --output ../video-clip_b_testing1.mkv \
      --opts MODEL.WEIGHTS detectron2://COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x/137849600/model_final_f10217.pkl
    
    opened by gireeshkbogu 28
  • Add support for Caffe2 ONNX export

    Add support for Caffe2 ONNX export

    Currently all Caffe2 export tests (under tests/test_export_caffe2.py) fail because the latest onnx releases do not have onnx.optimizer submodule anymore (instead, a new module onnxoptimizer was created from it)

    However, fuse_bn_into_conv optimization previously implemented within onnx.optimizer is already performed by torch.onnx.export too ruing ONNX export. Therefore onnx.optimizer dependency can be safely removed from detectron2 code.

    Depends on pytorch/pytorch#75718 Fixes #3488 Fixes pytorch/pytorch#69674 (PyTorch repo)

    ps: Although Caffe2 support is/will be deprecated, this PR relies on the fact that contributions are welcome as stated at docs/tutorials/deployment.md

    CLA Signed 
    opened by thiagocrepaldi 27
  • AssertionError: Attribute 'thing_classes' in the metadata of 'coco_2017_train' cannot be set to a different value!

    AssertionError: Attribute 'thing_classes' in the metadata of 'coco_2017_train' cannot be set to a different value!

    ❓ Questions and Help

    Hi, I am trying to train on my dataset with just 4 classes. When I run it, I get an error as below: image The scripts are interlinked a lot and therefore a bit difficult to debug. How to resolve this?

    Thanks.

    opened by akshaygadipatil 24
  • Loading pre-trained model configuration from Python file

    Loading pre-trained model configuration from Python file

    📚 Documentation Issue

    I'm struggling to load the pre-trained model defined by new_baselines/mask_rcnn_R_101_FPN_400ep_LSJ.py. I've found relevant documentation here, here and issue #3225. However none of these clearly elucidate my error.

    I'm trying to load the configuration with:

    cfg = LazyConfig.load("detectron2/configs/new_baselines/mask_rcnn_R_101_FPN_400ep_LSJ.py")
    cfg = setup_cfg(args)
    

    This produces the following traceback:

    Traceback (most recent call last):
      File "quality_test.py", line 97, in <module>
        results_ls = get_person_seg_masks(img_path, model_family, model)
      File "detectron2_wrapper.py", line 107, in get_person_seg_masks
        cfg = setup_cfg(args)
      File "detectron2/demo/demo.py", line 29, in setup_cfg
        cfg.merge_from_file(args.config_file)
      File "/home/appuser/detectron2_repo/detectron2/config/config.py", line 46, in merge_from_file
        loaded_cfg = self.load_yaml_with_base(cfg_filename, allow_unsafe=allow_unsafe)
      File "/home/appuser/.local/lib/python3.8/site-packages/fvcore/common/config.py", line 61, in load_yaml_with_base
        cfg = yaml.safe_load(f)
      File "/home/appuser/.local/lib/python3.8/site-packages/yaml/__init__.py", line 125, in safe_load
        return load(stream, SafeLoader)
      File "/home/appuser/.local/lib/python3.8/site-packages/yaml/__init__.py", line 81, in load
        return loader.get_single_data()
      File "/home/appuser/.local/lib/python3.8/site-packages/yaml/constructor.py", line 49, in get_single_data
        node = self.get_single_node()
      File "/home/appuser/.local/lib/python3.8/site-packages/yaml/composer.py", line 39, in get_single_node
        if not self.check_event(StreamEndEvent):
      File "/home/appuser/.local/lib/python3.8/site-packages/yaml/parser.py", line 98, in check_event
        self.current_event = self.state()
      File "/home/appuser/.local/lib/python3.8/site-packages/yaml/parser.py", line 171, in parse_document_start
        raise ParserError(None, None,
    yaml.parser.ParserError: expected '<document start>', but found '<scalar>'
      in "detectron2/configs/new_baselines/mask_rcnn_R_101_FPN_400ep_LSJ.py", line 11, column 1
    
    documentation 
    opened by buckeye17 2
  • a problem with Deeplab for visualizing semantic segmentation

    a problem with Deeplab for visualizing semantic segmentation

    I am trying to implement semantic segmentation on Google Colab by the instructions of the Deeplab project of Detectron2 but when I want to visualize the segments on an image, I face a problem that I cannot solve it.

    ** "Instructions To Reproduce the Issue and Full Logs":** `!pip install pyyaml==5.1 !pip install exif==1.3.5 !pip install torch==1.8.1+cu111 torchvision==0.9.1+cu111 torchaudio==0.8.1 -f https://download.pytorch.org/whl/torch_stable.html !git clone --branch v0.6 https://github.com/facebookresearch/detectron2.git detectron2_repo !pip install -e detectron2_repo import detectron2 from detectron2.utils.logger import setup_logger setup_logger() import numpy as np import cv2 import torch from google.colab.patches import cv2_imshow from detectron2 import model_zoo from detectron2.engine import DefaultPredictor from detectron2.config import get_cfg from detectron2.utils.visualizer import Visualizer, ColorMode from detectron2.data import MetadataCatalog coco_metadata = MetadataCatalog.get("coco_2017_val") from detectron2.projects import point_rend from detectron2.projects import deeplab from detectron2.projects.deeplab import add_deeplab_config !pip install 'git+https://github.com/facebookresearch/[email protected]' im=cv2.imread("./aachen_000005_000019_leftImg8bit.png") cv2_imshow(im) from detectron2.projects.deeplab.build_solver import build_lr_scheduler from detectron2 import checkpoint from fvcore.common.checkpoint import Checkpointer cfg = get_cfg() deeplab.add_deeplab_config(cfg) cfg.load_yaml_with_base("detectron2_repo/projects/DeepLab/configs/Cityscapes-SemanticSegmentation/deeplab_v3_plus_R_103_os16_mg124_poly_90k_bs16.yaml") cfg.merge_from_file("detectron2_repo/projects/DeepLab/configs/Cityscapes-SemanticSegmentation/deeplab_v3_plus_R_103_os16_mg124_poly_90k_bs16.yaml") cfg.MODEL.WEIGHTS = "https://dl.fbaipublicfiles.com/detectron2/DeepLab/Cityscapes-SemanticSegmentation/deeplab_v3_plus_R_103_os16_mg124_poly_90k_bs16/28054032/model_final_a8a355.pkl" predictor = DefaultPredictor(cfg) outputs = predictor(im)

    viz1=Visualizer(im[:,:,::-1], MetadataCatalog.get(cfg.DATASETS.TRAIN[0]), scale=1.0, instance_mode=ColorMode.SEGMENTATION)

    output=viz1.draw_sem_seg(outputs["sem_seg"].to("cpu"))

    image2 = output.get_image()[:,:,::-1] cv2_imshow(image2)`

    The error that I faced is:


    TypeError Traceback (most recent call last) in 1 viz1=Visualizer(im[:,:,::-1], MetadataCatalog.get(cfg.DATASETS.TRAIN[0]), scale=1.0, instance_mode=ColorMode.SEGMENTATION) ----> 2 output=viz1.draw_sem_seg(outputs["sem_seg"].to("cpu"))

    /content/detectron2_repo/detectron2/utils/visualizer.py in draw_sem_seg(self, sem_seg, area_threshold, alpha) 449 if isinstance(sem_seg, torch.Tensor): 450 sem_seg = sem_seg.numpy() --> 451 labels, areas = np.unique(sem_seg, return_counts=True) 452 sorted_idxs = np.argsort(-areas).tolist() 453 labels = labels[sorted_idxs]

    TypeError: list indices must be integers or slices, not numpy.float32

    Expected behavior: I expected that I could draw semantic segmentation on the image.

    Environment

    2023-01-03 21:56:28 URL:https://raw.githubusercontent.com/facebookresearch/detectron2/main/detectron2/utils/collect_env.py [8391/8391] -> "collect_env.py" [1]


    sys.platform linux Python 3.8.16 (default, Dec 7 2022, 01:12:13) [GCC 7.5.0] numpy 1.21.6 detectron2 failed to import detectron2._C not built correctly: No module named 'detectron2' Compiler ($CXX) c++ (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 CUDA compiler Build cuda_11.2.r11.2/compiler.29618528_0 DETECTRON2_ENV_MODULE PyTorch 1.13.0+cu116 @/usr/local/lib/python3.8/dist-packages/torch PyTorch debug build False GPU available Yes GPU 0 Tesla T4 (arch=7.5) Driver version 460.32.03 CUDA_HOME /usr/local/cuda Pillow 7.1.2 torchvision 0.14.0+cu116 @/usr/local/lib/python3.8/dist-packages/torchvision torchvision arch flags 3.5, 5.0, 6.0, 7.0, 7.5, 8.0, 8.6 cv2 4.6.0


    PyTorch built with:

    • GCC 9.3
    • C++ Version: 201402
    • Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
    • Intel(R) MKL-DNN v2.6.0 (Git Hash 52b5f107dd9cf10910aaa19cb47f3abf9b349815)
    • OpenMP 201511 (a.k.a. OpenMP 4.5)
    • LAPACK is enabled (usually provided by MKL)
    • NNPACK is enabled
    • CPU capability usage: AVX2
    • CUDA Runtime 11.6
    • NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86
    • CuDNN 8.3.2 (built against CUDA 11.5)
    • Magma 2.6.1
    • Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.6, CUDNN_VERSION=8.3.2, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -fabi-version=11 -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wunused-local-typedefs -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.13.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF,
    opened by SAAZI4 1
  • I'm trying to train custom keypoint detection model and running into some errors

    I'm trying to train custom keypoint detection model and running into some errors

    This is my program so far: `register_coco_instances("mesh_train", {}, "../mesh_coco_Train.json", "../Data/Train/Images") register_coco_instances("mesh_test", {}, "../mesh_coco_Test.json", "../Data/Test/Images")

    MetadataCatalog.get("mesh_train").keypoint_names = ["joints"] MetadataCatalog.get("mesh_train").keypoint_flip_map = [] train_dicts = DatasetCatalog.get("mesh_train") test_dicts = DatasetCatalog.get("mesh_test") mesh_metadata = MetadataCatalog.get("mesh_train")

    def cv2_imshow(im): im = cv2.cvtColor(im, cv2.COLOR_BGR2RGB) plt.figure(), plt.imshow(im), plt.axis('off') plt.show()

    for d in random.sample(train_dicts, 5): print(d["file_name"]) img = cv2.imread(d["file_name"]) visualizer = Visualizer(img[:, :, ::-1], metadata=mesh_metadata, scale=0.5) vis = visualizer.draw_dataset_dict(d) cv2_imshow(vis.get_image()[:, :, ::-1])

    cfg = get_cfg() cfg.merge_from_file(model_zoo.get_config_file("COCO-Keypoints/keypoint_rcnn_R_50_FPN_3x.yaml")) cfg.DATASETS.TRAIN = ("mesh_train",) cfg.DATASETS.TEST = ("mesh_test",) cfg.DATALOADER.NUM_WORKERS = 4 cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("COCO-Keypoints/keypoint_rcnn_R_50_FPN_3x.yaml") cfg.SOLVER.IMS_PER_BATCH = 2 cfg.SOLVER.BASE_LR = 0.001 cfg.SOLVER.MAX_ITER = 300 cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 128 cfg.MODEL.ROI_HEADS.NUM_CLASSES = 1

    os.makedirs(cfg.OUTPUT_DIR, exist_ok=True) trainer = DefaultTrainer(cfg) trainer.resume_or_load(resume=False)

    print("training....") trainer.train()`

    But when I run it, I'm getting the following error:

    [01/02 13:47:51 d2.engine.train_loop]: Starting training from iteration 0 ERROR [01/02 13:47:51 d2.engine.train_loop]: Exception during training: Traceback (most recent call last): File "C:\Users\Singh Automation\AppData\Local\Programs\Python\Python310\lib\site-packages\detectron2\engine\train_loop.py", line 149, in train self.run_step() File "C:\Users\Singh Automation\AppData\Local\Programs\Python\Python310\lib\site-packages\detectron2\engine\defaults.py", line 494, in run_step self._trainer.run_step() File "C:\Users\Singh Automation\AppData\Local\Programs\Python\Python310\lib\site-packages\detectron2\engine\train_loop.py", line 268, in run_step data = next(self._data_loader_iter) File "C:\Users\Singh Automation\AppData\Local\Programs\Python\Python310\lib\site-packages\detectron2\data\common.py", line 283, in iter for d in self.dataset: File "C:\Users\Singh Automation\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils\data\dataloader.py", line 435, in iter return self._get_iterator() File "C:\Users\Singh Automation\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils\data\dataloader.py", line 381, in _get_iterator return _MultiProcessingDataLoaderIter(self) File "C:\Users\Singh Automation\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils\data\dataloader.py", line 1034, in init w.start() File "C:\Users\Singh Automation\AppData\Local\Programs\Python\Python310\lib\multiprocessing\process.py", line 121, in start self._popen = self._Popen(self) File "C:\Users\Singh Automation\AppData\Local\Programs\Python\Python310\lib\multiprocessing\context.py", line 224, in _Popen return _default_context.get_context().Process._Popen(process_obj) File "C:\Users\Singh Automation\AppData\Local\Programs\Python\Python310\lib\multiprocessing\context.py", line 336, in _Popen return Popen(process_obj) File "C:\Users\Singh Automation\AppData\Local\Programs\Python\Python310\lib\multiprocessing\popen_spawn_win32.py", line 45, in init prep_data = spawn.get_preparation_data(process_obj._name) File "C:\Users\Singh Automation\AppData\Local\Programs\Python\Python310\lib\multiprocessing\spawn.py", line 154, in get_preparation_data _check_not_importing_main() File "C:\Users\Singh Automation\AppData\Local\Programs\Python\Python310\lib\multiprocessing\spawn.py", line 134, in _check_not_importing_main raise RuntimeError(''' RuntimeError: An attempt has been made to start a new process before the current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:
    
            if __name__ == '__main__':
                freeze_support()
                ...
    
        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.
    

    [01/02 13:47:51 d2.engine.hooks]: Total training time: 0:00:00 (0:00:00 on hooks) [01/02 13:47:51 d2.utils.events]: iter: 0 lr: N/A max_mem: 228M

    opened by Codeveen 1
  • [Multi node Training] Training time is very longer than a single node

    [Multi node Training] Training time is very longer than a single node

    Hello

    There is a problem that the training time is very slow when learning the model with detectron2 using two machines

    I use A6000 RTX with 4 GPUs per node and train my models with the two nodes. Two nodes are on Ubuntu 20.04. Training is normally working and the log.txt file is also generated well.

    I set the environment variables as follows

    Node1 setting(189) export NCCL_DEBUG="INFO" export NCCL_SOCKET_IFNAME="enp36s0f1" export GLOO_SOCKET_IFNAME="enp36s0f1"

    Node2 setting export NCCL_DEBUG="INFO" export NCCL_SOCKET_IFNAME="enp4s0" export GLOO_SOCKET_IFNAME="enp4s0"

    First, when I only set NCCL environment variables (not set GLOO), I got these errors

    -- Process 0 terminated with the following error:
    Traceback (most recent call last):
    File "/opt/conda/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 19, in _wrap
    fn(i, *args)
    File "/root/xgy/experiments/detectron2/detectron2/engine/launch.py", line 125, in _distributed_worker
    main_func(*args)
    File "/root/xgy/experiments/distributed-pytorch/MaskRCNN/train/train_net.py", line 141, in main
    trainer = Trainer(cfg)
    File "/root/xgy/experiments/detectron2/detectron2/engine/defaults.py", line 383, in init
    data_loader = self.build_train_loader(cfg)
    File "/root/xgy/experiments/detectron2/detectron2/engine/defaults.py", line 543, in build_train_loader
    return build_detection_train_loader(cfg)
    File "/root/xgy/experiments/detectron2/detectron2/config/config.py", line 192, in wrapped
    explicit_args = _get_args_from_config(from_config, *args, **kwargs)
    File "/root/xgy/experiments/detectron2/detectron2/config/config.py", line 229, in _get_args_from_config
    ret = from_config_func(*args, **kwargs)
    File "/root/xgy/experiments/detectron2/detectron2/data/build.py", line 328, in _train_loader_from_config
    sampler = TrainingSampler(len(dataset))
    File "/root/xgy/experiments/detectron2/detectron2/data/samplers/distributed_sampler.py", line 37, in init
    seed = comm.shared_random_seed()
    File "/root/xgy/experiments/detectron2/detectron2/utils/comm.py", line 230, in shared_random_seed
    all_ints = all_gather(ints)
    File "/root/xgy/experiments/detectron2/detectron2/utils/comm.py", line 154, in all_gather
    group = _get_global_gloo_group()
    File "/root/xgy/experiments/detectron2/detectron2/utils/comm.py", line 89, in _get_global_gloo_group
    return dist.new_group(backend="gloo")
    File "/opt/conda/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 2019, in new_group
    pg = _new_process_group_helper(group_world_size,
    File "/opt/conda/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 504, in _new_process_group_helper
    pg = ProcessGroupGloo(
    RuntimeError: [/pytorch/third_party/gloo/gloo/transport/tcp/pair.cc:769] connect [127.0.0.1]:7602: Connection refused
    

    After I set export GLOO_SOCKET_IFNAME="enp4s0" and export GLOO_SOCKET_IFNAME="enp36s0f1" respectively, The training worked but the time is too slow. This is my NCCL BUG Report

    cvlab189-System-Product-Name:1379562:1379562 [0] NCCL INFO Bootstrap : Using enp36s0f1:168.188.129.189<0>
    cvlab189-System-Product-Name:1379562:1379562 [0] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so), using internal implementation
    
    cvlab189-System-Product-Name:1379562:1379562 [0] misc/ibvwrap.cc:63 NCCL WARN Failed to open libibverbs.so[.1]
    cvlab189-System-Product-Name:1379562:1379562 [0] NCCL INFO NET/Socket : Using [0]enp36s0f1:168.188.129.189<0>
    cvlab189-System-Product-Name:1379562:1379562 [0] NCCL INFO Using network Socket
    NCCL version 2.10.3+cuda11.3
    cvlab189-System-Product-Name:1379564:1379564 [2] NCCL INFO Bootstrap : Using enp36s0f1:168.188.129.189<0>
    cvlab189-System-Product-Name:1379564:1379564 [2] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so), using internal implementation
    
    cvlab189-System-Product-Name:1379564:1379564 [2] misc/ibvwrap.cc:63 NCCL WARN Failed to open libibverbs.so[.1]
    cvlab189-System-Product-Name:1379564:1379564 [2] NCCL INFO NET/Socket : Using [0]enp36s0f1:168.188.129.189<0>
    cvlab189-System-Product-Name:1379564:1379564 [2] NCCL INFO Using network Socket
    cvlab189-System-Product-Name:1379565:1379565 [3] NCCL INFO Bootstrap : Using enp36s0f1:168.188.129.189<0>
    cvlab189-System-Product-Name:1379565:1379565 [3] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so), using internal implementation
    
    cvlab189-System-Product-Name:1379565:1379565 [3] misc/ibvwrap.cc:63 NCCL WARN Failed to open libibverbs.so[.1]
    cvlab189-System-Product-Name:1379563:1379563 [1] NCCL INFO Bootstrap : Using enp36s0f1:168.188.129.189<0>
    cvlab189-System-Product-Name:1379565:1379565 [3] NCCL INFO NET/Socket : Using [0]enp36s0f1:168.188.129.189<0>
    cvlab189-System-Product-Name:1379565:1379565 [3] NCCL INFO Using network Socket
    cvlab189-System-Product-Name:1379563:1379563 [1] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so), using internal implementation
    
    cvlab189-System-Product-Name:1379563:1379563 [1] misc/ibvwrap.cc:63 NCCL WARN Failed to open libibverbs.so[.1]
    cvlab189-System-Product-Name:1379563:1379563 [1] NCCL INFO NET/Socket : Using [0]enp36s0f1:168.188.129.189<0>
    cvlab189-System-Product-Name:1379563:1379563 [1] NCCL INFO Using network Socket
    cvlab189-System-Product-Name:1379565:1379737 [3] NCCL INFO Trees [0] -1/-1/-1->3->2 [1] -1/-1/-1->3->2
    cvlab189-System-Product-Name:1379562:1379735 [0] NCCL INFO Channel 00/02 :    0   1   2   3   4   5   6   7
    cvlab189-System-Product-Name:1379562:1379735 [0] NCCL INFO Channel 01/02 :    0   1   2   3   4   5   6   7
    cvlab189-System-Product-Name:1379562:1379735 [0] NCCL INFO Trees [0] 1/4/-1->0->-1 [1] 1/-1/-1->0->4
    cvlab189-System-Product-Name:1379563:1379738 [1] NCCL INFO Trees [0] 2/-1/-1->1->0 [1] 2/-1/-1->1->0
    cvlab189-System-Product-Name:1379564:1379736 [2] NCCL INFO Trees [0] 3/-1/-1->2->1 [1] 3/-1/-1->2->1
    cvlab189-System-Product-Name:1379564:1379736 [2] NCCL INFO Channel 00 : 2[41000] -> 3[61000] via P2P/IPC
    cvlab189-System-Product-Name:1379563:1379738 [1] NCCL INFO Channel 00 : 1[2c000] -> 2[41000] via P2P/IPC
    cvlab189-System-Product-Name:1379564:1379736 [2] NCCL INFO Channel 01 : 2[41000] -> 3[61000] via P2P/IPC
    cvlab189-System-Product-Name:1379563:1379738 [1] NCCL INFO Channel 01 : 1[2c000] -> 2[41000] via P2P/IPC
    cvlab189-System-Product-Name:1379562:1379735 [0] NCCL INFO Channel 00 : 7[68000] -> 0[1000] [receive] via NET/Socket/0
    cvlab189-System-Product-Name:1379565:1379737 [3] NCCL INFO Channel 00 : 3[61000] -> 4[19000] [send] via NET/Socket/0
    cvlab189-System-Product-Name:1379562:1379735 [0] NCCL INFO Channel 01 : 7[68000] -> 0[1000] [receive] via NET/Socket/0
    cvlab189-System-Product-Name:1379562:1379735 [0] NCCL INFO Channel 00 : 0[1000] -> 1[2c000] via P2P/IPC
    cvlab189-System-Product-Name:1379562:1379735 [0] NCCL INFO Channel 01 : 0[1000] -> 1[2c000] via P2P/IPC
    cvlab189-System-Product-Name:1379565:1379737 [3] NCCL INFO Channel 01 : 3[61000] -> 4[19000] [send] via NET/Socket/0
    cvlab189-System-Product-Name:1379565:1379737 [3] NCCL INFO Connected all rings
    cvlab189-System-Product-Name:1379565:1379737 [3] NCCL INFO Channel 00 : 3[61000] -> 2[41000] via P2P/IPC
    cvlab189-System-Product-Name:1379565:1379737 [3] NCCL INFO Channel 01 : 3[61000] -> 2[41000] via P2P/IPC
    cvlab189-System-Product-Name:1379564:1379736 [2] NCCL INFO Connected all rings
    cvlab189-System-Product-Name:1379562:1379735 [0] NCCL INFO Connected all rings
    cvlab189-System-Product-Name:1379563:1379738 [1] NCCL INFO Connected all rings
    cvlab189-System-Product-Name:1379564:1379736 [2] NCCL INFO Channel 00 : 2[41000] -> 1[2c000] via P2P/IPC
    cvlab189-System-Product-Name:1379563:1379738 [1] NCCL INFO Channel 00 : 1[2c000] -> 0[1000] via P2P/IPC
    cvlab189-System-Product-Name:1379564:1379736 [2] NCCL INFO Channel 01 : 2[41000] -> 1[2c000] via P2P/IPC
    cvlab189-System-Product-Name:1379563:1379738 [1] NCCL INFO Channel 01 : 1[2c000] -> 0[1000] via P2P/IPC
    cvlab189-System-Product-Name:1379565:1379737 [3] NCCL INFO Connected all trees
    cvlab189-System-Product-Name:1379565:1379737 [3] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 8/8/512
    cvlab189-System-Product-Name:1379565:1379737 [3] NCCL INFO 2 coll channels, 2 p2p channels, 1 p2p channels per peer
    cvlab189-System-Product-Name:1379562:1379735 [0] NCCL INFO Channel 00 : 4[19000] -> 0[1000] [receive] via NET/Socket/0
    cvlab189-System-Product-Name:1379564:1379736 [2] NCCL INFO Connected all trees
    cvlab189-System-Product-Name:1379564:1379736 [2] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 8/8/512
    cvlab189-System-Product-Name:1379564:1379736 [2] NCCL INFO 2 coll channels, 2 p2p channels, 1 p2p channels per peer
    cvlab189-System-Product-Name:1379562:1379735 [0] NCCL INFO Channel 01 : 4[19000] -> 0[1000] [receive] via NET/Socket/0
    cvlab189-System-Product-Name:1379562:1379735 [0] NCCL INFO Channel 00 : 0[1000] -> 4[19000] [send] via NET/Socket/0
    cvlab189-System-Product-Name:1379562:1379735 [0] NCCL INFO Channel 01 : 0[1000] -> 4[19000] [send] via NET/Socket/0
    cvlab189-System-Product-Name:1379562:1379735 [0] NCCL INFO Connected all trees
    cvlab189-System-Product-Name:1379562:1379735 [0] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 8/8/512
    cvlab189-System-Product-Name:1379562:1379735 [0] NCCL INFO 2 coll channels, 2 p2p channels, 1 p2p channels per peer
    cvlab189-System-Product-Name:1379563:1379738 [1] NCCL INFO Connected all trees
    cvlab189-System-Product-Name:1379563:1379738 [1] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 8/8/512
    cvlab189-System-Product-Name:1379563:1379738 [1] NCCL INFO 2 coll channels, 2 p2p channels, 1 p2p channels per peer
    cvlab189-System-Product-Name:1379562:1379735 [0] NCCL INFO comm 0x7f7590002fb0 rank 0 nranks 8 cudaDev 0 busId 1000 - Init COMPLETE
    cvlab189-System-Product-Name:1379563:1379738 [1] NCCL INFO comm 0x7f3928002fb0 rank 1 nranks 8 cudaDev 1 busId 2c000 - Init COMPLETE
    cvlab189-System-Product-Name:1379565:1379737 [3] NCCL INFO comm 0x7f9f00002fb0 rank 3 nranks 8 cudaDev 3 busId 61000 - Init COMPLETE
    cvlab189-System-Product-Name:1379564:1379736 [2] NCCL INFO comm 0x7fe414002fb0 rank 2 nranks 8 cudaDev 2 busId 41000 - Init COMPLETE
    cvlab189-System-Product-Name:1379562:1379562 [0] NCCL INFO Launch mode Parallel
    

    For the record, according to this guide https://pytorch.org/docs/stable/distributed.html, "If you encounter any problem with NCCL, use Gloo as the fallback option. (Note that Gloo currently runs slower than NCCL for GPUs." distributed_sampler in detectron2 uses gloo backend.

    When I type this command python -c "import torch;print(torch.cuda.nccl.version())"(NCCL Version check in Conda virtual Enviroment) (2, 10, 3) for both two machines I additionally didn't install NCCL (only installed Pytorch) What should I do?

    opened by daebakk 1
  • A simple trick for a fully deterministic ROIAlign, and thus MaskRCNN training and inference

    A simple trick for a fully deterministic ROIAlign, and thus MaskRCNN training and inference

    Non-determinism of MaskRCNN

    There have been a lot of discussions and inquiries in this repo about a fully deterministic MaskRCNN e.g. #4260, #3203 , #2615, #2480, and also on other detection repositories (e.g. MMDetection here and here and also torchvision here). Unfortunately, even after seeding everything and setting Pytorch's deterministic flags, results are still non-repeatable.

    It boils down to the fact that some of the used Pytorch / torchvision ops doesn't have a deterministic GPU implementation (most notably, due to using atomicAdd in the backward pass). So, the only solution is to train for as long as possible to reduce variance in the results. It is worth noting that not only training, but also evaluation (see #2480) of MaskRCNN (and actually most detectron2 models) is not deterministic

    Based on the minimal example in #4260, I made an analysis on the ops used for MaskRCNN and found that the main reason of non-determinism is the backward pass of ROIAlign (see here).

    Proposed solution

    I am here proposing a simple trick that makes ROIAlign practically fully reproducible, without touching the cuda kernel!! it introduces trivial additional memory and computation. It can be summarized as:

    • Truncate the input to a smaller datatype, this gives a starting point with a very small number of significand bits used
    • Then, cast to a larger data-type just before doing the computations that involve atomicAdd

    In terms of code, this is translated to simply modifying this function call to

    return roi_align(
        input.half().double(),
        rois.half().double(),
        self.output_size,
        self.spatial_scale,
        self.sampling_ratio,
        self.aligned,
    ).to(dtype=input.dtype)
    

    Test

    The conversion to double results in a trivial increase in memory & computation, but performing it after the truncation, significantly increases reproducibility.

    This solution was tested and found fully deterministic (losses values, and evaluation results on COCO) upto tens of thousands of steps (using same code as in #4260) for:

    • MaskRCNN based on ResNet-50 bakbone
    • MaskRCNN based on ResNeXt-101 bakbone
    • Wide range of batch sizes
    • Mixed-precision training
    • Single and Multi-GPU training
    • A100's & V100's

    Note on A100

    Ampere by default uses TF32 format for tensor-core computations, which means that the above truncation is done implicitly! so on Ampere based devices it is enough just to cast to double, i.e.

    return roi_align(
        input.double(),
        rois.double(),
        self.output_size,
        self.spatial_scale,
        self.sampling_ratio,
        self.aligned,
    ).to(dtype=input.dtype)
    

    Note: This is the default mode for PyTorch, but if TF32 is disabled for some reason (i.e. torch.backends.cudnn.allow_tf32 = False) then the above truncation with .half() is still necessary

    Note

    • This solution was tested and found to work well for other non-deterministic Pytorch ops, including: F.interpolate and F.grid_sample
    • This is not a general solution to the problem of random-order reproducible floating point summation, but a practical mitigation that works well for this setup / scenario
    • At least in theory, this should work even better if applied inside the kernel right before atomicAdd
    • The only alternative currently is training each experiment for very long, which isn't practical in many setups, and still isn't fully reproducible

    Would love to hear what people think about this! @ppwwyyxx @fmassa

    enhancement 
    opened by ASDen 0
  • How to save all logs in a file?

    How to save all logs in a file?

    Hi,

    During the training, I see that only loss and epochs-related info is stored in log file but what if any error happens in the training then it does not save such errors in the log file.

    Can you please share if this feature already exists or is there any way to achieve that?

    enhancement 
    opened by dsbyprateekg 0
Releases(v0.6)
  • v0.6(Nov 15, 2021)

    Pre-built Linux binaries are available for the following environment:

    CUDA torch 1.10torch 1.9torch 1.8
    11.3
    install
    python -m pip install detectron2==0.6 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu113/torch1.10/index.html
    
    11.1
    install
    python -m pip install detectron2==0.6 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu111/torch1.10/index.html
    
    install
    python -m pip install detectron2==0.6 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu111/torch1.9/index.html
    
    install
    python -m pip install detectron2==0.6 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu111/torch1.8/index.html
    
    10.2
    install
    python -m pip install detectron2==0.6 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu102/torch1.10/index.html
    
    install
    python -m pip install detectron2==0.6 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu102/torch1.9/index.html
    
    install
    python -m pip install detectron2==0.6 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu102/torch1.8/index.html
    
    10.1
    install
    python -m pip install detectron2==0.6 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu101/torch1.8/index.html
    
    cpu
    install
    python -m pip install detectron2==0.6 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cpu/torch1.10/index.html
    
    install
    python -m pip install detectron2==0.6 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cpu/torch1.9/index.html
    
    install
    python -m pip install detectron2==0.6 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cpu/torch1.8/index.html
    
    Source code(tar.gz)
    Source code(zip)
  • v0.5(Jul 23, 2021)

    Pre-built Linux binaries are available for the following environment:

    CUDA torch 1.9torch 1.8torch 1.7
    11.1
    install
    python -m pip install detectron2==0.5 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu111/torch1.9/index.html
    
    install
    python -m pip install detectron2==0.5 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu111/torch1.8/index.html
    
    11.0
    install
    python -m pip install detectron2==0.5 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu110/torch1.7/index.html
    
    10.2
    install
    python -m pip install detectron2==0.5 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu102/torch1.9/index.html
    
    install
    python -m pip install detectron2==0.5 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu102/torch1.8/index.html
    
    install
    python -m pip install detectron2==0.5 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu102/torch1.7/index.html
    
    10.1
    install
    python -m pip install detectron2==0.5 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu101/torch1.8/index.html
    
    install
    python -m pip install detectron2==0.5 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu101/torch1.7/index.html
    
    9.2
    install
    python -m pip install detectron2==0.5 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu92/torch1.7/index.html
    
    cpu
    install
    python -m pip install detectron2==0.5 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cpu/torch1.9/index.html
    
    install
    python -m pip install detectron2==0.5 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cpu/torch1.8/index.html
    
    install
    python -m pip install detectron2==0.5 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cpu/torch1.7/index.html
    
    Source code(tar.gz)
    Source code(zip)
  • v0.4(Mar 13, 2021)

    New Features

    • All common models can be converted to TorchScript format by tracing or scripting (tutorial). Requires pytorch≥1.8.
    • Support fvcore parameter schedulers (originally from ClassyVision) that are composable, scale-invariant, and can be used on parameters other than learning rate.
    • Refactor PointRend as a mask head (instead of an ROIHead).
    • New export and C++ deployment examples.
    • Release d2go which provides end-to-end production pipeline.

    New Features in DensePose:

    Release DensePose CSE (a framework to extend DensePose to various categories using 3D models) and DensePose Evolution (a framework to bootstrap DensePose on unlabeled data). See here for more details.

    Deprecations:

    • Deprecate cfg argument from COCO/LVIS evaluator; Deprecate num_classes and ignore_label argument from SemSegEvaluator
    • Deprecate WarmupMultiStepLR, WarmupCosineLR in favor of fvcore schedulers
    • Deprecated features will be removed in future releases

    Pre-built Linux binaries are available for the following environment:

    CUDA torch 1.8torch 1.7torch 1.6
    11.1
    install
    python -m pip install detectron2==0.4 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu111/torch1.8/index.html
    
    11.0
    install
    python -m pip install detectron2==0.4 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu110/torch1.7/index.html
    
    10.2
    install
    python -m pip install detectron2==0.4 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu102/torch1.8/index.html
    
    install
    python -m pip install detectron2==0.4 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu102/torch1.7/index.html
    
    install
    python -m pip install detectron2==0.4 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu102/torch1.6/index.html
    
    10.1
    install
    python -m pip install detectron2==0.4 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu101/torch1.8/index.html
    
    install
    python -m pip install detectron2==0.4 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu101/torch1.7/index.html
    
    install
    python -m pip install detectron2==0.4 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu101/torch1.6/index.html
    
    9.2
    install
    python -m pip install detectron2==0.4 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu92/torch1.7/index.html
    
    install
    python -m pip install detectron2==0.4 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu92/torch1.6/index.html
    
    cpu
    install
    python -m pip install detectron2==0.4 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cpu/torch1.8/index.html
    
    install
    python -m pip install detectron2==0.4 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cpu/torch1.7/index.html
    
    install
    python -m pip install detectron2==0.4 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cpu/torch1.6/index.html
    
    Source code(tar.gz)
    Source code(zip)
  • v0.3(Nov 6, 2020)

    Features & Improvements:

    • Support constructing RetinaNet, data loader, optimizer, COCOEvaluator without configs, in addition to Mask R-CNN.
    • Add DeepLab & PanopticDeepLab in projects/.
    • Support importing 3 projects (point_rend, deeplab, panoptic_deeplab) directly with import detectron2.projects.xxx.
    • Support mixed precision in training (using cfg.SOLVER.AMP.ENABLED) and inference.
    • Support ADE20k semantic segmentation dataset (named ade20k_sem_seg_train, ade20k_sem_seg_val).
    • Continuous build on Windows.

    Pre-built Linux binaries are provided for the following environment:

    CUDA torch 1.7torch 1.6torch 1.5
    11.0
    install
    python -m pip install detectron2==0.3 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu110/torch1.7/index.html
    
    10.2
    install
    python -m pip install detectron2==0.3 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu102/torch1.7/index.html
    
    install
    python -m pip install detectron2==0.3 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu102/torch1.6/index.html
    
    install
    python -m pip install detectron2==0.3 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu102/torch1.5/index.html
    
    10.1
    install
    python -m pip install detectron2==0.3 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu101/torch1.7/index.html
    
    install
    python -m pip install detectron2==0.3 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu101/torch1.6/index.html
    
    install
    python -m pip install detectron2==0.3 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu101/torch1.5/index.html
    
    9.2
    install
    python -m pip install detectron2==0.3 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu92/torch1.7/index.html
    
    install
    python -m pip install detectron2==0.3 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu92/torch1.6/index.html
    
    install
    python -m pip install detectron2==0.3 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu92/torch1.5/index.html
    
    cpu
    install
    python -m pip install detectron2==0.3 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cpu/torch1.7/index.html
    
    install
    python -m pip install detectron2==0.3 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cpu/torch1.6/index.html
    
    install
    python -m pip install detectron2==0.3 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cpu/torch1.5/index.html
    
    Source code(tar.gz)
    Source code(zip)
  • v0.2.1(Aug 4, 2020)

    • Added pre-built binary for PyTorch 1.6
    CUDA torch 1.6torch 1.5torch 1.4
    10.2
    install
    python -m pip install detectron2==0.2.1 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu102/torch1.6/index.html
    
    install
    python -m pip install detectron2==0.2.1 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu102/torch1.5/index.html
    
    10.1
    install
    python -m pip install detectron2==0.2.1 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu101/torch1.6/index.html
    
    install
    python -m pip install detectron2==0.2.1 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu101/torch1.5/index.html
    
    install
    python -m pip install detectron2==0.2.1 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu101/torch1.4/index.html
    
    10.0
    install
    python -m pip install detectron2==0.2.1 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu100/torch1.4/index.html
    
    9.2
    install
    python -m pip install detectron2==0.2.1 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu92/torch1.6/index.html
    
    install
    python -m pip install detectron2==0.2.1 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu92/torch1.5/index.html
    
    install
    python -m pip install detectron2==0.2.1 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu92/torch1.4/index.html
    
    cpu
    install
    python -m pip install detectron2==0.2.1 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cpu/torch1.6/index.html
    
    install
    python -m pip install detectron2==0.2.1 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cpu/torch1.5/index.html
    
    install
    python -m pip install detectron2==0.2.1 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cpu/torch1.4/index.html
    
    Source code(tar.gz)
    Source code(zip)
  • v0.2(Jul 8, 2020)

    Features & Improvements:

    • Support constructing objects with either configs or explicit arguments. As an example, the entire Mask R-CNN can be built without using configs
    • Rename TransformGen to Augmentation and keep TransformGen as an alias. Design the interface of Augmentation so that it can access arbitrary custom data types. See augmentation tutorial for details.
    • Improve speed of COCOEvaluator by about 3x
    • Support LVIS v1 dataset
    • Support GIoU loss in RPN and R-CNN
    • Support auto-scaling of batch size and learning rate in DefaultTrainer. See cfg.SOLVER.REFERENCE_WORLD_SIZE

    Pre-built Linux binaries are provided for the following environment:

    CUDA torch 1.5torch 1.4
    10.2
    install
    python -m pip install detectron2==0.2 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu102/torch1.5/index.html
    
    10.1
    install
    python -m pip install detectron2==0.2 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu101/torch1.5/index.html
    
    install
    python -m pip install detectron2==0.2 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu101/torch1.4/index.html
    
    10.0
    install
    python -m pip install detectron2==0.2 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu100/torch1.4/index.html
    
    9.2
    install
    python -m pip install detectron2==0.2 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu92/torch1.5/index.html
    
    install
    python -m pip install detectron2==0.2 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu92/torch1.4/index.html
    
    cpu
    install
    python -m pip install detectron2==0.2 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cpu/torch1.5/index.html
    
    install
    python -m pip install detectron2==0.2 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cpu/torch1.4/index.html
    
    Source code(tar.gz)
    Source code(zip)
  • v0.1.3(May 17, 2020)

    Bugfix version.

    We started to release pre-built wheels for multiple PyTorch versions:

    CUDA torch 1.5torch 1.4
    10.2
    install
    python -m pip install detectron2==0.1.3 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu102/torch1.5/index.html
    
    10.1
    install
    python -m pip install detectron2==0.1.3 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu101/torch1.5/index.html
    
    install
    python -m pip install detectron2==0.1.3 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu101/torch1.4/index.html
    
    10.0
    install
    python -m pip install detectron2==0.1.3 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu100/torch1.4/index.html
    
    9.2
    install
    python -m pip install detectron2==0.1.3 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu92/torch1.5/index.html
    
    install
    python -m pip install detectron2==0.1.3 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cu92/torch1.4/index.html
    
    cpu
    install
    python -m pip install detectron2==0.1.3 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cpu/torch1.5/index.html
    
    install
    python -m pip install detectron2==0.1.3 -f \
      https://dl.fbaipublicfiles.com/detectron2/wheels/cpu/torch1.4/index.html
    

    Incompatible changes about internal interface:

    • _init_{box,mask,keypoint}_head of StandardROIHeads was changed from instance method to class method.
    Source code(tar.gz)
    Source code(zip)
  • v0.1.2(May 4, 2020)

    The pre-built wheels for this version have to be used with an official binary release of PyTorch 1.5.

    Improvements:

    Incompatible changes:

    • When loading a checkpoint with resume_or_load(), training states like optimizer, start_iter will only be loaded when resume is True and the last checkpoint is found. This matches users’ expectations better
    • .output_size in custom box head is renamed to .output_shape
    • anchor_generator no longer duplicates the anchors for each image
    • feature_strides and feature_channels attributes are removed from ROIHeads. Use the input argument input_shape instead.

    New in DensePose:

    Source code(tar.gz)
    Source code(zip)
  • v0.1.1(Mar 5, 2020)

    Incompatible changes about head design:

    • Mask head and keypoint head now include logic for losses & inference. Custom heads should overwrite the feature computation by layers() method.
    • _forward_{box,mask,keypoint} methods of StandardROIHeads now accept dict of features.

    This release is made to be compatible with such changes in projects (Mesh R-CNN, PointRend, etc)

    Other additional features:

    The pre-built wheels for this version have to be used with an official binary release of PyTorch 1.4.

    Source code(tar.gz)
    Source code(zip)
  • v0.1(Feb 7, 2020)

    Some major additional features since open source:

    We start to provide pre-built binary wheels at https://dl.fbaipublicfiles.com/detectron2/wheels/index.html. The pre-built wheels for this version have to be used with an official binary release of PyTorch 1.4.

    Source code(tar.gz)
    Source code(zip)
Owner
Facebook Research
Facebook Research
A tool for calculating distortion parameters in coordination complexes.

OctaDist Octahedral distortion calculator: A tool for calculating distortion parameters in coordination complexes. https://octadist.github.io/ Registe

OctaDist 12 Oct 04, 2022
Official implementation of "StyleCariGAN: Caricature Generation via StyleGAN Feature Map Modulation" (SIGGRAPH 2021)

StyleCariGAN in PyTorch Official implementation of StyleCariGAN:Caricature Generation via StyleGAN Feature Map Modulation in PyTorch Requirements PyTo

PeterZhouSZ 49 Oct 31, 2022
PyTorch implementation of "Contrast to Divide: self-supervised pre-training for learning with noisy labels"

Contrast to Divide: self-supervised pre-training for learning with noisy labels This is an official implementation of "Contrast to Divide: self-superv

55 Nov 23, 2022
Computer vision - fun segmentation experience using classic and deep tools :)

Computer_Vision_Segmentation_Fun Segmentation of Images and Video. Tools: pytorch Models: Classic model - GrabCut Deep model - Deeplabv3_resnet101 Flo

Mor Ventura 1 Dec 18, 2021
Target Propagation via Regularized Inversion

Target Propagation via Regularized Inversion The present code implements an ideal formulation of target propagation using regularized inverses compute

Vincent Roulet 0 Dec 02, 2021
Deeper insights into graph convolutional networks for semi-supervised learning

deeper_insights_into_GCNs Deeper insights into graph convolutional networks for semi-supervised learning References data and utils.py come from Implem

Davidham3 17 Dec 16, 2022
Reusable constraint types to use with typing.Annotated

annotated-types PEP-593 added typing.Annotated as a way of adding context-specific metadata to existing types, and specifies that Annotated[T, x] shou

125 Dec 26, 2022
Tooling for converting STAC metadata to ODC data model

手语识别 0、使用到的模型 (1). openpose,作者:CMU-Perceptual-Computing-Lab https://github.com/CMU-Perceptual-Computing-Lab/openpose (2). 图像分类classification,作者:Bubbl

Open Data Cube 65 Dec 20, 2022
This program uses trial auth token of Azure Cognitive Services to do speech synthesis for you.

🗣️ aspeak A simple text-to-speech client using azure TTS API(trial). 😆 TL;DR: This program uses trial auth token of Azure Cognitive Services to do s

Levi Zim 359 Jan 05, 2023
GAN-based Matrix Factorization for Recommender Systems

GAN-based Matrix Factorization for Recommender Systems This repository contains the datasets' splits, the source code of the experiments and their res

Ervin Dervishaj 9 Nov 06, 2022
Multi Agent Path Finding Algorithms

MATP-solver Simulator collision check path step random initial states or given states Traditional method Seperate A* algorithem Confict-based Search S

30 Dec 12, 2022
Code I use to automatically update my videos' metadata on YouTube

mCodingYouTube This repository contains the code I use to automatically update my videos' metadata on YouTube, including: titles, descriptions, tags,

James Murphy 19 Oct 07, 2022
Pytorch implementation of OCNet series and SegFix.

openseg.pytorch News 2021/09/14 MMSegmentation has supported our ISANet and refer to ISANet for more details. 2021/08/13 We have released the implemen

openseg-group 1.1k Dec 23, 2022
🔥 Cogitare - A Modern, Fast, and Modular Deep Learning and Machine Learning framework for Python

Cogitare is a Modern, Fast, and Modular Deep Learning and Machine Learning framework for Python. A friendly interface for beginners and a powerful too

Cogitare - Modern and Easy Deep Learning with Python 76 Sep 30, 2022
Practical Blind Denoising via Swin-Conv-UNet and Data Synthesis

Practical Blind Denoising via Swin-Conv-UNet and Data Synthesis [Paper] [Online Demo] The following results are obtained by our SCUNet with purely syn

Kai Zhang 312 Jan 07, 2023
Predict the latency time of the deep learning models

Deep Neural Network Prediction Step 1. Genernate random parameters and Run them sequentially : $ python3 collect_data.py -gp -ep -pp -pl pooling -num

QAQ 1 Nov 12, 2021
Neurolab is a simple and powerful Neural Network Library for Python

Neurolab Neurolab is a simple and powerful Neural Network Library for Python. Contains based neural networks, train algorithms and flexible framework

152 Dec 06, 2022
[ICCV 2021] A Simple Baseline for Semi-supervised Semantic Segmentation with Strong Data Augmentation

[ICCV 2021] A Simple Baseline for Semi-supervised Semantic Segmentation with Strong Data Augmentation

CodingMan 45 Dec 12, 2022
PyTorch implementations of Generative Adversarial Networks.

This repository has gone stale as I unfortunately do not have the time to maintain it anymore. If you would like to continue the development of it as

Erik Linder-Norén 13.4k Jan 08, 2023
Here I will explain the flow to deploy your custom deep learning models on Ultra96V2.

Xilinx_Vitis_AI This repo will help you to Deploy your Deep Learning Model on Ultra96v2 Board. Prerequisites Vitis Core Development Kit 2019.2 This co

Amin Mamandipoor 1 Feb 08, 2022