quantize aware training package for NCNN on pytorch

Related tags

Deep Learningncnnqat
Overview

ncnnqat

ncnnqat is a quantize aware training package for NCNN on pytorch.

Table of Contents

Installation

  • Supported Platforms: Linux

  • Accelerators and GPUs: NVIDIA GPUs via CUDA driver 10.1.

  • Dependencies:

    • python >= 3.5, < 4
    • pytorch >= 1.6
    • numpy >= 1.18.1
    • onnx >= 1.7.0
    • onnx-simplifier >= 0.3.6
  • Install ncnnqat via pypi:

    $ pip install ncnnqat (to do....)

    It is recommended to install from the source code

  • or Install ncnnqat via repo:

    $ git clone https://github.com/ChenShisen/ncnnqat
    $ cd ncnnqat
    $ make install

Usage

  • register_quantization_hook and merge_freeze_bn

    (suggest finetuning from a well-trained model, do it after a few epochs of training otherwise.)

    from ncnnqat import unquant_weight, merge_freeze_bn, register_quantization_hook
    ...
    ...
        for epoch in range(epoch_train):
            model.train()
        if epoch==well_epoch:
            register_quantization_hook(model)
        if epoch>=well_epoch:
            model = merge_freeze_bn(model)  #it will change bn to eval() mode during training
    ...
  • Unquantize weight before update it

    ...
    ... 
        if epoch>=well_epoch:
            model.apply(unquant_weight)  # using original weight while updating
        optimizer.step()
    ...
  • Save weight and save ncnn quantize table after train

    ...
    ...
        onnx_path = "./xxx/model.onnx"
        table_path="./xxx/model.table"
        dummy_input = torch.randn(1, 3, img_size, img_size, device='cuda')
        input_names = [ "input" ]
        output_names = [ "fc" ]
        torch.onnx.export(model, dummy_input, onnx_path, verbose=False, input_names=input_names, output_names=output_names)
        save_table(model,onnx_path=onnx_path,table=table_path)
    
    ...

    if use "model = nn.DataParallel(model)",pytorch unsupport torch.onnx.export,you should save state_dict first and prepare a new model with one gpu,then you will export onnx model.

    ...
    ...
        model_s = new_net() #
        model_s.cuda()
        register_quantization_hook(model_s)
        #model_s = merge_freeze_bn(model_s)
        onnx_path = "./xxx/model.onnx"
        table_path="./xxx/model.table"
        dummy_input = torch.randn(1, 3, img_size, img_size, device='cuda')
        input_names = [ "input" ]
        output_names = [ "fc" ]
        model_s.load_state_dict({k.replace('module.',''):v for k,v in model.state_dict().items()}) #model_s = model     model = nn.DataParallel(model)
              
        torch.onnx.export(model_s, dummy_input, onnx_path, verbose=False, input_names=input_names, output_names=output_names)
        save_table(model_s,onnx_path=onnx_path,table=table_path)
        
    
    ...

Code Examples

Cifar10 quantization aware training example.

python test/test_cifar10.py

SSD300 quantization aware training example.

   ln -s /your_coco_path/coco ./tests/ssd300/data
   python -m torch.distributed.launch \
    --nproc_per_node=4 \
    --nnodes=1 \
    --node_rank=0 \
    ./tests/ssd300/main.py \
    -d ./tests/ssd300/data/coco
    python ./tests/ssd300/main.py --onnx_save  #load model dict, export onnx and ncnn table

Results

  • Cifar10

    result:

    net fp32(onnx) ncnnqat ncnn aciq ncnn kl
    mobilenet_v2 0.91 0.9066 0.9033 0.9066
    resnet18 0.94 0.93333 0.9367 0.937
  • SSD300(resnet18|coco)

    fp32:
     Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.193
     Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.344
     Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.191
     Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.042
     Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.195
     Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.328
     Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.199
     Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.293
     Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.309
     Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.084
     Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.326
     Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.501
    Current AP: 0.19269
    
    ncnnqat:
     Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.192
     Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.342
     Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.194
     Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.041
     Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.194
     Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.327
     Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.197
     Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.291
     Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.307
     Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.082
     Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.325
     Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.497
    Current AP: 0.19202
    

Todo

....

PyTorch implementation of some learning rate schedulers for deep learning researcher.

pytorch-lr-scheduler PyTorch implementation of some learning rate schedulers for deep learning researcher. Usage WarmupReduceLROnPlateauScheduler Visu

Soohwan Kim 59 Dec 08, 2022
Official Repsoitory for "Activate or Not: Learning Customized Activation." [CVPR 2021]

CVPR 2021 | Activate or Not: Learning Customized Activation. This repository contains the official Pytorch implementation of the paper Activate or Not

184 Dec 27, 2022
DeceFL: A Principled Decentralized Federated Learning Framework

DeceFL: A Principled Decentralized Federated Learning Framework This repository comprises codes that reproduce experiments in Ye, et al (2021), which

Huazhong Artificial Intelligence Lab (HAIL) 10 May 31, 2022
Open Source Differentiable Computer Vision Library for PyTorch

Kornia is a differentiable computer vision library for PyTorch. It consists of a set of routines and differentiable modules to solve generic computer

kornia 7.6k Jan 04, 2023
The openspoor package is intended to allow easy transformation between different geographical and topological systems commonly used in Dutch Railway

Openspoor The openspoor package is intended to allow easy transformation between different geographical and topological systems commonly used in Dutch

7 Aug 22, 2022
Large-Scale Unsupervised Object Discovery

Large-Scale Unsupervised Object Discovery Huy V. Vo, Elena Sizikova, Cordelia Schmid, Patrick Pérez, Jean Ponce [PDF] We propose a novel ranking-based

17 Sep 19, 2022
Roadmap to becoming a machine learning engineer in 2020

Roadmap to becoming a machine learning engineer in 2020, inspired by web-developer-roadmap.

Chris Hoyean Song 1.7k Dec 29, 2022
Python scripts for performing object detection with the 1000 labels of the ImageNet dataset in ONNX.

Python scripts for performing object detection with the 1000 labels of the ImageNet dataset in ONNX. The repository combines a class agnostic object localizer to first detect the objects in the image

Ibai Gorordo 24 Nov 14, 2022
Artificial Intelligence search algorithm base on Pacman

Pacman Search Artificial Intelligence search algorithm base on Pacman Source The Pacman Projects by the University of California, Berkeley. Layouts Di

Day Fundora 6 Nov 17, 2022
3D Human Pose Machines with Self-supervised Learning

3D Human Pose Machines with Self-supervised Learning Keze Wang, Liang Lin, Chenhan Jiang, Chen Qian, and Pengxu Wei, “3D Human Pose Machines with Self

Chenhan Jiang 398 Dec 20, 2022
PyTorch implementation of the ideas presented in the paper Interaction Grounded Learning (IGL)

Interaction Grounded Learning This repository contains a simple PyTorch implementation of the ideas presented in the paper Interaction Grounded Learni

Arthur Juliani 4 Aug 31, 2022
Using OpenAI's CLIP to upscale and enhance images

CLIP Upscaler and Enhancer Using OpenAI's CLIP to upscale and enhance images Based on nshepperd's JAX CLIP Guided Diffusion v2.4 Sample Results Viewpo

Tripp Lyons 5 Jun 14, 2022
Official implementation of the paper 'High-Resolution Photorealistic Image Translation in Real-Time: A Laplacian Pyramid Translation Network' in CVPR 2021

LPTN Paper | Supplementary Material | Poster High-Resolution Photorealistic Image Translation in Real-Time: A Laplacian Pyramid Translation Network Ji

372 Dec 26, 2022
AI grand challenge 2020 Repo (Speech Recognition Track)

KorBERT를 활용한 한국어 텍스트 기반 위협 상황인지(2020 인공지능 그랜드 챌린지) 본 프로젝트는 ETRI에서 제공된 한국어 korBERT 모델을 활용하여 폭력 기반 한국어 텍스트를 분류하는 다양한 분류 모델들을 제공합니다. 본 개발자들이 참여한 2020 인공지

Young-Seok Choi 23 Jan 25, 2022
DiscoNet: Learning Distilled Collaboration Graph for Multi-Agent Perception [NeurIPS 2021]

DiscoNet: Learning Distilled Collaboration Graph for Multi-Agent Perception [NeurIPS 2021] Yiming Li, Shunli Ren, Pengxiang Wu, Siheng Chen, Chen Feng

Automation and Intelligence for Civil Engineering (AI4CE) Lab @ NYU 98 Dec 21, 2022
Simple STAC Catalogs discovery tool.

STAC Catalog Discovery Simple STAC discovery tool. Just paste the STAC Catalog link and press Enter. Details STAC Discovery tool enables discovering d

Mykola Kozyr 21 Oct 19, 2022
Xview3 solution - XView3 challenge, 2nd place solution

Xview3, 2nd place solution https://iuu.xview.us/ test split aggregate score publ

Selim Seferbekov 24 Nov 23, 2022
Code of Puregaze: Purifying gaze feature for generalizable gaze estimation, AAAI 2022.

PureGaze: Purifying Gaze Feature for Generalizable Gaze Estimation Description Our work is accpeted by AAAI 2022. Picture: We propose a domain-general

39 Dec 05, 2022
Implementation of Vaswani, Ashish, et al. "Attention is all you need."

Attention Is All You Need Paper Implementation This is my from-scratch implementation of the original transformer architecture from the following pape

Brando Koch 195 Dec 30, 2022
Adaptive, interpretable wavelets across domains (NeurIPS 2021)

Adaptive wavelets Wavelets which adapt given data (and optionally a pre-trained model). This yields models which are faster, more compressible, and mo

Yu Group 50 Dec 16, 2022