BossNAS: Exploring Hybrid CNN-transformers with Block-wisely Self-supervised Neural Architecture Search

Related tags

Deep LearningBossNAS
Overview

BossNAS

This repository contains PyTorch evaluation code, retraining code and pretrained models of our paper: BossNAS: Exploring Hybrid CNN-transformers with Block-wisely Self-supervised Neural Architecture Search. [pdf link]

Illustration of the Siamese supernets training with ensemble bootstrapping.

Illustration of the fabric-like Hybrid CNN-transformer Search Space with flexible down-sampling positions.

Our Results and Trained Models

  • Here is a summary of our searched models:

    Model MAdds Steptime Top-1 (%) Top-5 (%) Url
    BossNet-T0 w/o SE 3.4B 101ms 80.5 95.0 checkpoint
    BossNet-T0 3.4B 115ms 80.8 95.2 checkpoint
    BossNet-T0^ 5.7B 147ms 81.6 95.6 same as above
    BossNet-T1 7.9B 156ms 81.9 95.6 checkpoint
    BossNet-T1^ 10.5B 165ms 82.2 95.7 same as above
  • Here is a summary of architecture rating accuracy of our method:

    Search space Dataset Kendall tau Spearman rho Pearson R
    MBConv ImageNet 0.65 0.78 0.85
    NATS-Bench Ss Cifar10 0.53 0.73 0.72
    NATS-Bench Ss Cifar100 0.59 0.76 0.79

Usage

1. Requirements

  • Install PyTorch 1.7.0+ and torchvision 0.8.1+, for example:
conda install -c pytorch pytorch torchvision
pip install timm==0.3.2

2. Retrain or Evaluate our BossNet-T models

  • First, move to retraining code directory to perform Retraining or Evaluation.

    cd HyTra_retraining

    Our retraining code of BossNet-T is based on DeiT repository.

  • You can evaluate our BossNet-T models with the following command:

    python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py --model bossnet_T0 --input-size 224 --batch-size 128 --data-path /PATH/TO/ImageNet --num_workers 8 --eval --resume PATH/TO/BossNet-T0-80_8.pth
    python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py --model bossnet_T1 --input-size 224 --batch-size 128 --data-path /PATH/TO/ImageNet --num_workers 8 --eval --resume PATH/TO/BossNet-T1-81_9.pth

    Please download our checkpoint files from the result table. Please change the --nproc_per_node option to suit your GPU numbers, and change the --data-path, --resume and --input-size accordingly.

  • You can retrain our BossNet-T models with the following command:

    Please change the --nproc_per_node and --data-path accordingly. Note that the learning rate will be automatically scaled according to the GPU numbers and batchsize. We recommend training with 128 batchsize and 8 GPUs.

    python -m torch.distributed.launch --nproc_per_node=8 --use_env main.py --model bossnet_T0 --input-size 224 --batch-size 128 --data-path /PATH/TO/ImageNet --num_workers 8
    python -m torch.distributed.launch --nproc_per_node=8 --use_env main.py --model bossnet_T1 --input-size 224 --batch-size 128 --data-path /PATH/TO/ImageNet --num_workers 8

Architecture of our BossNet-T0

3. Evaluate architecture rating accuracy of BossNAS

  • You can get the ranking correlations of BossNAS on MBConv search space with the following commands:

    cd MBConv_ranking
    python get_model_score_mbconv.py

  • You can get the ranking correlations of BossNAS on NATS-Bench Ss with the following commands:
    cd NATS_SS_ranking
    python get_model_score_nats.py

Citation

@article{li2021bossnas,
  author = {Li, Changlin and
            Tang, Tao and
            Wang, Guangrun and
            Peng, Jiefeng and
            Wang, Bing and
            Liang, Xiaodan and
            Chang, Xiaojun},
  title = {BossNAS: Exploring Hybrid CNN-transformers with Block-wisely Self-supervised Neural Architecture Search},
  journal = {arXiv:2103.12424},
  year = 2021,
}

TODO

Searching code will be released later.

Comments
  • How to select architectures from the trained supernet?

    How to select architectures from the trained supernet?

    Hi, thanks for your great work!

    I tried using your given searching code for training the supernet. But I did not figure out how to search the potential architectures from such a supernet?

    I guess the validation hook serves as such functions, but I did not find the saved path information after training one epoch. Are there other files I need to explore or just waiting for more epochs to be trained?

    Could you advise me about that, thanks in advance for your time and help!

    Best, Haoran

    opened by ranery 7
  • imagenet ACC1 is low (49.6%)  when evaluate BossNet-T0-80_8.pth

    imagenet ACC1 is low (49.6%) when evaluate BossNet-T0-80_8.pth

    Hello! I try to reproduce your model,but when I evaluate the pretrained model(BossNet-T0-80_8.pth),the ACC1 is too low! Did i miss something? Can you help me?

    The run command as follows: [email protected]:/data/juicefs_hz_cv_v3/11135821/bak/BossNAS/retraining_hytra# python main.py --model bossnet_T0 --input-size 224 --batch-size 128 --eval --resume /data/juicefs_hz_cv_v3/11135821/bak/model/BossNet-T0-80_8.pth Not using distributed mode Namespace(aa='rand-m9-mstd0.5-inc1', batch_size=128, clip_grad=None, color_jitter=0.4, cooldown_epochs=10, cutmix=1.0, cutmix_minmax=None, data_path='/data/glusterfs_cv_04/public_data/imagenet/CLS-LOC/', data_set='IMNET', decay_epochs=30, decay_rate=0.1, device='cuda', dist_url='env://', distributed=False, drop=0.0, drop_block=None, drop_path=0.1, epochs=300, eval=True, inat_category='name', input_size=224, local_rank=0, lr=0.0005, lr_noise=None, lr_noise_pct=0.67, lr_noise_std=1.0, min_lr=1e-05, mixup=0.8, mixup_mode='batch', mixup_prob=1.0, mixup_switch_prob=0.5, model='bossnet_T0', model_ema=True, model_ema_decay=0.99996, model_ema_force_cpu=False, momentum=0.9, num_workers=10, opt='adamw', opt_betas=None, opt_eps=1e-08, output_dir='output/bossnet_T0-20210804-163815', patience_epochs=10, pin_mem=True, recount=1, remode='pixel', repeated_aug=True, reprob=0.25, resplit=False, resume='/data/juicefs_hz_cv_v3/11135821/bak/model/BossNet-T0-80_8.pth', sched='cosine', seed=0, smoothing=0.1, start_epoch=0, train_interpolation='bicubic', warmup_epochs=5, warmup_lr=1e-06, weight_decay=0.05, world_size=1) Creating model: bossnet_T0 number of params: 38415960 Test: [ 0/261] eta: 0:39:36 loss: 1.9650 (1.9650) acc1: 68.2292 (68.2292) acc5: 90.1042 (90.1042) time: 9.1044 data: 4.8654 max mem: 5605 Test: [ 50/261] eta: 0:01:38 loss: 3.2916 (3.0472) acc1: 41.6667 (47.3039) acc5: 65.1042 (70.5372) time: 0.2928 data: 0.0004 max mem: 5605 Test: [100/261] eta: 0:01:01 loss: 2.9675 (3.1048) acc1: 46.8750 (45.7921) acc5: 69.2708 (69.9722) time: 0.2953 data: 0.0003 max mem: 5605 Test: [150/261] eta: 0:00:39 loss: 2.4230 (2.9457) acc1: 55.2083 (47.8960) acc5: 75.5208 (71.5370) time: 0.2989 data: 0.0003 max mem: 5605 Test: [200/261] eta: 0:00:20 loss: 2.6540 (2.9105) acc1: 48.4375 (48.1913) acc5: 68.7500 (71.3490) time: 0.3023 data: 0.0002 max mem: 5605 Test: [250/261] eta: 0:00:03 loss: 1.6506 (2.8344) acc1: 61.9792 (49.0310) acc5: 83.8542 (72.0431) time: 0.3036 data: 0.0003 max mem: 5605 Test: [260/261] eta: 0:00:00 loss: 1.6135 (2.8001) acc1: 65.6250 (49.5740) acc5: 86.4583 (72.5500) time: 0.3888 data: 0.0001 max mem: 5605 Test: Total time: 0:01:28 (0.3397 s / it)

    opened by fanliaveline 4
  • Some questions about the code and paper.

    Some questions about the code and paper.

    Hi, great work!

    I have some question about the code and paper:

    1. In section 3.3 of the paper which is about the searching phase, when calculating the evaluation loss in equation(5) and (6) the probability ensemble of the architecture population is from the online network, but in the code it's from the target network, which makes me confused.

    2. Still in section 3.3, it is mentioned that the searching are with an evolutionary algorithm, I read the references[12] and [54] but still have no clue how the evolutionary algorithm is implemented in the code, to be specific, how the architecture population is evolved?

    3. In the code of hytra_supernet.py, the stage depths are set to [4,3,2,2], is there a particular reason to set so? why not use [4,4,4,4] so that all possible pathes can be chosen?

    Thanks a lot for your time and I'm looking forward to your reply!

    opened by zhy0860 2
  • About formulation (1) and (6)

    About formulation (1) and (6)

    Hi, very thanks for sharing your nice work. In the paper's formulation (1) and (6), all has λ_k. But it seems to be no explaination about them. Could you please point it out here.

    opened by NickChang97 2
  • Some questions about BossNAS

    Some questions about BossNAS

    Hi, thanks for your excellent work~

    It is inspiring and practical for improving the sub-net ranking correlations. But I have a few questions.

    1. Although it is beneficial to upgrade the Ranking correlation on each small stage by progressively searching, will it lead to the accumulation of error? The best path of the previous stage maybe not suitable for the following. How could explain it?
    2. Why is the ResAttn operator only searched on depth=1/2?
    3. On the hybrid search space, ensemble different resolutions output is weird, since it discards the structure information by adaptive pooling, so I don't know why it can be suitable.
    4. As shown in Table~4, Unsupv. EB is better than Supv. class. Do you have a theoretical explanation about it?
    opened by huizhang0110 2
  • Question about Ranking nats

    Question about Ranking nats

    after run code:

    cd ranking_nats
    python get_model_score_nats.py
    

    I got:

    kendall tau begin
    BossNAS: KendalltauResult(correlation=-0.534180602248828, pvalue=0.0)
    (-0.7180607093955225, 0.0)
    SpearmanrResult(correlation=-0.7341493538551311, pvalue=0.0)
    
    opened by pprp 2
  • How to obtain the searched model

    How to obtain the searched model

    I used the searching code for a small number of epochs, can you share where exactly is the best model architecture stored when any custom NAS is performed? the pth files are saved in work_dir but im not sure where the corresponding architecture is stored so I can use a custom generated model together with these weights ?

    opened by hamdjalil 3
  • Question about the code of searching second Hytra block.

    Question about the code of searching second Hytra block.

    Hi, appreciate it for your time.

    I find an issue in the code of Hytra search phase. When searching for the second block, after the first evaluation the chosen best path of the second block will be appended after the best path of the first block, then the training process is conducted in a three block structure.

    Detailed codes are as follows: (val_hook.py) if self.every_n_epochs(runner, block_inteval): best_path = results[0][0] best_path = [int(i) for i in list(best_path)]

            if len(model.best_paths) == model.start_block + 1:
                model.best_paths.pop()
            model.best_paths.append(best_path)
    

    (siamese_supernets_hytra.py ) if self.start_block > 0: for i, best_path in enumerate(self.best_paths): img_v1 = self.online_backbone(img_v1, start_block=i, forward_op=best_path, block_op=True)[0] img_v2 = self.online_backbone(img_v2, start_block=i, forward_op=best_path, block_op=True)[0]

    In other word, the searching is not continued afte a frozen best path of previous block, but with two, the best path of the current block chosen by each evaluation stage is also freezed and appended, it means the path of second block will appear twice during searching. I can't understand why doing so.

    It will lead to an issue that if the downsampling is used in the freezed best path of the second block. For instance, suppose the spatial resolution has already reached the smallest scale 1/32 in the freezed previous best path of second block, when continuing searcing if the downsampling is occured again in the current path of second block, there will be an error of mismatch in shape. It makes me confused and we did encounter this problem in our implemetation.

    I'm sorry if I haven't described the issue clearly. Thanks a lot for your time again and I'm looking forward to your reply.

    bug 
    opened by zhy0860 2
  • Running HyTra search on CIFAR10 ( or potentially on other datasets...)

    Running HyTra search on CIFAR10 ( or potentially on other datasets...)

    I have some doubts on how to search on HyTra with datasets different from Imagenet. Is it possible? I tried to run the search on CIFAR10 but it gives me this error with a HyTracifar10 config file: RuntimeError: Given groups=1, weight of size [256, 1024, 1, 1], expected input[256, 512, 4, 4] to have 1024 channels, but got 512 channels instead. This is the config file I created for this purpose. Since the configs file were not so clear to me (my fault) I simply tried to mix the NATScifar10 config file and the HytraImagenet config file to obtain a HyTracifar10 version. The model and the dataset seem to be created/loaded correctly, I think there is a kind of mismatch on size. Up to now, I'm trying to run only on CIFAR10 but my intention is to generalize the process on different datasets (not only the most famous ones). I would like to know if this generalization can be already obtained with your code (or with slight modifies) or if the work was supposed to run only on the main datasets.

    HytraCifar10 config

    import copy base = 'base.py'

    model = dict( type='SiameseSupernetsHyTra', pretrained=None, base_momentum=0.99, pre_conv=True, backbone=dict( type='SupernetHyTra', ), start_block=0, num_block=4, neck=dict( type='NonLinearNeckSimCLRProject', in_channels=2048, hid_channels=4096, out_channels=256, num_layers=2, sync_bn=False, with_bias=True, with_last_bn=False, with_avg_pool=True), head=dict(type='LatentPredictHead', size_average=True, predictor=dict(type='NonLinearNeckSimCLR', in_channels=256, hid_channels=4096, out_channels=256, num_layers=2, sync_bn=False, with_bias=True, with_last_bn=False, with_avg_pool=False)))

    dataset settings

    data_source_cfg = dict(type='NATSCifar10', root='../data/cifar/', return_label=False) train_dataset_type = 'BYOLDataset' test_dataset_type = 'StoragedBYOLDataset' img_norm_cfg = dict(mean=[0.4914, 0.4822, 0.4465], std=[0.2023, 0.1994, 0.201]) train_pipeline = [ dict(type='RandomCrop', size=32, padding=4), dict(type='RandomHorizontalFlip'), ]

    prefetch

    prefetch = False if not prefetch: train_pipeline.extend([dict(type='ToTensor'), dict(type='Normalize', **img_norm_cfg)]) train_pipeline1 = copy.deepcopy(train_pipeline) train_pipeline2 = copy.deepcopy(train_pipeline)

    test_pipeline1 = copy.deepcopy(train_pipeline1) test_pipeline2 = copy.deepcopy(train_pipeline2) data = dict( imgs_per_gpu=256, # total 256*4(gpu)*4(interval)=4096 workers_per_gpu=2, train=dict( type=train_dataset_type, data_source=dict(split='train', **data_source_cfg), pipeline1=train_pipeline1, pipeline2=train_pipeline2), val=dict( type=test_dataset_type, data_source=dict(split='test', **data_source_cfg), pipeline1=test_pipeline1, pipeline2=test_pipeline2,), test=dict( type=test_dataset_type, data_source=dict(split='test', **data_source_cfg), pipeline1=test_pipeline1, pipeline2=test_pipeline2,))

    optimizer

    optimizer = dict(type='LARS', lr=4.8, weight_decay=0.000001, momentum=0.9, paramwise_options={ '(bn|gn)(\d+)?.(weight|bias)': dict(weight_decay=0., lars_exclude=True), 'bias': dict(weight_decay=0., lars_exclude=True)})

    apex

    use_fp16 = True

    interval for accumulate gradient

    update_interval = 8 optimizer_config = dict(update_interval=update_interval, use_fp16=use_fp16)

    learning policy

    lr_config = dict( policy='CosineAnnealing', min_lr=0., warmup='linear', warmup_iters=1, warmup_ratio=0.0001, # cannot be 0 warmup_by_epoch=True) checkpoint_config = dict(interval=1)

    runtime settings

    total_epochs = 24

    additional hooks

    custom_hooks = [ dict(type='BYOLHook', end_momentum=1., update_interval=update_interval), dict(type='RandomPathHook'), dict( type='ValBestPathHook', dataset=data['val'], bn_dataset=data['train'], initial=True, interval=2, optimizer_cfg=optimizer, lr_cfg=lr_config, imgs_per_gpu=256, workers_per_gpu=4, epoch_per_stage=6, resume_best_path='') # e.g. 'path_rank/bestpath_2.yml' ]

    resume_from = 'checkpoints/stage3_epoch3.pth'

    resume_optimizer = False

    cudnn_benchmark = True

    opened by matteogambella 0
  • Pre-trained supernet weights release

    Pre-trained supernet weights release

    Hi,

    Your approach is very impressive; I was wondering if you're planning to release the weights of the supernets you trained? (I'm specifically interested in the HyTra supernet)

    opened by AwesomeLemon 2
Owner
Changlin Li
Changlin Li
Satellite labelling tool for manual labelling of storm top features such as overshooting tops, above-anvil plumes, cold U/Vs, rings etc.

Satellite labelling tool About this app A tool for manual labelling of storm top features such as overshooting tops, above-anvil plumes, cold U/Vs, ri

Czech Hydrometeorological Institute - Satellite Department 10 Sep 14, 2022
Efficient Lottery Ticket Finding: Less Data is More

The lottery ticket hypothesis (LTH) reveals the existence of winning tickets (sparse but critical subnetworks) for dense networks, that can be trained in isolation from random initialization to match

VITA 20 Sep 04, 2022
PyTorch for Semantic Segmentation

PyTorch for Semantic Segmentation This repository contains some models for semantic segmentation and the pipeline of training and testing models, impl

Zijun Deng 1.7k Jan 06, 2023
AI Toolkit for Healthcare Imaging

Medical Open Network for AI MONAI is a PyTorch-based, open-source framework for deep learning in healthcare imaging, part of PyTorch Ecosystem. Its am

Project MONAI 3.7k Jan 07, 2023
PyTorch implementation of "Image-to-Image Translation Using Conditional Adversarial Networks".

pix2pix-pytorch PyTorch implementation of Image-to-Image Translation Using Conditional Adversarial Networks. Based on pix2pix by Phillip Isola et al.

mrzhu 383 Dec 17, 2022
Style transfer between images was performed using the VGG19 model

Style transfer between images was performed using the VGG19 model. The necessary codes, libraries and all other information of this project are available below

Onur yılmaz 2 May 09, 2022
Script utilizando OpenCV e modelo Machine Learning para detectar o uso de máscaras.

Reconhecendo máscaras Este repositório contém um script em Python3 que reconhece se um rosto está ou não portando uma máscara! O código utiliza da bib

Maria Eduarda de Azevedo Silva 168 Oct 20, 2022
A CNN implementation using only numpy. Supports multidimensional images, stride, etc.

A CNN implementation using only numpy. Supports multidimensional images, stride, etc. Speed up due to heavy use of slicing and mathematical simplification..

2 Nov 30, 2021
[ICCV 2021] Official Tensorflow Implementation for "Single Image Defocus Deblurring Using Kernel-Sharing Parallel Atrous Convolutions"

KPAC: Kernel-Sharing Parallel Atrous Convolutional block This repository contains the official Tensorflow implementation of the following paper: Singl

Hyeongseok Son 50 Dec 29, 2022
Self-Supervised CNN-GCN Autoencoder

GCNDepth Self-Supervised CNN-GCN Autoencoder GCNDepth: Self-supervised monocular depth estimation based on graph convolutional network To be published

53 Dec 14, 2022
Measures input lag without dedicated hardware, performing motion detection on recorded or live video

What is InputLagTimer? This tool can measure input lag by analyzing a video where both the game controller and the game screen can be seen on a webcam

Bruno Gonzalez 4 Aug 18, 2022
Spatial Action Maps for Mobile Manipulation (RSS 2020)

spatial-action-maps Update: Please see our new spatial-intention-maps repository, which extends this work to multi-agent settings. It contains many ne

Jimmy Wu 27 Nov 30, 2022
Light-Head R-CNN

Light-head R-CNN Introduction We release code for Light-Head R-CNN. This is my best practice for my research. This repo is organized as follows: light

jemmy li 835 Dec 06, 2022
Phonetic PosteriorGram (PPG)-Based Voice Conversion (VC)

ppg-vc Phonetic PosteriorGram (PPG)-Based Voice Conversion (VC) This repo implements different kinds of PPG-based VC models. Pretrained models. More m

Liu Songxiang 227 Dec 28, 2022
Implementation of Perceiver, General Perception with Iterative Attention, in Pytorch

Perceiver - Pytorch Implementation of Perceiver, General Perception with Iterative Attention, in Pytorch Install $ pip install perceiver-pytorch Usage

Phil Wang 876 Dec 29, 2022
[ICCV'21] Neural Radiance Flow for 4D View Synthesis and Video Processing

NeRFlow [ICCV'21] Neural Radiance Flow for 4D View Synthesis and Video Processing Datasets The pouring dataset used for experiments can be download he

44 Dec 20, 2022
DiSECt: Differentiable Simulator for Robotic Cutting

DiSECt: Differentiable Simulator for Robotic Cutting Website | Paper | Dataset | Video | Blog post DiSECt is a simulator for the cutting of deformable

NVIDIA Research Projects 73 Oct 29, 2022
The Habitat-Matterport 3D Research Dataset - the largest-ever dataset of 3D indoor spaces.

Habitat-Matterport 3D Dataset (HM3D) The Habitat-Matterport 3D Research Dataset is the largest-ever dataset of 3D indoor spaces. It consists of 1,000

Meta Research 62 Dec 27, 2022
AFLNet: A Greybox Fuzzer for Network Protocols

AFLNet: A Greybox Fuzzer for Network Protocols AFLNet is a greybox fuzzer for protocol implementations. Unlike existing protocol fuzzers, it takes a m

626 Jan 06, 2023
LAnguage Model Analysis

LAMA: LAnguage Model Analysis LAMA is a probe for analyzing the factual and commonsense knowledge contained in pretrained language models. The dataset

Meta Research 960 Jan 08, 2023