Code of PVTv2 is released! PVTv2 largely improves PVTv1 and works better than Swin Transformer with ImageNet-1K pre-training.

Overview

Updates

  • (2020/06/21) Code of PVTv2 is released! PVTv2 largely improves PVTv1 and works better than Swin Transformer with ImageNet-1K pre-training.

Pyramid Vision Transformer

The image is from Transformers: Revenge of the Fallen.

This repository contains the official implementation of PVTv1 & PVTv2 in image classification, object detection, and semantic segmentation tasks.

Model Zoo

Image Classification

Classification configs & weights see >>>here<<<.

  • PVTv2 on ImageNet-1K
Method Size [email protected] #Params (M)
PVTv2-B0 224 70.5 3.7
PVTv2-B1 224 78.7 14.0
PVTv2-B2-Linear 224 82.1 22.6
PVTv2-B2 224 82.0 25.4
PVTv2-B3 224 83.1 45.2
PVTv2-B4 224 83.6 62.6
PVTv2-B5 224 83.8 82.0
  • PVTv1 on ImageNet-1K
Method Size [email protected] #Params (M)
PVT-Tiny 224 75.1 13.2
PVT-Small 224 79.8 24.5
PVT-Medium 224 81.2 44.2
PVT-Large 224 81.7 61.4

Object Detection

Detection configs & weights see >>>here<<<.

  • PVTv2 on COCO

Baseline Detectors

Method Backbone Pretrain Lr schd Aug box AP mask AP
RetinaNet PVTv2-b0 ImageNet-1K 1x No 37.2 -
RetinaNet PVTv2-b1 ImageNet-1K 1x No 41.2 -
RetinaNet PVTv2-b2 ImageNet-1K 1x No 44.6 -
RetinaNet PVTv2-b3 ImageNet-1K 1x No 45.9 -
RetinaNet PVTv2-b4 ImageNet-1K 1x No 46.1 -
RetinaNet PVTv2-b5 ImageNet-1K 1x No 46.2 -
Mask R-CNN PVTv2-b0 ImageNet-1K 1x No 38.2 36.2
Mask R-CNN PVTv2-b1 ImageNet-1K 1x No 41.8 38.8
Mask R-CNN PVTv2-b2 ImageNet-1K 1x No 45.3 41.2
Mask R-CNN PVTv2-b3 ImageNet-1K 1x No 47.0 42.5
Mask R-CNN PVTv2-b4 ImageNet-1K 1x No 47.5 42.7
Mask R-CNN PVTv2-b5 ImageNet-1K 1x No 47.4 42.5

Advanced Detectors

Method Backbone Pretrain Lr schd Aug box AP mask AP
Cascade Mask R-CNN PVTv2-b2-Linear ImageNet-1K 3x Yes 50.9 44.0
Cascade Mask R-CNN PVTv2-b2 ImageNet-1K 3x Yes 51.1 44.4
ATSS PVTv2-b2-Linear ImageNet-1K 3x Yes 48.9 -
ATSS PVTv2-b2 ImageNet-1K 3x Yes 49.9 -
GFL PVTv2-b2-Linear ImageNet-1K 3x Yes 49.2 -
GFL PVTv2-b2 ImageNet-1K 3x Yes 50.2 -
Sparse R-CNN PVTv2-b2-Linear ImageNet-1K 3x Yes 48.9 -
Sparse R-CNN PVTv2-b2 ImageNet-1K 3x Yes 50.1 -
  • PVTv1 on COCO
Detector Backbone Pretrain Lr schd box AP mask AP
RetinaNet PVT-Tiny ImageNet-1K 1x 36.7 -
RetinaNet PVT-Small ImageNet-1K 1x 40.4 -
Mask RCNN PVT-Tiny ImageNet-1K 1x 36.7 35.1
Mask RCNN PVT-Small ImageNet-1K 1x 40.4 37.8
DETR PVT-Small ImageNet-1K 50ep 34.7 -

Semantic Segmentation

Segmentation configs & weights see >>>here<<<.

  • PVTv1 on ADE20K
Method Backbone Pretrain Iters mIoU
Semantic FPN PVT-Tiny ImageNet-1K 40K 35.7
Semantic FPN PVT-Small ImageNet-1K 40K 39.8
Semantic FPN PVT-Medium ImageNet-1K 40K 41.6
Semantic FPN PVT-Large ImageNet-1K 40K 42.1

License

This repository is released under the Apache 2.0 license as found in the LICENSE file.

Citation

If you use this code for a paper, please cite:

PVTv1

@misc{wang2021pyramid,
      title={Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions}, 
      author={Wenhai Wang and Enze Xie and Xiang Li and Deng-Ping Fan and Kaitao Song and Ding Liang and Tong Lu and Ping Luo and Ling Shao},
      year={2021},
      eprint={2102.12122},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

PVTv2

@misc{wang2021pvtv2,
      title={PVTv2: Improved Baselines with Pyramid Vision Transformer}, 
      author={Wenhai Wang and Enze Xie and Xiang Li and Deng-Ping Fan and Kaitao Song and Ding Liang and Tong Lu and Ping Luo and Ling Shao},
      year={2021},
      eprint={2106.13797},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Contact

This repo is currently maintained by Wenhai Wang (@whai362), Enze Xie (@xieenze), and Zhe Chen (@czczup).

Comments
  • Mask R-CNN configs

    Mask R-CNN configs

    Hi, thank you for your great work! Recently we would like to compare your model with ours on the Mask R-CNN results. I wonder if you can provide some configs for Mask R-CNN settings? Thanks!

    opened by xwjabc 10
  • semantic segmentation code

    semantic segmentation code

    Hi,thaks for your excellent work!!! I have read your paper named 'Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions', and want to apply it in my work in semantic segmentation, When will you make the semantic segmentation code and models public?

    opened by hgmlu 8
  • About FLOPs calculation in Table 2

    About FLOPs calculation in Table 2

    Hi Wenhai, thanks for this great work.

    I have few questions about the FLOPs calculation in this paper. Previously I tested the DeiT models with ptflops, I got 2.51G, 9.20G, 35.13G FLOPs for DeiT-Tiny, DeiT-Small, DeiT-Base, respectively.

    B.T.W I also included the matrix multiplications in the self-attention layer, namely q @ k and attn @ v. I assume there is something wrong with my calculation, may I know how do you calculate FLOPs for your experiments?

    Thanks.

    opened by HubHop 6
  • tkinter.tclerror

    tkinter.tclerror

    thanks for your work. i test demo.py and face this problems: if i comment out model.show_result, can obtain the result normally. Traceback (most recent call last): File "demo.py", line 62, in main(args) File "demo.py", line 35, in main model.show_result( File "/home/test/anaconda3/envs/pytensorrt/lib/python3.8/site-packages/mmdet/models/detectors/base.py", line 327, in show_result img = imshow_det_bboxes( File "/home/test/anaconda3/envs/pytensorrt/lib/python3.8/site-packages/mmdet/core/visualization/image.py", line 113, in imshow_det_bboxes fig = plt.figure(win_name, frameon=False) File "/home/test/anaconda3/envs/pytensorrt/lib/python3.8/site-packages/matplotlib/pyplot.py", line 687, in figure figManager = new_figure_manager(num, figsize=figsize, File "/home/test/anaconda3/envs/pytensorrt/lib/python3.8/site-packages/matplotlib/pyplot.py", line 315, in new_figure_manager return _backend_mod.new_figure_manager(*args, **kwargs) File "/home/test/anaconda3/envs/pytensorrt/lib/python3.8/site-packages/matplotlib/backend_bases.py", line 3494, in new_figure_manager return cls.new_figure_manager_given_figure(num, fig) File "/home/test/anaconda3/envs/pytensorrt/lib/python3.8/site-packages/matplotlib/backends/_backend_tk.py", line 885, in new_figure_manager_given_figure window = tk.Tk(className="matplotlib") File "/home/test/anaconda3/envs/pytensorrt/lib/python3.8/tkinter/init.py", line 2261, in init self.tk = _tkinter.create(screenName, baseName, className, interactive, wantobjects, useTk, sync, use) _tkinter.TclError: couldn't connect to display "localhost:10.0"

    opened by shengyuan-tang 4
  • how can i load pickle file?

    how can i load pickle file?

    thanks for sharing the code .. i'm trying to load pickle file to read it using these commands

    import pickle infile = open('data.pkl','rb') new_dict = pickle.load(infile) infile.close() print(type(new_dict)) but error is _pickle.UnpicklingError: A load persistent id instruction was encountered, but no persistent_load function was specified. i searched for the solution but got that pickle file appears to be using advanced features that suggest it was never supposed to be directly loaded this way. can you help, please ?

    opened by mathshangw 4
  • question for PVTv2: in the paper the reduction ratio is 7 in Linear SRA, but in the code  is sr_ratios=[8, 4, 2, 1]

    question for PVTv2: in the paper the reduction ratio is 7 in Linear SRA, but in the code is sr_ratios=[8, 4, 2, 1]

    question for PVTv2: in the paper the reduction ratio is 7 in Linear SRA, but in the code is sr_ratios=[8, 4, 2, 1],

    Is there something wrong with my understanding?

    opened by StormArcher 3
  • Low mAP on coco val

    Low mAP on coco val

    Hello, thx for your work. I was trying to train RetinaNet-FPN-PVTv2-B2-1x model on COCO2017, the reported mAP on val set is 44.6, but the results i got after training was only 33.5. Is there anything wrong?

    I trained on 8 V100 GPUs using your provided pre-trained model pvt_v2_b2.pth. Training script was: ./dist_train.sh configs/retinanet_pvt_v2_b2_fpn_1x_coco.py 8

    The config file was: model = dict( type='RetinaNet', pretrained='/opt/tiger/wanxingyu_tfm/pvt/pretrained/pvt_v2_b2.pth', backbone=dict( type='pvt_v2_b2', depth=50, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, norm_cfg=dict(type='BN', requires_grad=True), norm_eval=True, style='pytorch'), neck=dict( type='FPN', in_channels=[64, 128, 320, 512], out_channels=256, start_level=1, add_extra_convs='on_input', num_outs=5), bbox_head=dict( type='RetinaHead', num_classes=80, in_channels=256, stacked_convs=4, feat_channels=256, anchor_generator=dict( type='AnchorGenerator', octave_base_scale=4, scales_per_octave=3, ratios=[0.5, 1.0, 2.0], strides=[8, 16, 32, 64, 128]), bbox_coder=dict( type='DeltaXYWHBBoxCoder', target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[1.0, 1.0, 1.0, 1.0]), loss_cls=dict( type='FocalLoss', use_sigmoid=True, gamma=2.0, alpha=0.25, loss_weight=1.0), loss_bbox=dict(type='L1Loss', loss_weight=1.0)), train_cfg=dict( assigner=dict( type='MaxIoUAssigner', pos_iou_thr=0.5, neg_iou_thr=0.4, min_pos_iou=0, ignore_iof_thr=-1), allowed_border=-1, pos_weight=-1, debug=False), test_cfg=dict( nms_pre=1000, min_bbox_size=0, score_thr=0.05, nms=dict(type='nms', iou_threshold=0.5), max_per_img=100)) dataset_type = 'CocoDataset' data_root = '/opt/tiger/wanxingyu_tfm/datasets/coco/' img_norm_cfg = dict( mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) train_pipeline = [ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations', with_bbox=True), dict(type='Resize', img_scale=(1333, 800), keep_ratio=True), dict(type='RandomFlip', flip_ratio=0.5), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']) ] test_pipeline = [ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(1333, 800), flip=False, transforms=[ dict(type='Resize', keep_ratio=True), dict(type='RandomFlip'), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='ImageToTensor', keys=['img']), dict(type='Collect', keys=['img']) ]) ] data = dict( samples_per_gpu=2, workers_per_gpu=2, train=dict( type='CocoDataset', ann_file= '/opt/tiger/wanxingyu_tfm/datasets/coco/annotations/instances_train2017.json', img_prefix='/opt/tiger/wanxingyu_tfm/datasets/coco/train2017/', pipeline=[ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations', with_bbox=True), dict(type='Resize', img_scale=(1333, 800), keep_ratio=True), dict(type='RandomFlip', flip_ratio=0.5), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']) ]), val=dict( type='CocoDataset', ann_file= '/opt/tiger/wanxingyu_tfm/datasets/coco/annotations/instances_val2017.json', img_prefix='/opt/tiger/wanxingyu_tfm/datasets/coco/val2017/', pipeline=[ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(1333, 800), flip=False, transforms=[ dict(type='Resize', keep_ratio=True), dict(type='RandomFlip'), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='ImageToTensor', keys=['img']), dict(type='Collect', keys=['img']) ]) ]), test=dict( type='CocoDataset', ann_file= '/opt/tiger/wanxingyu_tfm/datasets/coco/annotations/instances_val2017.json', img_prefix='/opt/tiger/wanxingyu_tfm/datasets/coco/val2017/', pipeline=[ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(1333, 800), flip=False, transforms=[ dict(type='Resize', keep_ratio=True), dict(type='RandomFlip'), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='ImageToTensor', keys=['img']), dict(type='Collect', keys=['img']) ]) ])) evaluation = dict(interval=1, metric='bbox') optimizer = dict(type='AdamW', lr=0.0001, weight_decay=0.0001) optimizer_config = dict(grad_clip=None) lr_config = dict( policy='step', warmup='linear', warmup_iters=500, warmup_ratio=0.001, step=[8, 11]) runner = dict(type='EpochBasedRunner', max_epochs=12) checkpoint_config = dict(interval=1) log_config = dict(interval=5, hooks=[dict(type='TextLoggerHook')]) custom_hooks = [dict(type='NumClassCheckHook')] dist_params = dict(backend='nccl') log_level = 'INFO' load_from = None resume_from = None workflow = [('train', 1)] work_dir = './work_dirs/retinanet_pvt_v2_b2_fpn_1x_coco' gpu_ids = range(0, 8)

    Test script was: ./dist_test.sh configs/retinanet_pvt_v2_b2_fpn_1x_coco.py work_dirs/retinanet_pvt_v2_b2_fpn_1x_coco/epoch_12.pth 8 --eval bbox

    The result i got: Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.335 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=1000 ] = 0.514 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=1000 ] = 0.352 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.190 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.356 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.450 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.525 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=300 ] = 0.525 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=1000 ] = 0.525 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.325 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.561 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.683 OrderedDict([('bbox_mAP', 0.335), ('bbox_mAP_50', 0.514), ('bbox_mAP_75', 0.352), ('bbox_mAP_s', 0.19), ('bbox_mAP_m', 0.356), ('bbox_mAP_l', 0.45), ('bbox_mAP_copypaste', '0.335 0.514 0.352 0.190 0.356 0.450')])

    opened by memorywxy 3
  • How can I get small_pvt.pth?

    How can I get small_pvt.pth?

    I run your main.py .. I'm confusing what this class do ? it gave me the accuracy for 500 epoch and loss of them right ? and when I tried to train my images by this command 'dist_train.sh configs/retinanet_pvt_s_fpn_1x_coco_640.py 1'

    I got that small_pvt.pth not found .. excuse me does that will be the weights ? Or checkpoints ?

    Does small_pvt.pth here https://drive.google.com/file/d/1vtcyoU8KUqNzktlMGXZrYcMRsNNiVZFQ/view?usp=sharing For imagenet ? But how can I got pth file.if the dataset.is different . Appreciating your reply. Thanks

    opened by SamMohel 3
  • problems about loading pretrained model with pytorch version below 1.6

    problems about loading pretrained model with pytorch version below 1.6

    problems about loading pretrained model with pytorch version below 1.6

    pytorch 1.6 have switched torch.save to use a zip file-based format by default rather than the old Pickle-based format. This cause pytorchs with version below 1.6 can not load the pretained models AT ALL.

    Can you use "_use_new_zipfile_serialization=False" when using torch.save()? just like torch.save(m.state_dict(), 'mymodel.pt', _use_new_zipfile_serialization=False). And provide another version of pretrained models?

    Thanks a lot!!!!

    opened by WxWstranger 3
  • PVT Large deosn't converge

    PVT Large deosn't converge

    Thanks for your great work. But when I trained PVT Large (pvt_large) as your default settings, the model didn't converge. The loss declined correctly in the first 37 epochs and the accuracy went to 57% but the model went wrong at 38th epoch. I used your code without any change. What's the problem? Thank you!

    Below is a part of my training log.

    Test: Total time: 0:01:55 (0.4429 s / it)

    • [email protected] 57.009 [email protected] 81.174 loss 1.948 Accuracy of the network on the 50000 test images: 57.0% Max accuracy: 57.01% Epoch: [38] [ 0/1251] eta: 2:06:33 lr: 0.000963 loss: 4.9324 (4.9324) time: 6.0701 data: 3.6057 max mem: 25529 Epoch: [38] [ 10/1251] eta: 0:31:59 lr: 0.000963 loss: 4.5930 (4.5768) time: 1.5465 data: 0.3281 max mem: 25529 Epoch: [38] [ 20/1251] eta: 0:27:07 lr: 0.000963 loss: 4.6624 (4.6160) time: 1.0843 data: 0.0003 max mem: 25529 Epoch: [38] [ 30/1251] eta: 0:25:15 lr: 0.000963 loss: 4.7355 (4.5806) time: 1.0737 data: 0.0003 max mem: 25529 Epoch: [38] [ 40/1251] eta: 0:24:16 lr: 0.000963 loss: 4.6986 (4.5811) time: 1.0784 data: 0.0003 max mem: 25529 Epoch: [38] [ 50/1251] eta: 0:23:33 lr: 0.000963 loss: 4.6986 (4.5609) time: 1.0766 data: 0.0003 max mem: 25529 Epoch: [38] [ 60/1251] eta: 0:23:07 lr: 0.000963 loss: 4.7104 (4.5901) time: 1.0864 data: 0.0003 max mem: 25529 Epoch: [38] [ 70/1251] eta: 0:22:39 lr: 0.000963 loss: 4.8095 (4.6143) time: 1.0854 data: 0.0003 max mem: 25529 Epoch: [38] [ 80/1251] eta: 0:22:17 lr: 0.000963 loss: 4.7373 (4.5898) time: 1.0721 data: 0.0003 max mem: 25529 Epoch: [38] [ 90/1251] eta: 0:21:55 lr: 0.000963 loss: 4.4603 (4.5742) time: 1.0696 data: 0.0003 max mem: 25529 Epoch: [38] [ 100/1251] eta: 0:21:37 lr: 0.000963 loss: 4.5539 (4.5777) time: 1.0682 data: 0.0003 max mem: 25529 Epoch: [38] [ 110/1251] eta: 0:21:21 lr: 0.000963 loss: 4.9701 (4.5993) time: 1.0787 data: 0.0003 max mem: 25529 Epoch: [38] [ 120/1251] eta: 0:21:06 lr: 0.000963 loss: 4.9029 (4.5914) time: 1.0811 data: 0.0003 max mem: 25529 Epoch: [38] [ 130/1251] eta: 0:20:50 lr: 0.000963 loss: 4.7300 (4.5999) time: 1.0711 data: 0.0003 max mem: 25529 Epoch: [38] [ 140/1251] eta: 0:20:35 lr: 0.000963 loss: 4.7998 (4.5936) time: 1.0630 data: 0.0003 max mem: 25529 Epoch: [38] [ 150/1251] eta: 0:20:23 lr: 0.000963 loss: 4.8562 (4.5969) time: 1.0850 data: 0.0003 max mem: 25529 Epoch: [38] [ 160/1251] eta: 0:20:09 lr: 0.000963 loss: 4.8583 (4.5961) time: 1.0852 data: 0.0003 max mem: 25529 Epoch: [38] [ 170/1251] eta: 0:19:55 lr: 0.000963 loss: 4.8583 (4.6029) time: 1.0677 data: 0.0003 max mem: 25529 Epoch: [38] [ 180/1251] eta: 0:19:42 lr: 0.000963 loss: 5.0298 (4.6202) time: 1.0675 data: 0.0003 max mem: 25529 Epoch: [38] [ 190/1251] eta: 0:19:28 lr: 0.000963 loss: 4.8480 (4.6175) time: 1.0634 data: 0.0003 max mem: 25529 Epoch: [38] [ 200/1251] eta: 0:19:15 lr: 0.000963 loss: 4.6446 (4.6124) time: 1.0629 data: 0.0003 max mem: 25529 Epoch: [38] [ 210/1251] eta: 0:19:04 lr: 0.000963 loss: 4.8329 (4.6245) time: 1.0741 data: 0.0003 max mem: 25529 Epoch: [38] [ 220/1251] eta: 0:18:52 lr: 0.000963 loss: 4.9058 (4.6362) time: 1.0833 data: 0.0003 max mem: 25529 Epoch: [38] [ 230/1251] eta: 0:18:40 lr: 0.000963 loss: 4.7250 (4.6332) time: 1.0764 data: 0.0003 max mem: 25529 Epoch: [38] [ 240/1251] eta: 0:18:28 lr: 0.000963 loss: 4.6894 (4.6391) time: 1.0808 data: 0.0003 max mem: 25529 Epoch: [38] [ 250/1251] eta: 0:18:16 lr: 0.000963 loss: 4.8600 (4.6438) time: 1.0789 data: 0.0003 max mem: 25529 Epoch: [38] [ 260/1251] eta: 0:18:04 lr: 0.000963 loss: 4.9939 (4.6550) time: 1.0710 data: 0.0003 max mem: 25529 Epoch: [38] [ 270/1251] eta: 0:17:53 lr: 0.000963 loss: 4.7281 (4.6478) time: 1.0717 data: 0.0003 max mem: 25529 Epoch: [38] [ 280/1251] eta: 0:17:41 lr: 0.000963 loss: 4.3858 (4.6383) time: 1.0664 data: 0.0003 max mem: 25529 Epoch: [38] [ 290/1251] eta: 0:17:29 lr: 0.000963 loss: 4.5126 (4.6390) time: 1.0627 data: 0.0003 max mem: 25529 Epoch: [38] [ 300/1251] eta: 0:17:17 lr: 0.000963 loss: 4.3964 (4.6302) time: 1.0638 data: 0.0003 max mem: 25529 Epoch: [38] [ 310/1251] eta: 0:17:05 lr: 0.000963 loss: 4.3964 (4.6284) time: 1.0683 data: 0.0003 max mem: 25529 Epoch: [38] [ 320/1251] eta: 0:16:54 lr: 0.000963 loss: 4.4917 (4.6220) time: 1.0689 data: 0.0003 max mem: 25529 Epoch: [38] [ 330/1251] eta: 0:16:42 lr: 0.000963 loss: 4.7606 (4.6335) time: 1.0695 data: 0.0003 max mem: 25529 Epoch: [38] [ 340/1251] eta: 0:16:31 lr: 0.000963 loss: 5.0333 (4.6346) time: 1.0699 data: 0.0003 max mem: 25529 Epoch: [38] [ 350/1251] eta: 0:16:20 lr: 0.000963 loss: 4.6795 (4.6276) time: 1.0700 data: 0.0003 max mem: 25529 Epoch: [38] [ 360/1251] eta: 0:16:08 lr: 0.000963 loss: 4.7723 (4.6305) time: 1.0728 data: 0.0003 max mem: 25529 Epoch: [38] [ 370/1251] eta: 0:15:57 lr: 0.000963 loss: 4.8322 (4.6305) time: 1.0767 data: 0.0003 max mem: 25529 Epoch: [38] [ 380/1251] eta: 0:15:46 lr: 0.000963 loss: 4.7535 (4.6310) time: 1.0725 data: 0.0003 max mem: 25529 Epoch: [38] [ 390/1251] eta: 0:15:35 lr: 0.000963 loss: 4.5236 (4.6247) time: 1.0746 data: 0.0003 max mem: 25529 Epoch: [38] [ 400/1251] eta: 0:15:24 lr: 0.000963 loss: 4.5129 (4.6280) time: 1.0783 data: 0.0003 max mem: 25529 Epoch: [38] [ 410/1251] eta: 0:15:13 lr: 0.000963 loss: 4.6520 (4.6250) time: 1.0803 data: 0.0003 max mem: 25529 Epoch: [38] [ 420/1251] eta: 0:15:02 lr: 0.000963 loss: 4.6115 (4.6235) time: 1.0841 data: 0.0003 max mem: 25529 Epoch: [38] [ 430/1251] eta: 0:14:51 lr: 0.000963 loss: 4.5550 (4.6176) time: 1.0788 data: 0.0003 max mem: 25529 Epoch: [38] [ 440/1251] eta: 0:14:40 lr: 0.000963 loss: 4.3985 (4.6097) time: 1.0745 data: 0.0003 max mem: 25529 Epoch: [38] [ 450/1251] eta: 0:14:29 lr: 0.000963 loss: 4.5041 (4.6144) time: 1.0711 data: 0.0004 max mem: 25529 Epoch: [38] [ 460/1251] eta: 0:14:18 lr: 0.000963 loss: 4.7949 (4.6127) time: 1.0769 data: 0.0003 max mem: 25529 Epoch: [38] [ 470/1251] eta: 0:14:07 lr: 0.000963 loss: 4.7556 (4.6148) time: 1.0773 data: 0.0003 max mem: 25529 Epoch: [38] [ 480/1251] eta: 0:13:56 lr: 0.000963 loss: 5.0523 (4.6200) time: 1.0845 data: 0.0003 max mem: 25529 Epoch: [38] [ 490/1251] eta: 0:13:45 lr: 0.000963 loss: 4.5865 (4.6152) time: 1.0781 data: 0.0003 max mem: 25529 Epoch: [38] [ 500/1251] eta: 0:13:34 lr: 0.000963 loss: 4.6311 (4.6210) time: 1.0776 data: 0.0003 max mem: 25529 Epoch: [38] [ 510/1251] eta: 0:13:23 lr: 0.000963 loss: 4.8767 (4.6208) time: 1.0855 data: 0.0003 max mem: 25529 Epoch: [38] [ 520/1251] eta: 0:13:13 lr: 0.000963 loss: 4.7439 (4.6204) time: 1.0891 data: 0.0003 max mem: 25529 Epoch: [38] [ 530/1251] eta: 0:13:02 lr: 0.000963 loss: 4.7974 (4.6190) time: 1.0813 data: 0.0003 max mem: 25529 Epoch: [38] [ 540/1251] eta: 0:12:51 lr: 0.000963 loss: 4.6865 (4.6171) time: 1.0676 data: 0.0003 max mem: 25529 Epoch: [38] [ 550/1251] eta: 0:12:40 lr: 0.000963 loss: 4.4560 (4.6144) time: 1.0727 data: 0.0003 max mem: 25529 Epoch: [38] [ 560/1251] eta: 0:12:29 lr: 0.000963 loss: 4.2302 (4.6069) time: 1.0761 data: 0.0003 max mem: 25529 Epoch: [38] [ 570/1251] eta: 0:12:18 lr: 0.000963 loss: 4.3246 (4.6080) time: 1.0741 data: 0.0003 max mem: 25529 Epoch: [38] [ 580/1251] eta: 0:12:07 lr: 0.000963 loss: 4.5513 (4.6052) time: 1.0661 data: 0.0003 max mem: 25529 Epoch: [38] [ 590/1251] eta: 0:11:56 lr: 0.000963 loss: 4.4924 (4.6075) time: 1.0740 data: 0.0003 max mem: 25529 Epoch: [38] [ 600/1251] eta: 0:11:45 lr: 0.000963 loss: 4.5949 (4.6052) time: 1.0817 data: 0.0003 max mem: 25529 Epoch: [38] [ 610/1251] eta: 0:11:34 lr: 0.000963 loss: 4.5321 (4.6035) time: 1.0638 data: 0.0003 max mem: 25529 Epoch: [38] [ 620/1251] eta: 0:11:23 lr: 0.000963 loss: 4.7689 (4.6075) time: 1.0604 data: 0.0003 max mem: 25529 Epoch: [38] [ 630/1251] eta: 0:11:12 lr: 0.000963 loss: 4.7689 (4.6088) time: 1.0649 data: 0.0003 max mem: 25529 Epoch: [38] [ 640/1251] eta: 0:11:01 lr: 0.000963 loss: 4.4721 (4.6039) time: 1.0580 data: 0.0003 max mem: 25529 Epoch: [38] [ 650/1251] eta: 0:10:50 lr: 0.000963 loss: 4.5410 (4.6067) time: 1.0654 data: 0.0003 max mem: 25529 Epoch: [38] [ 660/1251] eta: 0:10:39 lr: 0.000963 loss: 4.5659 (4.5996) time: 1.0689 data: 0.0003 max mem: 25529 Epoch: [38] [ 670/1251] eta: 0:10:28 lr: 0.000963 loss: 4.4456 (4.5999) time: 1.0727 data: 0.0003 max mem: 25529 Epoch: [38] [ 680/1251] eta: 0:10:17 lr: 0.000963 loss: 4.8766 (4.6035) time: 1.0818 data: 0.0003 max mem: 25529 Epoch: [38] [ 690/1251] eta: 0:10:06 lr: 0.000963 loss: 4.8766 (4.6041) time: 1.0854 data: 0.0003 max mem: 25529 Epoch: [38] [ 700/1251] eta: 0:09:55 lr: 0.000963 loss: 4.9327 (4.6104) time: 1.0805 data: 0.0003 max mem: 25529 Epoch: [38] [ 710/1251] eta: 0:09:44 lr: 0.000963 loss: 5.0049 (4.6129) time: 1.0702 data: 0.0003 max mem: 25529 Epoch: [38] [ 720/1251] eta: 0:09:34 lr: 0.000963 loss: 4.6922 (4.6117) time: 1.0673 data: 0.0003 max mem: 25529 Epoch: [38] [ 730/1251] eta: 0:09:23 lr: 0.000963 loss: 4.6331 (4.6107) time: 1.0810 data: 0.0003 max mem: 25529 Epoch: [38] [ 740/1251] eta: 0:09:12 lr: 0.000963 loss: 4.5547 (4.6111) time: 1.0795 data: 0.0003 max mem: 25529 Epoch: [38] [ 750/1251] eta: 0:09:01 lr: 0.000963 loss: 4.8843 (4.6181) time: 1.0719 data: 0.0003 max mem: 25529 Epoch: [38] [ 760/1251] eta: 0:08:50 lr: 0.000963 loss: 4.8843 (4.6160) time: 1.0851 data: 0.0003 max mem: 25529 Epoch: [38] [ 770/1251] eta: 0:08:40 lr: 0.000963 loss: 4.2934 (4.6119) time: 1.0840 data: 0.0003 max mem: 25529 Epoch: [38] [ 780/1251] eta: 0:08:29 lr: 0.000963 loss: 4.1930 (4.6087) time: 1.0784 data: 0.0003 max mem: 25529 Epoch: [38] [ 790/1251] eta: 0:08:18 lr: 0.000963 loss: 4.4176 (4.6073) time: 1.0748 data: 0.0003 max mem: 25529 Epoch: [38] [ 800/1251] eta: 0:08:07 lr: 0.000963 loss: 4.7402 (4.6115) time: 1.0681 data: 0.0003 max mem: 25529 Epoch: [38] [ 810/1251] eta: 0:07:56 lr: 0.000963 loss: 4.7749 (4.6094) time: 1.0713 data: 0.0003 max mem: 25529 Epoch: [38] [ 820/1251] eta: 0:07:45 lr: 0.000963 loss: 4.6709 (4.6079) time: 1.0732 data: 0.0003 max mem: 25529 Epoch: [38] [ 830/1251] eta: 0:07:34 lr: 0.000963 loss: 4.7506 (4.6088) time: 1.0641 data: 0.0003 max mem: 25529 Epoch: [38] [ 840/1251] eta: 0:07:23 lr: 0.000963 loss: 4.8636 (4.6112) time: 1.0592 data: 0.0003 max mem: 25529 Epoch: [38] [ 850/1251] eta: 0:07:13 lr: 0.000963 loss: 4.9930 (4.6116) time: 1.0767 data: 0.0003 max mem: 25529 Epoch: [38] [ 860/1251] eta: 0:07:02 lr: 0.000963 loss: 5.0639 (4.6155) time: 1.0766 data: 0.0003 max mem: 25529 Epoch: [38] [ 870/1251] eta: 0:06:51 lr: 0.000963 loss: 5.0486 (4.6160) time: 1.0683 data: 0.0003 max mem: 25529 Epoch: [38] [ 880/1251] eta: 0:06:40 lr: 0.000963 loss: 4.6785 (4.6145) time: 1.0654 data: 0.0003 max mem: 25529 Epoch: [38] [ 890/1251] eta: 0:06:29 lr: 0.000963 loss: 4.6382 (4.6126) time: 1.0603 data: 0.0003 max mem: 25529 Epoch: [38] [ 900/1251] eta: 0:06:18 lr: 0.000963 loss: 4.9989 (4.6179) time: 1.0642 data: 0.0003 max mem: 25529 Epoch: [38] [ 910/1251] eta: 0:06:08 lr: 0.000963 loss: 5.0227 (4.6205) time: 1.0740 data: 0.0003 max mem: 25529 Epoch: [38] [ 920/1251] eta: 0:05:57 lr: 0.000963 loss: 4.7505 (4.6198) time: 1.0733 data: 0.0003 max mem: 25529 Epoch: [38] [ 930/1251] eta: 0:05:46 lr: 0.000963 loss: 4.6593 (4.6196) time: 1.0636 data: 0.0003 max mem: 25529 Epoch: [38] [ 940/1251] eta: 0:05:35 lr: 0.000963 loss: 4.7349 (4.6184) time: 1.0697 data: 0.0003 max mem: 25529 Epoch: [38] [ 950/1251] eta: 0:05:24 lr: 0.000963 loss: 4.8424 (4.6185) time: 1.0741 data: 0.0003 max mem: 25529 Epoch: [38] [ 960/1251] eta: 0:05:13 lr: 0.000963 loss: 4.5308 (4.6170) time: 1.0704 data: 0.0003 max mem: 25529 Epoch: [38] [ 970/1251] eta: 0:05:03 lr: 0.000963 loss: 4.6764 (4.6186) time: 1.0749 data: 0.0003 max mem: 25529 Epoch: [38] [ 980/1251] eta: 0:04:52 lr: 0.000963 loss: 4.6764 (4.6176) time: 1.0768 data: 0.0004 max mem: 25529 Epoch: [38] [ 990/1251] eta: 0:04:41 lr: 0.000963 loss: 4.5145 (4.6176) time: 1.0677 data: 0.0004 max mem: 25529 Epoch: [38] [1000/1251] eta: 0:04:30 lr: 0.000963 loss: 4.5645 (4.6202) time: 1.0686 data: 0.0003 max mem: 25529 Epoch: [38] [1010/1251] eta: 0:04:19 lr: 0.000963 loss: 5.3548 (4.6373) time: 1.0613 data: 0.0003 max mem: 25529 Epoch: [38] [1020/1251] eta: 0:04:09 lr: 0.000963 loss: 6.9353 (4.6599) time: 1.0595 data: 0.0003 max mem: 25529 Epoch: [38] [1030/1251] eta: 0:03:58 lr: 0.000963 loss: 6.9423 (4.6820) time: 1.0729 data: 0.0003 max mem: 25529 Epoch: [38] [1040/1251] eta: 0:03:47 lr: 0.000963 loss: 6.9381 (4.7036) time: 1.0715 data: 0.0003 max mem: 25529 Epoch: [38] [1050/1251] eta: 0:03:36 lr: 0.000963 loss: 6.9351 (4.7248) time: 1.0717 data: 0.0003 max mem: 25529 Epoch: [38] [1060/1251] eta: 0:03:25 lr: 0.000963 loss: 6.9315 (4.7456) time: 1.0655 data: 0.0003 max mem: 25529 Epoch: [38] [1070/1251] eta: 0:03:15 lr: 0.000963 loss: 6.9319 (4.7660) time: 1.0609 data: 0.0003 max mem: 25529 Epoch: [38] [1080/1251] eta: 0:03:04 lr: 0.000963 loss: 6.9287 (4.7860) time: 1.0717 data: 0.0003 max mem: 25529 Epoch: [38] [1090/1251] eta: 0:02:53 lr: 0.000963 loss: 6.9198 (4.8055) time: 1.0834 data: 0.0003 max mem: 25529 Epoch: [38] [1100/1251] eta: 0:02:42 lr: 0.000963 loss: 6.9219 (4.8248) time: 1.0835 data: 0.0003 max mem: 25529 Epoch: [38] [1110/1251] eta: 0:02:32 lr: 0.000963 loss: 6.9286 (4.8437) time: 1.1036 data: 0.0003 max mem: 25529 Epoch: [38] [1120/1251] eta: 0:02:21 lr: 0.000963 loss: 6.9209 (4.8622) time: 1.0965 data: 0.0003 max mem: 25529 Epoch: [38] [1130/1251] eta: 0:02:10 lr: 0.000963 loss: 6.9212 (4.8804) time: 1.0701 data: 0.0003 max mem: 25529 Epoch: [38] [1140/1251] eta: 0:01:59 lr: 0.000963 loss: 6.9192 (4.8983) time: 1.0686 data: 0.0003 max mem: 25529 Epoch: [38] [1150/1251] eta: 0:01:48 lr: 0.000963 loss: 6.9192 (4.9159) time: 1.0640 data: 0.0003 max mem: 25529 Epoch: [38] [1160/1251] eta: 0:01:38 lr: 0.000963 loss: 6.9231 (4.9332) time: 1.0687 data: 0.0003 max mem: 25529 Epoch: [38] [1170/1251] eta: 0:01:27 lr: 0.000963 loss: 6.9241 (4.9502) time: 1.0702 data: 0.0003 max mem: 25529 Epoch: [38] [1180/1251] eta: 0:01:16 lr: 0.000963 loss: 6.9240 (4.9669) time: 1.0687 data: 0.0003 max mem: 25529 Epoch: [38] [1190/1251] eta: 0:01:05 lr: 0.000963 loss: 6.9198 (4.9833) time: 1.0668 data: 0.0003 max mem: 25529 Epoch: [38] [1200/1251] eta: 0:00:54 lr: 0.000963 loss: 6.9150 (4.9993) time: 1.0864 data: 0.0003 max mem: 25529 Epoch: [38] [1210/1251] eta: 0:00:44 lr: 0.000963 loss: 6.9144 (5.0152) time: 1.0855 data: 0.0003 max mem: 25529 Epoch: [38] [1220/1251] eta: 0:00:33 lr: 0.000963 loss: 6.9167 (5.0308) time: 1.0714 data: 0.0003 max mem: 25529 Epoch: [38] [1230/1251] eta: 0:00:22 lr: 0.000963 loss: 6.9167 (5.0461) time: 1.0702 data: 0.0003 max mem: 25529 Epoch: [38] [1240/1251] eta: 0:00:11 lr: 0.000963 loss: 6.9135 (5.0612) time: 1.0574 data: 0.0005 max mem: 25529 Epoch: [38] [1250/1251] eta: 0:00:01 lr: 0.000963 loss: 6.9179 (5.0760) time: 1.0532 data: 0.0004 max mem: 25529 Epoch: [38] Total time: 0:22:28 (1.0781 s / it) Averaged stats: lr: 0.000963 loss: 6.9179 (5.0558) Test: [ 0/261] eta: 0:31:19 loss: 6.8103 (6.8103) acc1: 0.0000 (0.0000) acc5: 0.0000 (0.0000) time: 7.2018 data: 6.7932 max mem: 25529 Test: [ 10/261] eta: 0:04:17 loss: 6.9766 (6.9290) acc1: 0.0000 (0.0000) acc5: 0.0000 (0.0000) time: 1.0263 data: 0.6262 max mem: 25529 Test: [ 20/261] eta: 0:02:56 loss: 6.9750 (6.9375) acc1: 0.0000 (0.0000) acc5: 0.0000 (0.0000) time: 0.4103 data: 0.0066 max mem: 25529 Test: [ 30/261] eta: 0:02:25 loss: 6.9495 (6.9457) acc1: 0.0000 (0.0000) acc5: 0.0000 (0.0000) time: 0.4091 data: 0.0024 max mem: 25529 Test: [ 40/261] eta: 0:02:06 loss: 6.9158 (6.9258) acc1: 0.0000 (0.0000) acc5: 0.0000 (0.6352) time: 0.4017 data: 0.0010 max mem: 25529 Test: [ 50/261] eta: 0:01:53 loss: 6.8871 (6.9364) acc1: 0.0000 (0.0000) acc5: 0.0000 (0.5106) time: 0.3975 data: 0.0007 max mem: 25529 Test: [ 60/261] eta: 0:01:43 loss: 6.9326 (6.9323) acc1: 0.0000 (0.0000) acc5: 0.0000 (0.4269) time: 0.3969 data: 0.0007 max mem: 25529 Test: [ 70/261] eta: 0:01:35 loss: 6.8942 (6.9268) acc1: 0.0000 (0.0000) acc5: 0.0000 (0.3668) time: 0.3951 data: 0.0016 max mem: 25529 Test: [ 80/261] eta: 0:01:27 loss: 6.8974 (6.9259) acc1: 0.0000 (0.0000) acc5: 0.0000 (0.3215) time: 0.3954 data: 0.0025 max mem: 25529 Test: [ 90/261] eta: 0:01:21 loss: 6.9066 (6.9268) acc1: 0.0000 (0.0000) acc5: 0.0000 (0.2862) time: 0.3983 data: 0.0017 max mem: 25529 Test: [100/261] eta: 0:01:15 loss: 6.9556 (6.9323) acc1: 0.0000 (0.0000) acc5: 0.0000 (0.2578) time: 0.3960 data: 0.0009 max mem: 25529 Test: [110/261] eta: 0:01:09 loss: 6.9268 (6.9298) acc1: 0.0000 (0.0000) acc5: 0.0000 (0.2346) time: 0.3962 data: 0.0010 max mem: 25529 Test: [120/261] eta: 0:01:04 loss: 6.8970 (6.9270) acc1: 0.0000 (0.0000) acc5: 0.0000 (0.2152) time: 0.4211 data: 0.0242 max mem: 25529 Test: [130/261] eta: 0:00:59 loss: 6.8970 (6.9251) acc1: 0.0000 (0.0000) acc5: 0.0000 (0.1988) time: 0.4183 data: 0.0242 max mem: 25529 Test: [140/261] eta: 0:00:54 loss: 6.9251 (6.9268) acc1: 0.0000 (0.0000) acc5: 0.0000 (0.3694) time: 0.3986 data: 0.0018 max mem: 25529 Test: [150/261] eta: 0:00:49 loss: 6.9534 (6.9264) acc1: 0.0000 (0.0000) acc5: 0.0000 (0.3449) time: 0.4021 data: 0.0045 max mem: 25529 Test: [160/261] eta: 0:00:45 loss: 6.8927 (6.9243) acc1: 0.0000 (0.1617) acc5: 0.0000 (0.4852) time: 0.4124 data: 0.0182 max mem: 25529 Test: [170/261] eta: 0:00:40 loss: 6.8886 (6.9231) acc1: 0.0000 (0.1523) acc5: 0.0000 (0.4569) time: 0.4112 data: 0.0157 max mem: 25529 Test: [180/261] eta: 0:00:35 loss: 6.9188 (6.9233) acc1: 0.0000 (0.1439) acc5: 0.0000 (0.4316) time: 0.3997 data: 0.0016 max mem: 25529 Test: [190/261] eta: 0:00:31 loss: 6.9170 (6.9216) acc1: 0.0000 (0.1363) acc5: 0.0000 (0.4090) time: 0.4233 data: 0.0265 max mem: 25529 Test: [200/261] eta: 0:00:26 loss: 6.9137 (6.9224) acc1: 0.0000 (0.1296) acc5: 0.0000 (0.3887) time: 0.4463 data: 0.0536 max mem: 25529 Test: [210/261] eta: 0:00:22 loss: 6.9097 (6.9210) acc1: 0.0000 (0.1234) acc5: 0.0000 (0.3703) time: 0.5000 data: 0.1046 max mem: 25529 Test: [220/261] eta: 0:00:18 loss: 6.8762 (6.9184) acc1: 0.0000 (0.1178) acc5: 0.0000 (0.3535) time: 0.4731 data: 0.0773 max mem: 25529 Test: [230/261] eta: 0:00:13 loss: 6.8775 (6.9185) acc1: 0.0000 (0.1127) acc5: 0.0000 (0.4509) time: 0.3974 data: 0.0048 max mem: 25529 Test: [240/261] eta: 0:00:09 loss: 6.9246 (6.9183) acc1: 0.0000 (0.1081) acc5: 0.0000 (0.4322) time: 0.4009 data: 0.0050 max mem: 25529 Test: [250/261] eta: 0:00:04 loss: 6.9132 (6.9190) acc1: 0.0000 (0.1038) acc5: 0.0000 (0.5188) time: 0.3949 data: 0.0010 max mem: 25529 Test: [260/261] eta: 0:00:00 loss: 6.9128 (6.9180) acc1: 0.0000 (0.1000) acc5: 0.0000 (0.5000) time: 0.3788 data: 0.0001 max mem: 25529 Test: Total time: 0:01:54 (0.4370 s / it)
    • [email protected] 0.100 [email protected] 0.500 loss 6.918 Accuracy of the network on the 50000 test images: 0.1% Max accuracy: 57.01%
    opened by VictorLlu 3
  • pretrained model load

    pretrained model load

    hello~, i am very interested in your work. Now i meet some questions when the pretrained model was load image checkpoint = torch.load(args.finetune, map_location='cpu')

    debug: image pos_embed_checkpoint = checkpoint_model['pos_embed'] the checkpiont have "pos_embed1" "pos_embed2" "pos_embed3" "pos_embed4", but no "pos_embed"

    opened by surelyee 3
  • Why there is no DETR+PVTv2 in object detection?

    Why there is no DETR+PVTv2 in object detection?

    I noticed that there is DETR+PVTv1, although its AP value is not satisfactory. Why is there no implementation of DETR+PVTv2? Is it ineffective or just not provided yet.

    opened by yuhua666 0
  • Did you train PVT on ImageNet22k?

    Did you train PVT on ImageNet22k?

    Thank you for your great work! As the title descripted, I want to know about your ImageNet22k results. I saw a checkpoint of PVT_v2_b5 on imagenet_22k in your release. Is that useful?

    opened by Roger-Liang 0
  • Question about cls token

    Question about cls token

    Hi author! thanks for your nice work.

    I have a question about cls token in PVT.

    In ViT and DeiT, cls token is appended at input embedding process. But PVT append cls token at input of last stage.

    Why PVT doesn't append cls token at input embedding process?

    Thanks.

    opened by eremo2002 0
  • without Convolutions?

    without Convolutions?

    Paper offers convolution free architecture but implementation contains convolution, at pvt2 paper authors says spatial reduction done with conv but I could not see that in pvt1. Is there any other way to do that?

    opened by Oguzhanercan 0
  • Question about pooling size

    Question about pooling size

    Hi @whai362

    I was wondering why the pooling size is set to 7 for all stages? Have you tried a higher pooling size (e.g. more keys and values) for the initial stages while decreasing in later stages?

    opened by magehrig 0
Finding Biological Plausibility for Adversarially Robust Features via Metameric Tasks

Adversarially-Robust-Periphery Code + Data from the paper "Finding Biological Plausibility for Adversarially Robust Features via Metameric Tasks" by A

Anne Harrington 2 Feb 07, 2022
Codes accompanying the paper "Learning Nearly Decomposable Value Functions with Communication Minimization" (ICLR 2020)

NDQ: Learning Nearly Decomposable Value Functions with Communication Minimization Note This codebase accompanies paper Learning Nearly Decomposable Va

Tonghan Wang 69 Nov 26, 2022
Repo for flood prediction using LSTMs and HAND

Abstract Every year, floods cause billions of dollars’ worth of damages to life, crops, and property. With a proper early flood warning system in plac

1 Oct 27, 2021
[NeurIPS-2020] Self-paced Contrastive Learning with Hybrid Memory for Domain Adaptive Object Re-ID.

Self-paced Contrastive Learning (SpCL) The official repository for Self-paced Contrastive Learning with Hybrid Memory for Domain Adaptive Object Re-ID

Yixiao Ge 286 Dec 21, 2022
免费获取http代理并生成proxifier配置文件

freeproxy 免费获取http代理并生成proxifier配置文件 公众号:台下言书 工具说明:https://mp.weixin.qq.com/s?__biz=MzIyNDkwNjQ5Ng==&mid=2247484425&idx=1&sn=56ccbe130822aa35038095317

说书人 32 Mar 25, 2022
The King is Naked: on the Notion of Robustness for Natural Language Processing

the-king-is-naked: on the notion of robustness for natural language processing AAAI2022 DISCLAIMER:This repo will be updated soon with instructions on

Iperboreo_ 1 Nov 24, 2022
PHOTONAI is a high level python API for designing and optimizing machine learning pipelines.

PHOTONAI is a high level python API for designing and optimizing machine learning pipelines. We've created a system in which you can easily select and

Medical Machine Learning Lab - University of Münster 57 Nov 12, 2022
Implementation of light baking system for ray tracing based on Activision's UberBake

Vulkan Light Bakary MSU Graphics Group Student's Diploma Project Treefonov Andrey [GitHub] [LinkedIn] Project Goal The goal of the project is to imple

Andrey Treefonov 7 Dec 27, 2022
The code from the paper Character Transformations for Non-Autoregressive GEC Tagging

Character Transformations for Non-Autoregressive GEC Tagging Milan Straka, Jakub Náplava, Jana Straková Charles University Faculty of Mathematics and

ÚFAL 5 Dec 10, 2022
GAN Image Generator and Characterwise Image Recognizer with python

MODEL SUMMARY 모델의 구조는 크게 6단계로 나뉩니다. STEP 0: Input Image Predict 할 이미지를 모델에 입력합니다. STEP 1: Make Black and White Image STEP 1 은 입력받은 이미지의 글자를 흑색으로, 배경을

Juwan HAN 1 Feb 09, 2022
A Dynamic Residual Self-Attention Network for Lightweight Single Image Super-Resolution

DRSAN A Dynamic Residual Self-Attention Network for Lightweight Single Image Super-Resolution Karam Park, Jae Woong Soh, and Nam Ik Cho Environments U

4 May 10, 2022
This is code to fit per-pixel environment map with spherical Gaussian lobes, using LBFGS optimization

Spherical Gaussian Optimization This is code to fit per-pixel environment map with spherical Gaussian lobes, using LBFGS optimization. This code has b

41 Dec 14, 2022
Self-supervised Multi-modal Hybrid Fusion Network for Brain Tumor Segmentation

JBHI-Pytorch This repository contains a reference implementation of the algorithms described in our paper "Self-supervised Multi-modal Hybrid Fusion N

FeiyiFANG 5 Dec 13, 2021
Multi-agent reinforcement learning algorithm and environment

Multi-agent reinforcement learning algorithm and environment [en/cn] Pytorch implements multi-agent reinforcement learning algorithms including IQL, Q

万鲲鹏 7 Sep 20, 2022
Trainable PyTorch reproduction of AlphaFold 2

OpenFold A faithful PyTorch reproduction of DeepMind's AlphaFold 2. Features OpenFold carefully reproduces (almost) all of the features of the origina

AQ Laboratory 1.7k Dec 29, 2022
Implementation of "JOKR: Joint Keypoint Representation for Unsupervised Cross-Domain Motion Retargeting"

JOKR: Joint Keypoint Representation for Unsupervised Cross-Domain Motion Retargeting Pytorch implementation for the paper "JOKR: Joint Keypoint Repres

45 Dec 25, 2022
Make your AirPlay devices as TTS speakers

Apple AirPlayer Home Assistant integration component, make your AirPlay devices as TTS speakers. Before Use 2021.6.X or earlier Apple Airplayer compon

George Zhao 117 Dec 15, 2022
The official implementation of NeMo: Neural Mesh Models of Contrastive Features for Robust 3D Pose Estimation [ICLR-2021]. https://arxiv.org/pdf/2101.12378.pdf

NeMo: Neural Mesh Models of Contrastive Features for Robust 3D Pose Estimation [ICLR-2021] Release Notes The offical PyTorch implementation of NeMo, p

Angtian Wang 76 Nov 23, 2022
Implementation for paper "STAR: A Structure-aware Lightweight Transformer for Real-time Image Enhancement" (ICCV 2021).

STAR-pytorch Implementation for paper "STAR: A Structure-aware Lightweight Transformer for Real-time Image Enhancement" (ICCV 2021). CVF (pdf) STAR-DC

43 Dec 21, 2022
catch-22: CAnonical Time-series CHaracteristics

catch22 - CAnonical Time-series CHaracteristics About catch22 is a collection of 22 time-series features coded in C that can be run from Python, R, Ma

Carl H Lubba 229 Oct 21, 2022