3D ResNets for Action Recognition (CVPR 2018)

Last update: Jan 06, 2023

Overview

3D ResNets for Action Recognition

Update (2020/4/13)

We published a paper on arXiv.

Hirokatsu Kataoka, Tenga Wakamiya, Kensho Hara, and Yutaka Satoh,
"Would Mega-scale Datasets Further Enhance Spatiotemporal 3D CNNs",
arXiv preprint, arXiv:2004.04968, 2020.

We uploaded the pretrained models described in this paper including ResNet-50 pretrained on the combined dataset with Kinetics-700 and Moments in Time.

Update (2020/4/10)

We significantly updated our scripts. If you want to use older versions to reproduce our CVPR2018 paper, you should use the scripts in the CVPR2018 branch.

This update includes as follows:

Refactoring whole project
Supporting the newer PyTorch versions
Supporting distributed training
Supporting training and testing on the Moments in Time dataset.
Adding R(2+1)D models
Uploading 3D ResNet models trained on the Kinetics-700, Moments in Time, and STAIR-Actions datasets

Summary

This is the PyTorch code for the following papers:

Hirokatsu Kataoka, Tenga Wakamiya, Kensho Hara, and Yutaka Satoh,
"Would Mega-scale Datasets Further Enhance Spatiotemporal 3D CNNs",
arXiv preprint, arXiv:2004.04968, 2020.

Kensho Hara, Hirokatsu Kataoka, and Yutaka Satoh,
"Towards Good Practice for Action Recognition with Spatiotemporal 3D Convolutions",
Proceedings of the International Conference on Pattern Recognition, pp. 2516-2521, 2018.

Kensho Hara, Hirokatsu Kataoka, and Yutaka Satoh,
"Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet?",
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6546-6555, 2018.

Kensho Hara, Hirokatsu Kataoka, and Yutaka Satoh,
"Learning Spatio-Temporal Features with 3D Residual Networks for Action Recognition",
Proceedings of the ICCV Workshop on Action, Gesture, and Emotion Recognition, 2017.

This code includes training, fine-tuning and testing on Kinetics, Moments in Time, ActivityNet, UCF-101, and HMDB-51.

Citation

If you use this code or pre-trained models, please cite the following:

@inproceedings{hara3dcnns,
  author={Kensho Hara and Hirokatsu Kataoka and Yutaka Satoh},
  title={Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet?},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  pages={6546--6555},
  year={2018},
}

Pre-trained models

Pre-trained models are available here.
All models are trained on Kinetics-700 (K), Moments in Time (M), STAIR-Actions (S), or merged datasets of them (KM, KS, MS, KMS).
If you want to finetune the models on your dataset, you should specify the following options.

r3d18_K_200ep.pth: --model resnet --model_depth 18 --n_pretrain_classes 700
r3d18_KM_200ep.pth: --model resnet --model_depth 18 --n_pretrain_classes 1039
r3d34_K_200ep.pth: --model resnet --model_depth 34 --n_pretrain_classes 700
r3d34_KM_200ep.pth: --model resnet --model_depth 34 --n_pretrain_classes 1039
r3d50_K_200ep.pth: --model resnet --model_depth 50 --n_pretrain_classes 700
r3d50_KM_200ep.pth: --model resnet --model_depth 50 --n_pretrain_classes 1039
r3d50_KMS_200ep.pth: --model resnet --model_depth 50 --n_pretrain_classes 1139
r3d50_KS_200ep.pth: --model resnet --model_depth 50 --n_pretrain_classes 800
r3d50_M_200ep.pth: --model resnet --model_depth 50 --n_pretrain_classes 339
r3d50_MS_200ep.pth: --model resnet --model_depth 50 --n_pretrain_classes 439
r3d50_S_200ep.pth: --model resnet --model_depth 50 --n_pretrain_classes 100
r3d101_K_200ep.pth: --model resnet --model_depth 101 --n_pretrain_classes 700
r3d101_KM_200ep.pth: --model resnet --model_depth 101 --n_pretrain_classes 1039
r3d152_K_200ep.pth: --model resnet --model_depth 152 --n_pretrain_classes 700
r3d152_KM_200ep.pth: --model resnet --model_depth 152 --n_pretrain_classes 1039
r3d200_K_200ep.pth: --model resnet --model_depth 200 --n_pretrain_classes 700
r3d200_KM_200ep.pth: --model resnet --model_depth 200 --n_pretrain_classes 1039

Old pretrained models are still available here.
However, some modifications are required to use the old pretrained models in the current scripts.

Requirements

PyTorch (ver. 0.4+ required)

conda install pytorch torchvision cudatoolkit=10.1 -c soumith

FFmpeg, FFprobe
Python 3

Preparation

ActivityNet

Download videos using the official crawler.
Convert from avi to jpg files using util_scripts/generate_video_jpgs.py

python -m util_scripts.generate_video_jpgs mp4_video_dir_path jpg_video_dir_path activitynet

Add fps infomartion into the json file util_scripts/add_fps_into_activitynet_json.py

python -m util_scripts.add_fps_into_activitynet_json mp4_video_dir_path json_file_path

Kinetics

Download videos using the official crawler.
- Locate test set in video_directory/test.
Convert from avi to jpg files using util_scripts/generate_video_jpgs.py

python -m util_scripts.generate_video_jpgs mp4_video_dir_path jpg_video_dir_path kinetics

Generate annotation file in json format similar to ActivityNet using util_scripts/kinetics_json.py
- The CSV files (kinetics_{train, val, test}.csv) are included in the crawler.

python -m util_scripts.kinetics_json csv_dir_path 700 jpg_video_dir_path jpg dst_json_path

UCF-101

Download videos and train/test splits here.
Convert from avi to jpg files using util_scripts/generate_video_jpgs.py

python -m util_scripts.generate_video_jpgs avi_video_dir_path jpg_video_dir_path ucf101

Generate annotation file in json format similar to ActivityNet using util_scripts/ucf101_json.py
- annotation_dir_path includes classInd.txt, trainlist0{1, 2, 3}.txt, testlist0{1, 2, 3}.txt

python -m util_scripts.ucf101_json annotation_dir_path jpg_video_dir_path dst_json_path

HMDB-51

Download videos and train/test splits here.
Convert from avi to jpg files using util_scripts/generate_video_jpgs.py

python -m util_scripts.generate_video_jpgs avi_video_dir_path jpg_video_dir_path hmdb51

Generate annotation file in json format similar to ActivityNet using util_scripts/hmdb51_json.py
- annotation_dir_path includes brush_hair_test_split1.txt, ...

python -m util_scripts.hmdb51_json annotation_dir_path jpg_video_dir_path dst_json_path

Running the code

Assume the structure of data directories is the following:

~/
  data/
    kinetics_videos/
      jpg/
        .../ (directories of class names)
          .../ (directories of video names)
            ... (jpg files)
    results/
      save_100.pth
    kinetics.json

Confirm all options.

python main.py -h

Train ResNets-50 on the Kinetics-700 dataset (700 classes) with 4 CPU threads (for data loading).
Batch size is 128.
Save models at every 5 epochs. All GPUs is used for the training. If you want a part of GPUs, use CUDA_VISIBLE_DEVICES=....

python main.py --root_path ~/data --video_path kinetics_videos/jpg --annotation_path kinetics.json \
--result_path results --dataset kinetics --model resnet \
--model_depth 50 --n_classes 700 --batch_size 128 --n_threads 4 --checkpoint 5

Continue Training from epoch 101. (~/data/results/save_100.pth is loaded.)

python main.py --root_path ~/data --video_path kinetics_videos/jpg --annotation_path kinetics.json \
--result_path results --dataset kinetics --resume_path results/save_100.pth \
--model_depth 50 --n_classes 700 --batch_size 128 --n_threads 4 --checkpoint 5

Calculate top-5 class probabilities of each video using a trained model (~/data/results/save_200.pth.)
Note that inference_batch_size should be small because actual batch size is calculated by inference_batch_size * (n_video_frames / inference_stride).

python main.py --root_path ~/data --video_path kinetics_videos/jpg --annotation_path kinetics.json \
--result_path results --dataset kinetics --resume_path results/save_200.pth \
--model_depth 50 --n_classes 700 --n_threads 4 --no_train --no_val --inference --output_topk 5 --inference_batch_size 1

Evaluate top-1 video accuracy of a recognition result (~/data/results/val.json).

python -m util_scripts.eval_accuracy ~/data/kinetics.json ~/data/results/val.json --subset val -k 1 --ignore

Fine-tune fc layers of a pretrained model (~/data/models/resnet-50-kinetics.pth) on UCF-101.

python main.py --root_path ~/data --video_path ucf101_videos/jpg --annotation_path ucf101_01.json \
--result_path results --dataset ucf101 --n_classes 101 --n_pretrain_classes 700 \
--pretrain_path models/resnet-50-kinetics.pth --ft_begin_module fc \
--model resnet --model_depth 50 --batch_size 128 --n_threads 4 --checkpoint 5

Comments

question about the 'Temporal duration of inputs'

[email protected] , in the opts.py ,whether I can change temporal duration of inputs in parser.add_argument('--sample_duration', default=16, type=int, help='Temporal duration of inputs'),like 32 frames,64 frames,etc? have you take the similar experiments? I really appreciate for your reply, Thanks.

opened by sophiazy 25
Performance of pretrained weights on UCF101

Hi, Nice work! I have a question about your results on UCF101 split 1. I've evaluated your pretrained weight of "resnext-101-kinetics-ucf101_split1.pth" on UCF101 split 1 and got the accuracy of ~85.99. I'm wondering if it is the right accuracy or not. Would you please provide the accuracies of the pretrained models?

opened by MohsenFayyaz89 21
Train from scratch on UCF101 using ResNet18 and get 10% gain without doing anything
I train and evaluate the code and get 10% gain without changing anything. Here is my process:

Parse video data using codes from READ.ME.

Train the model using python3 main.py --root_path ./datasets/ --video_path UCF101/jpg --annotation_path ucf101_01.json --result_path results --dataset ucf101 --model resnet --model_depth 18 --n_classes 101 --batch_size 16 and get the model result datasets/desults/save_200.pth.

Test the dataset using python3 main.py --root_path ./datasets/ --video_path UCF101/jpg --annotation_path ucf101_01.json --result_path results --dataset ucf101 --resume_path results/save_200.pth --model resnet --model_depth 18 --n_classes 101 --batch_size 16 --no_train --test and get the result in val.json and the command window says the clip accuracy is 0.346.

Get the accuracy using eval_ucf101.py to compare ucf101_01.json with val.json and the top1 accuracy for video is 52.66%, which is about 10% over 42.4% (reported in the paper).

I only use UCF101 split_01 so there are no overlap videos in train and test data. It is a little bit strange, and is it because in the paper, the training procedure did not last for 200 epochs? My platform is Pytorch 0.4 and I only modified one place to avoid the error, which is reported in another issue.
opened by BestJuly 16
RuntimeError: invalid argument 1: must be strictly positive at /opt/conda/conda-bld/pytorch_1518243271935/work/torch/lib/TH/generic/THTensorMath.c:2247

Hi dear Need help....on running main.py ,everything is going well till dataset loading as shown below: model generated dataset loading [0/9537] dataset loading [1000/9537] dataset loading [2000/9537] dataset loading [3000/9537] dataset loading [4000/9537] dataset loading [5000/9537] dataset loading [6000/9537] dataset loading [7000/9537] dataset loading [8000/9537] dataset loading [9000/9537] dataset loading [0/3783] dataset loading [1000/3783] dataset loading [2000/3783] dataset loading [3000/3783] run

error occured here:

train at epoch 1 Traceback (most recent call last): File "/media/psrana/New Volume/chandni/HAR_3D_TU/main.py", line 139, in train_logger, train_batch_logger) File "/media/psrana/New Volume/chandni/HAR_3D_TU/train.py", line 22, in train_epoch for i, (inputs, targets) in enumerate(data_loader): File "/home/psrana/anaconda3/envs/har_chandni/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 417, in iter return DataLoaderIter(self) File "/home/psrana/anaconda3/envs/har_chandni/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 242, in init self._put_indices() File "/home/psrana/anaconda3/envs/har_chandni/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 290, in _put_indices indices = next(self.sample_iter, None) File "/home/psrana/anaconda3/envs/har_chandni/lib/python3.6/site-packages/torch/utils/data/sampler.py", line 119, in iter for idx in self.sampler: File "/home/psrana/anaconda3/envs/har_chandni/lib/python3.6/site-packages/torch/utils/data/sampler.py", line 50, in iter return iter(torch.randperm(len(self.data_source)).long()) RuntimeError: invalid argument 1: must be strictly positive at /opt/conda/conda-bld/pytorch_1518243271935/work/torch/lib/TH/generic/THTensorMath.c:2247

what could be the

opened by chandnikathuria1992 12
Very Slow Training

I am training Resnet with depth 34 on the kinetics dataset, however the training procedure is not improving anything. How long does it take till the model starts improving ? I have attached a screenshot; currently I am at epoch 34 but the loss is still 5.99 and is not decreasing, and accuracy is very volatile

opened by cryptedp 11
RuntimeError: expected a non-empty list of Tensors

Traceback (most recent call last): File "main.py", line 129, in train_logger, train_batch_logger) File "/home/hareesh/Downloads/3D-ResNets-PyTorch-master/train.py", line 22, in train_epoch for i, (inputs, targets) in enumerate(data_loader): File "/home/hareesh/.local/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 286, in next return self._process_next_batch(batch) File "/home/hareesh/.local/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 307, in _process_next_batch raise batch.exc_type(batch.exc_msg) RuntimeError: Traceback (most recent call last): File "/home/hareesh/.local/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 57, in _worker_loop samples = collate_fn([dataset[i] for i in batch_indices]) File "/home/hareesh/.local/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 57, in samples = collate_fn([dataset[i] for i in batch_indices]) File "/home/hareesh/Downloads/3D-ResNets-PyTorch-master/datasets/ucf101.py", line 193, in getitem clip = torch.stack(clip, 0).permute(1, 0, 2, 3) RuntimeError: expected a non-empty list of Tensors

Please let me know cause of this error.

opened by hareeshdevarakonda 10
Input of Densenet

Thank you for your wonderful work. Just read the paper, it is noted that each clip contains 16 frames. I read two other papers in which the author claims that 32 frame input would be better, have you tried 32 frames input? If you trained such models, can you please release the pretrained models?

opened by Tord-Zhang 10
Asking about the using of 3D ResNet on video sequence

Hello,

I'm new in this kind of 3D Convolution, so, I'm trying to understand how does this works. My dataset (UNBC McMaster) is including some videos that contains sequence of frames. For each frame, we have one pain intensity level. Now, I want to use 3D ResNet to predict pain level as regression problem. So, let says we have a sequence including 32 frames, which mean I have 32 labels for this sequence. Normally, with CNN + LSTM, I will use CNN to extract features and then put it through LSTM, take the output and label of the last frame to compute loss. So, for 3D ResNet, should I take the output of the model and the label of last frame to calculate loss ?

opened by glmanhtu 9
i have some problem do a test

I would like to test network on UCF101 after finetune in UCF101 using pretrained kinetics you provided.

i use resnet18. I want to get the accuracy of a video unit, not clip.

i use instruction below

python main.py --root_path UCF101 --video_path jpg --annotation_path ucf101_01.json --result_path test --dataset ucf101 --model resnet --model_depth 18 --n_classes 101 --batch_size 64 --n_threads 4 --pretrain_path 18result1s/save_200.pth --no_train --no_val --test --test_subset val --n_finetune_classes 101

jpg is the same as your data directory. 18results1s/save_200.pth is pretrained on Kinetics, finetune in UCF101's network.

it make a error.

run dataset loading [0/3783] dataset loading [1000/3783] dataset loading [2000/3783] dataset loading [3000/3783] test test.py:42: UserWarning: volatile was removed and now has no effect. Use with torch.no_grad(): instead. inputs = Variable(inputs, volatile=True) test.py:45: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument. outputs = F.softmax(outputs) Traceback (most recent call last): File "main.py", line 162, in test.test(test_loader, model, opt, test_data.class_names) File "test.py", line 50, in test test_results, class_names) File "test.py", line 20, in calculate_video_results 'label': class_names[locs[i]], KeyError: tensor(12)

KeyError : tensor(12) , i change my instruction, the number in parentheses is change, maybe.

can you help me?

opened by lee2h 9
Performance of fine-tuning on UCF101

I downloaded the network ResNet-101 pretrained on Kinetics, and fine-tuned on UCF101 following the example script. However, I can only get 82.5 by averaging the three splits. In the paper, the authors reported 88.9. Any suggestion?

opened by zhihuilics 9
Pretrain models cannot download

Hello, I am making a demo about 3D convolution network. After reading your CVPR paper, I am happy to use your pretrain models. However, when I try to open your link, I get "The folder has been put in the recycle bin.". These pretrain models are extramely important for my program, because I don't have many GPU to train the network from scratch. Please give me a chance to use res3D convolution network...... @kenshohara

opened by KeCh96 8
main.py: error: unrecognized arguments: hmdb51_3.json

I'm trying to train resnet on hmdb51. So I've 3 json files. But whichever I'm passing with the --annotation_path argument, it's keep giving this error. Please help!

opened by soumyadbanik 2
ABOUT ucf101_json

I used python script to generate three json files of UCF101, they are ucf101_01.json,ucf101_01.json,ucf101_03.json I run the main function to train resnet python main.py --root_path ./data --video_path ucf101_jpg/ --annotation_path ucf101_json/ucf101_01.json
--result_path results --dataset ucf101 --n_classes 101 --model resnet --model_depth 50 --batch_size 64
--n_threads 4 --checkpoint 5

I want to know when ucf101_02.json and ucf101_03.json should be used,thank you very much!!!

opened by theones-g 0
Image Resolution 112*112

I wanted to be able to input larger image resolutions. However when I do input image size of 480*480 it takes almost 10 minutes to process a tiny 10 second clip.

It seems when I increase image size, the model inference run-time become exponentially greater.

There is crucial motion information being lost when I downscale my images to 112*112 and it is effecting the precision of the model on my test sets.

Is there any alternative model or method that will allow me to proceed with larger image resolutions using the 3D-ResNet model?

Is it practical to use 3D-CNN with input sizes of 480*480 images for video classification tasks?

opened by darshvirbelandis 1
Why is opt.n_val_samples 3

I found that when fine-tuning UCF101, split1 partition was used to verify that the number in the dataset was 11349 instead of 3783, and why was batchsize opt.batch_size // opt.n_val_samples?

opened by YTHmamba 0
AssertionError when I inference

I used the r2p1d18_K_200ep.pth and finetune it on hmdb51 dataset,and when I want to use it to inference there is an AssertionError `CUDA_VISIBLE_DEVICES=0,1,2,3 python main.py --root_path /home/pubNAS/jianfei/3D-ResNets-PyTorch-master/data --video_path hmdb51-videos/jpg --annotation_path hmdb51_1.json \

--result_path results --dataset hmdb51 --resume_path results/save_200.pth
--model_depth 18 --n_classes 51 --n_threads 4 --no_train --no_val --inference --output_topk 5 --inference_batch_size 1 Namespace(accimage=False, annotation_path=PosixPath('/home/pubNAS/jianfei/3D-ResNets-PyTorch-master/data/hmdb51_1.json'), arch='resnet-18', batch_size=128, batchnorm_sync=False, begin_epoch=1, checkpoint=10, colorjitter=False, conv1_t_size=7, conv1_t_stride=1, dampening=0.0, dataset='hmdb51', dist_url='tcp://127.0.0.1:23456', distributed=False, file_type='jpg', ft_begin_module='', inference=True, inference_batch_size=1, inference_crop='center', inference_no_average=False, inference_stride=16, inference_subset='val', input_type='rgb', learning_rate=0.1, lr_scheduler='multistep', manual_seed=1, mean=[0.4345, 0.4051, 0.3775], mean_dataset='kinetics', model='resnet', model_depth=18, momentum=0.9, multistep_milestones=[50, 100, 150], n_classes=51, n_epochs=200, n_input_channels=3, n_pretrain_classes=0, n_threads=4, n_val_samples=3, nesterov=False, no_cuda=False, no_hflip=False, no_max_pool=False, no_mean_norm=False, no_std_norm=False, no_train=True, no_val=True, optimizer='sgd', output_topk=5, overwrite_milestones=False, plateau_patience=10, pretrain_path=None, resnet_shortcut='B', resnet_widen_factor=1.0, resnext_cardinality=32, result_path=PosixPath('/home/pubNAS/jianfei/3D-ResNets-PyTorch-master/data/results'), resume_path=PosixPath('/home/pubNAS/jianfei/3D-ResNets-PyTorch-master/data/results/save_200.pth'), root_path=PosixPath('/home/pubNAS/jianfei/3D-ResNets-PyTorch-master/data'), sample_duration=16, sample_size=112, sample_t_stride=1, std=[0.2768, 0.2713, 0.2737], tensorboard=False, train_crop='random', train_crop_min_ratio=0.75, train_crop_min_scale=0.25, train_t_crop='random', value_scale=1, video_path=PosixPath('/home/pubNAS/jianfei/3D-ResNets-PyTorch-master/data/hmdb51-videos/jpg'), weight_decay=0.001, wide_resnet_k=2, world_size=-1) loading checkpoint /home/pubNAS/jianfei/3D-ResNets-PyTorch-master/data/results/save_200.pth model Traceback (most recent call last): File "main.py", line 428, in main_worker(-1, opt) File "main.py", line 345, in main_worker model = resume_model(opt.resume_path, opt.arch, model) File "main.py", line 89, in resume_model assert arch == checkpoint['arch'] AssertionError `

opened by z369437558 0

Releases(1.0)

1.0(Oct 30, 2018)

This version works in PyTorch v0.3.1 or earlier.

In the following papers, we used this version.

Kensho Hara, Hirokatsu Kataoka, and Yutaka Satoh, "Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet?", Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6546-6555, 2018.

Kensho Hara, Hirokatsu Kataoka, and Yutaka Satoh, "Learning Spatio-Temporal Features with 3D Residual Networks for Action Recognition", Proceedings of the ICCV Workshop on Action, Gesture, and Emotion Recognition, 2017.
Source code(tar.gz)
Source code(zip)

Owner

Kensho Hara

GitHub Repository

[NeurIPS 2021] The PyTorch implementation of paper "Self-Supervised Learning Disentangled Group Representation as Feature"

IP-IRM [NeurIPS 2021] The PyTorch implementation of paper "Self-Supervised Learning Disentangled Group Representation as Feature". Codes will be relea

67 Dec 24, 2022

Awesome Long-Tailed Learning

Awesome Long-Tailed Learning This repo pays specially attention to the long-tailed distribution, where labels follow a long-tailed or power-law distri

284 Jan 06, 2023

An implementation of IMLE-Net: An Interpretable Multi-level Multi-channel Model for ECG Classification

IMLE-Net: An Interpretable Multi-level Multi-channel Model for ECG Classification The repostiory consists of the code, results and data set links for

12 Dec 26, 2022

A PyTorch Implementation of "Watch Your Step: Learning Node Embeddings via Graph Attention" (NeurIPS 2018).

Attention Walk ⠀⠀ A PyTorch Implementation of Watch Your Step: Learning Node Embeddings via Graph Attention (NIPS 2018). Abstract Graph embedding meth

303 Dec 09, 2022

FocusFace: Multi-task Contrastive Learning for Masked Face Recognition

FocusFace This is the official repository of "FocusFace: Multi-task Contrastive Learning for Masked Face Recognition" accepted at IEEE International C

21 Nov 17, 2022

General Vision Benchmark, a project from OpenGVLab

Introduction We build GV-B(General Vision Benchmark) on Classification, Detection, Segmentation and Depth Estimation including 26 datasets for model e

174 Dec 27, 2022

The Instructed Glacier Model (IGM)

The Instructed Glacier Model (IGM) Overview The Instructed Glacier Model (IGM) simulates the ice dynamics, surface mass balance, and its coupling thro

27 Dec 16, 2022

Multi-Glimpse Network With Python

Multi-Glimpse Network Multi-Glimpse Network: A Robust and Efficient Classification Architecture based on Recurrent Downsampled Attention arXiv Require

9 May 10, 2022

A PyTorch Reimplementation of TecoGAN: Temporally Coherent GAN for Video Super-Resolution

TecoGAN-PyTorch Introduction This is a PyTorch reimplementation of TecoGAN: Temporally Coherent GAN for Video Super-Resolution (VSR). Please refer to

165 Dec 17, 2022

Paper Title: Heterogeneous Knowledge Distillation for Simultaneous Infrared-Visible Image Fusion and Super-Resolution

HKDnet Paper Title: "Heterogeneous Knowledge Distillation for Simultaneous Infrared-Visible Image Fusion and Super-Resolution" Email:

11 Nov 12, 2022

Official Code for VideoLT: Large-scale Long-tailed Video Recognition (ICCV 2021)

Pytorch Code for VideoLT [Website][Paper] Updates [10/29/2021] Features uploaded to Google Drive, for access please send us an e-mail: zhangxing18 at

26 Sep 18, 2022

Adversarial Learning for Modeling Human Motion

Adversarial Learning for Modeling Human Motion This repository contains the open source code which reproduces the results for the paper: Adversarial l

6 Jun 15, 2021

LogDeep is an open source deeplearning-based log analysis toolkit for automated anomaly detection.

279 Dec 13, 2022

A mini-course offered to Undergrad chemistry students

The best way to use this material is by forking it by click the Fork button at the top, right corner. Then you will get your own copy to play with! Th

19 Dec 19, 2022

The official code for PRIMER: Pyramid-based Masked Sentence Pre-training for Multi-document Summarization

PRIMER The official code for PRIMER: Pyramid-based Masked Sentence Pre-training for Multi-document Summarization. PRIMER is a pre-trained model for mu

111 Dec 18, 2022

Tooling for converting STAC metadata to ODC data model

手语识别 0、使用到的模型 (1). openpose，作者：CMU-Perceptual-Computing-Lab https://github.com/CMU-Perceptual-Computing-Lab/openpose (2). 图像分类classification，作者：Bubbl

65 Dec 20, 2022

PyTorch implementation of "Transparency by Design: Closing the Gap Between Performance and Interpretability in Visual Reasoning"

Transparency-by-Design networks (TbD-nets) This repository contains code for replicating the experiments and visualizations from the paper Transparenc

351 Nov 18, 2022

StyleGAN2 - Official TensorFlow Implementation

10.1k Dec 28, 2022

small collection of functions for neural networks

neurobiba other languages: RU small collection of functions for neural networks. very easy to use! Installation: pip install neurobiba See examples h

4 Aug 23, 2021

This project demonstrates the use of neural networks and computer vision to create a classifier that interprets the Brazilian Sign Language.

LIBRAS-Image-Classifier This project demonstrates the use of neural networks and computer vision to create a classifier that interprets the Brazilian

26 Oct 14, 2022