Unofficial pytorch implementation for Self-critical Sequence Training for Image Captioning. and others.

Last update: Jan 03, 2023

Related tags

Deep Learning image-captioning

Overview

An Image Captioning codebase

This is a codebase for image captioning research.

It supports:

Self critical training from Self-critical Sequence Training for Image Captioning
Bottom up feature from ref.
Test time ensemble
Multi-GPU training. (DistributedDataParallel is now supported with the help of pytorch-lightning, see ADVANCED.md for details)
Transformer captioning model.

A simple demo colab notebook is available here

Requirements

Python 3
PyTorch 1.3+ (along with torchvision)
cider (already been added as a submodule)
coco-caption (already been added as a submodule) (Remember to follow initialization steps in coco-caption/README.md)
yacs
lmdbdict

Install

If you have difficulty running the training scripts in tools. You can try installing this repo as a python package:

python -m pip install -e .

Pretrained models

Checkout MODEL_ZOO.md.

If you want to do evaluation only, you can then follow this section after downloading the pretrained models (and also the pretrained resnet101 or precomputed bottomup features, see data/README.md).

Train your own network on COCO/Flickr30k

Prepare data.

We now support both flickr30k and COCO. See details in data/README.md. (Note: the later sections assume COCO dataset; it should be trivial to use flickr30k.)

Start training

$ python tools/train.py --id fc --caption_model newfc --input_json data/cocotalk.json --input_fc_dir data/cocotalk_fc --input_att_dir data/cocotalk_att --input_label_h5 data/cocotalk_label.h5 --batch_size 10 --learning_rate 5e-4 --learning_rate_decay_start 0 --scheduled_sampling_start 0 --checkpoint_path log_fc --save_checkpoint_every 6000 --val_images_use 5000 --max_epochs 30

$ python tools/train.py --cfg configs/fc.yml --id fc

The train script will dump checkpoints into the folder specified by --checkpoint_path (default = log_$id/). By default only save the best-performing checkpoint on validation and the latest checkpoint to save disk space. You can also set --save_history_ckpt to 1 to save every checkpoint.

To resume training, you can specify --start_from option to be the path saving infos.pkl and model.pth (usually you could just set --start_from and --checkpoint_path to be the same).

To checkout the training curve or validation curve, you can use tensorboard. The loss histories are automatically dumped into --checkpoint_path.

The current command use scheduled sampling, you can also set --scheduled_sampling_start to -1 to turn off scheduled sampling.

If you'd like to evaluate BLEU/METEOR/CIDEr scores during training in addition to validation cross entropy loss, use --language_eval 1 option, but don't forget to pull the submodule coco-caption.

For all the arguments, you can specify them in a yaml file and use --cfg to use the configurations in that yaml file. The configurations in command line will overwrite cfg file if there are conflicts.

For more options, see opts.py.

Train using self critical

First you should preprocess the dataset and get the cache for calculating cider score:

$ python scripts/prepro_ngrams.py --input_json data/dataset_coco.json --dict_json data/cocotalk.json --output_pkl data/coco-train --split train

Then, copy the model from the pretrained model using cross entropy. (It's not mandatory to copy the model, just for back-up)

$ bash scripts/copy_model.sh fc fc_rl

Then

$ python tools/train.py --id fc_rl --caption_model newfc --input_json data/cocotalk.json --input_fc_dir data/cocotalk_fc --input_att_dir data/cocotalk_att --input_label_h5 data/cocotalk_label.h5 --batch_size 10 --learning_rate 5e-5 --start_from log_fc_rl --checkpoint_path log_fc_rl --save_checkpoint_every 6000 --language_eval 1 --val_images_use 5000 --self_critical_after 30 --cached_tokens coco-train-idxs --max_epoch 50 --train_sample_n 5

$ python tools/train.py --cfg configs/fc_rl.yml --id fc_rl

You will see a huge boost on Cider score, : ).

A few notes on training. Starting self-critical training after 30 epochs, the CIDEr score goes up to 1.05 after 600k iterations (including the 30 epochs pertraining).

Generate image captions

Evaluate on raw images

Note: this doesn't work for models trained with bottomup feature. Now place all your images of interest into a folder, e.g. blah, and run the eval script:

$ python tools/eval.py --model model.pth --infos_path infos.pkl --image_folder blah --num_images 10

This tells the eval script to run up to 10 images from the given folder. If you have a big GPU you can speed up the evaluation by increasing batch_size. Use --num_images -1 to process all images. The eval script will create an vis.json file inside the vis folder, which can then be visualized with the provided HTML interface:

$ cd vis
$ python -m SimpleHTTPServer

Now visit localhost:8000 in your browser and you should see your predicted captions.

Evaluate on Karpathy's test split

$ python tools/eval.py --dump_images 0 --num_images 5000 --model model.pth --infos_path infos.pkl --language_eval 1

The defualt split to evaluate is test. The default inference method is greedy decoding (--sample_method greedy), to sample from the posterior, set --sample_method sample.

Beam Search. Beam search can increase the performance of the search for greedy decoding sequence by ~5%. However, this is a little more expensive. To turn on the beam search, use --beam_size N, N should be greater than 1.

Evaluate on COCO test set

$ python tools/eval.py --input_json cocotest.json --input_fc_dir data/cocotest_bu_fc --input_att_dir data/cocotest_bu_att --input_label_h5 none --num_images -1 --model model.pth --infos_path infos.pkl --language_eval 0

You can download the preprocessed file cocotest.json, cocotest_bu_att and cocotest_bu_fc from link.

Miscellanea

Using cpu. The code is currently defaultly using gpu; there is even no option for switching. If someone highly needs a cpu model, please open an issue; I can potentially create a cpu checkpoint and modify the eval.py to run the model on cpu. However, there's no point using cpus to train the model.

Train on other dataset. It should be trivial to port if you can create a file like dataset_coco.json for your own dataset.

Live demo. Not supported now. Welcome pull request.

For more advanced features:

Checkout ADVANCED.md.

Reference

If you find this repo useful, please consider citing (no obligation at all):

@article{luo2018discriminability,
  title={Discriminability objective for training descriptive captions},
  author={Luo, Ruotian and Price, Brian and Cohen, Scott and Shakhnarovich, Gregory},
  journal={arXiv preprint arXiv:1803.04376},
  year={2018}
}

Of course, please cite the original paper of models you are using (You can find references in the model files).

Acknowledgements

Thanks the original neuraltalk2 and awesome PyTorch team.

Comments

关于bottom up and top down模型的问题

你好， bottom up and top down的论文里说 60k iterations 训9个小时就能到达cider 120.1 的效果。但我将你代码模型的参数修改成论文中的参数，并将attention lstm那块的输入修改成每个bounding box 的image feature的均值。但是也达不到论文里的效果。想问问你，以你的经验来看，觉得会是什么原因呢？

opened by JimLee4530 26
The cider score gets smaller than pretrained model

Hi, i am trying to reproduce your code in tensorflow. I almost write my code as you do, but find the cider score is getting smaller than before. At the same time, the sampling sentence is also getting shorter. Can you give me some advice?

opened by wjb123 23
Structure loss

Hi,

I have a question regarding the structure loss in su+bu+structure. The structure loss weight is similar as the lambda in your discriminality paper? That is scales the rewards with cross-entropy ? Furthermore, if that is the case each of the rewards can as well have different weights?

Thank you.

opened by mememimis 20
About multi-GPU training

hi~, when I tried to train the fc_model on 2 GPUs by setting the environment variable CUDA_VISIBLE_DEVICES=0,1, I ran this code with some errors intrain.py#125, but there were no errors when I ran your codes with only one GPU.

Does this repository support multi-GPUs training?

opened by xuyan1115 16
evaluation error using pre-trained model

Hi Ruotian,

I tried to run the following script: python eval.py --model resnet50.pth --infos_path infos.pkl --image_folder ./image/val2014_coco/ --num_images 1

The model was resnet50 (and resnet101) downloaded from your google drive. But I got the error: Traceback (most recent call last): File "eval.py", line 102, in <module> model.load_state_dict(torch.load(opt.model)) File "/home/wentong/miniconda2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 522, in load_state_dict .format(name)) KeyError: 'unexpected key "conv1.weight" in state_dict'

I have searched online but there was little information about that. I guess you have used multiple gpus.

Any advice? Thank you for your implementation.

opened by Wentong-DST 15
CUDA out of memory in self-critical training but not in xe

Hello. I am using GTX2080 Ti with 11GB memory. I trained the model using xe and it works fine. But then when I run self-critical, I get CUDA out of memory. How come the model can fit in xe training but not in rl training? It is the same model with the same number of parameters. Any advice would help

opened by homelifes 12
Issue when training

I am trying to train the model as the per the instructions given on the repo. I am getting below error:

[[email protected] self-critical.pytorch]$ python tools/train.py --id fc --caption_model newfc --input_json data/cocotalk.json --input_fc_dir data/cocotalk_fc --input_att_dir data/cocotalk_att --input_label_h5 data/cocotalk_label.h5 --batch_size 10 --learning_rate 5e-4 --learning_rate_decay_start 0 --scheduled_sampling_start 0 --checkpoint_path log_fc --save_checkpoint_every 6000 --val_images_use 5000 --max_epochs 30 Hugginface transformers not installed; please visit https://github.com/huggingface/transformers meshed-memory-transformer not installed; please runpip install git+https://github.com/ruotianluo/meshed-memory-transformer.gitDataLoader loading json file: data/cocotalk.json vocab size is 9487 DataLoader loading h5 file: data/cocotalk_fc data/cocotalk_att data/cocotalk_box data/cocotalk_label.h5 max sequence length in data is 16 read 123287 image features assigned 113287 images to split train assigned 5000 images to split val assigned 5000 images to split test /home/default/ephemeral_drive/work/image_captioning/self-critical.pytorch/captioning/data/dataloader.py:291: RuntimeWarning: Mean of empty slice. fc_feat = att_feat.mean(0) /home/default/ephemeral_drive/work/image_captioning/self-critical.pytorch/captioning/data/dataloader.py:291: RuntimeWarning: Mean of empty slice. fc_feat = att_feat.mean(0) /home/default/ephemeral_drive/work/image_captioning/self-critical.pytorch/captioning/data/dataloader.py:291: RuntimeWarning: Mean of empty slice. fc_feat = att_feat.mean(0) /home/default/ephemeral_drive/work/image_captioning/self-critical.pytorch/captioning/data/dataloader.py:291: RuntimeWarning: Mean of empty slice. fc_feat = att_feat.mean(0) /home/default/ephemeral_drive/work/image_captioning/self-critical.pytorch/captioning/data/dataloader.py:291: RuntimeWarning: Mean of empty slice. fc_feat = att_feat.mean(0) /home/default/ephemeral_drive/work/image_captioning/self-critical.pytorch/captioning/data/dataloader.py:291: RuntimeWarning: Mean of empty slice. fc_feat = att_feat.mean(0) /home/default/ephemeral_drive/work/image_captioning/self-critical.pytorch/captioning/data/dataloader.py:291: RuntimeWarning: Mean of empty slice. fc_feat = att_feat.mean(0) /home/default/ephemeral_drive/work/image_captioning/self-critical.pytorch/captioning/data/dataloader.py:291: RuntimeWarning: Mean of empty slice. fc_feat = att_feat.mean(0) /home/default/ephemeral_drive/work/image_captioning/self-critical.pytorch/captioning/data/dataloader.py:291: RuntimeWarning: Mean of empty slice. fc_feat = att_feat.mean(0) /home/default/ephemeral_drive/work/image_captioning/self-critical.pytorch/captioning/data/dataloader.py:291: RuntimeWarning: Mean of empty slice. fc_feat = att_feat.mean(0) /home/default/ephemeral_drive/work/image_captioning/self-critical.pytorch/captioning/data/dataloader.py:291: RuntimeWarning: Mean of empty slice. fc_feat = att_feat.mean(0) /home/default/ephemeral_drive/work/image_captioning/self-critical.pytorch/captioning/data/dataloader.py:291: RuntimeWarning: Mean of empty slice. fc_feat = att_feat.mean(0) 2020-07-27 22:49:15.536672: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1 Read data: 0.0002117156982421875 Traceback (most recent call last): File "tools/train.py", line 289, in <module> train(opt) File "tools/train.py", line 182, in train model_out = dp_lw_model(fc_feats, att_feats, labels, masks, att_masks, data['gts'], torch.arange(0, len(data['gts'])), sc_flag, struc_flag) File "/usr/local/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 550, in __call__ result = self.forward(*input, **kwargs) File "/usr/local/lib64/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 155, in forward outputs = self.parallel_apply(replicas, inputs, kwargs) File "/usr/local/lib64/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 165, in parallel_apply return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)]) File "/usr/local/lib64/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 85, in parallel_apply output.reraise() File "/usr/local/lib64/python3.6/site-packages/torch/_utils.py", line 395, in reraise raise self.exc_type(msg) StopIteration: Caught StopIteration in replica 0 on device 0. Original Traceback (most recent call last): File "/usr/local/lib64/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 60, in _worker output = module(*input, **kwargs) File "/usr/local/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 550, in __call__ result = self.forward(*input, **kwargs) File "/home/default/ephemeral_drive/work/image_captioning/self-critical.pytorch/captioning/modules/loss_wrapper.py", line 45, in forward loss = self.crit(self.model(fc_feats, att_feats, labels[..., :-1], att_masks), labels[..., 1:], masks[..., 1:]) File "/usr/local/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 550, in __call__ result = self.forward(*input, **kwargs) File "/home/default/ephemeral_drive/work/image_captioning/self-critical.pytorch/captioning/models/CaptionModel.py", line 33, in forward return getattr(self, '_'+mode)(*args, **kwargs) File "/home/default/ephemeral_drive/work/image_captioning/self-critical.pytorch/captioning/models/AttModel.py", line 128, in _forward state = self.init_hidden(batch_size*seq_per_img) File "/home/default/ephemeral_drive/work/image_captioning/self-critical.pytorch/captioning/models/AttModel.py", line 99, in init_hidden weight = next(self.parameters()) StopIteration

PyTorch version: '1.5.0+cu101'

I saw a pytorch - bug https://github.com/huggingface/transformers/issues/3936, which describes a similar issue. Not sure if it related.

opened by gsrivas4 11
transformer性能

您好，我使用 transformer.yml训练只能达到 Bleu_1: 0.749 Bleu_2: 0.584 Bleu_3: 0.443 Bleu_4: 0.336 METEOR: 0.271 ROUGE_L: 0.553 CIDEr: 1.092 远远不及您提及的分数，请问您训练参数是如何设置的？

opened by hasky123 10
Type error while training

@ruotianluo Sorry again, after fixing the file problem, I got an error: Expected tensor for argument #1 'indices' to have scalar type Long; but got torch.cuda.IntTensor instead (while checking arguments for embedding)

run tools/train.py --cfg configs/fc_rl.yml --id fc_rl DataLoader loading json file: data/cocotalk.json vocab size is 9487 DataLoader loading h5 file: data/cocotalk_fc data/cocotalk_att data/cocotalk_box data/cocotalk_label.h5 max sequence length in data is 16 read 123287 image features assigned 113287 images to split train assigned 5000 images to split val assigned 5000 images to split test Read data: 0.003994464874267578 Save ckpt on exception ... model saved to ./log_fc_rl\model.pth Save ckpt done. Traceback (most recent call last): File "D:\Stephan\Final project\self-critical.pytorch-master\tools\train.py", line 183, in train model_out = dp_lw_model(fc_feats, att_feats, labels, masks, att_masks, data['gts'], torch.arange(0, len(data['gts'])), sc_flag, struc_flag) File "C:\Users\ncku_ailab\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 541, in call result = self.forward(*input, **kwargs) File "C:\Users\ncku_ailab\Anaconda3\lib\site-packages\torch\nn\parallel\data_parallel.py", line 150, in forward return self.module(*inputs[0], **kwargs[0]) File "C:\Users\ncku_ailab\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 541, in call result = self.forward(*input, **kwargs) File "D:\Stephan\Final project\self-critical.pytorch-master\captioning\modules\loss_wrapper.py", line 45, in forward loss = self.crit(self.model(fc_feats, att_feats, labels[..., :-1], att_masks), labels[..., 1:], masks[..., 1:]) File "C:\Users\ncku_ailab\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 541, in call result = self.forward(*input, **kwargs) File "D:\Stephan\Final project\self-critical.pytorch-master\captioning\models\CaptionModel.py", line 33, in forward return getattr(self, '_'+mode)(*args, **kwargs) File "D:\Stephan\Final project\self-critical.pytorch-master\captioning\models\AttModel.py", line 160, in _forward output, state = self.get_logprobs_state(it, p_fc_feats, p_att_feats, pp_att_feats, p_att_masks, state) File "D:\Stephan\Final project\self-critical.pytorch-master\captioning\models\AttModel.py", line 167, in get_logprobs_state xt = self.embed(it) File "C:\Users\ncku_ailab\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 541, in call result = self.forward(*input, **kwargs) File "C:\Users\ncku_ailab\Anaconda3\lib\site-packages\torch\nn\modules\sparse.py", line 114, in forward self.norm_type, self.scale_grad_by_freq, self.sparse) File "C:\Users\ncku_ailab\Anaconda3\lib\site-packages\torch\nn\functional.py", line 1484, in embedding return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse) RuntimeError: Expected tensor for argument #1 'indices' to have scalar type Long; but got torch.cuda.IntTensor instead (while checking arguments for embedding)

opened by stephancheng 10
Question regarding self-critical reward

Hi Ruotian, I'd like to ask about the nature of the reward when training with self-critical. Is it normal to start at negative or zero for the first few epochs? I am getting the following for the first epoch with self-critical (after training with XE for 13 epochs):

Is this normal? And what is the maximum reward you achieved after training with self-critical?

Also, I'm using the py3 branch. I saw there are a lot of differences between the py3 and py2 branches. So is the py3 branch reliable to use?

Looking forward to your answer!

opened by fawazsammani 9

ZeroDivisionError: division by zero

ZeroDivisionError: division by zero Hi Dr. Luo. I try to evaluate model, but I meet an error.

error info:

python eval.py --model data/pretrain/fc/model-best.pth  --infos_path data/pretrain/fc/infos_fc-best.pkl --image_folder blah --num_images -1
loading annotations into memory...
0:00:00.498650
creating index...
index created!
Traceback (most recent call last):
  File "eval.py", line 71, in <module>
    lang_stats = eval_utils.language_eval(opt.input_json, predictions, n_predictions, vars(opt), opt.split)
  File "/home/andrewcao95/workspace/pycharm_ws/self-critical.pytorch/eval_utils.py", line 83, in language_eval
    mean_perplexity = sum([_['perplexity'] for _ in preds_filt]) / len(preds_filt)
ZeroDivisionError: division by zero

source code:

# filter results to only those in MSCOCO validation set
    preds_filt = [p for p in preds if p['image_id'] in valids]
    mean_perplexity = sum([_['perplexity'] for _ in preds_filt]) / len(preds_filt)
    mean_entropy = sum([_['entropy'] for _ in preds_filt]) / len(preds_filt)
    print('using %d/%d predictions' % (len(preds_filt), len(preds)))
    json.dump(preds_filt, open(cache_path, 'w')) # serialize to temporary json file. Sigh, COCO API...

    cocoRes = coco.loadRes(cache_path)
    cocoEval = COCOEvalCap(coco, cocoRes)
    cocoEval.params['image_id'] = cocoRes.getImgIds()
    cocoEval.evaluate()

    for metric, score in cocoEval.eval.items():
        out[metric] = score
    # Add mean perplexity
    out['perplexity'] = mean_perplexity
    out['entropy'] = mean_entropy

    imgToEval = cocoEval.imgToEval
    for k in list(imgToEval.values())[0]['SPICE'].keys():
        if k != 'All':
            out['SPICE_'+k] = np.array([v['SPICE'][k]['f'] for v in imgToEval.values()])
            out['SPICE_'+k] = (out['SPICE_'+k][out['SPICE_'+k]==out['SPICE_'+k]]).mean()
    for p in preds_filt:
        image_id, caption = p['image_id'], p['caption']
        imgToEval[image_id]['caption'] = caption

opened by andrewcao95 8

How can I get my dataset image feature to scripts/prepro_feats.py

Thanks for your job! When I prepare data I find that I have not pre_extract the image features in my dataset. So, how can I get these image features? Could you give me some codebase?

opened by ShanZard 0
how to set the model ensemble?

hello, I see the model ensembling method in the paper, which is helpful to improve the model performance. So I want to know how to implement the method of model enhancement？Thanks very much.

opened by fjqfjqfjqfjqfjqfjqfjq 0
transformer_nsc

你好，我在运行transformer_nsc.yml时，碰到以下问题，请问怎么解决？ Hugginface transformers not installed; please visit https://github.com/huggingface/transformers meshed-memory-transformer not installed; please run pip install git+https://github.com/ruotianluo/meshed-memory-transformer.git Warning: coco-caption not available

opened by bai-24 11
Running Inference using CPU only.

I am not able to run inference on the model provided in the demo using CPU only. I am getting cuda device not available error. I tried removing all gpu and cuda references in the code and replacing it with CPU but it still does not work. It would be really helpful if you could help understand how to run the demo code without GPU.

Thanks.

opened by harindercnvrg 0

Releases(3.2)

3.2(May 29, 2020)
Faster beam search

support h5 feature file

allow beam search + scst (doesn't work as well though)

Add a few models, BertCapModel and m2transformer (usefulness still question marked)

Add projects.

Source code(tar.gz)
Source code(zip)
v3.1(Jan 10, 2020)
Since it's 2020, py3 is officially supported. Open an issue if there is still something wrong.

Finally, there is a model zoo which is relatively complete. Feel free to try the provided models.

Source code(tar.gz)
Source code(zip)
3(Dec 31, 2019)
Add structure loss inspired by Classical Structured Prediction Losses for Sequence to Sequence Learning

Add a function of sample n captions. Support methods described in https://www.dropbox.com/s/tdqr9efrjdkeicz/iccv.pdf?dl=0.

More pytorchy design of dataloader. Also, the dataloader now don't repeat image features according to seq_per_img. The repeating is now moved to the model forward function.

Add multi-sentence sampling evaluation metrics like mBleu, Self-CIDEr etc. (those described in https://www.dropbox.com/s/tdqr9efrjdkeicz/iccv.pdf?dl=0)

Use detectron type of config to setup experiments.

A better self critical objective. (Named as new_self_critical now.) Use config ymls that end with nsc to test the performance. A technical report will be out soon. Basically, it performs better than original SCST on all metrics (by a small margin), but also faster (by a little bit).

Source code(tar.gz)
Source code(zip)
2.3(Jul 18, 2019)

Source code(tar.gz)
Source code(zip)
2.2(Jun 25, 2019)

1 Refactor the code a little bit. 2 Add BPE (didn’t seem to work much different) 3 Add nucleus sampling, topk and gumbel softmax sampling. 4 Make AttEnsemble compatible with transformer 5 Add remove bad ending from Improving Reinforcement Learning Based Image Captioning with Natural Language Prior
Source code(tar.gz)
Source code(zip)
2.1(Jun 25, 2019)
Add loss_wrapper for multi-gpu loss computation

Fix some bugs

Add transformer.

Source code(tar.gz)
Source code(zip)
2.0.0(Apr 29, 2018)
Add support for bleu4 optimization or combination of bleu4 and cider

Add bottom-up feature support

Add ensemble during evaluation.

Add multi-gpu support.

Add miscellaneous things. (box features; experimental models etc.)

Source code(tar.gz)
Source code(zip)
1.0(Apr 28, 2018)

This version can replicate the self-critical sequence training paper.
Source code(tar.gz)
Source code(zip)

Owner

Ruotian(RT) Luo

Phd student at TTIC

GitHub Repository

Auxiliary Raw Net (ARawNet) is a ASVSpoof detection model taking both raw waveform and handcrafted features as inputs, to balance the trade-off between performance and model complexity.

Overview This repository is an implementation of the Auxiliary Raw Net (ARawNet), which is ASVSpoof detection system taking both raw waveform and hand

6 Jul 08, 2022

A cool little repl-based simulation written in Python

A cool little repl-based simulation written in Python planned to integrate machine-learning into itself to have AI battle to the death before your eye

6 Sep 17, 2022

The Turing Change Point Detection Benchmark: An Extensive Benchmark Evaluation of Change Point Detection Algorithms on real-world data

Turing Change Point Detection Benchmark Welcome to the repository for the Turing Change Point Detection Benchmark, a benchmark evaluation of change po

85 Dec 28, 2022

Airborne Optical Sectioning (AOS) is a wide synthetic-aperture imaging technique

AOS: Airborne Optical Sectioning Airborne Optical Sectioning (AOS) is a wide synthetic-aperture imaging technique that employs manned or unmanned airc

39 Dec 09, 2022

Real-CUGAN - Real Cascade U-Nets for Anime Image Super Resolution

Real Cascade U-Nets for Anime Image Super Resolution 中文 | English 🔥 Real-CUGAN

111 Dec 28, 2022

SAMO: Streaming Architecture Mapping Optimisation

SAMO: Streaming Architecture Mapping Optimiser The SAMO framework provides a method of optimising the mapping of a Convolutional Neural Network model

20 Dec 10, 2022

[CIKM 2021] Enhancing Aspect-Based Sentiment Analysis with Supervised Contrastive Learning

Enhancing Aspect-Based Sentiment Analysis with Supervised Contrastive Learning. This repo contains the PyTorch code and implementation for the paper E

18 Dec 22, 2022

An NLP library with Awesome pre-trained Transformer models and easy-to-use interface, supporting wide-range of NLP tasks from research to industrial applications.

简体中文 | English News [2021-10-12] PaddleNLP 2.1版本已发布！新增开箱即用的NLP任务能力、Prompt Tuning应用示例与生成任务的高性能推理！ 🎉 更多详细升级信息请查看Release Note。 [2021-08-22]《千言：面向事实一致性的生

6.9k Jan 01, 2023

Poplar implementation of "Bundle Adjustment on a Graph Processor" (CVPR 2020)

Poplar Implementation of Bundle Adjustment using Gaussian Belief Propagation on Graphcore's IPU Implementation of CVPR 2020 paper: Bundle Adjustment o

34 Dec 05, 2022

Grad2Task: Improved Few-shot Text Classification Using Gradients for Task Representation

Grad2Task: Improved Few-shot Text Classification Using Gradients for Task Representation Prerequisites This repo is built upon a local copy of transfo

10 Sep 28, 2022

Code for the paper: Sketch Your Own GAN

Sketch Your Own GAN Project | Paper | Youtube Our method takes in one or a few hand-drawn sketches and customizes an off-the-shelf GAN to match the in

677 Dec 28, 2022

Show Me the Whole World: Towards Entire Item Space Exploration for Interactive Personalized Recommendations

HierarchicyBandit Introduction This is the implementation of WSDM 2022 paper : Show Me the Whole World: Towards Entire Item Space Exploration for Inte

5 Sep 09, 2022

Release of the ConditionalQA dataset

ConditionalQA Datasets accompanying the paper ConditionalQA: A Complex Reading Comprehension Dataset with Conditional Answers. Disclaimer This dataset

14 Oct 17, 2022

BERTMap: A BERT-Based Ontology Alignment System

BERTMap: A BERT-based Ontology Alignment System Important Notices The relevant paper was accepted in AAAI-2022. Arxiv version is available at: https:/

36 Dec 24, 2022

Python package for covariance matrices manipulation and Biosignal classification with application in Brain Computer interface

pyRiemann pyRiemann is a python package for covariance matrices manipulation and classification through Riemannian geometry. The primary target is cla

447 Jan 05, 2023

[ICCV 2021] Relaxed Transformer Decoders for Direct Action Proposal Generation

RTD-Net (ICCV 2021) This repo holds the codes of paper: "Relaxed Transformer Decoders for Direct Action Proposal Generation", accepted in ICCV 2021. N

80 Nov 30, 2022

Supplementary code for SIGGRAPH 2021 paper: Discovering Diverse Athletic Jumping Strategies

SIGGRAPH 2021: Discovering Diverse Athletic Jumping Strategies project page paper demo video Prerequisites Important Notes We suspect there are bugs i

54 Dec 06, 2022

TensorFlow Implementation of Unsupervised Cross-Domain Image Generation

Domain Transfer Network (DTN) TensorFlow implementation of Unsupervised Cross-Domain Image Generation. Requirements Python 2.7 TensorFlow 0.12 Pickle

864 Dec 30, 2022

banditml is a lightweight contextual bandit & reinforcement learning library designed to be used in production Python services.

banditml is a lightweight contextual bandit & reinforcement learning library designed to be used in production Python services. This library is developed by Bandit ML and ex-authors of Facebook's app

51 Dec 22, 2022

使用深度学习框架提取视频硬字幕；docker容器免安装深度学习库，使用本地api接口使得界面和后端识别分离；

extract-video-subtittle 使用深度学习框架提取视频硬字幕；本地识别无需联网； CPU识别速度可观；容器提供API接口；运行环境本项目运行环境非常好搭建，我做好了docker容器免安装各种深度学习包；提供windows界面操作；容器为CPU版本；视频演示 https

16 Aug 06, 2022