An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"

Overview

The implementation of paper CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval.

CLIP4Clip is a video-text retrieval model based on CLIP (ViT-B/32). We investigate three similarity calculation approaches: parameter-free type, sequential type, and tight type, in this work. The model achieve SOTA results on MSR-VTT, MSVC, and LSMDC.

CLIP4Clip

Requirement

# From CLIP 
conda install --yes -c pytorch pytorch=1.7.1 torchvision cudatoolkit=11.0
pip install ftfy regex tqdm
pip install opencv-python boto3 requests pandas

Data Preparing

For MSRVTT

The official data and video links can be found in link.

For the convenience, you can also download the splits and captions by,

wget https://github.com/ArrowLuo/CLIP4Clip/releases/download/v0.0/msrvtt_data.zip

For MSVD

Raw videos can be download from link.

The splits and raw_captions can be found in the wonderful job collaborative-experts. For the convenience, you can also download them by,

wget https://github.com/ArrowLuo/CLIP4Clip/releases/download/v0.0/msvd_data.zip

For LSMDC

You must obtain permission from MPII to download and use the data. The download link is here. The 1000 test clips data is link. Read our paper and the dataloader for more information.

How to Run

--features_path is the video root path

--linear_patch can be set with 2d or 3d

--sim_header can be set with meanP, seqLSTM, seqTransf, or tightTransf

read our paper for more details on --linear_patch and --sim_header. Test more hyperparameters for better performance.

Download CLIP (ViT-B/32) weight,

 wget -P ./modules https://openaipublic.azureedge.net/clip/models/40d365715913c9da98579312b702a82c18be219cc2a73407c4526f58eba950af/ViT-B-32.pt

Then, run

MSRVTT

DATA_PATH=[Your MSRVTT data and videos path]
python -m torch.distributed.launch --nproc_per_node=4 \
main_task_retrieval.py --do_train --num_thread_reader=0 \
--epochs=5 --batch_size=128 --n_display=50 \
--train_csv ${DATA_PATH}/MSRVTT_train.9k.csv \
--val_csv ${DATA_PATH}/MSRVTT_JSFUSION_test.csv \
--data_path ${DATA_PATH}/MSRVTT_data.json \
--features_path ${DATA_PATH}/MSRVTT_Videos \
--output_dir ckpts/ckpt_msrvtt_retrieval_looseType \
--lr 1e-4 --max_words 32 --max_frames 12 --batch_size_val 16 \
--datatype msrvtt --expand_msrvtt_sentences  \
--feature_framerate 1 --coef_lr 1e-3 \
--freeze_layer_num 0  --slice_framepos 2 \
--loose_type --linear_patch 2d --sim_header meanP

MSVD

DATA_PATH=[Your MSVD data and videos path]
python -m torch.distributed.launch --nproc_per_node=4 \
main_task_retrieval.py --do_train --num_thread_reader=2 \
--epochs=5 --batch_size=128 --n_display=50 \
--data_path ${DATA_PATH} \
--features_path ${DATA_PATH}/MSVD_Videos \
--output_dir ckpts/ckpt_msvd_retrieval_looseType \
--lr 1e-4 --max_words 32 --max_frames 12 --batch_size_val 16 \
--datatype msvd \
--feature_framerate 1 --coef_lr 1e-3 \
--freeze_layer_num 0 --slice_framepos 2 \
--loose_type --linear_patch 2d --sim_header meanP

LSMDC

DATA_PATH=[Your LSMDC data and videos path]
python -m torch.distributed.launch --nproc_per_node=4 \
main_task_retrieval.py --do_train --num_thread_reader=2 \
--epochs=5 --batch_size=128 --n_display=50 \
--data_path ${DATA_PATH} \
--features_path ${DATA_PATH}/LSMDC_Videos \
--output_dir ckpts/ckpt_lsmdc_retrieval_looseType \
--lr 1e-4 --max_words 32 --max_frames 12 --batch_size_val 16 \
--datatype lsmdc --feature_framerate 1 --coef_lr 1e-3 \
--freeze_layer_num 0  --slice_framepos 2 \
--loose_type --linear_patch 2d --sim_header meanP

Citation

If you find CLIP4Clip useful in your work, you can cite the following paper:

@Article{Luo2021CLIP4Clip,
  author  = {Huaishao Luo and Lei Ji and Ming Zhong and Yang Chen and Wen Lei and Nan Duan and Tianrui Li},
  title   = {CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval},
  journal = {arXiv preprint arXiv:2104.08860},
  year    = {2021},
}

Acknowledgments

Our code is based on CLIP (ViT-B/32) and UniVL.

Comments
  • Poor performance when reproduce CLIP4clip(meanP) on ActivityNet

    Poor performance when reproduce CLIP4clip(meanP) on ActivityNet

    @ArrowLuo Hi, I directly train the CLIP4clip(meanP) on ActivityNet and get [email protected]=37.9 which is much worse than 40.5 reported in Table 4.

    I extracted images from the original videos with FPS=1, and trained the CLIP4clip(meanP) on 8 RTX3090. Due to the GPU memory constrain, I set the gradient_accumulation_steps=2.

    The caption is downloaded from https://cs.stanford.edu/people/ranjaykrishna/densevid/.

    opened by jianghaojun 14
  • How to download

    How to download "cross_pytorch_model.bin" as pretrained weights used in the project? & problem in DDP

    1. 我按照readme提供的参数,无法收敛,观察到日志中:
    2. 05/20/2022 00:18:31 - INFO - Weight doesn't exsits. xxx/modules/cross-base/cross_pytorch_model.bin
    3. 05/20/2022 00:18:42 - INFO - Weights from pretrained model not used in : clip.input_resolution clip.context_length clip.vocab_size 以上,请问如何下载modules/cross-base/cross_pytorch_model.bin文件?谢谢

    同时,我在2机8卡上进行分布式训练,按照正常的理解是每台机器8个进程,共16个(和GPU数一致),但跑这个工程的时候,每台机器先是启动了8个进程,接着又各派生了8个子进程,每台机器上存在16个进程,并且其中8个进程休眠,这个问题一直没有解决,想交流一下其他人是否遇到了这个问题?

    opened by AAUfoa 10
  • Some questions about the results of the MARVTT with `sim_header seqTransf`.

    Some questions about the results of the MARVTT with `sim_header seqTransf`.

    When I use the following configuration to train the model on MSRVTT Training-9K, the best result I got is 07/27/2021 13:11:01 - INFO - sim matrix size: 1000, 1000 07/27/2021 13:11:01 - INFO - Length-T: 1000, Length-V:1000 07/27/2021 13:11:01 - INFO - Text-to-Video: 07/27/2021 13:11:01 - INFO - >>> [email protected]: 43.2 - [email protected]: 71.0 - [email protected]: 79.4 - Median R: 2.0 - Mean R: 15.4 07/27/2021 13:11:01 - INFO - Video-to-Text: 07/27/2021 13:11:01 - INFO - >>> [email protected]: 43.1 - V2T$R@5: 71.2 - [email protected]: 80.7 - V2T$Median R: 2.0 - V2T$Mean R: 11.9. It's worse than the results [email protected]: 44.5 listed in the paper. Did i miss some details? Here is the configuration. CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 --master_addr=127.0.0.2 --master_port 29552 main_ta sk_retrieval.py --num_thread_reader=4 --epochs=5 --batch_size=128 --n_display=20 --train_csv /home/hadoop-vacv/cephfs/data/caoshuqia ng/data/jobs/MSRVTT/csv/msrvtt_data/MSRVTT_train.9k.csv --val_csv /home/hadoop-vacv/cephfs/data/caoshuqiang/data/jobs/MSRVTT/csv/msr vtt_data/MSRVTT_JSFUSION_test.csv --data_path /home/hadoop-vacv/cephfs/data/caoshuqiang/data/jobs/MSRVTT/csv/msrvtt_data/MSRVTT_data .json --features_path /home/hadoop-vacv/cephfs/data/caoshuqiang/data/jobs/MSRVTT/MSRVTT_Videos --output_dir /home/hadoop-vacv/cephfs /data/caoshuqiang/code/vicab/newexp/hope/clip_raw --lr 1e-4 --max_words 32 --max_frames 12 --batch_size_val 12 --datatype msrvtt -- expand_msrvtt_sentences --feature_framerate 1 --coef_lr 1e-3 --freeze_layer_num 0 --slice_framepos 2 --loose_type --linear_patch 2d --sim_header seqTransf --do_train.

    opened by sqiangcao99 10
  • subprocess.CalledProcessError

    subprocess.CalledProcessError

    Hi, I wanna train CLIP4Clip on MSRVTT. But I got this issue. Can you Help me?

    subprocess.CalledProcessError: Command '['/home/nccu/anaconda3/envs/Clip4Video-alvin/bin/python', '-u', 'main_task_retrieval.py', '--local_rank=3', '--do_train', '--num_thread_reader=0', '--epochs=5', '--batch_size=128', '--n_display=50', '--train_csv', './MSRVTT/videos/MSRVTT_train.9k.csv', '--val_csv', './MSRVTT/videos/MSRVTT_JSFUSION_test.csv', '--data_path', './MSRVTT/videos/MSRVTT_data.json', '--features_path', './MSRVTT/videos/MSRVTT_Videos', '--output_dir', 'ckpts/ckpt_msrvtt_retrieval_looseType', '--lr', '1e-4', '--max_words', '32', '--max_frames', '12', '--batch_size_val', '16', '--datatype', 'msrvtt', '--expand_msrvtt_sentences', '--feature_framerate', '1', '--coef_lr', '1e-3', '--freeze_layer_num', '0', '--slice_framepos', '2', '--loose_type', '--linear_patch', '2d', '--sim_header', 'meanP', '--pretrained_clip_name', 'ViT-B/32']' returned non-zero exit status 1.

    I will appreciate your help with this situation. Thank you in advance.

    opened by alvinlin1271320 9
  • The reproduction results of `meanP` and `seqTransf` on  `DiDeMo` dataset are much worse than those in the paper

    The reproduction results of `meanP` and `seqTransf` on `DiDeMo` dataset are much worse than those in the paper

    I trained clip4clip on didemo dataset, and the [email protected] of text-to-video is much worse than that shown in paper.

    The metric reported in paper is 43.4 on DiDeMo when similarity calculator is meanP and is 42.8 when the head is seqTransf.

    But according to my reproduction result, based on meanP, max t2v [email protected] is only up to 40.5, and it only up to 40.2 based on seqTransf.

    All settings remain the same as in the paper.

    image
    opened by HanielF 8
  • Query search

    Query search

    As I understood correctly, after training and evaluation, videos are set on the basis of ranks and similarity scores. Is there any script available to make a search query as text and get the video with some freedom of search based on confidence or similarity index with some closest ranked video ?

    opened by Tortoise17 7
  • What is the meaning of --num_thread_reader=0 in MSR-VTT training configuration?

    What is the meaning of --num_thread_reader=0 in MSR-VTT training configuration?

    Can you explain the number of thread reader in the training configuration? I can adjust this value to decrease my training time? (Why --num_thread_reader=0 in MSR-VTT while --num_thread_reader=2 in other dataset.) Thank you so much!

    opened by thinh276 6
  • loss becomes nan

    loss becomes nan

    Hi, I ran the code on MSRVTT dataset with 2 A100s, and its loss becomes nan after some iterations, like this issue. However, I found that the RawVideoExtractorCV2 function succeeded in reading the video when testing only one video input (Directly test the video_to_tensor function in RawVideoExtractorCV2 ), but failed to read the video with multiple num_workers when running the given scripts (the print log is printed in line 63 by myself, but no thing will be printed in line 211 ). Is there something wrong with the multiple threads setting?

    opened by fake-warrior8 5
  • Data split issue

    Data split issue

    Hi @ArrowLuo, I have a question about the data split during the fine-tuning stage. The paper claims that you refer to the data split of the ECCV'20 paper "Multi-modal transformer for video retrieval". In the ECCV'20 paper's code, they sample a subset from the training data to cross-validate and select model. And in your implementation, you directly validate on the 1k testing set. Is it ok to use the testing set to select the best model?

    opened by zhixinma 5
  • torch.distributed.init_process_group(backend=

    torch.distributed.init_process_group(backend="nccl") error and some other errors

    Dear author,

    Thank you for helping me about 4 months before.

    I'd tried to leave you recomment to your helping words, but my issue was already canceled. I really really feel sorry for that.

    Now, I have another issues on my running process with MSVD datasets.

    I successed to run your framework several time, but I got another problem on my environment these days.

    If you don't mind, Can I burrow your hand one more time?

    the following is my error log, and I can manage the GPU sever of two.

    each separation means that the error log raised on each server.

    If you need another request required for, leave me your comments.

    Sincerely,

    Server 16

    (CLIP4Clip) [email protected]:~/CLIP4Clip$ main_task_retrieval.py DATA_PATH=/home/key2317/CLIP4Clip/msvd_data/ VISIBLE_DEVICES=3,4,0,5 python -m torch.distributed.launch --nproc_per_node=4
    main_task_retrieval.py --do_train --num_thread_reader=2
    --epochs=5 --batch_size=128 --n_display=50
    --data_path ${DATA_PATH}
    --features_path ${DATA_PATH}/MSVD_Videos
    --output_dir ckpts/ckpt_msvd_retrieval_looseType
    --lr 1e-4 --max_words 32 --max_frames 12 --batch_size_val 16
    --datatype msvd
    --feature_framerate 1 --coef_lr 1e-3
    --freeze_layer_num 0 --slice_framepos 2
    --loose_type --linear_patch 2d --sim_header meanP
    --pretrained_clip_name ViT-B/32main_task_retrieval.py: command not found (CLIP4Clip) [email protected]:~/CLIP4Clip$ CUDA_VISIBLE_DEVICES=3,4,0,5 python -m torch.distributed.launch --nproc_per_node=4 \

    main_task_retrieval.py --do_train --num_thread_reader=2
    --epochs=5 --batch_size=128 --n_display=50
    --data_path ${DATA_PATH}
    --features_path ${DATA_PATH}/MSVD_Videos
    --output_dir ckpts/ckpt_msvd_retrieval_looseType
    --lr 1e-4 --max_words 32 --max_frames 12 --batch_size_val 16
    --datatype msvd
    --feature_framerate 1 --coef_lr 1e-3
    --freeze_layer_num 0 --slice_framepos 2
    --loose_type --linear_patch 2d --sim_header meanP
    --pretrained_clip_name ViT-B/32 Traceback (most recent call last): File "/home/key2317/anaconda3/envs/CLIP4Clip/lib/python3.8/runpy.py", line 185, in _run_module_as_main mod_name, mod_spec, code = _get_module_details(mod_name, _Error) File "/home/key2317/anaconda3/envs/CLIP4Clip/lib/python3.8/runpy.py", line 111, in _get_module_details import(pkg_name) File "/home/key2317/anaconda3/envs/CLIP4Clip/lib/python3.8/site-packages/torch/init.py", line 189, in _load_global_deps() File "/home/key2317/anaconda3/envs/CLIP4Clip/lib/python3.8/site-packages/torch/init.py", line 142, in _load_global_deps ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL) File "/home/key2317/anaconda3/envs/CLIP4Clip/lib/python3.8/ctypes/init.py", line 373, in init self._handle = _dlopen(self._name, mode) OSError: /home/key2317/anaconda3/envs/CLIP4Clip/lib/python3.8/site-packages/torch/lib/../../../../libcublas.so.11: undefined symbol: free_gemm_select, version libcublasLt.so.11 (CLIP4Clip) [email protected]:~/CLIP4Clip$

    Server 19

    (CLIP4Clip) [email protected]:~/video-multimodal/CLIP4Clip$ python -m torch.distributed.launch --nproc_per_node=4 \

    main_task_retrieval.py --do_train --num_thread_reader=2
    --epochs=5 --batch_size=128 --n_display=50
    --data_path ${DATA_PATH}
    --features_path ${DATA_PATH}/MSVD_Videos
    --output_dir ckpts/ckpt_msvd_retrieval_looseType
    --lr 1e-4 --max_words 32 --max_frames 12 --batch_size_val 16
    --datatype msvd
    --feature_framerate 1 --coef_lr 1e-3
    --freeze_layer_num 0 --slice_framepos 2
    --loose_type --linear_patch 2d --sim_header meanP
    --pretrained_clip_name ViT-B/32

    Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.

    Traceback (most recent call last): File "main_task_retrieval.py", line 29, in torch.distributed.init_process_group(backend="nccl") File "/home/key2317/anaconda3/envs/CLIP4Clip/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 436, in init_process_group store, rank, world_size = next(rendezvous_iterator) File "/home/key2317/anaconda3/envs/CLIP4Clip/lib/python3.8/site-packages/torch/distributed/rendezvous.py", line 179, in _env_rendezvous_handler store = TCPStore(master_addr, master_port, world_size, start_daemon, timeout) RuntimeError: Address already in use Traceback (most recent call last): File "/home/key2317/anaconda3/envs/CLIP4Clip/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/key2317/anaconda3/envs/CLIP4Clip/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/home/key2317/anaconda3/envs/CLIP4Clip/lib/python3.8/site-packages/torch/distributed/launch.py", line 260, in main() File "/home/key2317/anaconda3/envs/CLIP4Clip/lib/python3.8/site-packages/torch/distributed/launch.py", line 255, in main raise subprocess.CalledProcessError(returncode=process.returncode, subprocess.CalledProcessError: Command '['/home/key2317/anaconda3/envs/CLIP4Clip/bin/python', '-u', 'main_task_retrieval.py', '--local_rank=3', '--do_train', '--num_thread_reader=2', '--epochs=5', '--batch_size=128', '--n_display=50', '--data_path', '--features_path', '/MSVD_Videos', '--output_dir', 'ckpts/ckpt_msvd_retrieval_looseType', '--lr', '1e-4', '--max_words', '32', '--max_frames', '12', '--batch_size_val', '16', '--datatype', 'msvd', '--feature_framerate', '1', '--coef_lr', '1e-3', '--freeze_layer_num', '0', '--slice_framepos', '2', '--loose_type', '--linear_patch', '2d', '--sim_header', 'meanP', '--pretrained_clip_name', 'ViT-B/32']' returned non-zero exit status 1. (CLIP4Clip) [email protected]:~/video-multimodal/CLIP4Clip$ Traceback (most recent call last): File "main_task_retrieval.py", line 29, in torch.distributed.init_process_group(backend="nccl") File "/home/key2317/anaconda3/envs/CLIP4Clip/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 455, in init_process_group barrier() File "/home/key2317/anaconda3/envs/CLIP4Clip/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 1960, in barrier work = _default_pg.barrier() RuntimeError: NCCL error in: /opt/conda/conda-bld/pytorch_1607370172916/work/torch/lib/c10d/ProcessGroupNCCL.cpp:784, unhandled system error, NCCL version 2.7.8 Traceback (most recent call last): File "main_task_retrieval.py", line 29, in torch.distributed.init_process_group(backend="nccl") File "/home/key2317/anaconda3/envs/CLIP4Clip/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 455, in init_process_group barrier() File "/home/key2317/anaconda3/envs/CLIP4Clip/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 1960, in barrier work = _default_pg.barrier() RuntimeError: NCCL error in: /opt/conda/conda-bld/pytorch_1607370172916/work/torch/lib/c10d/ProcessGroupNCCL.cpp:784, unhandled system error, NCCL version 2.7.8 Traceback (most recent call last): File "main_task_retrieval.py", line 29, in torch.distributed.init_process_group(backend="nccl") File "/home/key2317/anaconda3/envs/CLIP4Clip/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 455, in init_process_group barrier() File "/home/key2317/anaconda3/envs/CLIP4Clip/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 1960, in barrier work = _default_pg.barrier() RuntimeError: NCCL error in: /opt/conda/conda-bld/pytorch_1607370172916/work/torch/lib/c10d/ProcessGroupNCCL.cpp:784, unhandled system error, NCCL version 2.7.8

    opened by celestialxevermore 5
  • Some errors : subprocess.CalledProcessError

    Some errors : subprocess.CalledProcessError

    Dear Author, really thanks you all for opening this open source.

    I'm a novice and do not have much techniques in dealing with and running all kinds of framework, so today, I've been suffered from error :

    subprocess.CalledProcessError: Command '['/home/key2317/anaconda3/envs/CLIP4Clip/bin/python', '-u', 'main_task_retrieval.py', '--local_rank=3', '--do_train', '--num_thread_reader=0', '--epochs=5', '--batch_size=128', '--n_display=50', '--train_csv', '/home/key2317/CLIP4Clip/msrvtt_data/MSRVTT_train.9k.csv', '--val_csv', '/home/key2317/CLIP4Clip/msrvtt_data/MSRVTT_JSFUSION_test.csv', '--data_path', '/home/key2317/CLIP4Clip/msrvtt_data/MSRVTT_data.json', '--features_path', '/home/key2317/CLIP4Clip/msrvtt_data/MSRVTT_Videos', '--output_dir', 'ckpts/ckpt_msrvtt_retrieval_looseType', '--lr', '1e-4', '--max_words', '32', '--max_frames', '12', '--batch_size_val', '16', '--datatype', 'msrvtt', '--expand_msrvtt_sentences', '--feature_framerate', '1', '--coef_lr', '1e-3', '--freeze_layer_num', '0', '--slice_framepos', '2', '--loose_type', '--linear_patch', '2d', '--sim_header', 'meanP', '--pretrained_clip_name', 'ViT-B/32']' returned non-zero exit status 1.

    From entering my starting command, It seemed to run well for about 5 seconds showing inner state like vison_layers: 12, vision_width : 768, blah blah blah. But unfortunately, It was all ended up with aforementioned messages.

    One of my colleague guessed that It would be problem in our unmatching issue in GPU environments, in short, GPU problems, but I'm not sure how can I diagnose my problem.

    I am really interested in your papers, also codes, but, because of this initial step, I cannot go for next. Would you please help me?

    Thanks.

    opened by celestialxevermore 5
  • About mean_pooling on text sequence

    About mean_pooling on text sequence

    Dear author, I hope you enjoy your new year.

    I have a quetion about, the reason why you do not use the function, _mean_pooling_for_similarity_sequence.

    Is there any special reason for that?

    I also looked out your previous model, UniVL, but I cannot find any reason about that.

    I hope you reply soon.

    Thx.

    Sincerly,

    opened by celestialxevermore 1
  • run simple inference

    run simple inference

    1.Is there a simple example to run an inference Eg: python infer.py --model --input 'red car', and return the video(s) or url(s) or something like that. 2. Is it possible to get the embeddings?

    opened by jdso1988 1
  • CVE-2007-4559 Patch

    CVE-2007-4559 Patch

    Patching CVE-2007-4559

    Hi, we are security researchers from the Advanced Research Center at Trellix. We have began a campaign to patch a widespread bug named CVE-2007-4559. CVE-2007-4559 is a 15 year old bug in the Python tarfile package. By using extract() or extractall() on a tarfile object without sanitizing input, a maliciously crafted .tar file could perform a directory path traversal attack. We found at least one unsantized extractall() in your codebase and are providing a patch for you via pull request. The patch essentially checks to see if all tarfile members will be extracted safely and throws an exception otherwise. We encourage you to use this patch or your own solution to secure against CVE-2007-4559. Further technical information about the vulnerability can be found in this blog.

    If you have further questions you may contact us through this projects lead researcher Kasimir Schulz.

    opened by TrellixVulnTeam 0
  • train on DiDeMo

    train on DiDeMo

    Thanks for sharing your wonderful work!

    When I train on DiDeMo dataset, the following information appears on the terminal more than once. 139ae0410327e9b89ba27ca5d1d6cfe

    But the model training was not interrupted. The model could be trained and tested normally. I ran according to the training parameters on DiDeMo dataset you provided, but changed the batch size. I want to know whether you have encountered the same situation, and whether it will affect the model results? I searched the Internet for solutions, but I could hardly find them.

    opened by qjyyyy 1
  • MSVD Weights

    MSVD Weights

    Hello, thanks for providing the code!

    In regards to the models used for generating the results in the paper, is this uploaded anywhere that can be shared? I would be interested in the inference without running the training from scratch.

    opened by ntseng450 1
Owner
ArrowLuo
Each place has water. The problem is that the depth we dig is not enough. So be more patient, be more try.
ArrowLuo
Machine translation models released by the Gourmet project

Gourmet Models Overview The Gourmet project has released several machine translation models to translate low-resource languages. This repository conta

Edinburgh NLP 5 Dec 08, 2021
NumPy String-Indexed is a NumPy extension that allows arrays to be indexed using descriptive string labels

NumPy String-Indexed NumPy String-Indexed is a NumPy extension that allows arrays to be indexed using descriptive string labels, rather than conventio

Aitan Grossman 1 Jan 08, 2022
Unofficial Parallel WaveGAN (+ MelGAN & Multi-band MelGAN & HiFi-GAN & StyleMelGAN) with Pytorch

Parallel WaveGAN implementation with Pytorch This repository provides UNOFFICIAL pytorch implementations of the following models: Parallel WaveGAN Mel

Tomoki Hayashi 1.2k Dec 23, 2022
Smart discord chatbot integrated with Dialogflow to manage different classrooms and assist in teaching!

smart-school-chatbot Smart discord chatbot integrated with Dialogflow to interact with students naturally and manage different classes in a school. De

Tom Huynh 5 Oct 24, 2022
Official implementations for various pre-training models of ERNIE-family, covering topics of Language Understanding & Generation, Multimodal Understanding & Generation, and beyond.

English|简体中文 ERNIE是百度开创性提出的基于知识增强的持续学习语义理解框架,该框架将大数据预训练与多源丰富知识相结合,通过持续学习技术,不断吸收海量文本数据中词汇、结构、语义等方面的知识,实现模型效果不断进化。ERNIE在累积 40 余个典型 NLP 任务取得 SOTA 效果,并在 G

5.4k Jan 03, 2023
TaCL: Improve BERT Pre-training with Token-aware Contrastive Learning

TaCL: Improve BERT Pre-training with Token-aware Contrastive Learning

Yixuan Su 26 Oct 17, 2022
Code for lyric-section-to-comment generation based on huggingface transformers.

CommentGeneration Code for lyric-section-to-comment generation based on huggingface transformers. Migrate Guyu model and code (both 12-layers and 24-l

Yawei Sun 8 Sep 04, 2021
Augmenty is an augmentation library based on spaCy for augmenting texts.

Augmenty: The cherry on top of your NLP pipeline Augmenty is an augmentation library based on spaCy for augmenting texts. Besides a wide array of high

Kenneth Enevoldsen 124 Dec 29, 2022
Python interface for converting Penn Treebank trees to Stanford Dependencies and Universal Depenencies

PyStanfordDependencies Python interface for converting Penn Treebank trees to Universal Dependencies and Stanford Dependencies. Example usage Start by

David McClosky 64 May 08, 2022
Long text token classification using LongFormer

Long text token classification using LongFormer

abhishek thakur 161 Aug 07, 2022
Idea is to build a model which will take keywords as inputs and generate sentences as outputs.

keytotext Idea is to build a model which will take keywords as inputs and generate sentences as outputs. Potential use case can include: Marketing Sea

Gagan Bhatia 364 Jan 03, 2023
A Persian Image Captioning model based on Vision Encoder Decoder Models of the transformers🤗.

Persian-Image-Captioning We fine-tuning the Vision Encoder Decoder Model for the task of image captioning on the coco-flickr-farsi dataset. The implem

Hamtech-ai 15 Aug 25, 2022
:id: A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.

Dedupe Python Library dedupe is a python library that uses machine learning to perform fuzzy matching, deduplication and entity resolution quickly on

Dedupe.io 3.6k Jan 02, 2023
🎐 a python library for doing approximate and phonetic matching of strings.

jellyfish Jellyfish is a python library for doing approximate and phonetic matching of strings. Written by James Turk James Turk 1.8k Dec 21, 2022

Uses Google's gTTS module to easily create robo text readin' on command.

Tool to convert text to speech, creating files for later use. TTRS uses Google's gTTS module to easily create robo text readin' on command.

0 Jun 20, 2021
fastai ulmfit - Pretraining the Language Model, Fine-Tuning and training a Classifier

fast.ai ULMFiT with SentencePiece from pretraining to deployment Motivation: Why even bother with a non-BERT / Transformer language model? Short answe

Florian Leuerer 26 May 27, 2022
Repositório da disciplina no semestre 2021-2

Avisos! Nenhum aviso! Compiladores 1 Este é o Git da disciplina Compiladores 1. Aqui ficará o material produzido em sala de aula assim como tarefas, w

6 May 13, 2022
A minimal Conformer ASR implementation adapted from ESPnet.

Conformer ASR A minimal Conformer ASR implementation adapted from ESPnet. Introduction I want to use the pre-trained English ASR model provided by ESP

Niu Zhe 3 Jan 24, 2022
Deal or No Deal? End-to-End Learning for Negotiation Dialogues

Introduction This is a PyTorch implementation of the following research papers: (1) Hierarchical Text Generation and Planning for Strategic Dialogue (

Facebook Research 1.4k Dec 29, 2022