This repository focus on Image Captioning & Video Captioning & Seq-to-Seq Learning & NLP

Overview

Awesome-Visual-CaptioningAwesome

Table of Contents

Paper Roadmap

ACL-2021

Image Captioning

  • Control Image Captioning Spatially and Temporally
  • SMURF: SeMantic and linguistic UndeRstanding Fusion for Caption Evaluation via Typicality Analysis [paper] [code]
  • Enhancing Descriptive Image Captioning with Natural Language Inference
  • UMIC: An Unreferenced Metric for Image Captioning via Contrastive Learning [paper]
  • Semantic Relation-aware Difference Representation Learning for Change Captioning

Video Captioning

  • Hierarchical Context-aware Network for Dense Video Event Captioning
  • Video Paragraph Captioning as a Text Summarization Task
  • O2NA: An Object-Oriented Non-Autoregressive Approach for Controllable Video Captioning

CVPR-2021

Image Captioning

  • Connecting What to Say With Where to Look by Modeling Human Attention Traces. [paper] [code]
  • Multiple Instance Captioning: Learning Representations from Histopathology Textbooks and Articles. [paper]
  • Improving OCR-Based Image Captioning by Incorporating Geometrical Relationship. [paper]
  • Image Change Captioning by Learning From an Auxiliary Task. [paper]
  • Scan2Cap: Context-aware Dense Captioning in RGB-D Scans. [paper] [code]
  • Towards Bridging Event Captioner and Sentence Localizer for Weakly Supervised Dense Event Captioning. paper
  • TAP: Text-Aware Pre-Training for Text-VQA and Text-Caption. [paper]
  • Towards Accurate Text-Based Image Captioning With Content Diversity Exploration. [paper]
  • FAIEr: Fidelity and Adequacy Ensured Image Caption Evaluation. [paper]
  • RSTNet: Captioning With Adaptive Attention on Visual and Non-Visual Words. [paper]
  • Human-Like Controllable Image Captioning With Verb-Specific Semantic Roles. [paper]

Video Captioning

  • Open-Book Video Captioning With Retrieve-Copy-Generate Network. [paper]
  • Towards Diverse Paragraph Captioning for Untrimmed Videos. [paper]

AAAI-2021

Image Captioning

  • Partially Non-Autoregressive Image Captioning. [code]
  • Improving Image Captioning by Leveraging Intra- and Inter-layer Global Representation in Transformer Network. [paper]
  • Object Relation Attention for Image Paragraph Captioning [paper]
  • Dual-Level Collaborative Transformer for Image Captioning. [paper] [code]
  • Memory-Augmented Image Captioning [paper]
  • Image Captioning with Context-Aware Auxiliary Guidance. [paper]
  • Consensus Graph Representation Learning for Better Grounded Image Captioning. [paper]
  • FixMyPose: Pose Correctional Captioning and Retrieval. [paper] [code] [website]
  • VIVO: Visual Vocabulary Pre-Training for Novel Object Captioning [paper]

Video Captioning

  • Non-Autoregressive Coarse-to-Fine Video Captioning. [paper]
  • Semantic Grouping Network for Video Captioning. [paper] [code]
  • Augmented Partial Mutual Learning with Frame Masking for Video Captioning. [paper]

ACMMM-2020

Image Captioning

  • Structural Semantic Adversarial Active Learning for Image Captioning. oral [paper]
  • Iterative Back Modification for Faster Image Captioning. [paper]
  • Bridging the Gap between Vision and Language Domains for Improved Image Captioning. [paper]
  • Hierarchical Scene Graph Encoder-Decoder for Image Paragraph Captioning. [paper]
  • Improving Intra- and Inter-Modality Visual Relation for Image Captioning. [paper]
  • ICECAP: Information Concentrated Entity-aware Image Captioning. [paper]
  • Attacking Image Captioning Towards Accuracy-Preserving Target Words Removal. [paper]
  • Multimodal Attention with Image Text Spatial Relationship for OCR-Based Image Captioning. [paper]

Video Captioning

  • Controllable Video Captioning with an Exemplar Sentence. oral [paper]
  • Poet: Product-oriented Video Captioner for E-commerce. oral [paper]
  • Learning Semantic Concepts and Temporal Alignment for Narrated Video Procedural Captioning. [paper]
  • Relational Graph Learning for Grounded Video Description Generation. [paper]

NeurIPS-2020

  • Prophet Attention: Predicting Attention with Future Attention for Improved Image Captioning. [paper]
  • RATT: Recurrent Attention to Transient Tasks for Continual Image Captioning. [paper]
  • Diverse Image Captioning with Context-Object Split Latent Spaces. [paper]

ECCV-2020

Image Captioning

  • Compare and Reweight: Distinctive Image Captioning Using Similar Images Sets. oral [paper]
  • In-Home Daily-Life Captioning Using Radio Signals. oral [paper] [website]
  • TextCaps: a Dataset for Image Captioning with Reading Comprehension. oral [paper] [website] [code]
  • SODA: Story Oriented Dense Video Captioning Evaluation Framework. [paper]
  • Towards Unique and Informative Captioning of Images. [paper]
  • Learning Visual Representations with Caption Annotations. [paper] [website]
  • Fashion Captioning: Towards Generating Accurate Descriptions with Semantic Rewards. [paper]
  • Length Controllable Image Captioning. [paper] [code]
  • Comprehensive Image Captioning via Scene Graph Decomposition. [paper] [website]
  • Finding It at Another Side: A Viewpoint-Adapted Matching Encoder for Change Captioning. [paper]
  • Captioning Images Taken by People Who Are Blind. [paper]
  • Learning to Generate Grounded Visual Captions without Localization Supervision. [paper] [code]

Video Captioning

  • Learning Modality Interaction for Temporal Sentence Localization and Event Captioning in Videos. Spotlight [paper] [code]
  • Character Grounding and Re-Identification in Story of Videos and Text Descriptions. Spotlight [paper] [code]
  • Identity-Aware Multi-Sentence Video Description. [paper]

CVPR-2020

Image Captioning

  • Context-Aware Group Captioning via Self-Attention and Contrastive Features [paper]
    Zhuowan Li, Quan Tran, Long Mai, Zhe Lin, Alan L. Yuille
  • More Grounded Image Captioning by Distilling Image-Text Matching Model [paper] [code]
    Yuanen Zhou, Meng Wang, Daqing Liu, Zhenzhen Hu, Hanwang Zhang
  • Show, Edit and Tell: A Framework for Editing Image Captions [paper] [code]
    Fawaz Sammani, Luke Melas-Kyriazi
  • Say As You Wish: Fine-Grained Control of Image Caption Generation With Abstract Scene Graphs [paper] [code]
    Shizhe Chen, Qin Jin, Peng Wang, Qi Wu
  • Normalized and Geometry-Aware Self-Attention Network for Image Captioning [paper]
    Longteng Guo, Jing Liu, Xinxin Zhu, Peng Yao, Shichen Lu, Hanqing Lu
  • Meshed-Memory Transformer for Image Captioning [paper] [code]
    Marcella Cornia, Matteo Stefanini, Lorenzo Baraldi, Rita Cucchiara
  • X-Linear Attention Networks for Image Captioning [paper] [code]
    Yingwei Pan, Ting Yao, Yehao Li, Tao Mei
  • Transform and Tell: Entity-Aware News Image Captioning [paper] [code] [website]
    Alasdair Tran, Alexander Mathews, Lexing Xie

Video Captioning

  • Object Relational Graph With Teacher-Recommended Learning for Video Captioning [paper]
    Ziqi Zhang, Yaya Shi, Chunfeng Yuan, Bing Li, Peijin Wang, Weiming Hu, Zheng-Jun Zha

  • Spatio-Temporal Graph for Video Captioning With Knowledge Distillation [paper] [code]
    Boxiao Pan, Haoye Cai, De-An Huang, Kuan-Hui Lee, Adrien Gaidon, Ehsan Adeli, Juan Carlos Niebles

  • Better Captioning With Sequence-Level Exploration [paper]
    Jia Chen, Qin Jin

  • Syntax-Aware Action Targeting for Video Captioning [code]
    Qi Zheng, Chaoyue Wang, Dacheng Tao

ACL-2020

Image Captioning

  • Clue: Cross-modal Coherence Modeling for Caption Generation [paper]
    Malihe Alikhani, Piyush Sharma, Shengjie Li, Radu Soricut and Matthew Stone

  • Improving Image Captioning Evaluation by Considering Inter References Variance [paper]
    Yanzhi Yi, Hangyu Deng and Jinglu Hu

  • Improving Image Captioning with Better Use of Caption [paper] [code]
    Zhan Shi, Xu Zhou, Xipeng Qiu and Xiaodan Zhu

Video Captioning

  • MART: Memory-Augmented Recurrent Transformer for Coherent Video Paragraph Captioning [paper] [code]
    Jie Lei, Liwei Wang, Yelong Shen, Dong Yu, Tamara Berg and Mohit Bansal

AAAI-2020

Image Captioning

  • Unified VLP: Unified Vision-Language Pre-Training for Image Captioning and VQA [paper]
    Luowei Zhou (University of Michigan); Hamid Palangi (Microsoft Research); Lei Zhang (Microsoft); Houdong Hu (Microsoft AI and Research); Jason Corso (University of Michigan); Jianfeng Gao (Microsoft Research)

  • OffPG: Reinforcing an Image Caption Generator using Off-line Human Feedback [paper]
    Paul Hongsuck Seo (POSTECH); Piyush Sharma (Google Research); Tomer Levinboim (Google); Bohyung Han(Seoul National University); Radu Soricut (Google)

  • MemCap: Memorizing Style Knowledge for Image Captioning [paper]
    Wentian Zhao (Beijing Institute of Technology); Xinxiao Wu (Beijing Institute of Technology); Xiaoxun Zhang(Alibaba Group)

  • C-R Reasoning: Joint Commonsense and Relation Reasoning for Image and Video Captioning [paper]
    Jingyi Hou (Beijing Institute of Technology); Xinxiao Wu (Beijing Institute of Technology); Xiaoxun Zhang (AlibabaGroup); Yayun Qi (Beijing Institute of Technology); Yunde Jia (Beijing Institute of Technology); Jiebo Luo (University of Rochester)

  • MHTN: Learning Long- and Short-Term User Literal-Preference with Multimodal Hierarchical Transformer Network for Personalized Image Caption [paper]
    Wei Zhang (East China Normal University); Yue Ying (East China Normal University); Pan Lu (The University of California, Los Angeles); Hongyuan Zha (GEORGIA TECH)

  • Show, Recall, and Tell: Image Captioning with Recall Mechanism [paper]
    Li WANG (MoE Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University, China); Zechen BAI(Institute of Software, Chinese Academy of Science, China); Yonghua Zhang (Bytedance); Hongtao Lu (Shanghai Jiao Tong University)

  • Interactive Dual Generative Adversarial Networks for Image Captioning
    Junhao Liu (Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences); Kai Wang (Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences); Chunpu Xu (Huazhong University of Science and Technology); Zhou Zhao (Zhejiang University); Ruifeng Xu (Harbin Institute of Technology (Shenzhen)); Ying Shen (Peking University Shenzhen Graduate School); Min Yang ( Chinese Academy of Sciences)

  • FDM-net: Feature Deformation Meta-Networks in Image Captioning of Novel Objects [paper]
    Tingjia Cao (Fudan University); Ke Han (Fudan University); Xiaomei Wang (Fudan University); Lin Ma (Tencent AI Lab); Yanwei Fu (Fudan University); Yu-Gang Jiang (Fudan University); Xiangyang Xue (Fudan University)

Video Captioning

  • An Efficient Framework for Dense Video Captioning
    Maitreya Suin (Indian Institute of Technology Madras)*; Rajagopalan Ambasamudram (Indian Institute of Technology Madras)

ACL-2019

  • Informative Image Captioning with External Sources of Information [paper]
    Sanqiang Zhao, Piyush Sharma, Tomer Levinboim and Radu Soricut

  • Dense Procedure Captioning in Narrated Instructional Videos [paper]
    Botian Shi, Lei Ji, Yaobo Liang, Nan Duan, Peng Chen, Zhendong Niu and Ming Zhou

  • Bridging by Word: Image Grounded Vocabulary Construction for Visual Captioning [paper]
    Zhihao Fan, Zhongyu Wei, Siyuan Wang and Xuanjing Huang

  • Bridging by Word: Image Grounded Vocabulary Construction for Visual Captioning [paper]
    Zhihao Fan, Zhongyu Wei, Siyuan Wang and Xuanjing Huang

  • Generating Question Relevant Captions to Aid Visual Question Answering [paper]
    Jialin Wu, Zeyuan Hu and Raymond Mooney

  • Bridging by Word: Image Grounded Vocabulary Construction for Visual Captioning [paper]
    Zhihao Fan, Zhongyu Wei, Siyuan Wang and Xuanjing Huang

NeurIPS-2019

Image Captioning

  • AAT: Adaptively Aligned Image Captioning via Adaptive Attention Time [paper] [code]
    Lun Huang, Wenmin Wang, Yaxian Xia, Jie Chen
  • ObjRel Transf: Image Captioning: Transforming Objects into Words [paper] [code]
    Simao Herdade, Armin Kappeler, Kofi Boakye, Joao Soares
  • VSSI-cap: Variational Structured Semantic Inference for Diverse Image Captioning [paper]
    Fuhai Chen, Rongrong Ji, Jiayi Ji, Xiaoshuai Sun, Baochang Zhang, Xuri Ge, Yongjian Wu, Feiyue Huang

ICCV-2019

Video Captioning

  • VATEX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research [paper] [challenge]
    Xin Wang, Jiawei Wu, Junkun Chen, Lei Li, Yuan-Fang Wang, William Yang Wang
    ICCV 2019 Oral

  • POS+CG: Controllable Video Captioning With POS Sequence Guidance Based on Gated Fusion Network [paper]
    Bairui Wang, Lin Ma, Wei Zhang, Wenhao Jiang, Jingwen Wang, Wei Liu

  • POS: Joint Syntax Representation Learning and Visual Cue Translation for Video Captioning [paper]
    Jingyi Hou, Xinxiao Wu, Wentian Zhao, Jiebo Luo, Yunde Jia

Image Captioning

  • DUDA: Robust Change Captioning
    Dong Huk Park, Trevor Darrell, Anna Rohrbach [paper]
    ICCV 2019 Oral

  • AoANet: Attention on Attention for Image Captioning [paper]
    Lun Huang, Wenmin Wang, Jie Chen, Xiao-Yong Wei
    ICCV 2019 Oral

  • MaBi-LSTMs: Exploring Overall Contextual Information for Image Captioning in Human-Like Cognitive Style [paper]
    Hongwei Ge, Zehang Yan, Kai Zhang, Mingde Zhao, Liang Sun

  • Align2Ground: Align2Ground: Weakly Supervised Phrase Grounding Guided by Image-Caption Alignment [paper]
    Samyak Datta, Karan Sikka, Anirban Roy, Karuna Ahuja, Devi Parikh, Ajay Divakaran*

  • GCN-LSTM+HIP: Hierarchy Parsing for Image Captioning [paper]
    Ting Yao, Yingwei Pan, Yehao Li, Tao Mei

  • IR+Tdiv: Generating Diverse and Descriptive Image Captions Using Visual Paraphrases [paper]
    Lixin Liu, Jiajun Tang, Xiaojun Wan, Zongming Guo

  • CNM+SGAE: Learning to Collocate Neural Modules for Image Captioning [paper]
    Xu Yang, Hanwang Zhang, Jianfei Cai

  • Seq-CVAE: Sequential Latent Spaces for Modeling the Intention During Diverse Image Captioning [paper]
    Jyoti Aneja, Harsh Agrawal, Dhruv Batra, Alexander Schwing

  • Towards Unsupervised Image Captioning With Shared Multimodal Embeddings [paper]
    Iro Laina, Christian Rupprecht, Nassir Navab

  • Human Attention in Image Captioning: Dataset and Analysis [paper]
    Sen He, Hamed R. Tavakoli, Ali Borji, Nicolas Pugeault

  • RDN: Reflective Decoding Network for Image Captioning [paper]
    Lei Ke, Wenjie Pei, Ruiyu Li, Xiaoyong Shen, Yu-Wing Tai

  • PSST: Joint Optimization for Cooperative Image Captioning [paper]
    Gilad Vered, Gal Oren, Yuval Atzmon, Gal Chechik

  • MUTAN: Watch, Listen and Tell: Multi-Modal Weakly Supervised Dense Event Captioning [paper]
    Tanzila Rahman, Bicheng Xu, Leonid Sigal

  • ETA: Entangled Transformer for Image Captioning [paper]
    Guang Li, Linchao Zhu, Ping Liu, Yi Yang

  • nocaps: novel object captioning at scale [paper]
    Harsh Agrawal, Karan Desai, Yufei Wang, Xinlei Chen, Rishabh Jain, Mark Johnson, Dhruv Batra, Devi Parikh, Stefan Lee, Peter Anderson

  • Cap2Det: Learning to Amplify Weak Caption Supervision for Object Detection [paper]
    Keren Ye, Mingda Zhang, Adriana Kovashka, Wei Li, Danfeng Qin, Jesse Berent

  • Graph-Align: Unpaired Image Captioning via Scene Graph Alignments paper
    Jiuxiang Gu, Shafiq Joty, Jianfei Cai, Handong Zhao, Xu Yang, Gang Wang

  • : Learning to Caption Images Through a Lifetime by Asking Questions [paper]
    Tingke Shen, Amlan Kar, Sanja Fidler

CVPR-2019

Image Captioning

  • SGAE: Auto-Encoding Scene Graphs for Image Captioning [paper] [code]
    XU YANG (Nanyang Technological University); Kaihua Tang (Nanyang Technological University); Hanwang Zhang (Nanyang Technological University); Jianfei Cai (Nanyang Technological University)
    CVPR 2019 Oral

  • POS: Fast, Diverse and Accurate Image Captioning Guided by Part-Of-Speech [paper]
    Aditya Deshpande (University of Illinois at UC); Jyoti Aneja (University of Illinois, Urbana-Champaign); Liwei Wang (Tencent AI Lab); Alexander Schwing (UIUC); David Forsyth (Univeristy of Illinois at Urbana-Champaign)
    CVPR 2019 Oral

  • Unsupervised Image Captioning [paper] [code]
    Yang Feng (University of Rochester); Lin Ma (Tencent AI Lab); Wei Liu (Tencent); Jiebo Luo (U. Rochester)

  • Adversarial Attack to Image Captioning via Structured Output Learning With Latent Variables [paper]
    Yan Xu (UESTC); Baoyuan Wu (Tencent AI Lab); Fumin Shen (UESTC); Yanbo Fan (Tencent AI Lab); Yong Zhang (Tencent AI Lab); Heng Tao Shen (University of Electronic Science and Technology of China (UESTC)); Wei Liu (Tencent)

  • Describing like Humans: On Diversity in Image Captioning [paper]
    Qingzhong Wang (Department of Computer Science, City University of Hong Kong); Antoni Chan (City University of Hong Kong, Hong, Kong)

  • MSCap: Multi-Style Image Captioning With Unpaired Stylized Text [paper]
    Longteng Guo ( Institute of Automation, Chinese Academy of Sciences); Jing Liu (National Lab of Pattern Recognition, Institute of Automation,Chinese Academy of Sciences); Peng Yao (University of Science and Technology Beijing); Jiangwei Li (Huawei); Hanqing Lu (NLPR, Institute of Automation, CAS)

  • CapSal: Leveraging Captioning to Boost Semantics for Salient Object Detection [paper] [code]
    Lu Zhang (Dalian University of Technology); Huchuan Lu (Dalian University of Technology); Zhe Lin (Adobe Research); Jianming Zhang (Adobe Research); You He (Naval Aviation University)

  • Context and Attribute Grounded Dense Captioning [paper]
    Guojun Yin (University of Science and Technology of China); Lu Sheng (The Chinese University of Hong Kong); Bin Liu (University of Science and Technology of China); Nenghai Yu (University of Science and Technology of China); Xiaogang Wang (Chinese University of Hong Kong, Hong Kong); Jing Shao (Sensetime)

  • Dense Relational Captioning: Triple-Stream Networks for Relationship-Based Captioning [paper]
    Dong-Jin Kim (KAIST); Jinsoo Choi (KAIST); Tae-Hyun Oh (MIT CSAIL); In So Kweon (KAIST)

  • Show, Control and Tell: A Framework for Generating Controllable and Grounded Captions [paper]
    Marcella Cornia (University of Modena and Reggio Emilia); Lorenzo Baraldi (University of Modena and Reggio Emilia); Rita Cucchiara (Universita Di Modena E Reggio Emilia)

  • Self-Critical N-step Training for Image Captioning [paper]
    Junlong Gao (Peking University Shenzhen Graduate School); Shiqi Wang (CityU); Shanshe Wang (Peking University); Siwei Ma (Peking University, China); Wen Gao (PKU)

  • Look Back and Predict Forward in Image Captioning [paper]
    Yu Qin (Shanghai Jiao Tong University); Jiajun Du (Shanghai Jiao Tong University); Hongtao Lu (Shanghai Jiao Tong University); Yonghua Zhang (Bytedance)

  • Intention Oriented Image Captions with Guiding Objects [paper]
    Yue Zheng (Tsinghua University); Ya-Li Li (THU); Shengjin Wang (Tsinghua University)

  • Adversarial Semantic Alignment for Improved Image Captions [paper]
    Pierre Dognin (IBM); Igor Melnyk (IBM); Youssef Mroueh (IBM Research); Jarret Ross (IBM); Tom Sercu (IBM Research AI)

  • Good News, Everyone! Context driven entity-aware captioning for news images [paper] [code]
    Ali Furkan Biten (Computer Vision Center); Lluis Gomez (Universitat Autónoma de Barcelona); Marçal Rusiñol (Computer Vision Center, UAB); Dimosthenis Karatzas (Computer Vision Centre)

  • Pointing Novel Objects in Image Captioning [paper]
    Yehao Li (Sun Yat-Sen University); Ting Yao (JD AI Research); Yingwei Pan (JD AI Research); Hongyang Chao (Sun Yat-sen University); Tao Mei (AI Research of JD.com)

  • Engaging Image Captioning via Personality [paper]
    Kurt Shuster (Facebook); Samuel Humeau (Facebook); Hexiang Hu (USC); Antoine Bordes (Facebook); Jason Weston (FAIR)

  • Intention Oriented Image Captions With Guiding Objects [paper]
    Yue Zheng, Yali Li, Shengjin Wang

  • Exact Adversarial Attack to Image Captioning via Structured Output Learning With Latent Variables [paper]
    Yan Xu, Baoyuan Wu, Fumin Shen, Yanbo Fan, Yong Zhang, Heng Tao Shen, Wei Liu

Video Captioning

  • SDVC: Streamlined Dense Video Captioning [paper]
    Jonghwan Mun (POSTECH); Linjie Yang (ByteDance AI Lab); Zhou Ren (Snap Inc.); Ning Xu (Snap); Bohyung Han (Seoul National University)
    CVPR 2019 Oral

  • GVD: Grounded Video Description [paper]
    Luowei Zhou (University of Michigan); Yannis Kalantidis (Facebook Research); Xinlei Chen (Facebook AI Research); Jason J Corso (University of Michigan); Marcus Rohrbach (Facebook AI Research)
    CVPR 2019 Oral

  • HybridDis: Adversarial Inference for Multi-Sentence Video Description [paper]
    Jae Sung Park (UC Berkeley); Marcus Rohrbach (Facebook AI Research); Trevor Darrell (UC Berkeley); Anna Rohrbach (UC Berkeley)
    CVPR 2019 Oral

  • OA-BTG: Object-aware Aggregation with Bidirectional Temporal Graph for Video Captioning [paper]
    Junchao Zhang (Peking University); Yuxin Peng (Peking University)

  • MARN: Memory-Attended Recurrent Network for Video Captioning [paper]
    Wenjie Pei (Tencent); Jiyuan Zhang (Tencent YouTu); Xiangrong Wang (Delft University of Technology); Lei Ke (Tencent); Xiaoyong Shen (Tencent); Yu-Wing Tai (Tencent)

  • GRU-EVE: Spatio-Temporal Dynamics and Semantic Attribute Enriched Visual Encoding for Video Captioning [paper]
    Nayyer Aafaq (The University of Western Australia); Naveed Akhtar (The University of Western Australia); Wei Liu (University of Western Australia); Syed Zulqarnain Gilani (The University of Western Australia); Ajmal Mian (University of Western Australia)

AAAI-2019

Image Captioning

  • Improving Image Captioning with Conditional Generative Adversarial Nets [paper]
    CHEN CHEN (Tencent); SHUAI MU (Tencent); WANPENG XIAO (Tencent); ZEXIONG YE (Tencent); LIESI WU (Tencent); QI JU (Tencent)
    AAAI 2019 Oral
  • PAGNet: Connecting Language to Images: A Progressive Attention-Guided Network for Simultaneous Image Captioning and Language Grounding [paper]
    Lingyun Song (Xi'an JiaoTong University); Jun Liu (Xi'an Jiaotong Univerisity); Buyue Qian (Xi'an Jiaotong University); Yihe Chen (University of Toronto)
    AAAI 2019 Oral
  • Meta Learning for Image Captioning [paper]
    Nannan Li (Wuhan University); Zhenzhong Chen (WHU); Shan Liu (Tencent America)
  • DA: Deliberate Residual based Attention Network for Image Captioning [paper] Lianli Gao (The University of Electronic Science and Technology of China); kaixuan fan (University of Electronic Science and Technology of China); Jingkuan Song (UESTC); Xianglong Liu (Beihang University); Xing Xu (University of Electronic Science and Technology of China); Heng Tao Shen (University of Electronic Science and Technology of China (UESTC))
  • HAN: Hierarchical Attention Network for Image Captioning [paper]
    Weixuan Wang (School of Electronic and Information Engineering, Sun Yat-sen University);Zhihong Chen (School of Electronic and Information Engineering, Sun Yat-sen University); Haifeng Hu (School of Electronic and Information Engineering, Sun Yat-sen University)
  • COCG: Learning Object Context for Dense Captioning [paper]
    Xiangyang Li (Institute of Computing Technology, Chinese Academy of Sciences); Shuqiang Jiang (ICT, China Academy of Science); Jungong Han (Lancaster University)

Video Captioning

  • TAMoE: Learning to Compose Topic-Aware Mixture of Experts for Zero-Shot Video Captioning [code] [paper]
    Xin Wang (University of California, Santa Barbara); Jiawei Wu (University of California, Santa Barbara); Da Zhang (UC Santa Barbara); Yu Su (OSU); William Wang (UC Santa Barbara)
    AAAI 2019 Oral

  • TDConvED: Temporal Deformable Convolutional Encoder-Decoder Networks for Video Captioning [paper]
    Jingwen Chen (Sun Yat-set University); Yingwei Pan (JD AI Research); Yehao Li (Sun Yat-Sen University); Ting Yao (JD AI Research); Hongyang Chao (Sun Yat-sen University); Tao Mei (AI Research of JD.com)
    AAAI 2019 Oral

  • FCVC-CF&IA: Fully Convolutional Video Captioning with Coarse-to-Fine and Inherited Attention [paper]
    Kuncheng Fang (Fudan University); Lian Zhou (Fudan University); Cheng Jin (Fudan University); Yuejie Zhang (Fudan University); Kangnian Weng (Shanghai University of Finance and Economics); Tao Zhang (Shanghai University of Finance and Economics); Weiguo Fan (University of Iowa)

  • MGSA: Motion Guided Spatial Attention for Video Captioning [paper]
    Shaoxiang Chen (Fudan University); Yu-Gang Jiang (Fudan University)

Owner
Ziqi Zhang
Ziqi Zhang
Analysis of Antarctica sequencing samples contaminated with SARS-CoV-2

Analysis of SARS-CoV-2 reads in sequencing of 2018-2019 Antarctica samples in PRJNA692319 The samples analyzed here are described in this preprint, wh

Jesse Bloom 4 Feb 09, 2022
Official implementation of the article "Unsupervised JPEG Domain Adaptation For Practical Digital Forensics"

Unsupervised JPEG Domain Adaptation for Practical Digital Image Forensics @WIFS2021 (Montpellier, France) Rony Abecidan, Vincent Itier, Jeremie Boulan

Rony Abecidan 6 Jan 06, 2023
A minimal TPU compatible Jax implementation of NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis

NeRF Minimal Jax implementation of NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. Result of Tiny-NeRF RGB Depth

Soumik Rakshit 11 Jul 24, 2022
An open framework for Federated Learning.

Welcome to Intel® Open Federated Learning Federated learning is a distributed machine learning approach that enables organizations to collaborate on m

Intel Corporation 397 Dec 27, 2022
TorchX: A PyTorch Extension Library for More Efficient Deep Learning

TorchX TorchX: A PyTorch Extension Library for More Efficient Deep Learning. @misc{torchx, author = {Ansheng You and Changxu Wang}, title = {T

Donny You 8 May 28, 2022
PyTorch implementation of MoCo: Momentum Contrast for Unsupervised Visual Representation Learning

MoCo: Momentum Contrast for Unsupervised Visual Representation Learning This is a PyTorch implementation of the MoCo paper: @Article{he2019moco, aut

Meta Research 3.7k Jan 02, 2023
Unofficial implementation of the ImageNet, CIFAR 10 and SVHN Augmentation Policies learned by AutoAugment using pillow

AutoAugment - Learning Augmentation Policies from Data Unofficial implementation of the ImageNet, CIFAR10 and SVHN Augmentation Policies learned by Au

Philip Popien 1.3k Jan 02, 2023
The official implementation of CVPR 2021 Paper: Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation.

Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation This repository is the official implementation of CVPR 2021 paper:

9 Nov 14, 2022
[NeurIPS 2021] Garment4D: Garment Reconstruction from Point Cloud Sequences

Garment4D [PDF] | [OpenReview] | [Project Page] Overview This is the codebase for our NeurIPS 2021 paper Garment4D: Garment Reconstruction from Point

Fangzhou Hong 112 Dec 23, 2022
Pytorch and Keras Implementations of Hyperspectral Image Classification -- Traditional to Deep Models: A Survey for Future Prospects.

The repository contains the implementations for Hyperspectral Image Classification -- Traditional to Deep Models: A Survey for Future Prospects. Model

Ankur Deria 115 Jan 06, 2023
A dead simple python wrapper for darknet that works with OpenCV 4.1, CUDA 10.1

What Dead simple python wrapper for Yolo V3 using AlexyAB's darknet fork. Works with CUDA 10.1 and OpenCV 4.1 or later (I use OpenCV master as of Jun

Pliable Pixels 6 Jan 12, 2022
Code for CVPR2021 paper 'Where and What? Examining Interpretable Disentangled Representations'.

PS-SC GAN This repository contains the main code for training a PS-SC GAN (a GAN implemented with the Perceptual Simplicity and Spatial Constriction c

Xinqi/Steven Zhu 40 Dec 16, 2022
RIFE: Real-Time Intermediate Flow Estimation for Video Frame Interpolation

RIFE RIFE: Real-Time Intermediate Flow Estimation for Video Frame Interpolation Ported from https://github.com/hzwer/arXiv2020-RIFE Dependencies NumPy

49 Jan 07, 2023
Code for: Imagine by Reasoning: A Reasoning-Based Implicit Semantic Data Augmentation for Long-Tailed Classification

Imagine by Reasoning: A Reasoning-Based Implicit Semantic Data Augmentation for Long-Tailed Classification Prerequisite PyTorch = 1.2.0 Python3 torch

16 Dec 14, 2022
Official implementation of the paper Chunked Autoregressive GAN for Conditional Waveform Synthesis

PyEmits, a python package for easy manipulation in time-series data. Time-series data is very common in real life. Engineering FSI industry (Financial

Descript 150 Dec 06, 2022
Code for ICE-BeeM paper - NeurIPS 2020

ICE-BeeM: Identifiable Conditional Energy-Based Deep Models Based on Nonlinear ICA This repository contains code to run and reproduce the experiments

Ilyes Khemakhem 65 Dec 22, 2022
Hyperbolic Image Segmentation, CVPR 2022

Hyperbolic Image Segmentation, CVPR 2022 This is the implementation of paper Hyperbolic Image Segmentation (CVPR 2022). Repository structure assets :

Mina Ghadimi Atigh 46 Dec 29, 2022
Inverse Optimal Control Adapted to the Noise Characteristics of the Human Sensorimotor System

Inverse Optimal Control Adapted to the Noise Characteristics of the Human Sensorimotor System This repository contains code for the paper Schultheis,

2 Oct 28, 2022
Method for facial emotion recognition compitition of Xunfei and Datawhale .

人脸情绪识别挑战赛-第3名-W03KFgNOc-源代码、模型以及说明文档 队名:W03KFgNOc 排名:3 正确率: 0.75564 队员:yyMoming,xkwang,RichardoMu。 比赛链接:人脸情绪识别挑战赛 文章地址:link emotion 该项目分别训练八个模型并生成csv文

6 Oct 17, 2022
Exploring whether attention is necessary for vision transformers

Do You Even Need Attention? A Stack of Feed-Forward Layers Does Surprisingly Well on ImageNet Paper/Report TL;DR We replace the attention layer in a v

Luke Melas-Kyriazi 461 Jan 07, 2023