Exploring Visual Engagement Signals for Representation Learning

Last update: Jul 23, 2022

Related tags

Deep Learning vise

Overview

Exploring Visual Engagement Signals for Representation Learning

Menglin Jia, Zuxuan Wu, Austin Reiter, Claire Cardie, Serge Belongie and Ser-Nam Lim
Cornell University, Facebook AI

arXiv: https://arxiv.org/abs/2104.07767

VisE is a pretraining approach which leverages Visual Engagement clues as supervisory signals. Given the same image, visual engagement provide semantically and contextually richer information than conventional recognition and captioning tasks. VisE transfers well to subjective downstream computer vision tasks like emotion recognition or political bias classification.

💬 Loading pretrained models

❗ NOTE: This is a torchvision-like model (all the layers before the last global average-pooling layer.). Given a batch of image tensors with size (B, 3, 224, 224), the provided models produce spatial image features of shape (B, 2048, 7, 7), where B is the batch size.

Loading models with torch.hub

Get the pretrained ResNet-50 models from VisE in one line!

VisE-250M (ResNet-50): this model is pretrained with 250 million public image posts.

import torch
model = torch.hub.load("KMnP/vise", "resnet50_250m", pretrained=True)

VisE-1.2M (ResNet-50): This model is pretrained with 1.23 million public image posts.

import torch
model = torch.hub.load("KMnP/vise", "resnet50_1m", pretrained=True)

Loading models manually

	Arch	Size	Model
VisE-250M	ResNet-50	94.3 MB	download
VisE-1.2M	ResNet-50	94.3 MB	download

If you encounter any issues with torch.hub, alternatively you can download the model checkpoints manually, and then following the script below.

import torch
import torchvision

# Create a torchvision resnet50 with randomly initialized weights.
model = torchvision.models.resnet50(pretrained=False)

# Get the model before the global aver-pooling layer.
model = torch.nn.Sequential(*list(model.children())[:-2])

# load the pretrained model from a local path: <CHECKPOINT_PATH>:
model.load_state_dict(torch.load(CHECKPOINT_PATH))

💬 Citing VisE

If you find VisE useful in your research, please cite the following publication.

@misc{jia2021vise,
      title={Exploring Visual Engagement Signals for Representation Learning}, 
      author={Menglin Jia and Zuxuan Wu and Austin Reiter and Claire Cardie and Serge Belongie and Ser-Nam Lim},
      year={2021},
      eprint={2104.07767},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

💬 Acknowledgments

We thank Marseille who was featured in the teaser photo.

💬 License

VisE models are released under the CC-BY-NC 4.0 license. See LICENSE for additional details.

Exploring Visual Engagement Signals for Representation Learning

Related tags

Overview

Exploring Visual Engagement Signals for Representation Learning

Menglin Jia, Zuxuan Wu, Austin Reiter, Claire Cardie, Serge Belongie and Ser-Nam Lim
Cornell University, Facebook AI

💬 Loading pretrained models

Loading models with torch.hub

Loading models manually

💬 Citing VisE

💬 Acknowledgments

💬 License

Owner

Menglin Jia

Categorizing comments on YouTube into different categories.

Pytorch implementation of U-Net, R2U-Net, Attention U-Net, and Attention R2U-Net.

The official PyTorch implementation of paper BBN: Bilateral-Branch Network with Cumulative Learning for Long-Tailed Visual Recognition

Implementation of paper "DeepTag: A General Framework for Fiducial Marker Design and Detection"

Code for our paper "Multi-scale Guided Attention for Medical Image Segmentation"

Meta-Learning Sparse Implicit Neural Representations (NeurIPS 2021)

Adversarial Graph Representation Adaptation for Cross-Domain Facial Expression Recognition (AGRA, ACM 2020, Oral)

tsai is an open-source deep learning package built on top of Pytorch & fastai focused on state-of-the-art techniques for time series classification, regression and forecasting.

Code and data for ACL2021 paper Cross-Lingual Abstractive Summarization with Limited Parallel Resources.

Source code release of the paper: Knowledge-Guided Deep Fractal Neural Networks for Human Pose Estimation.

Codes and models of NeurIPS2021 paper - DominoSearch: Find layer-wise fine-grained N:M sparse schemes from dense neural networks

Hypersearch weight debugging and losses tutorial

Universal Adversarial Examples in Remote Sensing: Methodology and Benchmark

Implementation of a Transformer that Ponders, using the scheme from the PonderNet paper

CONditionals for Ordinal Regression and classification in PyTorch

Rethinking Transformer-based Set Prediction for Object Detection

Code to reproduce the experiments in the paper "Transformer Based Multi-Source Domain Adaptation" (EMNLP 2020)

Jupyter notebooks showing best practices for using cx_Oracle, the Python DB API for Oracle Database

3D-printable hand-strapped keyboard

Code for the paper "How Attentive are Graph Attention Networks?"

Exploring Visual Engagement Signals for Representation Learning

Related tags

Overview

Exploring Visual Engagement Signals for Representation Learning

Menglin Jia, Zuxuan Wu, Austin Reiter, Claire Cardie, Serge Belongie and Ser-Nam Lim Cornell University, Facebook AI

💬 Loading pretrained models

Loading models with torch.hub

Loading models manually

💬 Citing VisE

💬 Acknowledgments

💬 License

Owner

Menglin Jia

Categorizing comments on YouTube into different categories.

Pytorch implementation of U-Net, R2U-Net, Attention U-Net, and Attention R2U-Net.

The official PyTorch implementation of paper BBN: Bilateral-Branch Network with Cumulative Learning for Long-Tailed Visual Recognition

Implementation of paper "DeepTag: A General Framework for Fiducial Marker Design and Detection"

Code for our paper "Multi-scale Guided Attention for Medical Image Segmentation"

Meta-Learning Sparse Implicit Neural Representations (NeurIPS 2021)

Adversarial Graph Representation Adaptation for Cross-Domain Facial Expression Recognition (AGRA, ACM 2020, Oral)

tsai is an open-source deep learning package built on top of Pytorch & fastai focused on state-of-the-art techniques for time series classification, regression and forecasting.

Code and data for ACL2021 paper Cross-Lingual Abstractive Summarization with Limited Parallel Resources.

Source code release of the paper: Knowledge-Guided Deep Fractal Neural Networks for Human Pose Estimation.

Codes and models of NeurIPS2021 paper - DominoSearch: Find layer-wise fine-grained N:M sparse schemes from dense neural networks

Hypersearch weight debugging and losses tutorial

Universal Adversarial Examples in Remote Sensing: Methodology and Benchmark

Implementation of a Transformer that Ponders, using the scheme from the PonderNet paper

CONditionals for Ordinal Regression and classification in PyTorch

Rethinking Transformer-based Set Prediction for Object Detection

Code to reproduce the experiments in the paper "Transformer Based Multi-Source Domain Adaptation" (EMNLP 2020)

Jupyter notebooks showing best practices for using cx_Oracle, the Python DB API for Oracle Database

3D-printable hand-strapped keyboard

Code for the paper "How Attentive are Graph Attention Networks?"

Menglin Jia, Zuxuan Wu, Austin Reiter, Claire Cardie, Serge Belongie and Ser-Nam Lim
Cornell University, Facebook AI