Pytorch Implementation of Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic

Last update: Jan 03, 2023

Related tags

Deep Learning zero-shot-image-to-text

Overview

Pytorch Implementation of Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic

[Paper] [Colab is coming soon]

Approach

Example

Usage

To run captioning on a single image:

$ python run.py 
--reset_context_delta
--caption_img_path "example_images/captions/COCO_val2014_000000097017.jpg"

To run model on visual arithmetic:

$ python run.py 
--reset_context_delta
--end_factor 1.06
--fusion_factor 0.95
--grad_norm_factor 0.95
--run_type arithmetics
--arithmetics_imgs "example_images/arithmetics/woman2.jpg" "example_images/arithmetics/king2.jpg" "example_images/arithmetics/man2.jpg"
--arithmetics_weights 1 1 -1

To run model on real world knowledge:

$ python run.py
--reset_context_delta --cond_text "Image of" 
--end_factor 1.04 
--caption_img_path "example_images/real_world/simpsons.jpg"

To run model on OCR:

$ python run.py
--reset_context_delta --cond_text "Image of text that says" 
--end_factor 1.04 
--caption_img_path "example_images/OCR/welcome_sign.jpg"

Owner

GitHub Repository

GazeScroller - Using Facial Movements to perform Hands-free Gesture on the system

GazeScroller Using Facial Movements to perform Hands-free Gesture on the system

2 Jan 05, 2022

Context Decoupling Augmentation for Weakly Supervised Semantic Segmentation

Context Decoupling Augmentation for Weakly Supervised Semantic Segmentation The code of: Context Decoupling Augmentation for Weakly Supervised Semanti

54 Dec 12, 2022

Rational Activation Functions - Replacing Padé Activation Units

Rational Activations - Learnable Rational Activation Functions First introduce as PAU in Padé Activation Units: End-to-end Learning of Activation Func

[email protected]"> 38 Nov 22, 2022

Pytorch implementations of popular off-policy multi-agent reinforcement learning algorithms, including QMix, VDN, MADDPG, and MATD3.

Off-Policy Multi-Agent Reinforcement Learning (MARL) Algorithms This repository contains implementations of various off-policy multi-agent reinforceme

183 Dec 28, 2022

Where-Got-Time - An NUS timetable generator which uses a genetic algorithm to optimise timetables to suit the needs of NUS students

Where Got Time(table)? A timetable optimsier which uses an evolutionary algorith

3 Jan 09, 2022

CellRank's reproducibility repository.

CellRank's reproducibility repository We believe that reproducibility is key and have made it as simple as possible to reproduce our results. Please e

8 Oct 08, 2022

Event sourced bank - A wide-and-shallow example using the Python event sourcing library

Event Sourced Bank A "wide but shallow" example of using the Python event sourci

3 Mar 09, 2022

A basic neural network for image segmentation.

Unet_erythema_detection A basic neural network for image segmentation. 前期准备 1.在logs文件夹中下载h5权重文件，百度网盘链接在logs文件夹中 2.将所有原图放置在“/dataset_1/JPEGImages/”文件夹

1 Jan 16, 2022

Liquid Warping GAN with Attention: A Unified Framework for Human Image Synthesis

Liquid Warping GAN with Attention: A Unified Framework for Human Image Synthesis, including human motion imitation, appearance transfer, and novel view synthesis. Currently the paper is under review

2.3k Jan 05, 2023

Code for paper Novel View Synthesis via Depth-guided Skip Connections

Novel View Synthesis via Depth-guided Skip Connections Code for paper Novel View Synthesis via Depth-guided Skip Connections @InProceedings{Hou_2021_W

8 Mar 14, 2022

Running Google MoveNet Multipose Tracking models on OpenVINO.

MoveNet MultiPose Tracking on OpenVINO

60 Nov 17, 2022

Lightweight Salient Object Detection in Optical Remote Sensing Images via Feature Correlation

CorrNet This project provides the code and results for 'Lightweight Salient Object Detection in Optical Remote Sensing Images via Feature Correlation'

13 Nov 03, 2022

Code to accompany the paper "Finding Bipartite Components in Hypergraphs", which is published in NeurIPS'21.

Finding Bipartite Components in Hypergraphs This repository contains code to accompany the paper "Finding Bipartite Components in Hypergraphs", publis

5 May 06, 2022

In this project we investigate the performance of the SetCon model on realistic video footage. Therefore, we implemented the model in PyTorch and tested the model on two example videos.

Contrastive Learning of Object Representations Supervisor: Prof. Dr. Gemma Roig Institutions: Goethe University CVAI - Computational Vision & Artifici

6 Dec 08, 2022

Using Streamlit to host a multi-page tool with model specs and classification metrics, while also accepting user input values for prediction.

Predicitng_viability Using Streamlit to host a multi-page tool with model specs and classification metrics, while also accepting user input values for

1 Nov 08, 2021

This is the official code for the paper "Learning with Nested Scene Modeling and Cooperative Architecture Search for Low-Light Vision"

RUAS This is the official code for the paper "Learning with Nested Scene Modeling and Cooperative Architecture Search for Low-Light Vision" A prelimin

2 May 05, 2022

Pytorch Implementation of Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic

Related tags

Overview

Pytorch Implementation of Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic

Approach

Example

Usage

To run captioning on a single image:

To run model on visual arithmetic:

To run model on real world knowledge:

To run model on OCR:

Owner

GazeScroller - Using Facial Movements to perform Hands-free Gesture on the system

Context Decoupling Augmentation for Weakly Supervised Semantic Segmentation

Rational Activation Functions - Replacing Padé Activation Units

Pytorch implementations of popular off-policy multi-agent reinforcement learning algorithms, including QMix, VDN, MADDPG, and MATD3.

Where-Got-Time - An NUS timetable generator which uses a genetic algorithm to optimise timetables to suit the needs of NUS students

CellRank's reproducibility repository.

Event sourced bank - A wide-and-shallow example using the Python event sourcing library

A basic neural network for image segmentation.

Liquid Warping GAN with Attention: A Unified Framework for Human Image Synthesis

Code for paper Novel View Synthesis via Depth-guided Skip Connections

Running Google MoveNet Multipose Tracking models on OpenVINO.

Lightweight Salient Object Detection in Optical Remote Sensing Images via Feature Correlation

Code to accompany the paper "Finding Bipartite Components in Hypergraphs", which is published in NeurIPS'21.

In this project we investigate the performance of the SetCon model on realistic video footage. Therefore, we implemented the model in PyTorch and tested the model on two example videos.

Using Streamlit to host a multi-page tool with model specs and classification metrics, while also accepting user input values for prediction.

This is an official implementation for "DeciWatch: A Simple Baseline for 10x Efficient 2D and 3D Pose Estimation"

Deep Surface Reconstruction from Point Clouds with Visibility Information

Tom-the-AI - A compound artificial intelligence software for Linux systems.

Self-Attention Between Datapoints: Going Beyond Individual Input-Output Pairs in Deep Learning

This is the official code for the paper "Learning with Nested Scene Modeling and Cooperative Architecture Search for Low-Light Vision"