CLIP (Contrastive Language–Image Pre-training) trained on Indonesian data

Last update: Mar 10, 2022

Related tags

Overview

CLIP-Indonesian

CLIP (Radford et al., 2021) is a multimodal model that can connect images and text by training a vision encoder and a text encoder jointly to project the representation of images and the corresponding text into the same embedding space. The expected outcome is the text embeddings and image embeddings are located near each other.

This repository hosts the code for CLIP-Indonesian, which is a CLIP multimodal model trained on Indonesian data.

For the image encoder, we use VIT, more specifically openai/clip-vit-base-patch32. Meanwhile, for the text encoder, we experimented with two models: IndoBERT Large (indobenchmark/indobert-base-p2) and Indonesian RoBERTa Base (flax-community/indonesian-roberta-base).

Most of the CLIP script is based on HybridCLIP and clip-italian.

Still a work in progress so may not give the best result (yet) :)

clip-indonesian was presented in PyCon ID 2021. You can view the slide deck here.

Dataset

More details about the dataset used can be found here.

Results

The results of the training can be accessed here.

Demo

References

Bianchi, F., Attanasio, G., Pisoni, R., Terragni, S., Sarti, G., Lakshmi, S. (2021). Contrastive Language-Image Pre-training for the Italian Language arXiv preprint arXiv:2108.08688.

Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., Krueger, G., & Sutskever, I. (2021). Learning Transferable Visual Models From Natural Language Supervision. ICML.

Wilie, B., Vincentio, K., Winata, G. I., Cahyawijaya, S., Li, X., Lim, Z. Y., ... & Purwarianti, A. (2020). IndoNLU: Benchmark and resources for evaluating Indonesian natural language understanding. arXiv preprint arXiv:2009.05387.

Hybrid CLIP by the HuggingFace team

Indonesian Roberta Base by Wilson Wongso, Steven Limcorn, Samsul Rahmadani, and Chew Kok Wah

Indonesian Translated Datasets by Samsul Rahmadani

Acknowledgment

All training was done on a TPUv3-8 VM sponsored by TPU Research Cloud.

CLIP (Contrastive Language–Image Pre-training) trained on Indonesian data

Related tags

Overview

CLIP-Indonesian

Dataset

Results

Demo

Links

References

Acknowledgment

Owner

Galuh

Traductor de lengua de señas al español basado en Python con Opencv y MedaiPipe

Unofficial implementation of Google "CutPaste: Self-Supervised Learning for Anomaly Detection and Localization" in PyTorch

Automated detection of anomalous exoplanet transits in light curve data.

Tools for robust generative diffeomorphic slice to volume reconstruction

[ICCV '21] In this repository you find the code to our paper Keypoint Communities

NaturalCC is a sequence modeling toolkit that allows researchers and developers to train custom models

Python3 / PyTorch implementation of the following paper: Fine-grained Semantics-aware Representation Enhancement for Self-supervisedMonocular Depth Estimation. ICCV 2021 (oral)

Official PyTorch implementation of Joint Object Detection and Multi-Object Tracking with Graph Neural Networks

TC-GNN with Pytorch integration

New AidForBlind - Various Libraries used like OpenCV and other mentioned in Requirements.txt

Implementation of Graph Transformer in Pytorch, for potential use in replicating Alphafold2

Lux AI environment interface for RLlib multi-agents

Automatic Calibration for Non-repetitive Scanning Solid-State LiDAR and Camera Systems

Anomaly detection in multi-agent trajectories: Code for training, evaluation and the OpenAI highway simulation.

PartImageNet is a large, high-quality dataset with part segmentation annotations

Code for ICCV 2021 paper "Distilling Holistic Knowledge with Graph Neural Networks"

Dataset for the Research2Clinics @ NeurIPS 2021 Paper: What Do You See in this Patient? Behavioral Testing of Clinical NLP Models

Task Transformer Network for Joint MRI Reconstruction and Super-Resolution (MICCAI 2021)

[NeurIPS 2021] The PyTorch implementation of paper "Self-Supervised Learning Disentangled Group Representation as Feature"

Brain tumor detection using Convolution-Neural Network (CNN)