CLIP (Contrastive Language–Image Pre-training) trained on Indonesian data

Last update: Mar 10, 2022

Related tags

Overview

CLIP-Indonesian

CLIP (Radford et al., 2021) is a multimodal model that can connect images and text by training a vision encoder and a text encoder jointly to project the representation of images and the corresponding text into the same embedding space. The expected outcome is the text embeddings and image embeddings are located near each other.

This repository hosts the code for CLIP-Indonesian, which is a CLIP multimodal model trained on Indonesian data.

For the image encoder, we use VIT, more specifically openai/clip-vit-base-patch32. Meanwhile, for the text encoder, we experimented with two models: IndoBERT Large (indobenchmark/indobert-base-p2) and Indonesian RoBERTa Base (flax-community/indonesian-roberta-base).

Most of the CLIP script is based on HybridCLIP and clip-italian.

Still a work in progress so may not give the best result (yet) :)

clip-indonesian was presented in PyCon ID 2021. You can view the slide deck here.

Dataset

More details about the dataset used can be found here.

Results

The results of the training can be accessed here.

Demo

References

Bianchi, F., Attanasio, G., Pisoni, R., Terragni, S., Sarti, G., Lakshmi, S. (2021). Contrastive Language-Image Pre-training for the Italian Language arXiv preprint arXiv:2108.08688.

Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., Krueger, G., & Sutskever, I. (2021). Learning Transferable Visual Models From Natural Language Supervision. ICML.

Wilie, B., Vincentio, K., Winata, G. I., Cahyawijaya, S., Li, X., Lim, Z. Y., ... & Purwarianti, A. (2020). IndoNLU: Benchmark and resources for evaluating Indonesian natural language understanding. arXiv preprint arXiv:2009.05387.

Hybrid CLIP by the HuggingFace team

Indonesian Roberta Base by Wilson Wongso, Steven Limcorn, Samsul Rahmadani, and Chew Kok Wah

Indonesian Translated Datasets by Samsul Rahmadani

Acknowledgment

All training was done on a TPUv3-8 VM sponsored by TPU Research Cloud.

CLIP (Contrastive Language–Image Pre-training) trained on Indonesian data

Related tags

Overview

CLIP-Indonesian

Dataset

Results

Demo

Links

References

Acknowledgment

Owner

Galuh

Python Interview Questions

MediaPipe Kullanarak İleri Seviye Bilgisayarla Görü

Facial recognition project

Rainbow is all you need! A step-by-step tutorial from DQN to Rainbow

CoINN: Correlated-informed neural networks: a new machine learning framework to predict pressure drop in micro-channels

Unofficial implementation of PatchCore anomaly detection

PyTorch Personal Trainer: My framework for deep learning experiments

Pocsploit is a lightweight, flexible and novel open source poc verification framework

Python package for covariance matrices manipulation and Biosignal classification with application in Brain Computer interface

Projects for AI/ML and IoT integration for games and other presented at re:Invent 2021.

Affine / perspective transformation in Pose Estimation with Tensorflow 2

Graph Transformer Architecture. Source code for

TensorLight - A high-level framework for TensorFlow

Unsupervised Foreground Extraction via Deep Region Competition

QICK: Quantum Instrumentation Control Kit

GrailQA: Strongly Generalizable Question Answering

LaBERT - A length-controllable and non-autoregressive image captioning model.

Code release for Universal Domain Adaptation(CVPR 2019)

This repository is based on Ultralytics/yolov5, with adjustments to enable polygon prediction boxes.

Source code for Task-Aware Variational Adversarial Active Learning