Label Studio is a multi-type data labeling and annotation tool with standardized output format

Last update: Jan 09, 2023

Overview

Website • Docs • Twitter • Join Slack Community

What is Label Studio?

Label Studio is an open source data labeling tool. It lets you label data types like audio, text, images, videos, and time series with a simple and straightforward UI and export to various model formats. It can be used to prepare raw data or improve existing training data to get more accurate ML models.

Try out Label Studio
What you get from Label Studio
Included templates for labeling data in Label Studio
Set up machine learning models with Label Studio
Integrate Label Studio with your existing tools

Have a custom dataset? You can customize Label Studio to fit your needs. Read an introductory blog post to learn more.

Try out Label Studio

Install Label Studio locally, or deploy it in a cloud instance. Also you can try Label Studio Teams.

Install locally with Docker
Run with Docker Compose (Label Studio + Nginx + PostgreSQL)
Install locally with pip
Install locally with Anaconda
Install for local development
Deploy in a cloud instance

Install locally with Docker

Official Label Studio docker image is here and it can be downloaded with docker pull. Run Label Studio in a Docker container and access it at http://localhost:8080.

docker pull heartexlabs/label-studio:latest
docker run -it -p 8080:8080 -v `pwd`/mydata:/label-studio/data heartexlabs/label-studio:latest

You can find all the generated assets, including SQLite3 database storage label_studio.sqlite3 and uploaded files, in the ./mydata directory.

Override default Docker install

You can override the default launch command by appending the new arguments:

docker run -it -p 8080:8080 -v `pwd`/mydata:/label-studio/data heartexlabs/label-studio:latest label-studio --log-level DEBUG

Build a local image with Docker

If you want to build a local image, run:

docker build -t heartexlabs/label-studio:latest .

Run with Docker Compose

Docker compose script provides production-ready stack consisting of the following components:

Label Studio
Nginx - proxy web server used to load various static data, including uploaded audio, images, etc.
PostgreSQL - production-ready database that replaces less performant SQLite3.

To start using the app from http://localhost run this command:

docker-compose up

Install locally with pip

# Requires >=Python3.6, <3.9
pip install label-studio

# Start the server at http://localhost:8080
label-studio

Install locally with Anaconda

conda create --name label-studio python=3.8
conda activate label-studio
pip install label-studio

Install for local development

You can run the latest Label Studio version locally without installing the package with pip.

# Install all package dependencies
pip install -e .
# Run database migrations
python label_studio/manage.py migrate
# Start the server in development mode at http://localhost:8080
python label_studio/manage.py runserver

Deploy in a cloud instance

You can deploy Label Studio with one click in Heroku, Microsoft Azure, or Google Cloud Platform:

Apply frontend changes

The frontend part of Label Studio app lies in the frontend/ folder and written in React JSX. In case you've made some changes there, the following commands should be run before building / starting the instance:

cd label_studio/frontend/
npm ci
npx webpack
cd ../..
python label_studio/manage.py collectstatic --no-input

Troubleshoot installation

If you see any errors during installation, try to rerun the installation

pip install --ignore-installed label-studio

Install dependencies on Windows

To run Label Studio on Windows, download and install the following wheel packages from Gohlke builds to ensure you're using the correct version of Python:

lxml

# Upgrade pip 
pip install -U pip

# If you're running Win64 with Python 3.8, install the packages downloaded from Gohlke:
pip install lxml‑4.5.0‑cp38‑cp38‑win_amd64.whl

# Install label studio
pip install label-studio

What you get from Label Studio

Multi-user labeling sign up and login, when you create an annotation it's tied to your account.
Multiple projects to work on all your datasets in one instance.
Streamlined design helps you focus on your task, not how to use the software.
Configurable label formats let you customize the visual interface to meet your specific labeling needs.
Support for multiple data types including images, audio, text, HTML, time-series, and video.
Import from files or from cloud storage in Amazon AWS S3, Google Cloud Storage, or JSON, CSV, TSV, RAR, and ZIP archives.
Integration with machine learning models so that you can visualize and compare predictions from different models and perform pre-labeling.
Embed it in your data pipeline REST API makes it easy to make it a part of your pipeline

Included templates for labeling data in Label Studio

Label Studio includes a variety of templates to help you label your data, or you can create your own using specifically designed configuration language. The most common templates and use cases for labeling include the following cases:

Set up machine learning models with Label Studio

Connect your favorite machine learning model using the Label Studio Machine Learning SDK. Follow these steps:

Start your own machine learning backend server. See more detailed instructions.
Connect Label Studio to the server on the model page found in project settings.

This lets you:

Pre-label your data using model predictions.
Do online learning and retrain your model while new annotations are being created.
Do active learning by labeling only the most complex examples in your data.

Integrate Label Studio with your existing tools

You can use Label Studio as an independent part of your machine learning workflow or integrate the frontend or backend into your existing tools.

Use the Label Studio Frontend as a separate React library. See more in the Frontend Library documentation.

Ecosystem

Project	Description
label-studio	Server, distributed as a pip package
label-studio-frontend	React and JavaScript frontend and can run standalone in a web browser or be embedded into your application.
data-manager	React and JavaScript frontend for managing data. Includes the Label Studio Frontend. Relies on the label-studio server or a custom backend with the expected API methods.
label-studio-converter	Encode labels in the format of your favorite machine learning library
label-studio-transformers	Transformers library connected and configured for use with Label Studio

Roadmap

Want to use The Coolest Feature X but Label Studio doesn't support it? Check out our public roadmap!

Citation

@misc{Label Studio,
  title={{Label Studio}: Data labeling software},
  url={https://github.com/heartexlabs/label-studio},
  note={Open source software available from https://github.com/heartexlabs/label-studio},
  author={
    Maxim Tkachenko and
    Mikhail Malyuk and
    Nikita Shevchenko and
    Andrey Holmanyuk and
    Nikolai Liubimov},
  year={2020-2021},
}

Label Studio is a multi-type data labeling and annotation tool with standardized output format

Related tags

Overview

What is Label Studio?

Try out Label Studio

Install locally with Docker

Override default Docker install

Build a local image with Docker

Run with Docker Compose

Install locally with pip

Install locally with Anaconda

Install for local development

Deploy in a cloud instance

Apply frontend changes

Troubleshoot installation

Install dependencies on Windows

What you get from Label Studio

Included templates for labeling data in Label Studio

Set up machine learning models with Label Studio

Integrate Label Studio with your existing tools

Ecosystem

Roadmap

Citation

License

Owner

Heartex

[ICCV 2021] Focal Frequency Loss for Image Reconstruction and Synthesis

Development of IP code based on VIPs and AADM

PushForKiCad - AISLER Push for KiCad EDA

2020 CCF大数据与计算智能大赛-非结构化商业文本信息中隐私信息识别-第7名方案

UMich 500-Level Mobile Robotics Course

CLIP2Video: Mastering Video-Text Retrieval via Image CLIP

Code for the prototype tool in our paper "CoProtector: Protect Open-Source Code against Unauthorized Training Usage with Data Poisoning".

The official project of SimSwap (ACM MM 2020)

Self-supervised spatio-spectro-temporal represenation learning for EEG analysis

Official Code for "Constrained Mean Shift Using Distant Yet Related Neighbors for Representation Learning"

This is an open source library implementing hyperbox-based machine learning algorithms

This repository contains the segmentation user interface from the OpenSurfaces project, extracted as a lightweight tool

Julia package for multiway (inverse) covariance estimation.

This is the pytorch code for the paper Curious Representation Learning for Embodied Intelligence.

Transformer based SAR image despeckling

A collection of differentiable SVD methods and also the official implementation of the ICCV21 paper "Why Approximate Matrix Square Root Outperforms Accurate SVD in Global Covariance Pooling?"

CHERRY is a python library for predicting the interactions between viral and prokaryotic genomes

Codeflare - Scale complex AI/ML pipelines anywhere

Code repository for EMNLP 2021 paper 'Adversarial Attacks on Knowledge Graph Embeddings via Instance Attribution Methods'

A way to store images in YAML.