LIVECell - A large-scale dataset for label-free live cell segmentation

Related tags

Deep LearningLIVECell
Overview

LIVECell dataset

This document contains instructions of how to access the data associated with the submitted manuscript "LIVECell - A large-scale dataset for label-free live cell segmentation" by Edlund et. al. 2021.

Background

Light microscopy is a cheap, accessible, non-invasive modality that when combined with well-established protocols of two-dimensional cell culture facilitates high-throughput quantitative imaging to study biological phenomena. Accurate segmentation of individual cells enables exploration of complex biological questions, but this requires sophisticated imaging processing pipelines due to the low contrast and high object density. Deep learning-based methods are considered state-of-the-art for most computer vision problems but require vast amounts of annotated data, for which there is no suitable resource available in the field of label-free cellular imaging. To address this gap we present LIVECell, a high-quality, manually annotated and expert-validated dataset that is the largest of its kind to date, consisting of over 1.6 million cells from a diverse set of cell morphologies and culture densities. To further demonstrate its utility, we provide convolutional neural network-based models trained and evaluated on LIVECell.

How to access LIVECell

All images in LIVECell are available following this link (requires 1.3 GB). Annotations for the different experiments are linked below. To see a more details regarding benchmarks and how to use our models, see this link.

LIVECell-wide train and evaluate

Annotation set URL
Training set link
Validation set link
Test set link

Single cell-type experiments

Cell Type Training set Validation set Test set
A172 link link link
BT474 link link link
BV-2 link link link
Huh7 link link link
MCF7 link link link
SH-SHY5Y link link link
SkBr3 link link link
SK-OV-3 link link link

Dataset size experiments

Split URL
2 % link
4 % link
5 % link
25 % link
50 % link

Comparison to fluorescence-based object counts

The images and corresponding json-file with object count per image is available together with the raw fluorescent images the counts is based on.

Cell Type Images Counts Fluorescent images
A549 link link link
A172 link link link

Download all of LIVECell

The LIVECell-dataset and trained models is stored in an Amazon Web Services (AWS) S3-bucket. It is easiest to download the dataset if you have an AWS IAM-user using the AWS-CLI in the folder you would like to download the dataset to by simply:

aws s3 sync s3://livecell-dataset .

If you do not have an AWS IAM-user, the procedure is a little bit more involved. We can use curl to make an HTTP-request to get the S3 XML-response and save to files.xml:

files.xml ">
curl -H "GET /?list-type=2 HTTP/1.1" \
     -H "Host: livecell-dataset.s3.eu-central-1.amazonaws.com" \
     -H "Date: 20161025T124500Z" \
     -H "Content-Type: text/plain" http://livecell-dataset.s3.eu-central-1.amazonaws.com/ > files.xml

We then get the urls from files using grep:

)[^<]+" files.xml | sed -e 's/^/http:\/\/livecell-dataset.s3.eu-central-1.amazonaws.com\//' > urls.txt ">
grep -oPm1 "(?<=
   
    )[^<]+" files.xml | sed -e 's/^/http:\/\/livecell-dataset.s3.eu-central-1.amazonaws.com\//' > urls.txt

   

Then download the files you like using wget.

File structure

The top-level structure of the files is arranged like:

/livecell-dataset/
    ├── LIVECell_dataset_2021  
    |       ├── annotations/
    |       ├── models/
    |       ├── nuclear_count_benchmark/	
    |       └── images.zip  
    ├── README.md  
    └── LICENSE

LIVECell_dataset_2021/images

The images of the LIVECell-dataset are stored in /livecell-dataset/LIVECell_dataset_2021/images.zip along with their annotations in /livecell-dataset/LIVECell_dataset_2021/annotations/.

Within images.zip are the training/validation-set and test-set images are completely separate to facilitate fair comparison between studies. The images require 1.3 GB disk space unzipped and are arranged like:

images/
    ├── livecell_test_images
    |       └── 
   
    
    |               └── 
    
     _Phase_
     
      _
      
       _
       
        _
        
         .tif └── livecell_train_val_images └── 
          
         
        
       
      
     
    
   

Where is each of the eight cell-types in LIVECell (A172, BT474, BV2, Huh7, MCF7, SHSY5Y, SkBr3, SKOV3). Wells are the location in the 96-well plate used to culture cells, indicates location in the well where the image was acquired, the time passed since the beginning of the experiment to image acquisition and index of the crop of the original larger image. An example image name is A172_Phase_C7_1_02d16h00m_2.tif, which is an image of A172-cells, grown in well C7 where the image is acquired in position 1 two days and 16 hours after experiment start (crop position 2).

LIVECell_dataset_2021/annotations/

The annotations of LIVECell are prepared for all tasks along with the training/validation/test splits used for all experiments in the paper. The annotations require 2.1 GB of disk space and are arranged like:

annotations/
    ├── LIVECell
    |       └── livecell_coco_
   
    .json
    ├── LIVECell_single_cells
    |       └── 
    
     
    |               └── 
     
      .json
    └── LIVECell_dataset_size_split
            └── 
      
       _train
       
        percent.json 
       
      
     
    
   
  • annotations/LIVECell contains the annotations used for the LIVECell-wide train and evaluate task.
  • annotations/LIVECell_single_cells contains the annotations used for Single cell type train and evaluate as well as the Single cell type transferability tasks.
  • annotations/LIVECell_dataset_size_split contains the annotations used to investigate the impact of training set scale.

All annotations are in Microsoft COCO Object Detection-format, and can for instance be parsed by the Python package pycocotools.

models/

ALL models trained and evaluated for tasks associated with LIVECell are made available for wider use. The models are trained using detectron2, Facebook's framework for object detection and instance segmentation. The models require 15 GB of disk space and are arranged like:

models/
   └── Anchor_
   
    
            ├── ALL/
            |    └──
    
     .pth
            └── 
     
      /
                 └──
      
       .pths
       

      
     
    
   

Where each .pth is a binary file containing the model weights.

configs/

The config files for each model can be found in the LIVECell github repo

LIVECell
    └── Anchor_
   
    
            ├── livecell_config.yaml
            ├── a172_config.yaml
            ├── bt474_config.yaml
            ├── bv2_config.yaml
            ├── huh7_config.yaml
            ├── mcf7_config.yaml
            ├── shsy5y_config.yaml
            ├── skbr3_config.yaml
            └── skov3_config.yaml

   

Where each config file can be used to reproduce the training done or in combination with our model weights for usage, for more info see the usage section.

nuclear_count_benchmark/

The images and fluorescence-based object counts are stored as the label-free images in a zip-archive and the corresponding counts in a json as below:

nuclear_count_benchmark/
    ├── A172.zip
    ├── A172_counts.json
    ├── A172_fluorescent_images.zip
    ├── A549.zip
    ├── A549_counts.json 
    └── A549_fluorescent_images.zip

The json files are on the following format:

": " " } ">
{
    "
     
      ": "
      
       "
}

      
     

Where points to one of the images in the zip-archive, and refers to the object count according fluorescent nuclear labels.

LICENSE

All images, annotations and models associated with LIVECell are published under Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) license.

All software source code associated associated with LIVECell are published under the MIT License.

Owner
Sartorius Corporate Research
Sartorius Corporate Research
Built a deep neural network (DNN) that functions as an end-to-end machine translation pipeline

Built a deep neural network (DNN) that functions as an end-to-end machine translation pipeline. The pipeline accepts english text as input and returns the French translation.

Afropunk Technologist 1 Jan 24, 2022
A highly efficient, fast, powerful and light-weight anime downloader and streamer for your favorite anime.

AnimDL - Download & Stream Your Favorite Anime AnimDL is an incredibly powerful tool for downloading and streaming anime. Core features Abuses the dev

KR 759 Jan 08, 2023
Unsupervised clustering of high content screen samples

Microscopium Unsupervised clustering and dataset exploration for high content screens. See microscopium in action Public dataset BBBC021 from the Broa

60 Dec 05, 2022
3rd Place Solution of the Traffic4Cast Core Challenge @ NeurIPS 2021

3rd Place Solution of Traffic4Cast 2021 Core Challenge This is the code for our solution to the NeurIPS 2021 Traffic4Cast Core Challenge. Paper Our so

7 Jul 25, 2022
Official PyTorch implementation of Spatial Dependency Networks.

Spatial Dependency Networks: Neural Layers for Improved Generative Image Modeling Đorđe Miladinović   Aleksandar Stanić   Stefan Bauer   Jürgen Schmid

Djordje Miladinovic 34 Jan 19, 2022
League of Legends Reinforcement Learning Environment (LoLRLE) multiple training scenarios using PPO.

League of Legends Reinforcement Learning Environment (LoLRLE) About This repo contains code to train an agent to play league of legends in a distribut

2 Aug 19, 2022
Implementation of Enformer, Deepmind's attention network for predicting gene expression, in Pytorch

Enformer - Pytorch (wip) Implementation of Enformer, Deepmind's attention network for predicting gene expression, in Pytorch. The original tensorflow

Phil Wang 235 Dec 27, 2022
QT Py Media Knob using rotary encoder & neopixel ring

QTPy-Knob QT Py USB Media Knob using rotary encoder & neopixel ring The QTPy-Knob features: Media knob for volume up/down/mute with "qtpy-knob.py" Cir

Tod E. Kurt 56 Dec 30, 2022
An official implementation of "Background-Aware Pooling and Noise-Aware Loss for Weakly-Supervised Semantic Segmentation" (CVPR 2021) in PyTorch.

BANA This is the implementation of the paper "Background-Aware Pooling and Noise-Aware Loss for Weakly-Supervised Semantic Segmentation". For more inf

CV Lab @ Yonsei University 59 Dec 12, 2022
Personals scripts using ageitgey/face_recognition

HOW TO USE pip3 install requirements.txt Add some pictures of known people in the folder 'people' : a) Create a folder called by the name of the perso

Antoine Bollengier 1 Jan 06, 2022
Jax/Flax implementation of Variational-DiffWave.

jax-variational-diffwave Jax/Flax implementation of Variational-DiffWave. (Zhifeng Kong et al., 2020, Diederik P. Kingma et al., 2021.) DiffWave with

YoungJoong Kim 37 Dec 16, 2022
基于Paddle框架的arcface复现

arcface-Paddle 基于Paddle框架的arcface复现 ArcFace-Paddle 本项目基于paddlepaddle框架复现ArcFace,并参加百度第三届论文复现赛,将在2021年5月15日比赛完后提供AIStudio链接~敬请期待 参考项目: InsightFace Padd

QuanHao Guo 16 Dec 15, 2022
Securetar - A streaming wrapper around python tarfile and allow secure handling files and support encryption

Secure Tar Secure Tarfile library It's a streaming wrapper around python tarfile

Pascal Vizeli 2 Dec 09, 2022
My solution for the 7th place / 245 in the Umoja Hack 2022 challenge

Umoja Hack 2022 : Insurance Claim Challenge My solution for the 7th place / 245 in the Umoja Hack 2022 challenge Umoja Hack Africa is a yearly hackath

Souames Annis 17 Jun 03, 2022
A Protein-RNA Interface Predictor Based on Semantics of Sequences

PRIP PRIP:A Protein-RNA Interface Predictor Based on Semantics of Sequences installation gensim==3.8.3 matplotlib==3.1.3 xgboost==1.3.3 prettytable==2

李优 0 Mar 25, 2022
Ratatoskr: Worcester Tech's conference scheduling system

Ratatoskr: Worcester Tech's conference scheduling system In Norse mythology, Ratatoskr is a squirrel who runs up and down the world tree Yggdrasil to

4 Dec 22, 2022
The code for the CVPR 2021 paper Neural Deformation Graphs, a novel approach for globally-consistent deformation tracking and 3D reconstruction of non-rigid objects.

Neural Deformation Graphs Project Page | Paper | Video Neural Deformation Graphs for Globally-consistent Non-rigid Reconstruction Aljaž Božič, Pablo P

Aljaz Bozic 134 Dec 16, 2022
Voice control for Garry's Mod

WIP: Talonvoice GMod integrations Very work in progress voice control demo for Garry's Mod. HOWTO Install https://talonvoice.com/ Press https://i.imgu

Meta Construct 5 Nov 15, 2022
Model Agnostic Interpretability for Multiple Instance Learning

MIL Model Agnostic Interpretability This repo contains the code for "Model Agnostic Interpretability for Multiple Instance Learning". Overview Executa

Joe Early 10 Dec 17, 2022
Code for "Learning the Best Pooling Strategy for Visual Semantic Embedding", CVPR 2021

Learning the Best Pooling Strategy for Visual Semantic Embedding Official PyTorch implementation of the paper Learning the Best Pooling Strategy for V

Jiacheng Chen 106 Jan 06, 2023