Using fully convolutional networks for semantic segmentation with caffe for the cityscapes dataset

Last update: Jun 06, 2022

Overview

Using fully convolutional networks for semantic segmentation (Shelhamer et al.) with caffe for the cityscapes dataset

How to get started

Download the cityscapes dataset and the vgg-16-layer net
Modify the images in the dataset with cut_images.py or downscale_images.py for less resource demanding training and evaluation
Create the 32 pixel stride net with net_32.py
Modify the paths in train.txt and val.txt (first line: path to training/validation images, second line: path to annotations)
Start training with solve_start.py
Run evaluate_models.py to evaluate your model or create_eval_images.py to create images with pixel label ids

Sources

Fully Convolutional Models for Semantic Segmentation:

Shelhamer, Evan, Jonathon Long, and Trevor Darrell. "Fully Convolutional Networks for Semantic Segmentation." PAMI, 2016, URL http://fcn.berkeleyvision.org

Cityscapes Dataset (Semantic Understanding of Urban Street Scenes):

Cordts, Marius, et al. "The cityscapes dataset." CVPR Workshop on The Future of Datasets in Vision. 2015, URL https://www.cityscapes-dataset.com

Caffe Deep Learning Framework:

Jia, Yangqing, et al. "Caffe: Convolutional architecture for fast feature embedding." Proceedings of the 22nd ACM international conference on Multimedia. ACM, 2014, URL http://caffe.berkeleyvision.org

Using fully convolutional networks for semantic segmentation with caffe for the cityscapes dataset

Related tags

Overview

How to get started

Sources

Fully Convolutional Models for Semantic Segmentation:

Cityscapes Dataset (Semantic Understanding of Urban Street Scenes):

Caffe Deep Learning Framework:

Owner

Simon Guist

E-RAFT: Dense Optical Flow from Event Cameras

Distance Encoding for GNN Design

RP-GAN: Stable GAN Training with Random Projections

Towers of Babel: Combining Images, Language, and 3D Geometry for Learning Multimodal Vision. ICCV 2021.

Simple image captioning model - CLIP prefix captioning.

Ludwig is a toolbox that allows to train and evaluate deep learning models without the need to write code.

Speech Recognition using DeepSpeech2.

"Exploring Vision Transformers for Fine-grained Classification" at CVPRW FGVC8

Human head pose estimation using Keras over TensorFlow.

OMNIVORE is a single vision model for many different visual modalities

Old Photo Restoration (Official PyTorch Implementation)

The open source code of SA-UNet: Spatial Attention U-Net for Retinal Vessel Segmentation.

Fermi Problems: A New Reasoning Challenge for AI

🏎️ Accelerate training and inference of 🤗 Transformers with easy to use hardware optimization tools

PyJokes - Joking around with Python library pyjokes

Invariant Causal Prediction for Block MDPs

MMFlow is an open source optical flow toolbox based on PyTorch

FAMIE is a comprehensive and efficient active learning (AL) toolkit for multilingual information extraction (IE)

Channel Pruning for Accelerating Very Deep Neural Networks (ICCV'17)

Build fully-functioning computer vision models with PyTorch