The best solution of the Weather Prediction track in the Yandex Shifts challenge

Overview

License Apache 2.0 Python 3.7

yandex-shifts-weather

The repository contains information about my solution for the Weather Prediction track in the Yandex Shifts challenge https://research.yandex.com/shifts/weather

This information includes two my Jupyter-notebooks (deep_ensemble_with_uncertainty_and_spec_fe_eval.ipynb for evaluating only, and deep_ensemble_with_uncertainty_and_spec_fe.ipynb for all stages of my experiment), several auxiliary Python modules (from https://github.com/yandex-research/shifts/tree/main/weather), and my pre-trained models (see the models/yandex-shifts/weather subdirectory).

The proposed solution is the best on this track (see SNN Ens U MT Np SpecFE method in the corresponding leaderboard).

You can read a detailed description of all the algorithmic techniques implemented in my solution and their motivation in this paper: More layers! End-to-end regression and uncertainty on tabular data with deep learning.

If you use my solution in your work, please cite my paper using the following Bibtex:

@article{morelayers2021,
  author    = {Bondarenko, Ivan},
  title     = {More layers! End-to-end regression and uncertainty on tabular data with deep learning},
  journal   = {arXiv preprint arXiv:2112.03566},
  year      = {2021},
}

Reproducibility

For reproducibility you need to Python 3.7 or later (I checked all with Python 3.7). You have to clone this repository and install all dependencies (I recommend do it in a special Python environment):

git clone https://github.com/bond005/yandex-shifts-weather.git
pip install -r requirements.txt

My solution is based on deep learning and uses Tensorflow. So, you need a CUDA-compatible GPU for more efficient reproducibility of my code (I used two variants: nVidia V100 and nVidia GeForce 2080Ti).

After installing the dependencies, you need to do the following steps:

  1. Download all data of the Weather Prediction track in the Yandex Shifts challenge and unarchive it into the data/yandex-shifts/weather subdirectory. The training and development data can be downloaded here, and the data for final evaluation stage is available below this link. As a result, there will be four CSV-files in the abovementioned subdirectory:
  • train.csv
  • dev_in.csv
  • dev_out.csv
  • eval.csv
  1. Download the baseline models developed by the challenge organizers at the link, and unarchive their into the models/yandex-shifts/weather-baseline-catboost subdirectory. This step is optional, and it is only needed to compare my solution with the baseline on the development set (my pre-trained models are part of this repository, and they are available in the models/yandex-shifts/weather subdirectory).

  2. Run Jupyter notebook deep_ensemble_with_uncertainty_and_spec_fe_eval.ipynb for inference process reproducing. As a result, you will see the quality evaluation of my pre-trained models on the development set in comparison with the baseline. You will also get the predictions on the development and evaluation sets in the corresponding CSV files models/yandex-shifts/weather/df_submission_dev.csv and models/yandex-shifts/weather/df_submission.csv.

  3. If you want to reproduce the united process of training and inference (and not just inference, as in step 3), then you have to run another Jupyter notebook deep_ensemble_with_uncertainty_and_spec_fe.ipynb. All pre-trained models in the models/yandex-shifts/weather subdirectory will be rewritten. After that predictions will be generated (similar to step 3, but using new models). This step is optional.

Key ideas of the proposed method

A special deep ensemble (i.e. ensemble of deep neural networks) with uncertainty, hierarchicalmultitask learning and some other features is proposed. It is built on the basis of the following key techniques:

  1. A simple preprocessing is applied to the input data:
  • imputing: the missing values are replaced in all input columns following a simple constant strategy (fill value is −1);
  • quantization: each input column is discretized into quantile bins, and the number of these bins is detected automatically; after that the bin identifier can be considered as a quantized numerical value of the original feature;
  • standardization: each quantized column is standardized by removing the mean and scaling to unit variance;
  • decorrelation: all possible linear correlations are removed from the feature vectors discretized by above-mentioned way; the decorrelation is implemented using principal component analysis (PCA).
  1. An even simpler preprocessing is applied to targets: it is based only on removing the mean and scaling to unit variance.

  2. A deep neural network is built for regression with uncertainty:

  • self-normalization: this is a self-normalized neural network, also known as SNN [Klambaueret al. 2017];
  • inductive bias: neural network weights are tuned using a hierarchical multitask learning [Caruana 1997; Søgaardet al. 2016] with the temperature prediction as a high-level regression task and the coarsened temperature class recognition as a low-level classification task;
  • uncertainty: a special loss function similar to RMSEWithUncertainty is applied to training;
  • robustness: a supervised contrastive learning based on N-pairs loss [Sohn2016] is applied instead of the crossentropy-based classification as a low-level task in the hierarchical multitask learning; it provides more robustness of the trainable neural network [Khosla et al. 2020]
  1. A special technique of deep ensemble creation is implemented: it uses a model average approach like bagging, but new training and validation sub-sets for corresponding ensemble items are generated using stratification based on coarsened temperature classes.

The deep ensemble size is 20. Hyper-parameters of each neural network in the ensemble (hidden layer size, number of hidden layers and alpha-dropout as special version of dropout in SNN are the same. They are selected using a hybrid approach: first automatically, and then manually. The automatic approach is based on a Bayesian optimization with Gaussian Process regression, and it discovered the following hyper-parameter values for training the neural network in a single-task mode:

  • hidden layer size is 512;
  • number of hidden layers is 12;
  • alpha-dropout is 0.0003.

After the implementation of the hierarchical multitask learning, the depth was manually increased up to 18 hidden layers: the low-level task (classification or supervised constrastive learning) was added after the 12th layer, and the high-level task (regression with uncertainty) was added after the 18th layer. General architecture of single neural network is shown on the following figure. All "dense" components are feed-forward layers with self-normalization. Alpha-dropout is not shown to oversimplify the figure. The weather_snn_1_output layer estimates the mean and the standard deviation of normal distribution, which is implemented using theweather_snn_1_distribution layer. Another output layer named as weather_snn_1_projection calculates low-dimensional projections for the supervised contrastive learning.

Rectified ADAM [L. Liu et al. 2020] with Lookahead [M. R. Zhang et al. 2019] is used for training with the following parameters: learning rate is 0.0003, synchronization period is 6, and slow weights step size is 0.5. You can see the more detailed description of other training hyper-parameters in the Jupyter notebook deep_ensemble_with_uncertainty_and_spec_fe.ipynb.

Experiments

Experiments were conducted according to data partitioning in [Malinin et al. 2021, section 3.1]. Development and evaluation sets were not used for training and for hyper-parameter search. The quality of each modification of the method was first estimated on the development set, because all participants had access to the development set targets. After the preliminary estimation, the selected modification of the method was submitted to estimate R-AUC MSE on the evaluation set with targets concealed from participants.

In comparison with the baseline, the quality of the deep learning method is better (i.e. R-AUC MSE is less) on both datasets for testing:

  • the development set (for preliminary testing):
  1. proposed deep ensemble = 1.015;
  2. baseline (CatBoost ensemble) = 1.227;
  • the evaluation set (for final testing):
  1. proposed deep ensemble = 1.141;
  2. baseline (CatBoost ensemble) = 1.335.

Also, the results of the deep learning method are better with all possible values of the uncertainty threshold for retention.

The total number of all submitted methods at the evaluation phase is 73. Six selected results (top-5 results of all participants and the baseline result) are presented in the following table. The first three places are occupied by the following modifications of the proposed deep learning method:

  • SNN Ens U MT Np SpecFE is the final solution with "all-inclusive";
  • SNN Ens U MT Np v2 excludes the feature quantization;
  • SNN Ens U MT excludes the feature quantization too, and classification is used instead of supervised contrastive learning as the low-level task in the hierarchical multitask learning.
Rank Team Method R-AUC MSE
1 bond005 SNN Ens U MT Np SpecFE 1.1406288012
2 bond005 SNN Ens U MT Np v2 1.1415403291
3 bond005 SNN Ens U MT 1.1533415892
4 CabbeanWeather Steel box v2 1.1575201873
5 KDDI Research more seed ens 1.1593224114
55 Shifts Team Shifts Challenge 1.3353865316

The time characteristics of the solution with GPU using (Nvidia V100 or GeForce 2080 Ti) are:

  • the average inference time per sample is 0.063 milliseconds;
  • the total training time is about a day.
Owner
Ivan Yu. Bondarenko
Ivan Yu. Bondarenko
Federated Deep Reinforcement Learning for the Distributed Control of NextG Wireless Networks.

FDRL-PC-Dyspan Federated Deep Reinforcement Learning for the Distributed Control of NextG Wireless Networks. This repository contains the entire code

Peyman Tehrani 17 Nov 18, 2022
利用yolov5和TensorRT从0到1实现目标检测的模型训练到模型部署全过程

写在前面 利用TensorRT加速推理速度是以时间换取精度的做法,意味着在推理速度上升的同时将会有精度的下降,不过不用太担心,精度下降微乎其微。此外,要有NVIDIA显卡,经测试,CUDA10.2可以支持20系列显卡及以下,30系列显卡需要CUDA11.x的支持,并且目前有bug。 默认你已经完成了

Helium 6 Jul 28, 2022
The PyTorch implementation for paper "Neural Texture Extraction and Distribution for Controllable Person Image Synthesis" (CVPR2022 Oral)

ArXiv | Get Start Neural-Texture-Extraction-Distribution The PyTorch implementation for our paper "Neural Texture Extraction and Distribution for Cont

Ren Yurui 111 Dec 10, 2022
Rethinking the Importance of Implementation Tricks in Multi-Agent Reinforcement Learning

RIIT Our open-source code for RIIT: Rethinking the Importance of Implementation Tricks in Multi-AgentReinforcement Learning. We implement and standard

405 Jan 06, 2023
3D2Unet: 3D Deformable Unet for Low-Light Video Enhancement (PRCV2021)

3DDUNET This is the code for 3D2Unet: 3D Deformable Unet for Low-Light Video Enhancement (PRCV2021) Conference Paper Link Dataset We use SMOID dataset

1 Jan 07, 2022
The official PyTorch implementation for the paper "sMGC: A Complex-Valued Graph Convolutional Network via Magnetic Laplacian for Directed Graphs".

Magnetic Graph Convolutional Networks About The official PyTorch implementation for the paper sMGC: A Complex-Valued Graph Convolutional Network via M

3 Feb 25, 2022
Experiment about Deep Person Re-identification with EfficientNet-v2

We evaluated the baseline with Resnet50 and Efficienet-v2 without using pretrained models. Also Resnet50-IBN-A and Efficientnet-v2 using pretrained on ImageNet. We used two datasets: Market-1501 and

lan.nguyen2k 77 Jan 03, 2023
Lighthouse: Predicting Lighting Volumes for Spatially-Coherent Illumination

Lighthouse: Predicting Lighting Volumes for Spatially-Coherent Illumination Pratul P. Srinivasan, Ben Mildenhall, Matthew Tancik, Jonathan T. Barron,

Pratul Srinivasan 65 Dec 14, 2022
Implementing Graph Convolutional Networks and Information Retrieval Mechanisms using pure Python and NumPy

Implementing Graph Convolutional Networks and Information Retrieval Mechanisms using pure Python and NumPy

Noah Getz 3 Jun 22, 2022
Instance Segmentation in 3D Scenes using Semantic Superpoint Tree Networks

SSTNet Instance Segmentation in 3D Scenes using Semantic Superpoint Tree Networks(ICCV2021) by Zhihao Liang, Zhihao Li, Songcen Xu, Mingkui Tan, Kui J

83 Nov 29, 2022
All course materials for the Zero to Mastery Machine Learning and Data Science course.

Zero to Mastery Machine Learning Welcome! This repository contains all of the code, notebooks, images and other materials related to the Zero to Maste

Daniel Bourke 1.6k Jan 08, 2023
In this tutorial, you will perform inference across 10 well-known pre-trained object detectors and fine-tune on a custom dataset. Design and train your own object detector.

Object Detection Object detection is a computer vision task for locating instances of predefined objects in images or videos. In this tutorial, you wi

Ibrahim Sobh 62 Dec 25, 2022
Airbus Ship Detection Challenge

Airbus Ship Detection Challenge This is an open solution to the Airbus Ship Detection Challenge. Our goals We are building entirely open solution to t

minerva.ml 55 Nov 29, 2022
(ICCV'21) Official PyTorch implementation of Relational Embedding for Few-Shot Classification

Relational Embedding for Few-Shot Classification (ICCV 2021) Dahyun Kang, Heeseung Kwon, Juhong Min, Minsu Cho [paper], [project hompage] We propose t

Dahyun Kang 82 Dec 24, 2022
PyStan, a Python interface to Stan, a platform for statistical modeling. Documentation: https://pystan.readthedocs.io

PyStan NOTE: This documentation describes a BETA release of PyStan 3. PyStan is a Python interface to Stan, a package for Bayesian inference. Stan® is

Stan 229 Dec 29, 2022
An interactive DNN Model deployed on web that predicts the chance of heart failure for a patient with an accuracy of 98%

Heart Failure Predictor About A Web UI deployed Dense Neural Network Model Made using Tensorflow that predicts whether the patient is healthy or has c

Adit Ahmedabadi 0 Jan 09, 2022
A `Neural = Symbolic` framework for sound and complete weighted real-value logic

Logical Neural Networks LNNs are a novel Neuro = symbolic framework designed to seamlessly provide key properties of both neural nets (learning) and s

International Business Machines 138 Dec 19, 2022
MultiLexNorm 2021 competition system from ÚFAL

ÚFAL at MultiLexNorm 2021: Improving Multilingual Lexical Normalization by Fine-tuning ByT5 David Samuel & Milan Straka Charles University Faculty of

ÚFAL 13 Jun 28, 2022
FastFCN: Rethinking Dilated Convolution in the Backbone for Semantic Segmentation.

FastFCN: Rethinking Dilated Convolution in the Backbone for Semantic Segmentation [Project] [Paper] [arXiv] [Home] Official implementation of FastFCN:

Wu Huikai 815 Dec 29, 2022
A Python-based development platform for automated trading systems - from backtesting to optimisation to livetrading.

AutoTrader AutoTrader is Python-based platform intended to help in the development, optimisation and deployment of automated trading systems. From sim

Kieran Mackle 485 Jan 09, 2023