Towards Representation Learning for Atmospheric Dynamics (AtmoDist)

Related tags

Deep LearningAtmoDist
Overview

Towards Representation Learning for Atmospheric Dynamics (AtmoDist)

The prediction of future climate scenarios under anthropogenic forcing is critical to understand climate change and to assess the impact of potentially counter-acting technologies. Machine learning and hybrid techniques for this prediction rely on informative metrics that are sensitive to pertinent but often subtle influences. For atmospheric dynamics, a critical part of the climate system, no well established metric exists and visual inspection is currently still often used in practice. However, this "eyeball metric" cannot be used for machine learning where an algorithmic description is required. Motivated by the success of intermediate neural network activations as basis for learned metrics, e.g. in computer vision, we present a novel, self-supervised representation learning approach specifically designed for atmospheric dynamics. Our approach, called AtmoDist, trains a neural network on a simple, auxiliary task: predicting the temporal distance between elements of a randomly shuffled sequence of atmospheric fields (e.g. the components of the wind field from reanalysis or simulation). The task forces the network to learn important intrinsic aspects of the data as activations in its layers and from these hence a discriminative metric can be obtained. We demonstrate this by using AtmoDist to define a metric for GAN-based super resolution of vorticity and divergence. Our upscaled data matches both visually and in terms of its statistics a high resolution reference closely and it significantly outperform the state-of-the-art based on mean squared error. Since AtmoDist is unsupervised, only requires a temporal sequence of fields, and uses a simple auxiliary task, it has the potential to be of utility in a wide range of applications.

Original implementation of

Hoffmann, Sebastian, and Christian Lessig. "Towards Representation Learning for Atmospheric Dynamics." arXiv preprint arXiv:2109.09076 (2021). https://arxiv.org/abs/2109.09076

presented as part of the NeurIPS 2021 Workshop on Tackling Climate Change with Machine Learning

We would like to thank Stengel et al. for openly making available their implementation (https://github.com/NREL/PhIRE) of Adversarial super-resolution of climatological wind and solar data on which we directly based the super-resolution part of this work.


Requirements

  • tensorflow 1.15.5
  • pyshtools (for SR evaluation)
  • pyspharm (for SR evaluation)
  • h5py
  • hdf5plugin
  • dask.array

Installation

pip install -e ./

This also makes available multiple command line tools that provide easy access to preprocessing, training, and evaluation routines. It's recommended to install the project in a virtual environment as to not polutte the global PATH.


CLI Tools

The provided CLI tools don't accept parameters but rather act as a shortcut to execute the corresponding script files. All parameters controlling the behaviour of the training etc. should thus be adjusted in the script files directly. We list both the command-line command, as well as the script file the command executes.

  • rplearn-data (python/phire/data_tool.py)
    • Samples patches and generates .tfrecords files from HDF5 data for the self-supervised representation-learning task.
  • rplearn-train (python/phire/rplearn/train.py)
    • Trains the representation network. By toggling comments, the same script is also used for evaluation of the trained network.
  • phire-data (python/phire/data_tool.py)
    • Samples patches and generates .tfrecords files from HDF5 data for the super-resolution task.
  • phire-train (python/phire/main.py)
    • Trains the SRGAN model using either MSE or a content-loss based on AtmoDist.
  • phire-eval (python/phire/evaluation/cli.py)
    • Evaluates trained SRGAN models using various metrics (e.g. energy spectrum, semivariogram, etc.). Generation of images is also part of this.

Project Structure

  • python/phire
    • Mostly preserved from the Stengel et al. implementation, this directory contains the code for the SR training. sr_network.py contains the actual GAN model, whereas PhIREGANs.py contains the main training loop, data pipeline, as well as interference procedure.
  • python/phire/rplearn
    • Contains everything related to representation learning task, i.e. AtmoDist. The actual ResNet models are defined in resnet.py, while the training procedure can be found in train.py.
  • python/phire/evaluation
    • Dedicated to the evaluation of the super-resolved fields. The main configuration of the evaluation is done in cli.py, while the other files mostly correspond to specific evaluation metrics.
  • python/phire/data
    • Static data shipped with the python package.
  • python/phire/jetstream
    • WiP: Prediction of jetstream latitude as downstream task.
  • scripts/
    • Various utility scripts, e.g. used to generate some of the figures seen in the paper.

Preparing the data

AtmoDist is trained on vorticity and divergence fields from ERA5 reanalysis data. The data was directly obtained as spherical harmonic coefficients from model level 120, before being converted to regular lat-lon grids (1280 x 2560) using pyshtools (right now not included in this repository).

We assume this gridded data to be stored in a hdf5 file for training and evaluation respectively containing a single dataset /data with dimensions C x T x H x W. These dimensions correspond to channel (/variable), time, height, and width respectively. Patches are then sampled from this hdf5 data and stored in .tfrecords files for training.

In practice, these "master" files actually contained virtual datasets, while the actual data was stored as one hdf5 file per year. This is however not a hard requirement. The script to create these virtual datasets is currently not included in the repository but might be at a later point of time.

To sample patches for training or evaluation run rplearn-data and phire-data.

Normalization

Normalization is done by the phire/data_tool.py script. This procedure is opaque to the models and data is only de-normalized during evaluation. The mean and standard deviations used for normalization can be specified using DataSampler.mean, DataSampler.std, DataSampler.mean_log1p, DataSampler.std_log1p. If specified as None, then these statistics will be calculated from the dataset using dask (this will take some time).


Training the AtmoDist model

  1. Specify dataset location, model name, output location, and number of classes (i.e. max delta T) in phire/rplearn/train.py
  2. Run training using rplearn-train
  3. Switch to evaluation by calling evaluate_all() and compute metrics on eval set
  4. Find optimal epoch and calculate normalization factors (for specific layer) using calculate_loss()

Training the SRGAN model

  1. Specify dataset location, model name, AtmoDist model to use, and training regimen in phire/main.py
  2. Run training using phire-train

Evaluating the SRGAN models

  1. Specify dataset location, models to evaluate, output location, and metrics to calculate in phire/evaluation/cli.py
  2. Evaluate using phire-eval
  3. Toggle the if-statement to generate comparing plots and data between different models and rerun phire-eval
Owner
Sebastian Hoffmann
Sebastian Hoffmann
Monocular 3D Object Detection: An Extrinsic Parameter Free Approach (CVPR2021)

Monocular 3D Object Detection: An Extrinsic Parameter Free Approach (CVPR2021) Yunsong Zhou, Yuan He, Hongzi Zhu, Cheng Wang, Hongyang Li, Qinhong Jia

Yunsong Zhou 51 Dec 14, 2022
i3DMM: Deep Implicit 3D Morphable Model of Human Heads

i3DMM: Deep Implicit 3D Morphable Model of Human Heads CVPR 2021 (Oral) Arxiv | Poject Page This project is the official implementation our work, i3DM

Tarun Yenamandra 60 Jan 03, 2023
A library that allows for inference on probabilistic models

Bean Machine Overview Bean Machine is a probabilistic programming language for inference over statistical models written in the Python language using

Meta Research 234 Dec 29, 2022
The code of “Similarity Reasoning and Filtration for Image-Text Matching” [AAAI2021]

SGRAF PyTorch implementation for AAAI2021 paper of “Similarity Reasoning and Filtration for Image-Text Matching”. It is built on top of the SCAN and C

Ronnie_IIAU 149 Dec 22, 2022
A different spin on dataclasses.

dataklasses Dataklasses is a library that allows you to quickly define data classes using Python type hints. Here's an example of how you use it: from

David Beazley 752 Nov 18, 2022
PyTorchMemTracer - Depict GPU memory footprint during DNN training of PyTorch

A Memory Tracer For PyTorch OOM is a nightmare for PyTorch users. However, most

Jiarui Fang 9 Nov 14, 2022
RITA is a family of autoregressive protein models, developed by LightOn in collaboration with the OATML group at Oxford and the Debora Marks Lab at Harvard.

RITA: a Study on Scaling Up Generative Protein Sequence Models RITA is a family of autoregressive protein models, developed by a collaboration of Ligh

LightOn 69 Dec 22, 2022
Unsupervised Domain Adaptation for Nighttime Aerial Tracking (CVPR2022)

Unsupervised Domain Adaptation for Nighttime Aerial Tracking (CVPR2022) Junjie Ye, Changhong Fu, Guangze Zheng, Danda Pani Paudel, and Guang Chen. Uns

Intelligent Vision for Robotics in Complex Environment 91 Dec 30, 2022
Codecov coverage standard for Python

Python-Standard Last Updated: 01/07/22 00:09:25 What is this? This is a Python application, with basic unit tests, for which coverage is uploaded to C

Codecov 10 Nov 04, 2022
Byte-based multilingual transformer TTS for low-resource/few-shot language adaptation.

One model to speak them all 🌎 Audio Language Text ▷ Chinese 人人生而自由,在尊严和权利上一律平等。 ▷ English All human beings are born free and equal in dignity and rig

Mutian He 60 Nov 14, 2022
[ICCV'21] Pri3D: Can 3D Priors Help 2D Representation Learning?

Pri3D: Can 3D Priors Help 2D Representation Learning? [ICCV 2021] Pri3D leverages 3D priors for downstream 2D image understanding tasks: during pre-tr

Ji Hou 124 Jan 06, 2023
Codes for the ICCV'21 paper "FREE: Feature Refinement for Generalized Zero-Shot Learning"

FREE This repository contains the reference code for the paper "FREE: Feature Refinement for Generalized Zero-Shot Learning". [arXiv][Paper] 1. Prepar

Shiming Chen 28 Jul 29, 2022
PyTorch implementation of HDN(Homography Decomposition Networks) for planar object tracking

Homography Decomposition Networks for Planar Object Tracking This project is the offical PyTorch implementation of HDN(Homography Decomposition Networ

CaptainHook 48 Dec 15, 2022
An official TensorFlow implementation of “CLCC: Contrastive Learning for Color Constancy” accepted at CVPR 2021.

CLCC: Contrastive Learning for Color Constancy (CVPR 2021) Yi-Chen Lo*, Chia-Che Chang*, Hsuan-Chao Chiu, Yu-Hao Huang, Chia-Ping Chen, Yu-Lin Chang,

Yi-Chen (Howard) Lo 58 Dec 17, 2022
Course on computational design, non-linear optimization, and dynamics of soft systems at UIUC.

Computational Design and Dynamics of Soft Systems · This is a repository that contains the source code for generating the lecture notes, handouts, exe

Tejaswin Parthasarathy 4 Jul 21, 2022
When Does Pretraining Help? Assessing Self-Supervised Learning for Law and the CaseHOLD Dataset of 53,000+ Legal Holdings

When Does Pretraining Help? Assessing Self-Supervised Learning for Law and the CaseHOLD Dataset of 53,000+ Legal Holdings This is the repository for t

RegLab 39 Jan 07, 2023
Object Depth via Motion and Detection Dataset

ODMD Dataset ODMD is the first dataset for learning Object Depth via Motion and Detection. ODMD training data are configurable and extensible, with ea

Brent Griffin 172 Dec 21, 2022
Codebase for Inducing Causal Structure for Interpretable Neural Networks

Interchange Intervention Training (IIT) Codebase for Inducing Causal Structure for Interpretable Neural Networks Release Notes 12/01/2021: Code and Pa

Zen 6 Oct 10, 2022
Time should be taken seer-iously

TimeSeers seers - (Noun) plural form of seer - A person who foretells future events by or as if by supernatural means TimeSeers is an hierarchical Bay

279 Dec 26, 2022
Fake videos detection by tracing the source using video hashing retrieval.

Vision Transformer Based Video Hashing Retrieval for Tracing the Source of Fake Videos 🎉️ 📜 Directory Introduction VTL Trace Samples and Acc of Hash

56 Dec 22, 2022