An SE(3)-invariant autoencoder for generating the periodic structure of materials

Related tags

Deep Learningcdvae
Overview

Crystal Diffusion Variational AutoEncoder

This software implementes Crystal Diffusion Variational AutoEncoder (CDVAE), which generates the periodic structure of materials.

It has several main functionalities:

  • Generate novel, stable materials by learning from a dataset containing existing material structures.
  • Generate materials by optimizing a specific property in the latent space, i.e. inverse design.

[Paper] [Datasets]

Table of Contents

Installation

The easiest way to install prerequisites is via conda.

Pre-install step

Install conda-merge:

pip install conda-merge

Check that you can invoke conda-merge by running conda-merge -h.

GPU machines

Run the following command to install the environment:

conda-merge env.common.yml env.gpu.yml > env.yml
conda env create -f env.yml

Activate the conda environment with conda activate cdvae.

Install this package with pip install -e ..

CPU-only machines

conda-merge env.common.yml env.cpu.yml > env.yml
conda env create -f env.yml
conda activate cdvae
pip install -e .

Setting up environment variables

Make a copy of the .env.template file and rename it to .env. Modify the following environment variables in .env.

  • PROJECT_ROOT: path to the folder that contains this repo
  • HYDRA_JOBS: path to a folder to store hydra outputs
  • WABDB: path to a folder to store wabdb outputs

Datasets

All datasets are directly available on data/ with train/valication/test splits. You don't need to download them again. If you use these datasets, please consider to cite the original papers from which we curate these datasets.

Find more about these datasets by going to our Datasets page.

Training CDVAE

Training without a property predictor

To train a CDVAE, run the following command:

python cdvae/run.py data=perov expname=perov

To use other datasets, use data=carbon and data=mp_20 instead. CDVAE uses hydra to configure hyperparameters, and users can modify them with the command line or configure files in conf/ folder.

After training, model checkpoints can be found in $HYDRA_JOBS/singlerun/YYYY-MM-DD/expname.

Training with a property predictor

Users can also additionally train an MLP property predictor on the latent space, which is needed for the property optimization task:

python cdvae/run.py data=perov expname=perov model.predict_property=True

The name of the predicted propery is defined in data.prop, as in conf/data/perov.yaml for Perov-5.

Generating materials

To generate materials, run the following command:

python scripts/evaluate.py --model_path MODEL_PATH --tasks recon gen opt

MODEL_PATH will be the path to the trained model. Users can choose one or several of the 3 tasks:

  • recon: reconstruction, reconstructs all materials in the test data. Outputs can be found in eval_recon.ptl
  • gen: generate new material structures by sampling from the latent space. Outputs can be found in eval_gen.pt.
  • opt: generate new material strucutre by minimizing the trained property in the latent space (requires model.predict_property=True). Outputs can be found in eval_opt.pt.

eval_recon.pt, eval_gen.pt, eval_opt.pt are pytorch pickles files containing multiple tensors that describes the structures of M materials batched together. Each material can have different number of atoms, and we assume there are in total N atoms. num_evals denote the number of Langevin dynamics we perform for each material.

  • frac_coords: fractional coordinates of each atom, shape (num_evals, N, 3)
  • atom_types: atomic number of each atom, shape (num_evals, N)
  • lengths: the lengths of the lattice, shape (num_evals, M, 3)
  • angles: the angles of the lattice, shape (num_evals, M, 3)
  • num_atoms: the number of atoms in each material, shape (num_evals, M)

Evaluating model

To compute evaluation metrics, run the following command:

python scripts/compute_metrics.py --root_path MODEL_PATH --tasks recon gen opt

MODEL_PATH will be the path to the trained model. All evaluation metrics will be saved in eval_metrics.json.

Authors and acknowledgements

The software is primary written by Tian Xie, with signficant contributions from Xiang Fu.

The GNN codebase and many utility functions are adapted from the ocp-models by the Open Catalyst Project. Especially, the GNN implementations of DimeNet++ and GemNet are used.

The main structure of the codebase is built from NN Template.

For the datasets, Perov-5 is curated from Perovksite water-splitting, Carbon-24 is curated from AIRSS data for carbon at 10GPa, MP-20 is curated from Materials Project.

Citation

Please consider citing the following paper if you find our code & data useful.

@article{xie2021crystal,
  title={Crystal Diffusion Variational Autoencoder for Periodic Material Generation},
  author={Xie, Tian and Fu, Xiang and Ganea, Octavian-Eugen and Barzilay, Regina and Jaakkola, Tommi},
  journal={arXiv preprint arXiv:2110.06197},
  year={2021}
}

Contact

Please leave an issue or reach out to Tian Xie (txie AT csail DOT mit DOT edu) if you have any questions.

Owner
Tian Xie
Postdoc at MIT CSAIL. Machine learning algorithms for materials, drugs, and beyond.
Tian Xie
Object-Centric Learning with Slot Attention

Slot Attention This is a re-implementation of "Object-Centric Learning with Slot Attention" in PyTorch (https://arxiv.org/abs/2006.15055). Requirement

Untitled AI 72 Jan 02, 2023
PyTorch Implementation of DiffGAN-TTS: High-Fidelity and Efficient Text-to-Speech with Denoising Diffusion GANs

DiffGAN-TTS - PyTorch Implementation PyTorch implementation of DiffGAN-TTS: High

Keon Lee 157 Jan 01, 2023
Face recognition system using MTCNN, FACENET, SVM and FAST API to track participants of Big Brother Brasil in real time.

BBB Face Recognizer Face recognition system using MTCNN, FACENET, SVM and FAST API to track participants of Big Brother Brasil in real time. Instalati

Rafael Azevedo 232 Dec 24, 2022
Official implementation of the paper "Topographic VAEs learn Equivariant Capsules"

Topographic Variational Autoencoder Paper: https://arxiv.org/abs/2109.01394 Getting Started Install requirements with Anaconda: conda env create -f en

T. Andy Keller 69 Dec 12, 2022
GPU implementation of $k$-Nearest Neighbors and Shared-Nearest Neighbors

GPU implementation of kNN and SNN GPU implementation of $k$-Nearest Neighbors and Shared-Nearest Neighbors Supported by numba cuda and faiss library E

Hyeon Jeon 7 Nov 23, 2022
The official code of Anisotropic Stroke Control for Multiple Artists Style Transfer

ASMA-GAN Anisotropic Stroke Control for Multiple Artists Style Transfer Proceedings of the 28th ACM International Conference on Multimedia The officia

Six_God 146 Nov 21, 2022
This is an official implementation of our CVPR 2021 paper "Bottom-Up Human Pose Estimation Via Disentangled Keypoint Regression" (https://arxiv.org/abs/2104.02300)

Bottom-Up Human Pose Estimation Via Disentangled Keypoint Regression Introduction In this paper, we are interested in the bottom-up paradigm of estima

HRNet 367 Dec 27, 2022
Codes for TIM2021 paper "Anchor-Based Spatio-Temporal Attention 3-D Convolutional Networks for Dynamic 3-D Point Cloud Sequences"

Codes for TIM2021 paper "Anchor-Based Spatio-Temporal Attention 3-D Convolutional Networks for Dynamic 3-D Point Cloud Sequences"

Intelligent Robotics and Machine Vision Lab 4 Jul 19, 2022
Self-Supervised Learning with Kernel Dependence Maximization

Self-Supervised Learning with Kernel Dependence Maximization This is the code for SSL-HSIC, a self-supervised learning loss proposed in the paper Self

DeepMind 29 Dec 29, 2022
Code Repo for the ACL21 paper "Common Sense Beyond English: Evaluating and Improving Multilingual LMs for Commonsense Reasoning"

Common Sense Beyond English: Evaluating and Improving Multilingual LMs for Commonsense Reasoning This is the Github repository of our paper, "Common S

INK Lab @ USC 19 Nov 30, 2022
Source code for "UniRE: A Unified Label Space for Entity Relation Extraction.", ACL2021.

UniRE Source code for "UniRE: A Unified Label Space for Entity Relation Extraction.", ACL2021. Requirements python: 3.7.6 pytorch: 1.8.1 transformers:

Wang Yijun 109 Nov 29, 2022
Official Implementation for the "An Empirical Investigation of 3D Anomaly Detection and Segmentation" paper.

An Empirical Investigation of 3D Anomaly Detection and Segmentation Project | Paper Official PyTorch Implementation for the "An Empirical Investigatio

Eliahu Horwitz 55 Dec 14, 2022
Pytorch implementation of the paper Time-series Generative Adversarial Networks

TimeGAN-pytorch Pytorch implementation of the paper Time-series Generative Adversarial Networks presented at NeurIPS'19. Jinsung Yoon, Daniel Jarrett

Zhiwei ZHANG 21 Nov 24, 2022
Topic Modelling for Humans

gensim – Topic Modelling in Python Gensim is a Python library for topic modelling, document indexing and similarity retrieval with large corpora. Targ

RARE Technologies 13.8k Jan 03, 2023
Text to Image Generation with Semantic-Spatial Aware GAN

text2image This repository includes the implementation for Text to Image Generation with Semantic-Spatial Aware GAN This repo is not completely. Netwo

CVDDL 124 Dec 30, 2022
Style transfer between images was performed using the VGG19 model

Style transfer between images was performed using the VGG19 model. The necessary codes, libraries and all other information of this project are available below

Onur yılmaz 2 May 09, 2022
CBKH: The Cornell Biomedical Knowledge Hub

Cornell Biomedical Knowledge Hub (CBKH) CBKG integrates data from 18 publicly available biomedical databases. The current version of CBKG contains a t

44 Dec 21, 2022
A Comprehensive Analysis of Weakly-Supervised Semantic Segmentation in Different Image Domains (IJCV submission)

wsss-analysis The code of: A Comprehensive Analysis of Weakly-Supervised Semantic Segmentation in Different Image Domains, arXiv pre-print 2019 paper.

Lyndon Chan 48 Dec 18, 2022
ElegantRL is featured with lightweight, efficient and stable, for researchers and practitioners.

Lightweight, efficient and stable implementations of deep reinforcement learning algorithms using PyTorch. 🔥

AI4Finance 2.5k Jan 08, 2023
FlexConv: Continuous Kernel Convolutions with Differentiable Kernel Sizes

FlexConv: Continuous Kernel Convolutions with Differentiable Kernel Sizes This repository contains the source code accompanying the paper: FlexConv: C

Robert-Jan Bruintjes 96 Dec 12, 2022