Twin-deep neural network for semi-supervised learning of materials properties

Related tags

Deep Learningtsdnn
Overview

Deep Semi-Supervised Teacher-Student Material Synthesizability Prediction

Citation:

Semi-supervised teacher-student deep neural network for materials discovery” by Daniel Gleaves, Edirisuriya M. Dilanga Siriwardane,Yong Zhao, and JianjunHu.

Machine learning and evolution laboratory

Department of Computer Science and Engineering

University of South Carolina


This software package implements the Meta Pseudo Labels (MPL) semi-supervised learning method with Crystal Graph Convolutional Neural Networks (CGCNN) with that takes an arbitary crystal structure to predict material synthesizability and whether it has positive or negative formation energy

The package provides two major functions:

  • Train a semi-supervised TSDNN classification model with a customized dataset.
  • Predict material synthesizability and formation energy of new crystals with a pre-trained TSDNN model.

The following paper describes the details of the CGCNN architecture, a graph neural network model for materials property prediction: CGCNN paper

The following paper describes the details of the semi-supervised learning framework that we used in our model: Meta Pseudo Labels

Table of Contents

Prerequisites

This package requires:

If you are new to Python, the easiest way of installing the prerequisites is via conda. After installing conda, run the following command to create a new environment named cgcnn and install all prerequisites:

conda upgrade conda
conda create -n tsdnn python=3 scikit-learn pytorch torchvision pymatgen -c pytorch -c conda-forge

*Note: this code is tested for PyTorch v1.0.0+ and is not compatible with versions below v0.4.0 due to some breaking changes.

This creates a conda environment for running TSDNN. Before using TSDNN, activate the environment by:

conda activate tsdnn

Usage

Define a customized dataset

To input crystal structures to TSDNN, you will need to define a customized dataset. Note that this is required for both training and predicting.

Before defining a customized dataset, you will need:

  • CIF files recording the structure of the crystals that you are interested in
  • The target label for each crystal (not needed for predicting, but you need to put some random numbers in data_test.csv)

You can create a customized dataset by creating a directory root_dir with the following files:

  1. data_labeled.csv: a CSV file with two columns. The first column recodes a unique ID for each crystal, and the second column recodes the known value of the target label.

  2. data_unlabeled.csv: a CSV file with two columns. The first column recodes a unique ID for each crystal, and the second column can be filled with alternating 1 and 0 (the second column is still needed).

  3. atom_init.json: a JSON file that stores the initialization vector for each element. An example of atom_init.json is data/sample-regression/atom_init.json, which should be good for most applications.

  4. ID.cif: a CIF file that recodes the crystal structure, where ID is the unique ID for the crystal.

(4.) data_predict: a CSV file with two columns. The first column recodes a unique ID for each crystal, and the second column can be filled with alternating 1 and 0 (the second column is still needed). This is the file that will be used if you want to classify materials with predict.py.

The structure of the root_dir should be:

root_dir
├── data_labeled.csv
├── data_unlabeled.csv
├── data_test.csv
├── data_positive.csv (optional- for positive and unlabeled dataset generation)
├── data_unlabeled_full.csv (optional- for positive and unlabeled dataset generation, data_unlabeled.csv will be overwritten)
├── atom_init.json
├── id0.cif
├── id1.cif
├── ...

There is an example of customized a dataset in: data/example.

Train a TSDNN model

Before training a new TSDNN model, you will need to:

Then, in directory synth-tsdnn, you can train a TSDNN model for your customized dataset by:

python main.py root_dir

If you want to use the PU learning dataset generation, you can train a model using the --uds flag with the number of PU iterations to perform.

python main.py --uds 5 root_dir

You can set the number of training, validation, and test data with labels --train-size, --val-size, and --test-size. Alternatively, you may use the flags --train-ratio, --val-ratio, --test-ratio instead. Note that the ratio flags cannot be used with the size flags simultaneously. For instance, data/example has 10 data points in total. You can train a model by:

python main.py --train-size 6 --val-size 2 --test-size 2 data/example

or alternatively

python main.py --train-ratio 0.6 --val-ratio 0.2 --test-ratio 0.2 data/example

After training, you will get 5 files in synth-tsdnn directory.

  • checkpoints/teacher_best.pth.tar: stores the TSDNN teacher model with the best validation accuracy.
  • checkpoints/student_best.pth.tar" stores the TSDNN student model with the best validation accuracy.
  • checkpoints/t_checkpoint.pth.tar: stores the TSDNN teacher model at the last epoch.
  • checkpoints/s_checkpoint.pth.tar: stores the TSDNN student model at the last epoch.
  • results/validation/test_results.csv: stores the ID and predicted value for each crystal in training set.

Predict material properties with a pre-trained TSDNN model

Before predicting the material properties, you will need to:

  • Define a customized dataset at root_dir for all the crystal structures that you want to predict.
  • Obtain a pre-trained TSDNN model (example found in checkpoints/pre-trained/pre-train.pth.tar).

Then, in directory synth-tsdnn, you can predict the properties of the crystals in root_dir:

python predict.py checkpoints/pre-trained/pre-trained.pth.tar data/root_dir

After predicting, you will get one file in synth-tsdnn directory:

  • predictions.csv: stores the ID and predicted value for each crystal in test set.

Data

To reproduce our paper, you can download the corresponding datasets following the instruction. Each dataset discussed can be found in data/datasets/

Authors

This software was primarily written by Daniel Gleaves who was advised by Prof. Jianjun Hu. This software builds upon work by Tian Xie, Hieu Pham, and Jungdae Kim.

Acknowledgements

Research reported in this work was supported in part by NSF under grants 1940099 and 1905775. The views, perspective,and content do not necessarily represent the official views of NSF. This work was supported in part by the South Carolina Honors College Research Program. This work is partially supported by a grant from the University of South Carolina Magellan Scholar Program.

License

TSDNN is released under the MIT License.

Owner
MLEG
MLEG
A PyTorch implementation for PyramidNets (Deep Pyramidal Residual Networks)

A PyTorch implementation for PyramidNets (Deep Pyramidal Residual Networks) This repository contains a PyTorch implementation for the paper: Deep Pyra

Greg Dongyoon Han 262 Jan 03, 2023
This toolkit provides codes to download and pre-process the SLUE datasets, train the baseline models, and evaluate SLUE tasks.

slue-toolkit We introduce Spoken Language Understanding Evaluation (SLUE) benchmark. This toolkit provides codes to download and pre-process the SLUE

ASAPP Research 39 Sep 21, 2022
This repo. is an implementation of ACFFNet, which is accepted for in Image and Vision Computing.

Attention-Guided-Contextual-Feature-Fusion-Network-for-Salient-Object-Detection This repo. is an implementation of ACFFNet, which is accepted for in I

5 Nov 21, 2022
Tool for live presentations using manim

manim-presentation Tool for live presentations using manim Install pip install manim-presentation opencv-python Usage Use the class Slide as your sce

Federico Galatolo 146 Jan 06, 2023
A faster pytorch implementation of faster r-cnn

A Faster Pytorch Implementation of Faster R-CNN Write at the beginning [05/29/2020] This repo was initaited about two years ago, developed as the firs

Jianwei Yang 7.1k Jan 01, 2023
Melanoma Skin Cancer Detection using Convolutional Neural Networks and Transfer Learning🕵🏻‍♂️

This is a Kaggle competition in which we have to identify if the given lesion image is malignant or not for Melanoma which is a type of skin cancer.

Vipul Shinde 1 Jan 27, 2022
Load What You Need: Smaller Multilingual Transformers for Pytorch and TensorFlow 2.0.

Smaller Multilingual Transformers This repository shares smaller versions of multilingual transformers that keep the same representations offered by t

Geotrend 79 Dec 28, 2022
Research on Tabular Deep Learning (Python package & papers)

Research on Tabular Deep Learning For paper implementations, see the section "Papers and projects". rtdl is a PyTorch-based package providing a user-f

Yura Gorishniy 510 Dec 30, 2022
PAMI stands for PAttern MIning. It constitutes several pattern mining algorithms to discover interesting patterns in transactional/temporal/spatiotemporal databases

Introduction PAMI stands for PAttern MIning. It constitutes several pattern mining algorithms to discover interesting patterns in transactional/tempor

RAGE UDAY KIRAN 43 Jan 08, 2023
Processed, version controlled history of Minecraft's generated data and assets

mcmeta Processed, version controlled history of Minecraft's generated data and assets Repository structure Each of the following branches has a commit

Misode 75 Dec 28, 2022
Deep Reinforcement Learning for Keras.

Deep Reinforcement Learning for Keras What is it? keras-rl implements some state-of-the art deep reinforcement learning algorithms in Python and seaml

Keras-RL 0 Dec 15, 2022
[ICLR 2021, Spotlight] Large Scale Image Completion via Co-Modulated Generative Adversarial Networks

Large Scale Image Completion via Co-Modulated Generative Adversarial Networks, ICLR 2021 (Spotlight) Demo | Paper [NEW!] Time to play with our interac

Shengyu Zhao 373 Jan 02, 2023
Alignment Attention Fusion framework for Few-Shot Object Detection

AAF framework Framework generalities This repository contains the code of the AAF framework proposed in this paper. The main idea behind this work is

Pierre Le Jeune 20 Dec 16, 2022
My implementation of Fully Convolutional Neural Networks in Keras

Keras-FCN This repository contains my implementation of Fully Convolutional Networks in Keras (Tensorflow backend). Currently, semantic segmentation c

The Duy Nguyen 15 Jan 13, 2020
Research Artifact of USENIX Security 2022 Paper: Automated Side Channel Analysis of Media Software with Manifold Learning

Automated Side Channel Analysis of Media Software with Manifold Learning Official implementation of USENIX Security 2022 paper: Automated Side Channel

Yuanyuan Yuan 175 Jan 07, 2023
Reproduced Code for Image Forgery Detection papers.

Image Forgery Detection With over 4.5 billion active internet users, the amount of multimedia content being shared every day has surpassed everyone’s

Umar Masud 15 Dec 06, 2022
Keras + Hyperopt: A very simple wrapper for convenient hyperparameter optimization

This project is now archived. It's been fun working on it, but it's time for me to move on. Thank you for all the support and feedback over the last c

Max Pumperla 2.1k Jan 03, 2023
Pytorch implementation of ProjectedGAN

ProjectedGAN-pytorch Pytorch implementation of ProjectedGAN (https://arxiv.org/abs/2111.01007) Note: this repository is still under developement. @InP

Dominic Rampas 17 Dec 14, 2022
This repository contains the code for EMNLP-2021 paper "Word-Level Coreference Resolution"

Word-Level Coreference Resolution This is a repository with the code to reproduce the experiments described in the paper of the same name, which was a

79 Dec 27, 2022
Individual Tree Crown classification on WorldView-2 Images using Autoencoder -- Group 9 Weak learners - Final Project (Machine Learning 2020 Course)

Created by Olga Sutyrina, Sarah Elemili, Abduragim Shtanchaev and Artur Bille Individual Tree Crown classification on WorldView-2 Images using Autoenc

2 Dec 08, 2022