Dual Adaptive Sampling for Machine Learning Interatomic potential.

Related tags

Machine Learningdas
Overview

DAS

Dual Adaptive Sampling for Machine Learning Interatomic potential.

How to cite

If you use this code in your research, please cite this using: Hongliang Yang, Yifan Zhu, Erting Dong, Yabei Wu, Jiong Yang, and Wenqing Zhang. Dual adaptive sampling and machine learning interatomic potentials for modeling materials with chemical bond hierarchy. Phys. Rev. B 104, 094310 (2021).

Install

Install pymtp

You should first install the python interface for mtp: https://github.com/hlyang1992/pymtp

Install das

You can download the code by

git clone https://github.com/hlyang1992/das
cd das
cp -r <path-to-mlip-2>/untrained_mtps/*.mtp das/utils/untrained_mtps

Then remove the redundant settings from each mtp file. Only the following settings can be retained for each mtp file:

radial_funcs_count = 
alpha_moments_count = 
alpha_index_basic_count = 
alpha_index_basic = 
alpha_index_times_count = 
alpha_index_times = 
alpha_scalar_moments = 
alpha_moment_mapping =

Install das by

cd <path-to-das>
pip install -r requirements.txt
pip install .

Usage

das  config_dir  job_name

Configuration

The configuration directory config_dir must contain the configuration file conf.yaml, which controls all sampling processes. The conf.yaml file should look like the following:

"global_settings":

"machine_settings":

"selector_settings": {} 

"labeler_settings":

"trainer_settings":

"sampler_settings":

"init_conf_setting":

"iter_params_template":

"iter_params":
  • global_settings:
"global_settings":
  # The elements in the system, the order of the elements does not matter, the program automatically numbers the 
  # atomic types according to their atomic number from smallest to largest.
  "unique_elements": [ "Co", "Sb" ]
  # path to VASP Pseudopotential Database, see detail at https://wiki.fysik.dtu.dk/ase/ase/calculators/vasp.html#vasp
  "vasp_pp_path": "path_to_directory" 
  • machine_settings:

All time-consuming computational tasks such as sampling, labeling, and training can be dispatched to designated machines via ssh. Currently only LSF is supported and migration to other job management systems is very easy.

"machine_settings":
  "machine_1":
    # The supported machine types are now: `machine_lsf`, `machine_shell`
    "machine_type": "machine_lsf"
    "host": "ip address"
    "user": "username"
    "password": "password"
    # Exclude these nodes when submitting tasks.
    "bad_nodes": [ ] # #BSUB -R "hname!={{node}}"
    "port": 22
    # number of cores for each task
    "n_cores": 40 # #BSUB -n {{ncores}}
    "n_tasks": 40 # The maximum number of tasks to run simultaneously.
    "q_name": "short" # #BSUB -q {{q_name}}
    "env_source_file": "env.sh" # env.sh is in the config_dir
    "run_dir": "path-to-run-directory-in-target"
    "extra_params":
      "vasp_cmd": "mpiexec.hydra -machinefile $LSB_DJOB_HOSTFILE -np $NP vasp"
      "lmp_cmd": "mpiexec.hydra -machinefile $LSB_DJOB_HOSTFILE -np $NP lmp_mlp"
      "mlip_cmd": "mpiexec.hydra -machinefile $LSB_DJOB_HOSTFILE -np $NP mlp train"
      "python_cmd": "absolute path to python path"
  "machine_2":
    # setting for machchine_2
    "machine_type": "machine_lsf"
    # ...

You should prepare a file to set the environment variables. The program will source this file to set the environment variables after connecting to the machine via ssh. For technical reasons please see: The remote shell environment doesn’t match interactive shells

  • sampler_settings
"scale_1":
  "kind": "scale_box"
  "scale_factors": [0.998, 0.9985, 0.999]
"scale_2":
  "kind": "scale_box"
  "scale_factors": [[0.998, 0.9985, 0.999, 0.997], # a
                    [1.002, 1.003, 1.004, 1.005],  # b
                    [0.997, 0.995, 0.999, 0.996]] # c
"nvt_0": 
  "kind": "lmp_model_sampler"
  "max_number_confs": 5
  "min_number_confs": 0
  "machine": "machine_1"
  "lmp_vars":
    "temp": [ 100, 150 ]
    "steps": [ 10000 ]
    "nevery": [ 20 ]
    "prev_steps": [ 0 ]
 
"npt_0": 
  "kind": "lmp_model_sampler"
  "max_number_confs": 5
  "min_number_confs": 0
  "machine": "machine_2"
  "lmp_vars":
    "temp": [ 100, 150 ]
    "steps": [ 10000 ]
    "nevery": [ 20 ]
    "press": [100, 200] # bar
    "prev_steps": [ 0 ]
  • "labeler_settings"

We use ase to generate input files (INCAR, POTCAR, KPOINTS) for VASP calculation. Please see detail at Ase vasp calculator

"labeler_settings":
  "vasp":
    "kind": "vasp"
    "machine": "ty_label"
    "vasp_parms":
      "xc": "pbe"
      "prec": "A"
      # other setting for vasp calculations
  • "trainer_settings"
"trainer_settings":
  "train_5_model":
    "kind": "mtp_trainer"
    "machine": "ty_train" 
    "model_index": 18 
    "min_dist": 1.39 
    "max_dist": 5.0
    "n_models": 5 
    "train_from_prev_model": true 
  • init_conf_setting:
"init_conf_setting":
  "-1": [ "init_MD.cfg" ]
  "-2": [ "init_1.vasp" ]
  "-3": [ "init_2.vasp" ]
  • iter_params_template:
"iter_params_template":
  "0":
    "init_conf": [ -1 ]
    "sampler": [ ]
    "selector": [ ]
    "labeler": [ ]
    "trainer": [ "train_5_model" ]
  "10":
    "init_conf": [ -2 ]
    "sampler": [ "scale_0", "nvt_0" ]
    "selector": [ ]
    "labeler": [ "vasp" ]
    "trainer": [ "train_5_model" ]
  "20":
    "init_conf": [ -3 ]
    "sampler": [ "npt_0"]
    "selector": [ ]
    "labeler": [ "vasp" ]
    "trainer": [ "train_5_model" ]
  "30":
    "init_conf": [ -2,-3 ]
    "sampler": [ "npt_0"]
    "selector": [ ]
    "labeler": [ "vasp" ]
    "trainer": [ "train_5_model" ]
  • iter_params:
"iter_params":
  [
    [ "0" ],
    # If the last one is LOOP, repeat all the previous ones until convergence.
    ["10", "LOOP"], 
    ["30", "LOOP"],
    ["10", "10"]  
    ["20"],
  ]
A library of sklearn compatible categorical variable encoders

Categorical Encoding Methods A set of scikit-learn-style transformers for encoding categorical variables into numeric by means of different techniques

2.1k Jan 07, 2023
Machine Learning for Time-Series with Python.Published by Packt

Machine-Learning-for-Time-Series-with-Python Become proficient in deriving insights from time-series data and analyzing a model’s performance Links Am

Packt 124 Dec 28, 2022
Distributed deep learning on Hadoop and Spark clusters.

Note: we're lovingly marking this project as Archived since we're no longer supporting it. You are welcome to read the code and fork your own version

Yahoo 1.3k Dec 28, 2022
A flexible CTF contest platform for coming PKU GeekGame events

Project Guiding Star: the Backend A flexible CTF contest platform for coming PKU GeekGame events Still in early development Highlights Not configurabl

PKU GeekGame 14 Dec 15, 2022
Lightning ⚡️ fast forecasting with statistical and econometric models.

Nixtla Statistical ⚡️ Forecast Lightning fast forecasting with statistical and econometric models StatsForecast offers a collection of widely used uni

Nixtla 2.1k Dec 29, 2022
It is a forest of random projection trees

rpforest rpforest is a Python library for approximate nearest neighbours search: finding points in a high-dimensional space that are close to a given

Lyst 211 Dec 29, 2022
Combines MLflow with a database (PostgreSQL) and a reverse proxy (NGINX) into a multi-container Docker application

Combines MLflow with a database (PostgreSQL) and a reverse proxy (NGINX) into a multi-container Docker application (with docker-compose).

Philip May 2 Dec 03, 2021
Simulate & classify transient absorption spectroscopy (TAS) spectral features for bulk semiconducting materials (Post-DFT)

PyTASER PyTASER is a Python (3.9+) library and set of command-line tools for classifying spectral features in bulk materials, post-DFT. The goal of th

Materials Design Group 4 Dec 27, 2022
A toolbox to iNNvestigate neural networks' predictions!

iNNvestigate neural networks! Table of contents Introduction Installation Usage and Examples More documentation Contributing Releases Introduction In

Maximilian Alber 1.1k Jan 05, 2023
A Multipurpose Library for Synthetic Time Series Generation in Python

TimeSynth Multipurpose Library for Synthetic Time Series Please cite as: J. R. Maat, A. Malali, and P. Protopapas, “TimeSynth: A Multipurpose Library

278 Dec 26, 2022
A data preprocessing and feature engineering script for a machine learning pipeline is prepared.

FEATURE ENGINEERING Business Problem: A data preprocessing and feature engineering script for a machine learning pipeline needs to be prepared. It is

Pinar Oner 7 Dec 18, 2021
A collection of Scikit-Learn compatible time series transformers and tools.

tsfeast A collection of Scikit-Learn compatible time series transformers and tools. Installation Create a virtual environment and install: From PyPi p

Chris Santiago 0 Mar 30, 2022
Covid-polygraph - a set of Machine Learning-driven fact-checking tools

Covid-polygraph, a set of Machine Learning-driven fact-checking tools that aim to address the issue of misleading information related to COVID-19.

1 Apr 22, 2022
AP1 Transcription Factor Binding Site Prediction

A machine learning project that predicted binding sites of AP1 transcription factor, using ChIP-Seq data and local DNA shape information.

1 Jan 21, 2022
Data Efficient Decision Making

Data Efficient Decision Making

Microsoft 197 Jan 06, 2023
scikit-learn is a python module for machine learning built on top of numpy / scipy

About scikit-learn is a python module for machine learning built on top of numpy / scipy. The purpose of the scikit-learn-tutorial subproject is to le

Gael Varoquaux 122 Dec 12, 2022
Turns your machine learning code into microservices with web API, interactive GUI, and more.

Turns your machine learning code into microservices with web API, interactive GUI, and more.

Machine Learning Tooling 2.8k Jan 02, 2023
MLReef is an open source ML-Ops platform that helps you collaborate, reproduce and share your Machine Learning work with thousands of other users.

The collaboration platform for Machine Learning MLReef is an open source ML-Ops platform that helps you collaborate, reproduce and share your Machine

MLReef 1.4k Dec 27, 2022
Distributed Deep learning with Keras & Spark

Elephas: Distributed Deep Learning with Keras & Spark Elephas is an extension of Keras, which allows you to run distributed deep learning models at sc

Max Pumperla 1.6k Dec 29, 2022
Dragonfly is an open source python library for scalable Bayesian optimisation.

Dragonfly is an open source python library for scalable Bayesian optimisation. Bayesian optimisation is used for optimising black-box functions whose

744 Jan 02, 2023