Dual Adaptive Sampling for Machine Learning Interatomic potential.

Related tags

Machine Learningdas
Overview

DAS

Dual Adaptive Sampling for Machine Learning Interatomic potential.

How to cite

If you use this code in your research, please cite this using: Hongliang Yang, Yifan Zhu, Erting Dong, Yabei Wu, Jiong Yang, and Wenqing Zhang. Dual adaptive sampling and machine learning interatomic potentials for modeling materials with chemical bond hierarchy. Phys. Rev. B 104, 094310 (2021).

Install

Install pymtp

You should first install the python interface for mtp: https://github.com/hlyang1992/pymtp

Install das

You can download the code by

git clone https://github.com/hlyang1992/das
cd das
cp -r <path-to-mlip-2>/untrained_mtps/*.mtp das/utils/untrained_mtps

Then remove the redundant settings from each mtp file. Only the following settings can be retained for each mtp file:

radial_funcs_count = 
alpha_moments_count = 
alpha_index_basic_count = 
alpha_index_basic = 
alpha_index_times_count = 
alpha_index_times = 
alpha_scalar_moments = 
alpha_moment_mapping =

Install das by

cd <path-to-das>
pip install -r requirements.txt
pip install .

Usage

das  config_dir  job_name

Configuration

The configuration directory config_dir must contain the configuration file conf.yaml, which controls all sampling processes. The conf.yaml file should look like the following:

"global_settings":

"machine_settings":

"selector_settings": {} 

"labeler_settings":

"trainer_settings":

"sampler_settings":

"init_conf_setting":

"iter_params_template":

"iter_params":
  • global_settings:
"global_settings":
  # The elements in the system, the order of the elements does not matter, the program automatically numbers the 
  # atomic types according to their atomic number from smallest to largest.
  "unique_elements": [ "Co", "Sb" ]
  # path to VASP Pseudopotential Database, see detail at https://wiki.fysik.dtu.dk/ase/ase/calculators/vasp.html#vasp
  "vasp_pp_path": "path_to_directory" 
  • machine_settings:

All time-consuming computational tasks such as sampling, labeling, and training can be dispatched to designated machines via ssh. Currently only LSF is supported and migration to other job management systems is very easy.

"machine_settings":
  "machine_1":
    # The supported machine types are now: `machine_lsf`, `machine_shell`
    "machine_type": "machine_lsf"
    "host": "ip address"
    "user": "username"
    "password": "password"
    # Exclude these nodes when submitting tasks.
    "bad_nodes": [ ] # #BSUB -R "hname!={{node}}"
    "port": 22
    # number of cores for each task
    "n_cores": 40 # #BSUB -n {{ncores}}
    "n_tasks": 40 # The maximum number of tasks to run simultaneously.
    "q_name": "short" # #BSUB -q {{q_name}}
    "env_source_file": "env.sh" # env.sh is in the config_dir
    "run_dir": "path-to-run-directory-in-target"
    "extra_params":
      "vasp_cmd": "mpiexec.hydra -machinefile $LSB_DJOB_HOSTFILE -np $NP vasp"
      "lmp_cmd": "mpiexec.hydra -machinefile $LSB_DJOB_HOSTFILE -np $NP lmp_mlp"
      "mlip_cmd": "mpiexec.hydra -machinefile $LSB_DJOB_HOSTFILE -np $NP mlp train"
      "python_cmd": "absolute path to python path"
  "machine_2":
    # setting for machchine_2
    "machine_type": "machine_lsf"
    # ...

You should prepare a file to set the environment variables. The program will source this file to set the environment variables after connecting to the machine via ssh. For technical reasons please see: The remote shell environment doesn’t match interactive shells

  • sampler_settings
"scale_1":
  "kind": "scale_box"
  "scale_factors": [0.998, 0.9985, 0.999]
"scale_2":
  "kind": "scale_box"
  "scale_factors": [[0.998, 0.9985, 0.999, 0.997], # a
                    [1.002, 1.003, 1.004, 1.005],  # b
                    [0.997, 0.995, 0.999, 0.996]] # c
"nvt_0": 
  "kind": "lmp_model_sampler"
  "max_number_confs": 5
  "min_number_confs": 0
  "machine": "machine_1"
  "lmp_vars":
    "temp": [ 100, 150 ]
    "steps": [ 10000 ]
    "nevery": [ 20 ]
    "prev_steps": [ 0 ]
 
"npt_0": 
  "kind": "lmp_model_sampler"
  "max_number_confs": 5
  "min_number_confs": 0
  "machine": "machine_2"
  "lmp_vars":
    "temp": [ 100, 150 ]
    "steps": [ 10000 ]
    "nevery": [ 20 ]
    "press": [100, 200] # bar
    "prev_steps": [ 0 ]
  • "labeler_settings"

We use ase to generate input files (INCAR, POTCAR, KPOINTS) for VASP calculation. Please see detail at Ase vasp calculator

"labeler_settings":
  "vasp":
    "kind": "vasp"
    "machine": "ty_label"
    "vasp_parms":
      "xc": "pbe"
      "prec": "A"
      # other setting for vasp calculations
  • "trainer_settings"
"trainer_settings":
  "train_5_model":
    "kind": "mtp_trainer"
    "machine": "ty_train" 
    "model_index": 18 
    "min_dist": 1.39 
    "max_dist": 5.0
    "n_models": 5 
    "train_from_prev_model": true 
  • init_conf_setting:
"init_conf_setting":
  "-1": [ "init_MD.cfg" ]
  "-2": [ "init_1.vasp" ]
  "-3": [ "init_2.vasp" ]
  • iter_params_template:
"iter_params_template":
  "0":
    "init_conf": [ -1 ]
    "sampler": [ ]
    "selector": [ ]
    "labeler": [ ]
    "trainer": [ "train_5_model" ]
  "10":
    "init_conf": [ -2 ]
    "sampler": [ "scale_0", "nvt_0" ]
    "selector": [ ]
    "labeler": [ "vasp" ]
    "trainer": [ "train_5_model" ]
  "20":
    "init_conf": [ -3 ]
    "sampler": [ "npt_0"]
    "selector": [ ]
    "labeler": [ "vasp" ]
    "trainer": [ "train_5_model" ]
  "30":
    "init_conf": [ -2,-3 ]
    "sampler": [ "npt_0"]
    "selector": [ ]
    "labeler": [ "vasp" ]
    "trainer": [ "train_5_model" ]
  • iter_params:
"iter_params":
  [
    [ "0" ],
    # If the last one is LOOP, repeat all the previous ones until convergence.
    ["10", "LOOP"], 
    ["30", "LOOP"],
    ["10", "10"]  
    ["20"],
  ]
DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.

DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective. 10x Larger Models 10x Faster Trainin

Microsoft 8.4k Dec 30, 2022
Transpile trained scikit-learn estimators to C, Java, JavaScript and others.

sklearn-porter Transpile trained scikit-learn estimators to C, Java, JavaScript and others. It's recommended for limited embedded systems and critical

Darius Morawiec 1.2k Jan 05, 2023
Arquivos do curso online sobre a estatística voltada para ciência de dados e aprendizado de máquina.

Estatistica para Ciência de Dados e Machine Learning Arquivos do curso online sobre a estatística voltada para ciência de dados e aprendizado de máqui

Renan Barbosa 1 Jan 10, 2022
Implementations of Machine Learning models, Regularizers, Optimizers and different Cost functions.

Linear Models Implementations of LinearRegression, LassoRegression and RidgeRegression with appropriate Regularizers and Optimizers. Linear Regression

Keivan Ipchi Hagh 1 Nov 22, 2021
Add built-in support for quaternions to numpy

Quaternions in numpy This Python module adds a quaternion dtype to NumPy. The code was originally based on code by Martin Ling (which he wrote with he

Mike Boyle 531 Dec 28, 2022
A collection of interactive machine-learning experiments: 🏋️models training + 🎨models demo

🤖 Interactive Machine Learning experiments: 🏋️models training + 🎨models demo

Oleksii Trekhleb 1.4k Jan 06, 2023
Python Extreme Learning Machine (ELM) is a machine learning technique used for classification/regression tasks.

Python Extreme Learning Machine (ELM) Python Extreme Learning Machine (ELM) is a machine learning technique used for classification/regression tasks.

Augusto Almeida 84 Nov 25, 2022
Python 3.6+ toolbox for submitting jobs to Slurm

Submit it! What is submitit? Submitit is a lightweight tool for submitting Python functions for computation within a Slurm cluster. It basically wraps

Facebook Incubator 768 Jan 03, 2023
Combines MLflow with a database (PostgreSQL) and a reverse proxy (NGINX) into a multi-container Docker application

Combines MLflow with a database (PostgreSQL) and a reverse proxy (NGINX) into a multi-container Docker application (with docker-compose).

Philip May 2 Dec 03, 2021
Simulation of early COVID-19 using SIR model and variants (SEIR ...).

COVID-19-simulation Simulation of early COVID-19 using SIR model and variants (SEIR ...). Made by the Laboratory of Sustainable Life Assessment (GYRO)

José Paulo Pereira das Dores Savioli 1 Nov 17, 2021
K-Means clusternig example with Python and Scikit-learn

Unsupervised-Machine-Learning Flat Clustering K-Means clusternig example with Python and Scikit-learn Flat clustering Clustering algorithms group a se

Emin 1 Dec 13, 2021
ML Kaggle Titanic Problem using LogisticRegrission

-ML-Kaggle-Titanic-Problem-using-LogisticRegrission here you will find the solution for the titanic problem on kaggle with comments and step by step c

Mahmoud Nasser Abdulhamed 3 Oct 23, 2022
This repo includes some graph-based CTR prediction models and other representative baselines.

Graph-based CTR prediction This is a repository designed for graph-based CTR prediction methods, it includes our graph-based CTR prediction methods: F

Big Data and Multi-modal Computing Group, CRIPAC 47 Dec 30, 2022
Send rockets to Mars with artificial intelligence(Genetic algorithm) in python.

Send Rockets To Mars With AI Send rockets to Mars with artificial intelligence(Genetic algorithm) in python. Tools Python 3 EasyDraw How to Play Insta

Mohammad Dori 3 Jul 15, 2022
ETNA – time series forecasting framework

ETNA Time Series Library Predict your time series the easiest way Homepage | Documentation | Tutorials | Contribution Guide | Release Notes ETNA is an

Tinkoff.AI 675 Jan 08, 2023
Mesh TensorFlow: Model Parallelism Made Easier

Mesh TensorFlow - Model Parallelism Made Easier Introduction Mesh TensorFlow (mtf) is a language for distributed deep learning, capable of specifying

1.3k Dec 26, 2022
Can a machine learning project be implemented to estimate the salaries of baseball players whose salary information and career statistics for 1986 are shared?

END TO END MACHINE LEARNING PROJECT ON HITTERS DATASET Can a machine learning project be implemented to estimate the salaries of baseball players whos

Pinar Oner 7 Dec 18, 2021
BASTA: The BAyesian STellar Algorithm

BASTA: BAyesian STellar Algorithm Current stable version: v1.0 Important note: BASTA is developed for Python 3.8, but Python 3.7 should work as well.

BASTA team 16 Nov 15, 2022
dirty_cat is a Python module for machine-learning on dirty categorical variables.

dirty_cat dirty_cat is a Python module for machine-learning on dirty categorical variables.

637 Dec 29, 2022
Steganography is the art of hiding the fact that communication is taking place, by hiding information in other information.

Steganography is the art of hiding the fact that communication is taking place, by hiding information in other information.

Priyansh Sharma 7 Nov 09, 2022