Fitting thermodynamic models with pycalphad

Overview

ESPEI

ESPEI, or Extensible Self-optimizing Phase Equilibria Infrastructure, is a tool for thermodynamic database development within the CALPHAD method. It uses pycalphad for calculating Gibbs free energies of thermodynamic models.

Read the documentation at espei.org.

Installation Anaconda (recommended)

ESPEI does not require any special compiler, but several dependencies do. Therefore it is suggested to install ESPEI from conda-forge.

conda install -c conda-forge espei

What is ESPEI?

  1. ESPEI parameterizes CALPHAD models with enthalpy, entropy, and heat capacity data using the corrected Akiake Information Criterion (AICc). This parameter generation step augments the CALPHAD modeler by providing tools for data-driven model selection, rather than relying on a modeler's intuition alone.
  2. ESPEI optimizes CALPHAD model parameters to thermochemical and phase boundary data and quantifies the uncertainty of the model parameters using Markov Chain Monte Carlo (MCMC). This is similar to the PARROT module of Thermo-Calc, but goes beyond by adjusting all parameters simultaneously and evaluating parameter uncertainty.

Details on the implementation of ESPEI can be found in the publication: B. Bocklund et al., MRS Communications 9(2) (2019) 1–10. doi:10.1557/mrc.2019.59.

What ESPEI can do?

ESPEI can be used to generate model parameters for CALPHAD models of the Gibbs energy that follow the temperature-dependent polynomial by Dinsdale (CALPHAD 15(4) 1991 317-425) within the compound energy formalism (CEF) for endmembers and Redlich-Kister-Mugganu excess mixing parameters in unary, binary and ternary systems.

All thermodynamic quantities are computed by pycalphad. The MCMC-based Bayesian parameter estimation can optimize data for any model that is supported by pycalphad, including models beyond the endmember Gibbs energies Redlich-Kister-Mugganiu excess terms, such as parameters in the ionic liquid model, magnetic, or two-state models. Performing Bayesian parameter estimation for arbitrary multicomponent thermodynamic data is supported.

Goals

  1. Offer a free and open-source tool for users to develop multicomponent databases with quantified uncertainty
  2. Enable development of CALPHAD-type models for Gibbs energy, thermodynamic or kinetic properties
  3. Provide a platform to build and apply novel model selection, optimization, and uncertainty quantification methods

The implementation for ESPEI involves first performing parameter generation by calculating parameters in thermodynamic models that are linearly described by non-equilibrium thermochemical data. Then Markov Chain Monte Carlo (MCMC) is used to optimize the candidate models from the parameter generation to phase boundary data.

Cu-Mg phase diagram

Cu-Mg phase diagram from a database created with and optimized by ESPEI. See the Cu-Mg Example.

History

The ESPEI package is based on a fork of pycalphad-fitting. The name and idea of ESPEI are originally based off of Shang, Wang, and Liu, ESPEI: Extensible, Self-optimizing Phase Equilibrium Infrastructure for Magnesium Alloys Magnes. Technol. 2010 617-622 (2010).

Implementation details for ESPEI have been described in the following publications:

Getting Help

For help on installing and using ESPEI, please join the PhasesResearchLab/ESPEI Gitter room.

Bugs and software issues should be reported on GitHub.

License

ESPEI is MIT licensed.

The MIT License (MIT)

Copyright (c) 2015-2018 Richard Otis
Copyright (c) 2017-2018 Brandon Bocklund
Copyright (c) 2018-2019 Materials Genome Foundation

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

Citing ESPEI

If you use ESPEI for work presented in a publication, we ask that you cite the following publication:

  1. Bocklund, R. Otis, A. Egorov, A. Obaied, I. Roslyakova, Z.-K. Liu, ESPEI for efficient thermodynamic database development, modification, and uncertainty quantification: application to Cu–Mg, MRS Commun. (2019) 1–10. doi:10.1557/mrc.2019.59.
@article{Bocklund2019ESPEI,
         archivePrefix = {arXiv},
         arxivId = {1902.01269},
         author = {Bocklund, Brandon and Otis, Richard and Egorov, Aleksei and Obaied, Abdulmonem and Roslyakova, Irina and Liu, Zi-Kui},
         doi = {10.1557/mrc.2019.59},
         eprint = {1902.01269},
         issn = {2159-6859},
         journal = {MRS Communications},
         month = {jun},
         pages = {1--10},
         title = {{ESPEI for efficient thermodynamic database development, modification, and uncertainty quantification: application to Cu–Mg}},
         year = {2019}
}
Comments
  • Compute metastable/unstable single phase driving forces in ZPF error

    Compute metastable/unstable single phase driving forces in ZPF error

    Thanks to Tobias Spitaler for suggesting this and to @richardotis for brainstorming this solution concept.

    This PR introduces two new functions in ZPF error, _solve_sitefracs_composition and _sample_solution_constitution. Their purpose is to facilitate computing metastable or unstable single phase driving forces when a phase has a miscibility gap. This should improve the convergence for any phase that has a stable or metastable miscibility gap.

    Rationale

    ESPEI currently computes the "single-phase hyperplane" at a vertex by performing an equilibrium calculate at a black point and then subtracting that from the target hyperplane energy at that composition. As illustrated in the figure Tobias constructed (below), this is problematic for phases with a miscibility gap because a "single-phase" equilibrium calculation in pycalphad will always compute the global minimum energy and give two composition sets.

    driving-force-Spitaler

    What ESPEI should do is what Tobias illustrates by the orange x and the green driving force line. This solution ensures that minimizing the driving force will force the Gibbs energy curve to match the energy of the black points on the multi-phase target hyperplane.

    Historically, we didn't implement this because one would like to use equilibrium to minimize the internal degeres of freedom, but pycalphad always computes the global minimum energy, so it was not possible to do via equilibrium. More recently, ESPEI had introduced the idea of approximate_equilibrium, which uses starting_point to more quickly determine a minimum energy solution from a discrete point smapling grid. The approximate_equilibrium method we use still has the same problem as pycalphad's equilibrium because starting_point will still give the global minimum solution for the discrete sampling.

    Solution

    In an ideal world, pycalphad should be able to turn off global minimization (automatically introducing new composition sets) and enable a condition to be set for the composition of a phase, i.e. X(BCC,B). In practice, being able to turn off global minimum and provide a valid starting point for only one composition set that has a global composition condition would simulate a phase composition condition. Unfortunately, neither turning off global minimization nor phase composition conditions are currently implemented. So we need to do a workaround.

    The two functions introduced here consider each single phase composition at a tie-vertex and construct a point grid that only contains points which satisfy the prescribed overall composition (and the internal phase constraints). This can be used in either approximate or exact equilibrium modes to find lowest energy starting point and then to pass that equilibrium with the constrained point grid so the global minimization step has no new composition sets to introduce (i.e. it cannot detect a miscibility gap).

    For perfomance, we pre-compute the grid of points for every phase composition in the ZPF datasets and re-use them to compute the grid, starting point and equilibrium at every parameter iteration (note that this would be invalid if a parameter changes the number of moles, like varying coordination number in the MQMQA).

    To summarize the impact:

    1. This method will be entirely backwards compatible for phases without a miscibility gap.
    2. For cases where a miscibility gap is present in the parameters, but a single phase is prescribed, there will be a driving force to eliminate the miscibility gap, so the single phase compositions are more meaningful too. This is significant because you can prescribe single phase regions in ZPF datasets and it will enforce that no miscibility gap occurs, which is not true today.
    3. For phase compositions inside a miscibilty gap, the Gibbs energy curve will match the multi-phase global minimum hyperplane at the phase compositions (at convergence).
    opened by bocklund 20
  • ERROR occurred using the new development version

    ERROR occurred using the new development version

    Dear Administrator, There were some tests that failed when I try to run pytest after install the new development version(2021/4/21, Beijing time). Meanwhile, there is some error occurred when I run some example cases that successfully run using other versions before. errorlog.txt pytestfail.txt condalist.txt

    opened by duxiaoxian 12
  • Error releasing un-acquired lock in dask

    Error releasing un-acquired lock in dask

    Was distributed (1.18.0) when this error occurred. Changed to distributed (1.16.3).

      File "/Applications/anaconda/envs/my_pycalphad/bin/espei", line 11, in <module>
        sys.exit(main())
      File "/Applications/anaconda/envs/my_pycalphad/lib/python3.6/site-packages/espei/run_espei.py", line 135, in main
        mcmc_steps=args.mcmc_steps, save_interval=args.save_interval)
      File "/Applications/anaconda/envs/my_pycalphad/lib/python3.6/site-packages/espei/paramselect.py", line 754, in fit
        for i, result in enumerate(sampler.sample(walkers, iterations=mcmc_steps)):
      File "/Applications/anaconda/envs/my_pycalphad/lib/python3.6/site-packages/emcee/ensemble.py", line 259, in sample
        lnprob[S0])
      File "/Applications/anaconda/envs/my_pycalphad/lib/python3.6/site-packages/emcee/ensemble.py", line 332, in _propose_stretch
        newlnprob, blob = self._get_lnprob(q)
      File "/Applications/anaconda/envs/my_pycalphad/lib/python3.6/site-packages/emcee/ensemble.py", line 382, in _get_lnprob
        results = list(M(self.lnprobfn, [p[i] for i in range(len(p))]))
      File "/Applications/anaconda/envs/my_pycalphad/lib/python3.6/site-packages/espei/utils.py", line 39, in map
        result = [x.result() for x in result]
      File "/Applications/anaconda/envs/my_pycalphad/lib/python3.6/site-packages/espei/utils.py", line 39, in <listcomp>
        result = [x.result() for x in result]
      File "/Applications/anaconda/envs/my_pycalphad/lib/python3.6/site-packages/distributed/client.py", line 155, in result
        six.reraise(*result)
      File "/Applications/anaconda/envs/my_pycalphad/lib/python3.6/site-packages/six.py", line 685, in reraise
        raise value.with_traceback(tb)
      File "/Applications/anaconda/envs/my_pycalphad/lib/python3.6/site-packages/distributed/protocol/pickle.py", line 59, in loads
        return pickle.loads(x)
    RuntimeError: cannot release un-acquired lock```
    bug 
    opened by ghost 10
  • dask workers can sometimes die without warning

    dask workers can sometimes die without warning

    I haven't been able to reproduce it consistently, but dark workers sometimes die with the dask scheduler.

    To debug this, I turned on debugging output by scheduler = LocalCluster(n_workers=cores, threads_per_worker=1, processes=True, silence_logs=verbosity[output_settings['verbosity']]).

    I am still waiting for that job to have workers die to see the output, but for now as iterations in emcee complete the results are processed in Python (it is known that this is happening because of the progress bar output). During this time, the LocalCluster debugging gives output

    distributed.core - WARNING - Event loop was unresponsive for 1.69s.  This is often caused by long-running GIL-holding functions or moving large chunks of data. This can cause timeouts and instability.
    

    Usually I get two similar messages in a row.

    As another possibility, the most recent time I was able to reproduce this was when I had two instances of ESPEI running at the same time. I wouldn't think that the different client instances would interact, but maybe it should be investigated.

    opened by bocklund 6
  • Issues reproducing Cu-Mg example

    Issues reproducing Cu-Mg example

    I had several issues running the Cu-Mg example from the ESPEI website. I installed ESPEI using the conda command, and took the Cu-Mg data directory from the ESPEI-datasets repository.

    I first tried reproducing the diagram from the section titled, First-principles phase diagram The code successfully ran, but the returned phase diagram didn't match the example well: diagram_dft

    I then tried reproducing the results in the MCMC optimization section. I wasn't able to successfully perform the MCMC optimization. The code returned numerous errors over the course of several minutes and eventually hung with no further output.

    This file contains the full python output when I ran the optimization: espei_mcmc_error.txt

    Here is my python version and installed packages/versions: python_info.txt

    opened by npaulson 6
  • The latest version of espei = 0.7.2 get an error when plot

    The latest version of espei = 0.7.2 get an error when plot

    I have recently used the latest version of espei = 0.7.2 and I always get an error, but I used espei = 0.6 and it works fine. image

    My current computer can't use espei = 0.6 again, so I don't know which version to use, I don't know what went wrong. I always get MPI errors when I use espei = 0.6 image

    AG_CU_1214.zip

    opened by duxiaoxian 5
  • Run ESPEI via input files, rather than command line arguments

    Run ESPEI via input files, rather than command line arguments

    A first draft and feedback was written in this gist

    The current iteration is:

    Header area.
    Include any metadata above the `---`.
    ---
    # core run settings
    run_type: full # choose full | dft | mcmc
    phase_models: input.json
    datasets: input-datasets # path to datasets. Defaults to current directory.
    scheduler: dask # can be dask | MPIPool
    
    # control output
    verbosity: 0 # integer verbosity level 0 | 1 | 2, where 2 is most verbose.
    output_tdb: out.tdb
    tracefile: chain.npy # name of the file containing the mcmc chain array
    probfile: lnprob.npy # name of the file containing the mcmc ln probability array
    
    # the following only take effect for full or mcmc runs
    mcmc:
      mcmc_steps: 2000
      mcmc_save_interval: 100
    
      # the following take effect for only mcmc runs
      input_tdb: null # TDB file used to start the mcmc run
      restart_chain: null # restart the mcmc fitting from a previous calculation
    

    This issue will focus on the development of a first generation input file structure and spec, and also as a place to brainstorm options that should be user-facing.

    opened by bocklund 5
  • Limit the degrees of freedom for non-active phases in MCMC to prevent them from diverging?

    Limit the degrees of freedom for non-active phases in MCMC to prevent them from diverging?

    Phases that do not have phase equilibria data should have their parameters fixed before the MCMC run.

    A particular phase in an ESPEI run can have single phase DFT data and no phase equilibria. This means that the parameters that were calculated in the single phase fitting have no effect on the error function that is used in the MCMC run.

    When parameters have no effect on the error function, they diverge when used in emcee because the ensemble sampler scales them up to infinity in an attempt to force that parameter to affect the error function.

    bug enhancement 
    opened by bocklund 5
  • Error when running Cu-Mg example

    Error when running Cu-Mg example

    Hello, I am trying to run ESPEI for the first time.

    I created a conda env and installed ESPEI using conda. I downloaded json and yaml files as well as the contents of the Cu-Mg folder in ESPEI-datasets, renamed it to input-data. After running espei --input espei-in.yaml, I get the errors below. Could you please let me know if I am doing anything wrong?

    Thanks!

    Traceback (most recent call last):
      File "/Users/latmarat/miniforge3/envs/espenv/bin/espei", line 10, in <module>
        sys.exit(main())
      File "/Users/latmarat/miniforge3/envs/espenv/lib/python3.10/site-packages/espei/espei_script.py", line 307, in main
        run_espei(input_settings)
      File "/Users/latmarat/miniforge3/envs/espenv/lib/python3.10/site-packages/espei/espei_script.py", line 177, in run_espei
        dbf = generate_parameters(phase_models, datasets, refdata, excess_model,
      File "/Users/latmarat/miniforge3/envs/espenv/lib/python3.10/site-packages/espei/paramselect.py", line 517, in generate_parameters
        aliases = extract_aliases(phase_models)
      File "/Users/latmarat/miniforge3/envs/espenv/lib/python3.10/site-packages/espei/utils.py", line 370, in extract_aliases
        aliases = {phase_name: phase_name for phase_name in phase_models["phases"].keys()}
    AttributeError: 'list' object has no attribute 'keys'
    
    opened by latmarat 4
  • AttributeError: 'NoneType' object has no attribute 'values'

    AttributeError: 'NoneType' object has no attribute 'values'

    Dear Administrator, An 'AttributeError' occurred when I run 'espei --input espei-in-2.yaml' using the latest development version of ESPEI. Would you mind help me to check my dataset? Thanks. errorprint-log.txt verbosity-log.txt CO-CU-20201104.zip

    f:\users\zhang\pycalphad\pycalphad\codegen\callables.py:97: UserWarning: State variables in build_callables are not {N, P, T}, but {T, P}. This can lead to incorrectly calculated values if the state variables used to call the generated functions do not match the state variables used to create them. State variables can be added with the additional_statevars argument. "additional_statevars argument.".format(state_variables)) Traceback (most recent call last): File "F:\Users\zhang\Anaconda32020\envs\espei2020test\Scripts\espei-script.py", line 33, in sys.exit(load_entry_point('espei', 'console_scripts', 'espei')()) File "f:\users\zhang\espei\espei\espei_script.py", line 311, in main run_espei(input_settings) File "f:\users\zhang\espei\espei\espei_script.py", line 260, in run_espei approximate_equilibrium=approximate_equilibrium, File "f:\users\zhang\espei\espei\optimizers\opt_base.py", line 36, in fit node = self.fit(symbols, datasets, *args, **kwargs) File "f:\users\zhang\espei\espei\optimizers\opt_mcmc.py", line 238, in fit self.predict(initial_guess, **ctx) File "f:\users\zhang\espei\espei\optimizers\opt_mcmc.py", line 289, in predict multi_phase_error = calculate_zpf_error(parameters=np.array(params), **zpf_kwargs) File "f:\users\zhang\espei\espei\error_functions\zpf_error.py", line 315, in calculate_zpf_error target_hyperplane = estimate_hyperplane(phase_region, parameters, approximate_equilibrium=approximate_equilibrium) File "f:\users\zhang\espei\espei\error_functions\zpf_error.py", line 186, in estimate_hyperplane grid = calculate(dbf, species, phases, str_statevar_dict, models, phase_records, pdens=500, fake_points=True) File "f:\users\zhang\espei\espei\shadow_functions.py", line 55, in calculate largest_energy=float(1e10), fake_points=fp) File "f:\users\zhang\pycalphad\pycalphad\core\calculate.py", line 190, in _compute_phase_values param_symbols, parameter_array = extract_parameters(parameters) File "f:\users\zhang\pycalphad\pycalphad\core\utils.py", line 361, in extract_parameters parameter_array_lengths = set(np.atleast_1d(val).size for val in parameters.values()) AttributeError: 'NoneType' object has no attribute 'values'

    opened by duxiaoxian 4
  • Migrate pycalphad refdata to ESPEI

    Migrate pycalphad refdata to ESPEI

    Tracking from https://github.com/pycalphad/pycalphad/issues/120

    Assume that SGTE91Stable is correct per https://github.com/pycalphad/pycalphad/issues/120. Then we must

    • [x] Remove the metastable phases not present in the SGTE91 original paper
    • [ ] Check that remaining phases have correct descriptions
    opened by bocklund 4
  • MCMC Initialized chains should include initial point

    MCMC Initialized chains should include initial point

    During the initialization of the chains for the MCMC optimizer, a Gaussian distribution about an initial point is taken. https://github.com/PhasesResearchLab/ESPEI/blob/7c797191d4c3178fe4a22275bbaee9c2977786ad/espei/optimizers/opt_mcmc.py#L98

    I would suggest including the initial point in that set of initial chains. If everything is set up correctly, this won't matter, but for cases where the standard deviation is too high while the initial guess is quite good, the current behavior will lead to a lot of bad starting points. Modifying the initial set to include the initial guess point should ensure that at least this state (or acceptable permutations of it) will survive the MCMC run. What do you think?

    opened by toastedcrumpets 0
  • formatted_parameter broken by SymEngine

    formatted_parameter broken by SymEngine

    Switching the symbolic backend to SymEngine broke espei.utils.formatted_parameter. Here's a test to validate (run from the tests directory for the testing_data module to be importable).

    # espei/tests/test_utils.py
    
    from pycalphad import Database
    from espei.utils import formatted_parameter, database_symbols_to_fit
    from .testing_data import CU_MG_TDB
    def test_cu_mg_parameters_can_be_formatted_to_strings():
        """Formating parameters should work for common variables parameters"""
        dbf = Database(CU_MG_TDB)
        for sym in database_symbols_to_fit(dbf):
            assert isinstance(formatted_parameter(dbf, sym), str), f"Formatted parameter for symbol {sym} (value = {dbf.symbols[sym]}) in database not a string"
    

    Running this gives an error:

    Traceback (most recent call last):
      File "/Users/bocklund1/src/calphad/espei/tests/dummy.py", line 11, in <module>
        test_cu_mg_parameters_can_be_formatted_to_strings()
      File "/Users/bocklund1/src/calphad/espei/tests/dummy.py", line 9, in test_cu_mg_parameters_can_be_formatted_to_strings
        assert isinstance(formatted_parameter(dbf, sym), str), f"Formatted parameter for symbol {sym} (value = {dbf.symbols[sym]}) in database not a string"
      File "/Users/bocklund1/src/calphad/espei/espei/utils.py", line 295, in formatted_parameter
        term = parameter_term(result['parameter'], symbol)
      File "/Users/bocklund1/src/calphad/espei/espei/utils.py", line 218, in parameter_term
        coeff, root = term_coeff.as_coeff_mul(symbol)
    AttributeError: 'symengine.lib.symengine_wrapper.Symbol' object has no attribute 'as_coeff_mul'
    

    I think the breakage might be because espei.utils.parameter_term isn't correctly picking up the first condition, since for the case of symbol being a symengine.lib.symengine_wrapper.Symbol, I think expression == symbol should evaluate to true, but evidently (via the traceback) it is evaluating to false.

    opened by bocklund 0
  • Memory leak when running MCMC in parallel

    Memory leak when running MCMC in parallel

    Due to a known memory leak when instantiating subclasses of SymEngine (one of our upstream dependencies) Symbol objects (see https://github.com/symengine/symengine.py/issues/379), running ESPEI with parallelization will cause memory to grow in each worker.

    Only running in parallel will trigger significant memory growth, because running in parallel uses the pickle library to serialize and deserialize symbol objects and create new objects that can't be freed. When running without parallelization (mcmc.scheduler: null), new symbols are not created.

    Until https://github.com/symengine/symengine.py/issues/379 is fixed, some mitigation strategies to avoid running out of memory are:

    • Run ESPEI without parallelization by setting scheduler: null
    • (Under consideration to implement): when parallelization is active, use an option to restart the workers every N iterations.
    • (Under consideration to implement): remove Model objects from the keyword arguments of ESPEI's likelihood functions. Model objects contribute a lot of symbol instances in the form of v.SiteFraction objects. We should be able to get away with only using PhaseRecord objects, but there are a few places Model.constituents to be able to infer the sublattice model and internal degrees of freedom that would need to be rewritten.
    opened by bocklund 1
  • Unable to use activity data in binary Fe-C with Graphite as reference state

    Unable to use activity data in binary Fe-C with Graphite as reference state

    Hi,

    We are currently trying to use activity data for Fe-C. Lobo1976 measured the activity of C in alpha-iron relative to Graphite as the standard state, but get erroneous results. (Lobo, Joseph A., and Gordon H. Geiger. "Thermodynamics and solubility of carbon in ferrite and ferritic Fe-Mo alloys." Metallurgical Transactions A 7.8 (1976): 1347-1357.)

    I have added the input file below. With this input file, we get chemical potential difference: [nan] (verbosity 3 output). Is the input file correct or are we missing something? I have had a look at the value of ref_result within the activity_error.py and this does give only nan results for the specified reference state. Graphite only has C as a component. An equilibrium calculation of Graphite specifying x.V('C') gives an error as Number of dependent components different from one. Can this cause an error here as well? Used versions: espei: 0.8.6 and pycalphad 0.9.2. I have added a zip-file with the TDB file and espei input files which reproduces this behaviour.

    Thank you for your help, Tobias

    {
            "components": ["FE", "C", "VA"],
            "phases": ["BCC_A2", "GRAPHITE"],
            "weight": 1000,
            "reference_state": {
                    "phases": ["GRAPHITE"],
                    "conditions": {
                            "P": 101325,
                            "T": 1056.15,
                            "X_C": 1
    
                    }
            },
            "conditions": {
                    "P": 101325,
                    "T": 1056.15,
                    "X_C": [0.00013017]
            },
            "output": "ACR_C",
            "values": [[[0.087]]
                    ],
            "reference": "Lobo1976_1056K",
            "meta_data": {
                    "DOI": "10.1007/BF02658820",
                    "literature reference": "Thermodynamics and Solubility of Carbon in Ferrite and Ferritic Fe-Mo Alloys",
                    "table/figure": "table 1",
                    "measured data": "C-activity in Alpha-Iron",
                    "experimental details": "not available",
                    "weight": "default"
            }
    }
    

    minimal_example.zip

    opened by tobiasspt 1
  • ENH: Allow multiple datasets directories to be specified in YAML input

    ENH: Allow multiple datasets directories to be specified in YAML input

    Sometimes it is useful to load datasets from different filesystem locations, for example if one folder contains hand-curated data and another contains automatically generated data.

    In code, it would be pretty simple to handle this. Instead of

    from espei.datasets import load_datasets, recursive_glob
    directory = '/path/to/directory/'
    load_datasets(recursive_glob(directory))
    

    we could do

    from itertools import chain
    from espei.datasets import load_datasets, recursive_glob
    directories = ['/path/to/directory_1/', '/path/to/directory_2/']
    load_datasets(chain(*map(recursive_glob, directories)))
    
    opened by bocklund 1
Releases(0.8.9)
Owner
Phases Research Lab
Research group lead by Dr. Zi-Kui Liu at The Pennsylvania State University.
Phases Research Lab
Intercepting proxy + analysis toolkit for Second Life compatible virtual worlds

Hippolyzer Hippolyzer is a revival of Linden Lab's PyOGP library targeting modern Python 3, with a focus on debugging issues in Second Life-compatible

Salad Dais 6 Sep 01, 2022
For making Tagtog annotation into csv dataset

tagtog_relation_extraction for making Tagtog annotation into csv dataset How to Use On Tagtog 1. Go to Project Downloads 2. Download all documents,

hyeong 4 Dec 28, 2021
Stitch together Nanopore tiled amplicon data without polishing a reference

Stitch together Nanopore tiled amplicon data using a reference guided approach Tiled amplicon data, like those produced from primers designed with pri

Amanda Warr 14 Aug 30, 2022
Pizza Orders Data Pipeline Usecase Solved by SQL, Sqoop, HDFS, Hive, Airflow.

PizzaOrders_DataPipeline There is a Tony who is owning a New Pizza shop. He knew that pizza alone was not going to help him get seed funding to expand

Melwin Varghese P 4 Jun 05, 2022
Evidence enables analysts to deliver a polished business intelligence system using SQL and markdown.

Evidence enables analysts to deliver a polished business intelligence system using SQL and markdown

915 Dec 26, 2022
Python ELT Studio, an application for building ELT (and ETL) data flows.

The Python Extract, Load, Transform Studio is an application for performing ELT (and ETL) tasks. Under the hood the application consists of a two parts.

Schlerp 55 Nov 18, 2022
Statistical Rethinking: A Bayesian Course Using CmdStanPy and Plotnine

Statistical Rethinking: A Bayesian Course Using CmdStanPy and Plotnine Intro This repo contains the python/stan version of the Statistical Rethinking

Andrés Suárez 3 Nov 08, 2022
PATC: Introduction to Big Data Analytics. Practical Data Analytics for Solving Real World Problems

PATC: Introduction to Big Data Analytics. Practical Data Analytics for Solving Real World Problems

1 Feb 07, 2022
pandas: powerful Python data analysis toolkit

pandas is a Python package that provides fast, flexible, and expressive data structures designed to make working with "relational" or "labeled" data both easy and intuitive.

pandas 36.4k Jan 03, 2023
Python package for analyzing sensor-collected human motion data

Python package for analyzing sensor-collected human motion data

Simon Ho 71 Nov 05, 2022
Tablexplore is an application for data analysis and plotting built in Python using the PySide2/Qt toolkit.

Tablexplore is an application for data analysis and plotting built in Python using the PySide2/Qt toolkit.

Damien Farrell 81 Dec 26, 2022
Generate lookml for views from dbt models

dbt2looker Use dbt2looker to generate Looker view files automatically from dbt models. Features Column descriptions synced to looker Dimension for eac

lightdash 126 Dec 28, 2022
A Python adaption of Augur to prioritize cell types in perturbation analysis.

A Python adaption of Augur to prioritize cell types in perturbation analysis.

Theis Lab 2 Mar 29, 2022
Orchest is a browser based IDE for Data Science.

Orchest is a browser based IDE for Data Science. It integrates your favorite Data Science tools out of the box, so you don’t have to. The application is easy to use and can run on your laptop as well

Orchest 3.6k Jan 09, 2023
Full ELT process on GCP environment.

Rent Houses Germany - GCP Pipeline Project: The goal of the project is to extract data about house rentals in Germany, store, process and analyze it u

Felipe Demenech Vasconcelos 2 Jan 20, 2022
Flood modeling by 2D shallow water equation

hydraulicmodel Flood modeling by 2D shallow water equation. Refer to Hunter et al (2005), Bates et al. (2010). Diffusive wave approximation Local iner

6 Nov 30, 2022
Data analysis and visualisation projects from a range of individual projects and applications

Python-Data-Analysis-and-Visualisation-Projects Data analysis and visualisation projects from a range of individual projects and applications. Python

Tom Ritman-Meer 1 Jan 25, 2022
A tool to compare differences between dataframes and create a differences report in Excel

similarpanda A module to check for differences between pandas Dataframes, and generate a report in Excel format. This is helpful in a workplace settin

Andre Pretorius 9 Sep 15, 2022
Using Data Science with Machine Learning techniques (ETL pipeline and ML pipeline) to classify received messages after disasters.

Using Data Science with Machine Learning techniques (ETL pipeline and ML pipeline) to classify received messages after disasters.

1 Feb 11, 2022
Titanic data analysis for python

Titanic-data-analysis This Repo is an analysis on Titanic_mod.csv This csv file contains some assumed data of the Titanic ship after sinking This full

Hardik Bhanot 1 Dec 26, 2021