Maximum Covariance Analysis in Python

Last update: Jan 03, 2023

Overview

xMCA | Maximum Covariance Analysis in Python

The aim of this package is to provide a flexible tool for the climate science community to perform Maximum Covariance Analysis (MCA) in a simple and consistent way. Given the huge popularity of xarray in the climate science community, xmca supports xarray.DataArray as well as numpy.ndarray as input formats.

_{Mode 2 of complex rotated Maximum Covariance Analysis showing the shared dynamics of SST and continental precipitation associated to ENSO between 1980 and 2020.}

🔰 What is MCA?

MCA maximises the temporal covariance between two different data fields and is closely related to Principal Component Analysis (PCA) / Empirical Orthogonal Function analysis (EOF analysis). While EOF analysis maximises the variance within a single data field, MCA allows to extract the dominant co-varying patterns between two different data fields. When the two input fields are the same, MCA reduces to standard EOF analysis.

For the mathematical understanding please have a look at e.g. Bretherton et al. or the lecture material written by C. Bretherton.

⭐ New in release 1.4.x

Much faster and more memory-efficient algorithm
Significance testing of individual modes via
- Rule N (Overland & Preisendorfer 1982)
- Bootstrapping/permutation schemes + block-wise approach for autocorrelated data
- Iterative permutation (Winkler et al. 2020)
Period parameter of solve method provides more flexibility to exponential extension, making complex MCA more stable
Fixed missing coslat weighting when saving a model (Issue 25)

📌 Core Features

	Standard	Rotated	Complex	Complex Rotated
EOF analysis	✔️	✔️	✔️	✔️
MCA	✔️	✔️	✔️	✔️

* click on check marks for reference
** Complex rotated MCA is also available as a pre-print on arXiv.

🔧 Installation

Installation is simply done via

pip install xmca

If you have problems during the installation please consult the documentation or raise an issue here on Github.

📰 Documentation

A tutorial to get you started as well as the full API can be found in the documentation.

⚡ Quickstart

Import the package

    from xmca.array import MCA  # use with np.ndarray
    from xmca.xarray import xMCA  # use with xr.DataArray

As an example, we take North American surface temperatures shipped with xarray. Note: only works with xr.DataArray, not xr.Dataset.

    import xarray as xr  # only needed to obtain test data

    # split data arbitrarily into west and east coast
    data = xr.tutorial.open_dataset('air_temperature').air
    west = data.sel(lon=slice(200, 260))
    east = data.sel(lon=slice(260, 360))

PCA / EOF analysis

Construct a model with only one field and solve it to perform standard PCA / EOF analysis.

    pca = xMCA(west)                        # PCA of west coast
    pca.solve(complexify=False)            # True for complex PCA

    svals = pca.singular_values()     # singular vales = eigenvalues for PCA
    expvar      = pca.explained_variance()  # explained variance
    pcs         = pca.pcs()                 # Principal component scores (PCs)
    eofs        = pca.eofs()                # spatial patterns (EOFs)

Obtaining a Varimax/Promax-rotated solution can be achieved by rotating the model choosing the number of EOFs to be rotated (n_rot) as well as the Promax parameter (power). Here, power=1 equals a Varimax-rotated solution.

    pca.rotate(n_rot=10, power=1)

    expvar_rot  = pca.explained_variance()  # explained variance
    pcs_rot     = pca.pcs()                 # Principal component scores (PCs)
    eofs_rot    = pca.eofs()                # spatial patterns (EOFs)

MCA

Same as for PCA / EOF analysis, but with two input fields instead of one.

    mca = xMCA(west, east)                  # MCA of field A and B
    mca.solve(complexify=False)            # True for complex MCA

    eigenvalues = mca.singular_values()     # singular vales
    pcs = mca.pcs()                         # expansion coefficient (PCs)
    eofs = mca.eofs()                       # spatial patterns (EOFs)

Significance analysis

A simple way of estimating the significance of the obtained modes is by running Monte Carlo simulations based on uncorrelated Gaussian white noise known as Rule N (Overland and Preisendorfer 1982). Here we create 200 of such synthetic data sets and compare the synthetic with the real singular spectrum to assess significance.

    surr = mca.rule_n(200)
    median = surr.median('run')
    q99 = surr.quantile(.99, dim='run')
    q01 = surr.quantile(.01, dim='run')

    cutoff = np.sum((svals - q99 > 0)).values  # first 8 modes significant

    fig = plt.figure(figsize=(10, 4))
    ax = fig.add_subplot(111)
    svals.plot(ax=ax, yscale='log', label='true')
    median.plot(ax=ax, yscale='log', color='.5', label='rule N')
    q99.plot(ax=ax, yscale='log', color='.5', ls=':')
    q01.plot(ax=ax, yscale='log', color='.5', ls=':')
    ax.axvline(cutoff + 0.5, ls=':')
    ax.set_xlim(-2, 200)
    ax.set_ylim(1e-1, 2.5e4)
    ax.set_title('Significance based on Rule N')
    ax.legend()

The first 8 modes are significant according to rule N using 200 synthetic runs.

Saving/loading an analysis

    mca.save_analysis('my_analysis')    # this will save the data and a respective
                                        # info file. The files will be stored in a
                                        # special directory
    mca2 = xMCA()                       # create a new, empty instance
    mca2.load_analysis('my_analysis/info.xmca') # analysis can be
                                        # loaded via specifying the path to the
                                        # info file created earlier

Quickly inspect your results visually

The package provides a method to plot individual modes.

    mca2.set_field_names('West', 'East')
    pkwargs = {'orientation' : 'vertical'}
    mca2.plot(mode=1, **pkwargs)

Result of default plot method after performing MCA on T2m of North American west and east coast showing mode 1.

You may want to modify the plot for some better optics:

    from cartopy.crs import EqualEarth  # for different map projections

    # map projections for "left" and "right" field
    projections = {
        'left': EqualEarth(),
        'right': EqualEarth()
    }

    pkwargs = {
        "figsize"     : (8, 5),
        "orientation" : 'vertical',
        'cmap_eof'    : 'BrBG',  # colormap amplitude
        "projection"  : projections,
    }
    mca2.plot(mode=3, **pkwargs)

You can save the plot to your local disk as a .png file via

    skwargs={'dpi':200}
    mca2.save_plot(mode=3, plot_kwargs=pkwargs, save_kwargs=skwargs)

🔖 Please cite

I am just starting my career as a scientist. Feedback on my scientific work is therefore important to me in order to assess which of my work advances the scientific community. As such, if you use the package for your own research and find it helpful, I would appreciate feedback here on Github, via email, or as a citation:

Niclas Rieger, 2021: nicrie/xmca: version x.y.z. doi:10.5281/zenodo.4749830.

💪 Credits

Kudos to the developers and contributors of the following Github projects which I initially used myself and used as an inspiration:

And of course credits to the developers of the extremely useful packages

Comments

SVD did not converge

Hi Niclas,

The XMCA worked fine when I used it directly on my raw data. As it produced results of what I was expecting. However, I tried to use processed data (like anomalies and detrend) it gives the following error - SVG didn't converge

Does XMCA only accept raw data or is something wrong with my i/p? This is how my data looks:

Even the previously worked data was in a similar format. What could be the issue?

opened by GIRIJA-KALYANI 3

cartopy dependency is too restrictive

The very restrictive cartopy dependency makes it tricky to install into an existing conda environment

cartopy==0.18.0

I can see it was changed at https://github.com/coecms/xmca/commit/896e0b5977c4f4a36ed01363141f3ab7dd24c6d5

When I changed it back to >=18.0 it installed fine using pip install --user and imported fine with cartopy-0.19.0.post1.

I ran the tests like so

python -m unittest discover -v -s tests/

but three of the tests didn't pass

test_save_load_cplx (integration.test_integration_xarray.TestIntegration) ... ERROR    
test_save_load_rot (integration.test_integration_xarray.TestIntegration) ... ERROR                                       
test_save_load_std (integration.test_integration_xarray.TestIntegration) ... ERROR

Some other error messages:

Error: Rotation process did not converge!

======================================================================                                                   
ERROR: test_save_load_cplx (integration.test_integration_xarray.TestIntegration)                                         
----------------------------------------------------------------------                                                   
Traceback (most recent call last):
  File "/home/502/aph502/.local/lib/python3.9/site-packages/parameterized/parameterized.py", line 533, in standalone_func
    return func(*(a + p.args), **p.kwargs)
  File "/home/502/aph502/code/python/xmca/tests/integration/test_integration_xarray.py", line 148, in test_save_load     
    rmtree(join(getcwd(), 'tests/integration/temp/'))
  File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-22.01/lib/python3.9/shutil.py", line 731, in rmtree           
    onerror(os.rmdir, path, sys.exc_info())
  File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-22.01/lib/python3.9/shutil.py", line 729, in rmtree           
    os.rmdir(path)
OSError: [Errno 39] Directory not empty: '/home/502/aph502/code/python/xmca/tests/integration/temp/'                     

======================================================================                                                   
ERROR: test_save_load_rot (integration.test_integration_xarray.TestIntegration)                                          
----------------------------------------------------------------------                                                   
Traceback (most recent call last):
  File "/home/502/aph502/.local/lib/python3.9/site-packages/parameterized/parameterized.py", line 533, in standalone_func
    return func(*(a + p.args), **p.kwargs)
  File "/home/502/aph502/code/python/xmca/tests/integration/test_integration_xarray.py", line 148, in test_save_load     
    rmtree(join(getcwd(), 'tests/integration/temp/'))
  File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-22.01/lib/python3.9/shutil.py", line 731, in rmtree           
    onerror(os.rmdir, path, sys.exc_info())
  File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-22.01/lib/python3.9/shutil.py", line 729, in rmtree           
    os.rmdir(path)
OSError: [Errno 39] Directory not empty: '/home/502/aph502/code/python/xmca/tests/integration/temp/'                     

======================================================================                                                   
ERROR: test_save_load_std (integration.test_integration_xarray.TestIntegration)                                          
----------------------------------------------------------------------                                                   
Traceback (most recent call last):
  File "/home/502/aph502/.local/lib/python3.9/site-packages/parameterized/parameterized.py", line 533, in standalone_func
    return func(*(a + p.args), **p.kwargs)
  File "/home/502/aph502/code/python/xmca/tests/integration/test_integration_xarray.py", line 148, in test_save_load     
    rmtree(join(getcwd(), 'tests/integration/temp/'))
  File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-22.01/lib/python3.9/shutil.py", line 731, in rmtree           
    onerror(os.rmdir, path, sys.exc_info())
  File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-22.01/lib/python3.9/shutil.py", line 729, in rmtree           
    os.rmdir(path)
OSError: [Errno 39] Directory not empty: '/home/502/aph502/code/python/xmca/tests/integration/temp/'                     

----------------------------------------------------------------------

but it wasn't obvious that this was an issue with the version of cartopy.

opened by aidanheerdegen 3

installation error

Hello Niclas,

I installed xmca but when I import the library I get this error: libmkl_rt.so: cannot open shared object file: No such file or directory

any ideas or suggestions would be greatly appreciated.

Cheers,

Charles.

opened by chjones2 3
May cause errors after using mca. normalize()

Hello, first of all, thank you for your contribution. However, I found a bug that may cause SVD to fail to calculate (due to the existence of NaN value). The details are as follows:

when MCA class is initialized, it will be called get_nan_cols and remove_ nan_ cols to remove the NaN value, but if you call mca.normalize() at this time. New NaN values can appear and be brought into SVD calculation. This is because if the value of each time step of a grid point in the input array is the same (For example, it never rains in a place), the value obtained after standardization is NaN, which causes SVD unable to solve problem.

opened by L1angY 2
Feedback
I'm posting my feedback on behalf of Alex vM, who asked me to have a look at this package. The feedback is subjective and you may disagree with some if not all suggestions below.

In general, the readme is well written. Clear and concise. I'd add a figure(s) though. For instance, after you call mca.plot() in your example.

Currently, this package is for developers or at least those who have python experience. Well, maybe covariance estimation requires knowledge of python and programming skills, but I recommend making sphinx documentation (and publishing it on readthedocs) to bring users. You've already written docs for each if not all functions and classes. The only step left is to wrap it in sphinx with default parameters and paths.

To give credits to your package and show that it's well maintained, I recommend adding badges (readthedocs, travis build and test coverage). Use CircleCI. Here is a config example (you need only build-pip job since it's the easiest).

tools folder must be in the xmca folder.

setup.py install requires must be read from requirements.txt file (example) and not hard-coded.

GPL-3 license is too restrictive. Consider BSD-3 or even MIT.

Each test that involves randomness must start with a numpy.random.seed. Currently, you're setting the seed globally. It's not a good idea because the test results depend on the test order, which, of course, should not happen.

Good luck!

Best, Danylo
bug documentation
opened by dizcza 1
save_analysis currently does not save cos lat weighting

Just stumbled over this:

When saving a model via xmca.xarray.save_analysis and cosine latitude weighting was applied (apply_coslat), the current implementation does not invoke xmca.xarray.apply_coslat when the model gets loaded via xmca.xarray.load_analysis thus creating false PCs.

I hope to provide a fix to this soon.
bug

opened by nicrie 0
Release 1.0.0
New in release 1.0.0

method predict allows to project new, unseen data to obtain the corresponding PCs (works for standard, rotation and complex)

more efficient storing/loading of files; Unfortunately, this and the point above made it necessary to change the code considerably. As a consequence, loading models which were performed and saved using an older package version (0.x.y) is not supported.

add method to summarize performed analysis (summary)

add method to return input fields

improve docs (resolves #7)

correct and consistent use of definition of loadings

some bugfixes (e.g. resolves #12 )
opened by nicrie 0

MCA errors

Hello, I am trying to run the MCA with two variables, which are a climate model, WRF's output.

I get this error right after the bit:

mca.plot(mode=1, **pkwargs) :

ValueError: coordinate lon has dimensions ('south_north', 'west_east'), but these are not a subset of the DataArray dimensions ['lat', 'lon', 'mode']

Would really appreciate any help with this error. Many thanks.

# Load packages and data:
import xarray as xr
import matplotlib.pyplot as plt
import cartopy

var=xr.open_dataset("F:\\era5_2000_2020_vars_salem.nc")

t2=var.T2C
snow=var.SNOWH

#The variables, e.g., t2 is  structured as follows: 
t2:
<xarray.DataArray 'T2C' (time: 3512, south_north: 111, west_east: 114)>
Coordinates:
    lat          (south_north, west_east) float32 ...
    lon          (south_north, west_east) float32 ...
    xtime        (time) datetime64[ns] ...
  * time         (time) datetime64[ns] 2000-11-01 2000-11-02 ... 2020-04-29
  * west_east    (west_east) float64 -2.766e+05 -2.666e+05 ... 8.534e+05
  * south_north  (south_north) float64 -1.353e+05 -1.253e+05 ... 9.647e+05
Attributes:
    FieldType:    104
    MemoryOrder:  XY 
    description:  2m Temperature
    units:        C
    stagger:      
    pyproj_srs:   +proj=lcc +lat_0=64 +lon_0=10 +lat_1=64 +lat_2=68 +x_0=0 +y...
    coordinates:  XLONG XLAT XTIME

mca = xMCA(t2, snow)                  # MCA of field A and B
mca.solve(complexify=False)            # True for complex MCA


eigenvalues = mca.singular_values()     
pcs = mca.pcs()                           
eofs = mca.eofs()   

mca.set_field_names('t2','snow')
pkwargs = {'orientation' : 'vertical'}
mca.plot(mode=1, **pkwargs)

opened by Murk89 23

Sourcery Starbot ⭐ refactored nicrie/xmca
Thanks for starring sourcery-ai/sourcery ✨ 🌟 ✨

Here's your pull request refactoring your most popular Python repo.

If you want Sourcery to refactor all your Python repos and incoming pull requests install our bot.

Review changes via command line

To manually merge these changes, make sure you're on the master branch, then run:

git fetch https://github.com/sourcery-ai-bot/xmca master git merge --ff-only FETCH_HEAD git reset HEAD^
opened by sourcery-ai-bot 0
multivariate EOF analysis / MCA

add this feature in next release

note: this will be probably be a major change since it requires to rewrite the internal structure of the package and therefore will break backwards version compatibility
enhancement

opened by nicrie 0

Releases(1.4.2)

1.4.2(Feb 24, 2022)
remove overly restrictive cartopy version constrain

rotation now raises an error if it not does converge

Source code(tar.gz)
Source code(zip)
1.4.1(Oct 26, 2021)
fix #25

some minor updates in Readme

Source code(tar.gz)
Source code(zip)
1.4.0(Sep 13, 2021)

more significance testing! added an iterative permutation approach based on Winkler et al. 2020
Source code(tar.gz)
Source code(zip)
1.3.0(Aug 22, 2021)
Release dedicated to help assessing mode significance:

North's Rule of thumb

Rule N

Bootstrap

Source code(tar.gz)
Source code(zip)
1.2.0(Aug 20, 2021)
New in this release

much more efficient algorithm in terms of both time and memory

significance testing via Rule N

increased stability of complex PCA/MCA by providing flexible period parameter

Source code(tar.gz)
Source code(zip)
1.1.0(Jul 29, 2021)
add variance method

truncate method now works correctly

bugfix of incorrect pcs/eofs 'eigen' scaling

Source code(tar.gz)
Source code(zip)
1.0.4(Jul 22, 2021)
bug fix of wrong results when predicting coslat weighted PCs

Source code(tar.gz)
Source code(zip)
1.0.3(Jul 20, 2021)
resolve bugfix in predict

resolve bugfix in hom/het patterns which were introduced after the upgrade from 0.x to 1.x

add test cases, respectively, to avoid similar issues in the future

Source code(tar.gz)
Source code(zip)
1.0.2(Jul 8, 2021)
resolve missing API in docs

Source code(tar.gz)
Source code(zip)
1.0.1(Jul 7, 2021)
consistent storing location of plots

add CI workflow

Source code(tar.gz)
Source code(zip)
1.0.0(Jul 6, 2021)
New in release 1.0.0

method predict allows to project new, unseen data to obtain the corresponding PCs (works for standard, rotation and complex)

more efficient storing/loading of files; Unfortunately, this and the point above made it necessary to change the code considerably. As a consequence, loading models which were performed and saved using an older package version (0.x.y) is not supported.

add method to summarize performed analysis (summary)

add method to return input fields

improve docs (resolves #7)

correct and consistent use of definition of loadings

some bugfixes (e.g. resolves #12 )

Source code(tar.gz)
Source code(zip)
0.3.3(May 12, 2021)
Release 0.3.3

add arXiv pre-print for complex rotated MCA

Source code(tar.gz)
Source code(zip)
0.3.2(May 12, 2021)
Release 0.3.2

hotfix wrong file to version path which caused a direct import xmca to fail

Source code(tar.gz)
Source code(zip)
0.3.1(May 12, 2021)

fixed an issue with wrong versioning
Source code(tar.gz)
Source code(zip)
0.3.0(May 12, 2021)
Release 0.3.0

exponential time series extension is now vectorized -> much faster!

API reference section in the documentation is more neat

Source code(tar.gz)
Source code(zip)
0.2.2(May 11, 2021)

mostly documentation update & release to connect with zenodo
Source code(tar.gz)
Source code(zip)
0.2.1(May 12, 2021)

Source code(tar.gz)
Source code(zip)

Owner

Niclas Rieger

GitHub Repository

Python package for analyzing behavioral data for Brain Observatory: Visual Behavior

Allen Institute Visual Behavior Analysis package This repository contains code for analyzing behavioral data from the Allen Brain Observatory: Visual

16 Nov 04, 2022

Codes for the collection and predictive processing of bitcoin from the API of coinmarketcap

5 Apr 26, 2022

An extension to pandas dataframes describe function.

pandas_summary An extension to pandas dataframes describe function. The module contains DataFrameSummary object that extend describe() with: propertie

450 Dec 30, 2022

The Dash Enterprise App Gallery "Oil & Gas Wells" example

This app is based on the Dash Enterprise App Gallery "Oil & Gas Wells" example. For more information and more apps see: Dash App Gallery See the Dash

1 Nov 08, 2021

Validation and inference over LinkML instance data using souffle

Translates LinkML schemas into Datalog programs and executes them using Souffle, enabling advanced validation and inference over instance data

7 Aug 07, 2022

Exploratory data analysis

Exploratory data analysis An Exploratory data analysis APP TAPIWA CHAMBOKO 🚀 About Me I'm a full stack developer experienced in deploying artificial

1 Nov 07, 2021

Single machine, multiple cards training; mix-precision training; DALI data loader.

Template Script Category Description Category script comparison script train.py, loader.py for single-machine-multiple-cards training train_DP.py, tra

2 Jun 27, 2022

Data Intelligence Applications - Online Product Advertising and Pricing with Context Generation

Data Intelligence Applications - Online Product Advertising and Pricing with Context Generation Overview Consider the scenario in which advertisement

2 Nov 18, 2021

Tkinter Izhikevich Neuron Model With Python

TKINTER IZHIKEVICH NEURON MODEL WITH PYTHON Hodgkin-Huxley Model It is a mathematical model for the generation and transmission of action potentials i

8 Jul 16, 2022

Implementation in Python of the reliability measures such as Omega.

OmegaPy Summary Simple implementation in Python of the reliability measures: Omega Total, Omega Hierarchical and Omega Hierarchical Total. Name Link O

2 Apr 27, 2022

Additional tools for particle accelerator data analysis and machine information

PyLHC Tools This package is a collection of useful scripts and tools for the Optics Measurements and Corrections group (OMC) at CERN. Documentation Au

3 Apr 13, 2022

A variant of LinUCB bandit algorithm with local differential privacy guarantee

Contents LDP LinUCB Description Model Architecture Dataset Environment Requirements Script Description Script and Sample Code Script Parameters Launch

4 Oct 25, 2022

Spaghetti: an open-source Python library for the analysis of network-based spatial data

pysal/spaghetti SPAtial GrapHs: nETworks, Topology, & Inference Spaghetti is an open-source Python library for the analysis of network-based spatial d

203 Jan 03, 2023

Hatchet is a Python-based library that allows Pandas dataframes to be indexed by structured tree and graph data.

Hatchet Hatchet is a Python-based library that allows Pandas dataframes to be indexed by structured tree and graph data. It is intended for analyzing

14 Aug 19, 2022

Wafer Fault Detection - Wafer circleci with python

Wafer Fault Detection Problem Statement: Wafer (In electronics), also called a slice or substrate, is a thin slice of semiconductor, such as a crystal

14 Nov 21, 2022

Instant search for and access to many datasets in Pyspark.

SparkDataset Provides instant access to many datasets right from Pyspark (in Spark DataFrame structure). Drop a star if you like the project. 😃 Motiv

31 Dec 16, 2022

sportsdataverse python package

sportsdataverse-py See CHANGELOG.md for details. The goal of sportsdataverse-py is to provide the community with a python package for working with spo

37 Dec 27, 2022

Elementary is an open-source data reliability framework for modern data teams. The first module of the framework is data lineage.

Data lineage made simple, reliable, and automated. Effortlessly track the flow of data, understand dependencies and analyze impact. Features Visualiza

898 Jan 09, 2023

Jupyter notebooks for the book "The Elements of Statistical Learning".

This repository contains Jupyter notebooks implementing the algorithms found in the book and summary of the textbook.

369 Dec 30, 2022

A probabilistic programming language in TensorFlow. Deep generative models, variational inference.

Edward is a Python library for probabilistic modeling, inference, and criticism. It is a testbed for fast experimentation and research with probabilis

4.7k Jan 09, 2023

Maximum Covariance Analysis in Python

Related tags

Overview

xMCA | Maximum Covariance Analysis in Python

🔰 What is MCA?

⭐ New in release 1.4.x

📌 Core Features

🔧 Installation

📰 Documentation

⚡ Quickstart

PCA / EOF analysis

MCA

Significance analysis

Saving/loading an analysis

Quickly inspect your results visually

🔖 Please cite

💪 Credits

Comments

New in release 1.0.0

Releases(1.4.2)

1.4.2(Feb 24, 2022)

1.4.1(Oct 26, 2021)

1.4.0(Sep 13, 2021)

1.3.0(Aug 22, 2021)

1.2.0(Aug 20, 2021)

1.1.0(Jul 29, 2021)

1.0.4(Jul 22, 2021)

1.0.3(Jul 20, 2021)

1.0.2(Jul 8, 2021)

1.0.1(Jul 7, 2021)

1.0.0(Jul 6, 2021)