Maximum Covariance Analysis in Python

Last update: Jan 03, 2023

Overview

xMCA | Maximum Covariance Analysis in Python

The aim of this package is to provide a flexible tool for the climate science community to perform Maximum Covariance Analysis (MCA) in a simple and consistent way. Given the huge popularity of xarray in the climate science community, xmca supports xarray.DataArray as well as numpy.ndarray as input formats.

_{Mode 2 of complex rotated Maximum Covariance Analysis showing the shared dynamics of SST and continental precipitation associated to ENSO between 1980 and 2020.}

🔰 What is MCA?

MCA maximises the temporal covariance between two different data fields and is closely related to Principal Component Analysis (PCA) / Empirical Orthogonal Function analysis (EOF analysis). While EOF analysis maximises the variance within a single data field, MCA allows to extract the dominant co-varying patterns between two different data fields. When the two input fields are the same, MCA reduces to standard EOF analysis.

For the mathematical understanding please have a look at e.g. Bretherton et al. or the lecture material written by C. Bretherton.

⭐ New in release 1.4.x

Much faster and more memory-efficient algorithm
Significance testing of individual modes via
- Rule N (Overland & Preisendorfer 1982)
- Bootstrapping/permutation schemes + block-wise approach for autocorrelated data
- Iterative permutation (Winkler et al. 2020)
Period parameter of solve method provides more flexibility to exponential extension, making complex MCA more stable
Fixed missing coslat weighting when saving a model (Issue 25)

📌 Core Features

	Standard	Rotated	Complex	Complex Rotated
EOF analysis	✔️	✔️	✔️	✔️
MCA	✔️	✔️	✔️	✔️

* click on check marks for reference
** Complex rotated MCA is also available as a pre-print on arXiv.

🔧 Installation

Installation is simply done via

pip install xmca

If you have problems during the installation please consult the documentation or raise an issue here on Github.

📰 Documentation

A tutorial to get you started as well as the full API can be found in the documentation.

⚡ Quickstart

Import the package

    from xmca.array import MCA  # use with np.ndarray
    from xmca.xarray import xMCA  # use with xr.DataArray

As an example, we take North American surface temperatures shipped with xarray. Note: only works with xr.DataArray, not xr.Dataset.

    import xarray as xr  # only needed to obtain test data

    # split data arbitrarily into west and east coast
    data = xr.tutorial.open_dataset('air_temperature').air
    west = data.sel(lon=slice(200, 260))
    east = data.sel(lon=slice(260, 360))

PCA / EOF analysis

Construct a model with only one field and solve it to perform standard PCA / EOF analysis.

    pca = xMCA(west)                        # PCA of west coast
    pca.solve(complexify=False)            # True for complex PCA

    svals = pca.singular_values()     # singular vales = eigenvalues for PCA
    expvar      = pca.explained_variance()  # explained variance
    pcs         = pca.pcs()                 # Principal component scores (PCs)
    eofs        = pca.eofs()                # spatial patterns (EOFs)

Obtaining a Varimax/Promax-rotated solution can be achieved by rotating the model choosing the number of EOFs to be rotated (n_rot) as well as the Promax parameter (power). Here, power=1 equals a Varimax-rotated solution.

    pca.rotate(n_rot=10, power=1)

    expvar_rot  = pca.explained_variance()  # explained variance
    pcs_rot     = pca.pcs()                 # Principal component scores (PCs)
    eofs_rot    = pca.eofs()                # spatial patterns (EOFs)

MCA

Same as for PCA / EOF analysis, but with two input fields instead of one.

    mca = xMCA(west, east)                  # MCA of field A and B
    mca.solve(complexify=False)            # True for complex MCA

    eigenvalues = mca.singular_values()     # singular vales
    pcs = mca.pcs()                         # expansion coefficient (PCs)
    eofs = mca.eofs()                       # spatial patterns (EOFs)

Significance analysis

A simple way of estimating the significance of the obtained modes is by running Monte Carlo simulations based on uncorrelated Gaussian white noise known as Rule N (Overland and Preisendorfer 1982). Here we create 200 of such synthetic data sets and compare the synthetic with the real singular spectrum to assess significance.

    surr = mca.rule_n(200)
    median = surr.median('run')
    q99 = surr.quantile(.99, dim='run')
    q01 = surr.quantile(.01, dim='run')

    cutoff = np.sum((svals - q99 > 0)).values  # first 8 modes significant

    fig = plt.figure(figsize=(10, 4))
    ax = fig.add_subplot(111)
    svals.plot(ax=ax, yscale='log', label='true')
    median.plot(ax=ax, yscale='log', color='.5', label='rule N')
    q99.plot(ax=ax, yscale='log', color='.5', ls=':')
    q01.plot(ax=ax, yscale='log', color='.5', ls=':')
    ax.axvline(cutoff + 0.5, ls=':')
    ax.set_xlim(-2, 200)
    ax.set_ylim(1e-1, 2.5e4)
    ax.set_title('Significance based on Rule N')
    ax.legend()

The first 8 modes are significant according to rule N using 200 synthetic runs.

Saving/loading an analysis

    mca.save_analysis('my_analysis')    # this will save the data and a respective
                                        # info file. The files will be stored in a
                                        # special directory
    mca2 = xMCA()                       # create a new, empty instance
    mca2.load_analysis('my_analysis/info.xmca') # analysis can be
                                        # loaded via specifying the path to the
                                        # info file created earlier

Quickly inspect your results visually

The package provides a method to plot individual modes.

    mca2.set_field_names('West', 'East')
    pkwargs = {'orientation' : 'vertical'}
    mca2.plot(mode=1, **pkwargs)

Result of default plot method after performing MCA on T2m of North American west and east coast showing mode 1.

You may want to modify the plot for some better optics:

    from cartopy.crs import EqualEarth  # for different map projections

    # map projections for "left" and "right" field
    projections = {
        'left': EqualEarth(),
        'right': EqualEarth()
    }

    pkwargs = {
        "figsize"     : (8, 5),
        "orientation" : 'vertical',
        'cmap_eof'    : 'BrBG',  # colormap amplitude
        "projection"  : projections,
    }
    mca2.plot(mode=3, **pkwargs)

You can save the plot to your local disk as a .png file via

    skwargs={'dpi':200}
    mca2.save_plot(mode=3, plot_kwargs=pkwargs, save_kwargs=skwargs)

🔖 Please cite

I am just starting my career as a scientist. Feedback on my scientific work is therefore important to me in order to assess which of my work advances the scientific community. As such, if you use the package for your own research and find it helpful, I would appreciate feedback here on Github, via email, or as a citation:

Niclas Rieger, 2021: nicrie/xmca: version x.y.z. doi:10.5281/zenodo.4749830.

💪 Credits

Kudos to the developers and contributors of the following Github projects which I initially used myself and used as an inspiration:

And of course credits to the developers of the extremely useful packages

Comments

SVD did not converge

Hi Niclas,

The XMCA worked fine when I used it directly on my raw data. As it produced results of what I was expecting. However, I tried to use processed data (like anomalies and detrend) it gives the following error - SVG didn't converge

Does XMCA only accept raw data or is something wrong with my i/p? This is how my data looks:

Even the previously worked data was in a similar format. What could be the issue?

opened by GIRIJA-KALYANI 3

cartopy dependency is too restrictive

The very restrictive cartopy dependency makes it tricky to install into an existing conda environment

cartopy==0.18.0

I can see it was changed at https://github.com/coecms/xmca/commit/896e0b5977c4f4a36ed01363141f3ab7dd24c6d5

When I changed it back to >=18.0 it installed fine using pip install --user and imported fine with cartopy-0.19.0.post1.

I ran the tests like so

python -m unittest discover -v -s tests/

but three of the tests didn't pass

test_save_load_cplx (integration.test_integration_xarray.TestIntegration) ... ERROR    
test_save_load_rot (integration.test_integration_xarray.TestIntegration) ... ERROR                                       
test_save_load_std (integration.test_integration_xarray.TestIntegration) ... ERROR

Some other error messages:

Error: Rotation process did not converge!

======================================================================                                                   
ERROR: test_save_load_cplx (integration.test_integration_xarray.TestIntegration)                                         
----------------------------------------------------------------------                                                   
Traceback (most recent call last):
  File "/home/502/aph502/.local/lib/python3.9/site-packages/parameterized/parameterized.py", line 533, in standalone_func
    return func(*(a + p.args), **p.kwargs)
  File "/home/502/aph502/code/python/xmca/tests/integration/test_integration_xarray.py", line 148, in test_save_load     
    rmtree(join(getcwd(), 'tests/integration/temp/'))
  File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-22.01/lib/python3.9/shutil.py", line 731, in rmtree           
    onerror(os.rmdir, path, sys.exc_info())
  File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-22.01/lib/python3.9/shutil.py", line 729, in rmtree           
    os.rmdir(path)
OSError: [Errno 39] Directory not empty: '/home/502/aph502/code/python/xmca/tests/integration/temp/'                     

======================================================================                                                   
ERROR: test_save_load_rot (integration.test_integration_xarray.TestIntegration)                                          
----------------------------------------------------------------------                                                   
Traceback (most recent call last):
  File "/home/502/aph502/.local/lib/python3.9/site-packages/parameterized/parameterized.py", line 533, in standalone_func
    return func(*(a + p.args), **p.kwargs)
  File "/home/502/aph502/code/python/xmca/tests/integration/test_integration_xarray.py", line 148, in test_save_load     
    rmtree(join(getcwd(), 'tests/integration/temp/'))
  File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-22.01/lib/python3.9/shutil.py", line 731, in rmtree           
    onerror(os.rmdir, path, sys.exc_info())
  File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-22.01/lib/python3.9/shutil.py", line 729, in rmtree           
    os.rmdir(path)
OSError: [Errno 39] Directory not empty: '/home/502/aph502/code/python/xmca/tests/integration/temp/'                     

======================================================================                                                   
ERROR: test_save_load_std (integration.test_integration_xarray.TestIntegration)                                          
----------------------------------------------------------------------                                                   
Traceback (most recent call last):
  File "/home/502/aph502/.local/lib/python3.9/site-packages/parameterized/parameterized.py", line 533, in standalone_func
    return func(*(a + p.args), **p.kwargs)
  File "/home/502/aph502/code/python/xmca/tests/integration/test_integration_xarray.py", line 148, in test_save_load     
    rmtree(join(getcwd(), 'tests/integration/temp/'))
  File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-22.01/lib/python3.9/shutil.py", line 731, in rmtree           
    onerror(os.rmdir, path, sys.exc_info())
  File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-22.01/lib/python3.9/shutil.py", line 729, in rmtree           
    os.rmdir(path)
OSError: [Errno 39] Directory not empty: '/home/502/aph502/code/python/xmca/tests/integration/temp/'                     

----------------------------------------------------------------------

but it wasn't obvious that this was an issue with the version of cartopy.

opened by aidanheerdegen 3

installation error

Hello Niclas,

I installed xmca but when I import the library I get this error: libmkl_rt.so: cannot open shared object file: No such file or directory

any ideas or suggestions would be greatly appreciated.

Cheers,

Charles.

opened by chjones2 3
May cause errors after using mca. normalize()

Hello, first of all, thank you for your contribution. However, I found a bug that may cause SVD to fail to calculate (due to the existence of NaN value). The details are as follows:

when MCA class is initialized, it will be called get_nan_cols and remove_ nan_ cols to remove the NaN value, but if you call mca.normalize() at this time. New NaN values can appear and be brought into SVD calculation. This is because if the value of each time step of a grid point in the input array is the same (For example, it never rains in a place), the value obtained after standardization is NaN, which causes SVD unable to solve problem.

opened by L1angY 2
Feedback
I'm posting my feedback on behalf of Alex vM, who asked me to have a look at this package. The feedback is subjective and you may disagree with some if not all suggestions below.

In general, the readme is well written. Clear and concise. I'd add a figure(s) though. For instance, after you call mca.plot() in your example.

Currently, this package is for developers or at least those who have python experience. Well, maybe covariance estimation requires knowledge of python and programming skills, but I recommend making sphinx documentation (and publishing it on readthedocs) to bring users. You've already written docs for each if not all functions and classes. The only step left is to wrap it in sphinx with default parameters and paths.

To give credits to your package and show that it's well maintained, I recommend adding badges (readthedocs, travis build and test coverage). Use CircleCI. Here is a config example (you need only build-pip job since it's the easiest).

tools folder must be in the xmca folder.

setup.py install requires must be read from requirements.txt file (example) and not hard-coded.

GPL-3 license is too restrictive. Consider BSD-3 or even MIT.

Each test that involves randomness must start with a numpy.random.seed. Currently, you're setting the seed globally. It's not a good idea because the test results depend on the test order, which, of course, should not happen.

Good luck!

Best, Danylo
bug documentation
opened by dizcza 1
save_analysis currently does not save cos lat weighting

Just stumbled over this:

When saving a model via xmca.xarray.save_analysis and cosine latitude weighting was applied (apply_coslat), the current implementation does not invoke xmca.xarray.apply_coslat when the model gets loaded via xmca.xarray.load_analysis thus creating false PCs.

I hope to provide a fix to this soon.
bug

opened by nicrie 0
Release 1.0.0
New in release 1.0.0

method predict allows to project new, unseen data to obtain the corresponding PCs (works for standard, rotation and complex)

more efficient storing/loading of files; Unfortunately, this and the point above made it necessary to change the code considerably. As a consequence, loading models which were performed and saved using an older package version (0.x.y) is not supported.

add method to summarize performed analysis (summary)

add method to return input fields

improve docs (resolves #7)

correct and consistent use of definition of loadings

some bugfixes (e.g. resolves #12 )
opened by nicrie 0

MCA errors

Hello, I am trying to run the MCA with two variables, which are a climate model, WRF's output.

I get this error right after the bit:

mca.plot(mode=1, **pkwargs) :

ValueError: coordinate lon has dimensions ('south_north', 'west_east'), but these are not a subset of the DataArray dimensions ['lat', 'lon', 'mode']

Would really appreciate any help with this error. Many thanks.

# Load packages and data:
import xarray as xr
import matplotlib.pyplot as plt
import cartopy

var=xr.open_dataset("F:\\era5_2000_2020_vars_salem.nc")

t2=var.T2C
snow=var.SNOWH

#The variables, e.g., t2 is  structured as follows: 
t2:
<xarray.DataArray 'T2C' (time: 3512, south_north: 111, west_east: 114)>
Coordinates:
    lat          (south_north, west_east) float32 ...
    lon          (south_north, west_east) float32 ...
    xtime        (time) datetime64[ns] ...
  * time         (time) datetime64[ns] 2000-11-01 2000-11-02 ... 2020-04-29
  * west_east    (west_east) float64 -2.766e+05 -2.666e+05 ... 8.534e+05
  * south_north  (south_north) float64 -1.353e+05 -1.253e+05 ... 9.647e+05
Attributes:
    FieldType:    104
    MemoryOrder:  XY 
    description:  2m Temperature
    units:        C
    stagger:      
    pyproj_srs:   +proj=lcc +lat_0=64 +lon_0=10 +lat_1=64 +lat_2=68 +x_0=0 +y...
    coordinates:  XLONG XLAT XTIME

mca = xMCA(t2, snow)                  # MCA of field A and B
mca.solve(complexify=False)            # True for complex MCA


eigenvalues = mca.singular_values()     
pcs = mca.pcs()                           
eofs = mca.eofs()   

mca.set_field_names('t2','snow')
pkwargs = {'orientation' : 'vertical'}
mca.plot(mode=1, **pkwargs)

opened by Murk89 23

Sourcery Starbot ⭐ refactored nicrie/xmca
Thanks for starring sourcery-ai/sourcery ✨ 🌟 ✨

Here's your pull request refactoring your most popular Python repo.

If you want Sourcery to refactor all your Python repos and incoming pull requests install our bot.

Review changes via command line

To manually merge these changes, make sure you're on the master branch, then run:

git fetch https://github.com/sourcery-ai-bot/xmca master git merge --ff-only FETCH_HEAD git reset HEAD^
opened by sourcery-ai-bot 0
multivariate EOF analysis / MCA

add this feature in next release

note: this will be probably be a major change since it requires to rewrite the internal structure of the package and therefore will break backwards version compatibility
enhancement

opened by nicrie 0

Releases(1.4.2)

1.4.2(Feb 24, 2022)
remove overly restrictive cartopy version constrain

rotation now raises an error if it not does converge

Source code(tar.gz)
Source code(zip)
1.4.1(Oct 26, 2021)
fix #25

some minor updates in Readme

Source code(tar.gz)
Source code(zip)
1.4.0(Sep 13, 2021)

more significance testing! added an iterative permutation approach based on Winkler et al. 2020
Source code(tar.gz)
Source code(zip)
1.3.0(Aug 22, 2021)
Release dedicated to help assessing mode significance:

North's Rule of thumb

Rule N

Bootstrap

Source code(tar.gz)
Source code(zip)
1.2.0(Aug 20, 2021)
New in this release

much more efficient algorithm in terms of both time and memory

significance testing via Rule N

increased stability of complex PCA/MCA by providing flexible period parameter

Source code(tar.gz)
Source code(zip)
1.1.0(Jul 29, 2021)
add variance method

truncate method now works correctly

bugfix of incorrect pcs/eofs 'eigen' scaling

Source code(tar.gz)
Source code(zip)
1.0.4(Jul 22, 2021)
bug fix of wrong results when predicting coslat weighted PCs

Source code(tar.gz)
Source code(zip)
1.0.3(Jul 20, 2021)
resolve bugfix in predict

resolve bugfix in hom/het patterns which were introduced after the upgrade from 0.x to 1.x

add test cases, respectively, to avoid similar issues in the future

Source code(tar.gz)
Source code(zip)
1.0.2(Jul 8, 2021)
resolve missing API in docs

Source code(tar.gz)
Source code(zip)
1.0.1(Jul 7, 2021)
consistent storing location of plots

add CI workflow

Source code(tar.gz)
Source code(zip)
1.0.0(Jul 6, 2021)
New in release 1.0.0

method predict allows to project new, unseen data to obtain the corresponding PCs (works for standard, rotation and complex)

more efficient storing/loading of files; Unfortunately, this and the point above made it necessary to change the code considerably. As a consequence, loading models which were performed and saved using an older package version (0.x.y) is not supported.

add method to summarize performed analysis (summary)

add method to return input fields

improve docs (resolves #7)

correct and consistent use of definition of loadings

some bugfixes (e.g. resolves #12 )

Source code(tar.gz)
Source code(zip)
0.3.3(May 12, 2021)
Release 0.3.3

add arXiv pre-print for complex rotated MCA

Source code(tar.gz)
Source code(zip)
0.3.2(May 12, 2021)
Release 0.3.2

hotfix wrong file to version path which caused a direct import xmca to fail

Source code(tar.gz)
Source code(zip)
0.3.1(May 12, 2021)

fixed an issue with wrong versioning
Source code(tar.gz)
Source code(zip)
0.3.0(May 12, 2021)
Release 0.3.0

exponential time series extension is now vectorized -> much faster!

API reference section in the documentation is more neat

Source code(tar.gz)
Source code(zip)
0.2.2(May 11, 2021)

mostly documentation update & release to connect with zenodo
Source code(tar.gz)
Source code(zip)
0.2.1(May 12, 2021)

Source code(tar.gz)
Source code(zip)

Owner

Niclas Rieger

GitHub Repository

A 2-dimensional physics engine written in Cairo

38 Nov 16, 2022

Python package to transfer data in a fast, reliable, and packetized form.

pySerialTransfer Python package to transfer data in a fast, reliable, and packetized form.

101 Dec 07, 2022

Manage large and heterogeneous data spaces on the file system.

signac - simple data management The signac framework helps users manage and scale file-based workflows, facilitating data reuse, sharing, and reproduc

109 Dec 14, 2022

Fast, flexible and easy to use probabilistic modelling in Python.

Please consider citing the JMLR-MLOSS Manuscript if you've used pomegranate in your academic work! pomegranate is a package for building probabilistic

3k Jan 02, 2023

This repo contains a simple but effective tool made using python which can be used for quality control in statistical approach.

This repo contains a powerful tool made using python which is used to visualize, analyse and finally assess the quality of the product depending upon the given observations

8 Oct 18, 2022

Intercepting proxy + analysis toolkit for Second Life compatible virtual worlds

Hippolyzer Hippolyzer is a revival of Linden Lab's PyOGP library targeting modern Python 3, with a focus on debugging issues in Second Life-compatible

6 Sep 01, 2022

Two phase pipeline + StreamlitTwo phase pipeline + Streamlit

Two phase pipeline + Streamlit This is an example project that demonstrates how to create a pipeline that consists of two phases of execution. In betw

1 Nov 17, 2021

ToeholdTools is a Python package and desktop app designed to facilitate analyzing and designing toehold switches, created as part of the 2021 iGEM competition.

ToeholdTools Category Status Repository Package Build Quality A library for the analysis of toehold switch riboregulators created by the iGEM team Cit

0 Dec 01, 2021

Maximum Covariance Analysis in Python

Related tags

Overview

xMCA | Maximum Covariance Analysis in Python

🔰 What is MCA?

⭐ New in release 1.4.x

📌 Core Features

🔧 Installation

📰 Documentation

⚡ Quickstart

PCA / EOF analysis

MCA

Significance analysis

Saving/loading an analysis

Quickly inspect your results visually

🔖 Please cite

💪 Credits

Comments

New in release 1.0.0

Releases(1.4.2)

1.4.2(Feb 24, 2022)

1.4.1(Oct 26, 2021)

1.4.0(Sep 13, 2021)

1.3.0(Aug 22, 2021)

1.2.0(Aug 20, 2021)

1.1.0(Jul 29, 2021)

1.0.4(Jul 22, 2021)

1.0.3(Jul 20, 2021)

1.0.2(Jul 8, 2021)

1.0.1(Jul 7, 2021)

1.0.0(Jul 6, 2021)

New in release 1.0.0

0.3.3(May 12, 2021)

0.3.2(May 12, 2021)

0.3.1(May 12, 2021)

0.3.0(May 12, 2021)

0.2.2(May 11, 2021)

0.2.1(May 12, 2021)

Owner

Niclas Rieger

A 2-dimensional physics engine written in Cairo

Python package to transfer data in a fast, reliable, and packetized form.

Manage large and heterogeneous data spaces on the file system.

Fast, flexible and easy to use probabilistic modelling in Python.

This repo contains a simple but effective tool made using python which can be used for quality control in statistical approach.

Intercepting proxy + analysis toolkit for Second Life compatible virtual worlds

Two phase pipeline + StreamlitTwo phase pipeline + Streamlit

ToeholdTools is a Python package and desktop app designed to facilitate analyzing and designing toehold switches, created as part of the 2021 iGEM competition.

Python dataset creator to construct datasets composed of OpenFace extracted features and Shimmer3 GSR+ Sensor datas

VHub - An API that permits uploading of vulnerability datasets and return of the serialized data

Karate Club: An API Oriented Open-source Python Framework for Unsupervised Learning on Graphs (CIKM 2020)

Zipline, a Pythonic Algorithmic Trading Library

Hatchet is a Python-based library that allows Pandas dataframes to be indexed by structured tree and graph data.

Python Implementation of Scalable In-Memory Updatable Bitmap Indexing

Pipetools enables function composition similar to using Unix pipes.

Data processing with Pandas.

Data Analytics: Modeling and Studying data relating to climate change and adoption of electric vehicles

SparseLasso: Sparse Solutions for the Lasso

simple way to build the declarative and destributed data pipelines with python

Numerical Analysis toolkit centred around PDEs, for demonstration and understanding purposes not production