Maximum Covariance Analysis in Python

Overview

xMCA | Maximum Covariance Analysis in Python

version GitHub Workflow Status Documentation Status codecov.io downloads DOI

The aim of this package is to provide a flexible tool for the climate science community to perform Maximum Covariance Analysis (MCA) in a simple and consistent way. Given the huge popularity of xarray in the climate science community, xmca supports xarray.DataArray as well as numpy.ndarray as input formats.

Example Figure Mode 2 of complex rotated Maximum Covariance Analysis showing the shared dynamics of SST and continental precipitation associated to ENSO between 1980 and 2020.

🔰 What is MCA?

MCA maximises the temporal covariance between two different data fields and is closely related to Principal Component Analysis (PCA) / Empirical Orthogonal Function analysis (EOF analysis). While EOF analysis maximises the variance within a single data field, MCA allows to extract the dominant co-varying patterns between two different data fields. When the two input fields are the same, MCA reduces to standard EOF analysis.

For the mathematical understanding please have a look at e.g. Bretherton et al. or the lecture material written by C. Bretherton.

New in release 1.4.x

  • Much faster and more memory-efficient algorithm
  • Significance testing of individual modes via
  • Period parameter of solve method provides more flexibility to exponential extension, making complex MCA more stable
  • Fixed missing coslat weighting when saving a model (Issue 25)

📌 Core Features

Standard Rotated Complex Complex Rotated
EOF analysis ✔️ ✔️ ✔️ ✔️
MCA ✔️ ✔️ ✔️ ✔️

* click on check marks for reference
** Complex rotated MCA is also available as a pre-print on arXiv.

🔧 Installation

Installation is simply done via

pip install xmca

If you have problems during the installation please consult the documentation or raise an issue here on Github.

📰 Documentation

A tutorial to get you started as well as the full API can be found in the documentation.

Quickstart

Import the package

    from xmca.array import MCA  # use with np.ndarray
    from xmca.xarray import xMCA  # use with xr.DataArray

As an example, we take North American surface temperatures shipped with xarray. Note: only works with xr.DataArray, not xr.Dataset.

    import xarray as xr  # only needed to obtain test data

    # split data arbitrarily into west and east coast
    data = xr.tutorial.open_dataset('air_temperature').air
    west = data.sel(lon=slice(200, 260))
    east = data.sel(lon=slice(260, 360))

PCA / EOF analysis

Construct a model with only one field and solve it to perform standard PCA / EOF analysis.

    pca = xMCA(west)                        # PCA of west coast
    pca.solve(complexify=False)            # True for complex PCA

    svals = pca.singular_values()     # singular vales = eigenvalues for PCA
    expvar      = pca.explained_variance()  # explained variance
    pcs         = pca.pcs()                 # Principal component scores (PCs)
    eofs        = pca.eofs()                # spatial patterns (EOFs)

Obtaining a Varimax/Promax-rotated solution can be achieved by rotating the model choosing the number of EOFs to be rotated (n_rot) as well as the Promax parameter (power). Here, power=1 equals a Varimax-rotated solution.

    pca.rotate(n_rot=10, power=1)

    expvar_rot  = pca.explained_variance()  # explained variance
    pcs_rot     = pca.pcs()                 # Principal component scores (PCs)
    eofs_rot    = pca.eofs()                # spatial patterns (EOFs)

MCA

Same as for PCA / EOF analysis, but with two input fields instead of one.

    mca = xMCA(west, east)                  # MCA of field A and B
    mca.solve(complexify=False)            # True for complex MCA

    eigenvalues = mca.singular_values()     # singular vales
    pcs = mca.pcs()                         # expansion coefficient (PCs)
    eofs = mca.eofs()                       # spatial patterns (EOFs)

Significance analysis

A simple way of estimating the significance of the obtained modes is by running Monte Carlo simulations based on uncorrelated Gaussian white noise known as Rule N (Overland and Preisendorfer 1982). Here we create 200 of such synthetic data sets and compare the synthetic with the real singular spectrum to assess significance.

    surr = mca.rule_n(200)
    median = surr.median('run')
    q99 = surr.quantile(.99, dim='run')
    q01 = surr.quantile(.01, dim='run')

    cutoff = np.sum((svals - q99 > 0)).values  # first 8 modes significant

    fig = plt.figure(figsize=(10, 4))
    ax = fig.add_subplot(111)
    svals.plot(ax=ax, yscale='log', label='true')
    median.plot(ax=ax, yscale='log', color='.5', label='rule N')
    q99.plot(ax=ax, yscale='log', color='.5', ls=':')
    q01.plot(ax=ax, yscale='log', color='.5', ls=':')
    ax.axvline(cutoff + 0.5, ls=':')
    ax.set_xlim(-2, 200)
    ax.set_ylim(1e-1, 2.5e4)
    ax.set_title('Significance based on Rule N')
    ax.legend()

Example Figure Mode1 The first 8 modes are significant according to rule N using 200 synthetic runs.

Saving/loading an analysis

    mca.save_analysis('my_analysis')    # this will save the data and a respective
                                        # info file. The files will be stored in a
                                        # special directory
    mca2 = xMCA()                       # create a new, empty instance
    mca2.load_analysis('my_analysis/info.xmca') # analysis can be
                                        # loaded via specifying the path to the
                                        # info file created earlier

Quickly inspect your results visually

The package provides a method to plot individual modes.

    mca2.set_field_names('West', 'East')
    pkwargs = {'orientation' : 'vertical'}
    mca2.plot(mode=1, **pkwargs)

Example Figure Mode1 Result of default plot method after performing MCA on T2m of North American west and east coast showing mode 1.

You may want to modify the plot for some better optics:

    from cartopy.crs import EqualEarth  # for different map projections

    # map projections for "left" and "right" field
    projections = {
        'left': EqualEarth(),
        'right': EqualEarth()
    }

    pkwargs = {
        "figsize"     : (8, 5),
        "orientation" : 'vertical',
        'cmap_eof'    : 'BrBG',  # colormap amplitude
        "projection"  : projections,
    }
    mca2.plot(mode=3, **pkwargs)

Example Figure Mode 3

You can save the plot to your local disk as a .png file via

    skwargs={'dpi':200}
    mca2.save_plot(mode=3, plot_kwargs=pkwargs, save_kwargs=skwargs)

🔖 Please cite

I am just starting my career as a scientist. Feedback on my scientific work is therefore important to me in order to assess which of my work advances the scientific community. As such, if you use the package for your own research and find it helpful, I would appreciate feedback here on Github, via email, or as a citation:

Niclas Rieger, 2021: nicrie/xmca: version x.y.z. doi:10.5281/zenodo.4749830.

💪 Credits

Kudos to the developers and contributors of the following Github projects which I initially used myself and used as an inspiration:

And of course credits to the developers of the extremely useful packages

Comments
  • SVD did not converge

    SVD did not converge

    Hi Niclas,

    The XMCA worked fine when I used it directly on my raw data. As it produced results of what I was expecting. However, I tried to use processed data (like anomalies and detrend) it gives the following error - SVG didn't converge

    image image

    Does XMCA only accept raw data or is something wrong with my i/p? This is how my data looks: image

    Even the previously worked data was in a similar format. What could be the issue?

    opened by GIRIJA-KALYANI 3
  • cartopy dependency is too restrictive

    cartopy dependency is too restrictive

    The very restrictive cartopy dependency makes it tricky to install into an existing conda environment

    cartopy==0.18.0
    

    I can see it was changed at https://github.com/coecms/xmca/commit/896e0b5977c4f4a36ed01363141f3ab7dd24c6d5

    When I changed it back to >=18.0 it installed fine using pip install --user and imported fine with cartopy-0.19.0.post1.

    I ran the tests like so

    python -m unittest discover -v -s tests/
    

    but three of the tests didn't pass

    test_save_load_cplx (integration.test_integration_xarray.TestIntegration) ... ERROR    
    test_save_load_rot (integration.test_integration_xarray.TestIntegration) ... ERROR                                       
    test_save_load_std (integration.test_integration_xarray.TestIntegration) ... ERROR    
    

    Some other error messages:

    Error: Rotation process did not converge!
    
    ======================================================================                                                   
    ERROR: test_save_load_cplx (integration.test_integration_xarray.TestIntegration)                                         
    ----------------------------------------------------------------------                                                   
    Traceback (most recent call last):
      File "/home/502/aph502/.local/lib/python3.9/site-packages/parameterized/parameterized.py", line 533, in standalone_func
        return func(*(a + p.args), **p.kwargs)
      File "/home/502/aph502/code/python/xmca/tests/integration/test_integration_xarray.py", line 148, in test_save_load     
        rmtree(join(getcwd(), 'tests/integration/temp/'))
      File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-22.01/lib/python3.9/shutil.py", line 731, in rmtree           
        onerror(os.rmdir, path, sys.exc_info())
      File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-22.01/lib/python3.9/shutil.py", line 729, in rmtree           
        os.rmdir(path)
    OSError: [Errno 39] Directory not empty: '/home/502/aph502/code/python/xmca/tests/integration/temp/'                     
    
    ======================================================================                                                   
    ERROR: test_save_load_rot (integration.test_integration_xarray.TestIntegration)                                          
    ----------------------------------------------------------------------                                                   
    Traceback (most recent call last):
      File "/home/502/aph502/.local/lib/python3.9/site-packages/parameterized/parameterized.py", line 533, in standalone_func
        return func(*(a + p.args), **p.kwargs)
      File "/home/502/aph502/code/python/xmca/tests/integration/test_integration_xarray.py", line 148, in test_save_load     
        rmtree(join(getcwd(), 'tests/integration/temp/'))
      File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-22.01/lib/python3.9/shutil.py", line 731, in rmtree           
        onerror(os.rmdir, path, sys.exc_info())
      File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-22.01/lib/python3.9/shutil.py", line 729, in rmtree           
        os.rmdir(path)
    OSError: [Errno 39] Directory not empty: '/home/502/aph502/code/python/xmca/tests/integration/temp/'                     
    
    ======================================================================                                                   
    ERROR: test_save_load_std (integration.test_integration_xarray.TestIntegration)                                          
    ----------------------------------------------------------------------                                                   
    Traceback (most recent call last):
      File "/home/502/aph502/.local/lib/python3.9/site-packages/parameterized/parameterized.py", line 533, in standalone_func
        return func(*(a + p.args), **p.kwargs)
      File "/home/502/aph502/code/python/xmca/tests/integration/test_integration_xarray.py", line 148, in test_save_load     
        rmtree(join(getcwd(), 'tests/integration/temp/'))
      File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-22.01/lib/python3.9/shutil.py", line 731, in rmtree           
        onerror(os.rmdir, path, sys.exc_info())
      File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-22.01/lib/python3.9/shutil.py", line 729, in rmtree           
        os.rmdir(path)
    OSError: [Errno 39] Directory not empty: '/home/502/aph502/code/python/xmca/tests/integration/temp/'                     
    
    ---------------------------------------------------------------------- 
    

    but it wasn't obvious that this was an issue with the version of cartopy.

    opened by aidanheerdegen 3
  • installation error

    installation error

    Hello Niclas,

    I installed xmca but when I import the library I get this error: libmkl_rt.so: cannot open shared object file: No such file or directory

    any ideas or suggestions would be greatly appreciated.

    Cheers,

    Charles.

    opened by chjones2 3
  • May cause errors after using mca. normalize()

    May cause errors after using mca. normalize()

    Hello, first of all, thank you for your contribution. However, I found a bug that may cause SVD to fail to calculate (due to the existence of NaN value). The details are as follows:

    when MCA class is initialized, it will be called get_nan_cols and remove_ nan_ cols to remove the NaN value, but if you call mca.normalize() at this time. New NaN values can appear and be brought into SVD calculation. This is because if the value of each time step of a grid point in the input array is the same (For example, it never rains in a place), the value obtained after standardization is NaN, which causes SVD unable to solve problem.

    opened by L1angY 2
  • Feedback

    Feedback

    I'm posting my feedback on behalf of Alex vM, who asked me to have a look at this package. The feedback is subjective and you may disagree with some if not all suggestions below.

    1. In general, the readme is well written. Clear and concise. I'd add a figure(s) though. For instance, after you call mca.plot() in your example.
    2. Currently, this package is for developers or at least those who have python experience. Well, maybe covariance estimation requires knowledge of python and programming skills, but I recommend making sphinx documentation (and publishing it on readthedocs) to bring users. You've already written docs for each if not all functions and classes. The only step left is to wrap it in sphinx with default parameters and paths.
    3. To give credits to your package and show that it's well maintained, I recommend adding badges (readthedocs, travis build and test coverage). Use CircleCI. Here is a config example (you need only build-pip job since it's the easiest).
    4. tools folder must be in the xmca folder.
    5. setup.py install requires must be read from requirements.txt file (example) and not hard-coded.
    6. GPL-3 license is too restrictive. Consider BSD-3 or even MIT.
    7. Each test that involves randomness must start with a numpy.random.seed. Currently, you're setting the seed globally. It's not a good idea because the test results depend on the test order, which, of course, should not happen.

    Good luck!

    Best, Danylo

    bug documentation 
    opened by dizcza 1
  • save_analysis currently does not save cos lat weighting

    save_analysis currently does not save cos lat weighting

    Just stumbled over this:

    When saving a model via xmca.xarray.save_analysis and cosine latitude weighting was applied (apply_coslat), the current implementation does not invoke xmca.xarray.apply_coslat when the model gets loaded via xmca.xarray.load_analysis thus creating false PCs.

    I hope to provide a fix to this soon.

    bug 
    opened by nicrie 0
  • Release 1.0.0

    Release 1.0.0

    New in release 1.0.0

    • method predict allows to project new, unseen data to obtain the corresponding PCs (works for standard, rotation and complex)
    • more efficient storing/loading of files; Unfortunately, this and the point above made it necessary to change the code considerably. As a consequence, loading models which were performed and saved using an older package version (0.x.y) is not supported.
    • add method to summarize performed analysis (summary)
    • add method to return input fields
    • improve docs (resolves #7)
    • correct and consistent use of definition of loadings
    • some bugfixes (e.g. resolves #12 )
    opened by nicrie 0
  • MCA errors

    MCA errors

    Hello, I am trying to run the MCA with two variables, which are a climate model, WRF's output.

    I get this error right after the bit:

    mca.plot(mode=1, **pkwargs) : 
    

    ValueError: coordinate lon has dimensions ('south_north', 'west_east'), but these are not a subset of the DataArray dimensions ['lat', 'lon', 'mode']

    Would really appreciate any help with this error. Many thanks.

    # Load packages and data:
    import xarray as xr
    import matplotlib.pyplot as plt
    import cartopy
    
    var=xr.open_dataset("F:\\era5_2000_2020_vars_salem.nc")
    
    t2=var.T2C
    snow=var.SNOWH
    
    #The variables, e.g., t2 is  structured as follows: 
    t2:
    <xarray.DataArray 'T2C' (time: 3512, south_north: 111, west_east: 114)>
    Coordinates:
        lat          (south_north, west_east) float32 ...
        lon          (south_north, west_east) float32 ...
        xtime        (time) datetime64[ns] ...
      * time         (time) datetime64[ns] 2000-11-01 2000-11-02 ... 2020-04-29
      * west_east    (west_east) float64 -2.766e+05 -2.666e+05 ... 8.534e+05
      * south_north  (south_north) float64 -1.353e+05 -1.253e+05 ... 9.647e+05
    Attributes:
        FieldType:    104
        MemoryOrder:  XY 
        description:  2m Temperature
        units:        C
        stagger:      
        pyproj_srs:   +proj=lcc +lat_0=64 +lon_0=10 +lat_1=64 +lat_2=68 +x_0=0 +y...
        coordinates:  XLONG XLAT XTIME
    
    mca = xMCA(t2, snow)                  # MCA of field A and B
    mca.solve(complexify=False)            # True for complex MCA
    
    
    eigenvalues = mca.singular_values()     
    pcs = mca.pcs()                           
    eofs = mca.eofs()   
    
    mca.set_field_names('t2','snow')
    pkwargs = {'orientation' : 'vertical'}
    mca.plot(mode=1, **pkwargs)
    
    opened by Murk89 23
  • Sourcery Starbot ⭐ refactored nicrie/xmca

    Sourcery Starbot ⭐ refactored nicrie/xmca

    Thanks for starring sourcery-ai/sourcery ✨ 🌟 ✨

    Here's your pull request refactoring your most popular Python repo.

    If you want Sourcery to refactor all your Python repos and incoming pull requests install our bot.

    Review changes via command line

    To manually merge these changes, make sure you're on the master branch, then run:

    git fetch https://github.com/sourcery-ai-bot/xmca master
    git merge --ff-only FETCH_HEAD
    git reset HEAD^
    
    opened by sourcery-ai-bot 0
  • multivariate EOF analysis / MCA

    multivariate EOF analysis / MCA

    add this feature in next release

    note: this will be probably be a major change since it requires to rewrite the internal structure of the package and therefore will break backwards version compatibility

    enhancement 
    opened by nicrie 0
Releases(1.4.2)
Owner
Niclas Rieger
Niclas Rieger
Python script for transferring data between three drives in two separate stages

Waterlock Waterlock is a Python script meant for incrementally transferring data between three folder locations in two separate stages. It performs ha

David Swanlund 13 Nov 10, 2021
Display the behaviour of a realtime program with a scope or logic analyser.

1. A monitor for realtime MicroPython code This library provides a means of examining the behaviour of a running system. It was initially designed to

Peter Hinch 17 Dec 05, 2022
MidTerm Project for the Data Analysis FT Bootcamp, Adam Tycner and Florent ZAHOUI

MidTerm Project for the Data Analysis FT Bootcamp, Adam Tycner and Florent ZAHOUI Hallo

Florent Zahoui 1 Feb 07, 2022
A multi-platform GUI for bit-based analysis, processing, and visualization

A multi-platform GUI for bit-based analysis, processing, and visualization

Mahlet 529 Dec 19, 2022
Python package for processing UC module spectral data.

UC Module Python Package How To Install clone repo. cd UC-module pip install . How to Use uc.module.UC(measurment=str, dark=str, reference=str, heade

Nicolai Haaber Junge 1 Oct 20, 2021
CaterApp is a cross platform, remotely data sharing tool created for sharing files in a quick and secured manner.

CaterApp is a cross platform, remotely data sharing tool created for sharing files in a quick and secured manner. It is aimed to integrate this tool with several more features including providing a U

Ravi Prakash 3 Jun 27, 2021
COVID-19 deaths statistics around the world

COVID-19-Deaths-Dataset COVID-19 deaths statistics around the world This is a daily updated dataset of COVID-19 deaths around the world. The dataset c

Nisa Efendioğlu 4 Jul 10, 2022
Python package to transfer data in a fast, reliable, and packetized form.

pySerialTransfer Python package to transfer data in a fast, reliable, and packetized form.

PB2 101 Dec 07, 2022
An Integrated Experimental Platform for time series data anomaly detection.

Curve Sorry to tell contributors and users. We decided to archive the project temporarily due to the employee work plan of collaborators. There are no

Baidu 486 Dec 21, 2022
A neural-based binary analysis tool

A neural-based binary analysis tool Introduction This directory contains the demo of a neural-based binary analysis tool. We test the framework using

Facebook Research 208 Dec 22, 2022
Aggregating gridded data (xarray) to polygons

A package to aggregate gridded data in xarray to polygons in geopandas using area-weighting from the relative area overlaps between pixels and polygons. Check out the binder link above for a sample c

Kevin Schwarzwald 42 Nov 09, 2022
[CVPR2022] This repository contains code for the paper "Nested Collaborative Learning for Long-Tailed Visual Recognition", published at CVPR 2022

Nested Collaborative Learning for Long-Tailed Visual Recognition This repository is the official PyTorch implementation of the paper in CVPR 2022: Nes

Jun Li 65 Dec 09, 2022
Random dataframe and database table generator

Random database/dataframe generator Authored and maintained by Dr. Tirthajyoti Sarkar, Fremont, USA Introduction Often, beginners in SQL or data scien

Tirthajyoti Sarkar 249 Jan 08, 2023
Pypeln is a simple yet powerful Python library for creating concurrent data pipelines.

Pypeln Pypeln (pronounced as "pypeline") is a simple yet powerful Python library for creating concurrent data pipelines. Main Features Simple: Pypeln

Cristian Garcia 1.4k Dec 31, 2022
A 2-dimensional physics engine written in Cairo

A 2-dimensional physics engine written in Cairo

Topology 38 Nov 16, 2022
Used for data processing in machine learning, and help us to construct ML model more easily from scratch

Used for data processing in machine learning, and help us to construct ML model more easily from scratch. Can be used in linear model, logistic regression model, and decision tree.

ShawnWang 0 Jul 05, 2022
PyEmits, a python package for easy manipulation in time-series data.

PyEmits, a python package for easy manipulation in time-series data. Time-series data is very common in real life. Engineering FSI industry (Financial

Thompson 5 Sep 23, 2022
Data Intelligence Applications - Online Product Advertising and Pricing with Context Generation

Data Intelligence Applications - Online Product Advertising and Pricing with Context Generation Overview Consider the scenario in which advertisement

Manuel Bressan 2 Nov 18, 2021
A project consists in a set of assignements corresponding to a BI process: data integration, construction of an OLAP cube, qurying of a OPLAP cube and reporting.

TennisBusinessIntelligenceProject - A project consists in a set of assignements corresponding to a BI process: data integration, construction of an OLAP cube, qurying of a OPLAP cube and reporting.

carlo paladino 1 Jan 02, 2022
A model checker for verifying properties in epistemic models

Epistemic Model Checker This is a model checker for verifying properties in epistemic models. The goal of the model checker is to check for Pluralisti

Thomas Träff 2 Dec 22, 2021