Create HTML profiling reports from pandas DataFrame objects

Last update: Jan 01, 2023

Overview

Pandas Profiling

Generates profile reports from a pandas DataFrame.

The pandas df.describe() function is great but a little basic for serious exploratory data analysis. pandas_profiling extends the pandas DataFrame with df.profile_report() for quick data analysis.

For each column the following statistics - if relevant for the column type - are presented in an interactive HTML report:

Type inference: detect the types of columns in a dataframe.
Essentials: type, unique values, missing values
Quantile statistics like minimum value, Q1, median, Q3, maximum, range, interquartile range
Descriptive statistics like mean, mode, standard deviation, sum, median absolute deviation, coefficient of variation, kurtosis, skewness
Most frequent values
Histogram
Correlations highlighting of highly correlated variables, Spearman, Pearson and Kendall matrices
Missing values matrix, count, heatmap and dendrogram of missing values
Text analysis learn about categories (Uppercase, Space), scripts (Latin, Cyrillic) and blocks (ASCII) of text data.
File and Image analysis extract file sizes, creation dates and dimensions and scan for truncated images or those containing EXIF information.

Announcements

Version v2.11.0 released featuring an exciting integration with Great Expectations that many of you requested (see details below).

Spark backend in progress: We can happily announce that we're nearing v1 for the Spark backend for generating profile reports. Stay tuned.

Support `pandas-profiling`

The development of pandas-profiling relies completely on contributions. If you find value in the package, we welcome you to support the project directly through GitHub Sponsors! Please help me to continue to support this package. It's extra exciting that GitHub matches your contribution for the first year.

Find more information here:

February 20, 2021 💘

Examples

The following examples can give you an impression of what the package can do:

Census Income (US Adult Census data relating income)
NASA Meteorites (comprehensive set of meteorite landings)
Titanic (the "Wonderwall" of datasets)
NZA (open data from the Dutch Healthcare Authority)
Stata Auto (1978 Automobile data)
Vektis (Vektis Dutch Healthcare data)
Colors (a simple colors dataset)
UCI Bank Dataset (banking marketing dataset)

Specific features:

Russian Vocabulary (demonstrates text analysis)
Cats and Dogs (demonstrates image analysis from the file system)
Celebrity Faces (demonstrates image analysis with EXIF information)
Website Inaccessibility (demonstrates URL analysis)
Orange prices and Coal prices (showcases report themes)

Tutorials:

Tutorial: report structure using Kaggle data (advanced) (modify the report's structure)

Installation

Using pip

You can install using the pip package manager by running

pip install pandas-profiling[notebook]

Alternatively, you could install the latest version directly from Github:

pip install https://github.com/pandas-profiling/pandas-profiling/archive/master.zip

Using conda

You can install using the conda package manager by running

conda install -c conda-forge pandas-profiling

From source

Download the source code by cloning the repository or by pressing 'Download ZIP' on this page.

Install by navigating to the proper directory and running:

python setup.py install

Documentation

The documentation for pandas_profiling can be found here. Previous documentation is still available here.

Getting started

Start by loading in your pandas DataFrame, e.g. by using:

import numpy as np
import pandas as pd
from pandas_profiling import ProfileReport

df = pd.DataFrame(
    np.random.rand(100, 5),
    columns=["a", "b", "c", "d", "e"]
)

To generate the report, run:

profile = ProfileReport(df, title="Pandas Profiling Report")

Explore deeper

You can configure the profile report in any way you like. The example code below loads the explorative configuration file, that includes many features for text (length distribution, unicode information), files (file size, creation time) and images (dimensions, exif information). If you are interested what exact settings were used, you can compare with the default configuration file.

profile = ProfileReport(df, title='Pandas Profiling Report', explorative=True)

Learn more about configuring pandas-profiling on the Advanced usage page.

Jupyter Notebook

We recommend generating reports interactively by using the Jupyter notebook. There are two interfaces (see animations below): through widgets and through a HTML report.

This is achieved by simply displaying the report. In the Jupyter Notebook, run:

profile.to_widgets()

The HTML report can be included in a Jupyter notebook:

Run the following code:

profile.to_notebook_iframe()

Saving the report

If you want to generate a HTML report file, save the ProfileReport to an object and use the to_file() function:

profile.to_file("your_report.html")

Alternatively, you can obtain the data as JSON:

# As a string
json_data = profile.to_json()

# As a file
profile.to_file("your_report.json")

Large datasets

Version 2.4 introduces minimal mode.

This is a default configuration that disables expensive computations (such as correlations and dynamic binning).

Use the following syntax:

profile = ProfileReport(large_dataset, minimal=True)
profile.to_file("output.html")

Command line usage

For standard formatted CSV files that can be read immediately by pandas, you can use the pandas_profiling executable.

Run the following for information about options and arguments.

pandas_profiling -h

Advanced usage

A set of options is available in order to adapt the report generated.

title (str): Title for the report ('Pandas Profiling Report' by default).
pool_size (int): Number of workers in thread pool. When set to zero, it is set to the number of CPUs available (0 by default).
progress_bar (bool): If True, pandas-profiling will display a progress bar.
infer_dtypes (bool): When True (default) the dtype of variables are inferred using visions using the typeset logic (for instance a column that has integers stored as string will be analyzed as if being numeric).

More settings can be found in the default configuration file, minimal configuration file and dark themed configuration file.

You find the configuration docs on the advanced usage page here

Example

profile = df.profile_report(title='Pandas Profiling Report', plot={'histogram': {'bins': 8}})
profile.to_file("output.html")

Integrations

Great Expectations

Profiling your data is closely related to data validation: often validation rules are defined in terms of well-known statistics. For that purpose, pandas-profiling integrates with Great Expectations. This a world-class open-source library that helps you to maintain data quality and improve communication about data between teams. Great Expectations allows you to create Expectations (which are basically unit tests for your data) and Data Docs (conveniently shareable HTML data reports). pandas-profiling features a method to create a suite of Expectations based on the results of your ProfileReport, which you can store, and use to validate another (or future) dataset.

You can find more details on the Great Expectations integration here

Supporting open source

Maintaining and developing the open-source code for pandas-profiling, with millions of downloads and thousands of users, would not be possible without support of our gracious sponsors.

Lambda workstations, servers, laptops, and cloud services power engineers and researchers at Fortune 500 companies and 94% of the top 50 universities. Lambda Cloud offers 4 & 8 GPU instances starting at $1.50 / hr. Pre-installed with TensorFlow, PyTorch, Ubuntu, CUDA, and cuDNN.

We would like to thank our generous Github Sponsors supporters who make pandas-profiling possible:

Martin Sotir, Brian Lee, Stephanie Rivera, abdulAziz, gramster

More info if you would like to appear here: Github Sponsor page

Types

Types are a powerful abstraction for effective data analysis, that goes beyond the logical data types (integer, float etc.). pandas-profiling currently, recognizes the following types: Boolean, Numerical, Date, Categorical, URL, Path, File and Image.

We have developed a type system for Python, tailored for data analysis: visions. Selecting the right typeset drastically reduces the complexity the code of your analysis. Future versions of pandas-profiling will have extended type support through visions!

Contributing

Read on getting involved in the Contribution Guide.

A low threshold place to ask questions or start contributing is by reaching out on the pandas-profiling Slack. Join the Slack community.

Editor integration

PyCharm integration

Install pandas-profiling via the instructions above

Locate your pandas-profiling executable.

On macOS / Linux / BSD:

$ which pandas_profiling
(example) /usr/local/bin/pandas_profiling

On Windows:

$ where pandas_profiling
(example) C:\ProgramData\Anaconda3\Scripts\pandas_profiling.exe

In PyCharm, go to Settings (or Preferences on macOS) > Tools > External tools
Click the + icon to add a new external tool
Insert the following values
- Name: Pandas Profiling
- Program: The location obtained in step 2
- Arguments: "$FilePath$" "$FileDir$/$FileNameWithoutAllExtensions$_report.html"
- Working Directory: $ProjectFileDir$

To use the PyCharm Integration, right click on any dataset file:

External Tools > Pandas Profiling.

Other integrations

Other editor integrations may be contributed via pull requests.

Dependencies

The profile report is written in HTML and CSS, which means pandas-profiling requires a modern browser.

You need Python 3 to run this package. Other dependencies can be found in the requirements files:

Filename	Requirements
requirements.txt	Package requirements
requirements-dev.txt	Requirements for development
requirements-test.txt	Requirements for testing
setup.py	Requirements for Widgets etc.

Comments

pandas-profiling not compatible with pandas v1.0
Describe the bug

pandas-profiling not compatible with pandas v1.0. The key method "ProfileReport" returns error "TypeError: concat() got an unexpected keyword argument 'join_axes'" as join_axes is deprecated starting from Pandas v1.0. https://pandas.pydata.org/pandas-docs/stable/whatsnew/v1.0.0.html?highlight=concat

To Reproduce

import pandas as pd import pandas_profiling

def test_issueXXX(): df = pd.read_csv(r'')

pf = pandas.profiling.ProfileReport(df)

TypeError: concat() got an unexpected keyword argument 'join_axes'

Version information:

Python version: 3.7.

Environment: Command Line and Pycharm

pandas-profiling: 1.4.1

pandas: 1.0

bug 🐛
opened by mantou16 27
AttributeError: 'DataFrame' object has no attribute 'profile_report'
Describe the bug Running the example in readme generates an error.

To Reproduce Running:

import numpy as np import pandas as pd import pandas_profiling df = pd.DataFrame( np.random.rand(100, 5), columns=['a', 'b', 'c', 'd', 'e'] ) df.profile_report()

in a Jupyter notebook gives:

--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) <ipython-input-16-f9a7584e785c> in <module> ----> 1 df.profile_report() ~/anaconda3/lib/python3.7/site-packages/pandas/core/generic.py in __getattr__(self, name) 5065 if self._info_axis._can_hold_identifiers_and_holds_name(name): 5066 return self[name] -> 5067 return object.__getattribute__(self, name) 5068 5069 def __setattr__(self, name, value): AttributeError: 'DataFrame' object has no attribute 'profile_report'

Version information: alabaster==0.7.12 anaconda-client==1.7.2 anaconda-navigator==1.9.7 anaconda-project==0.8.2 asn1crypto==0.24.0 astroid==2.2.5 astropy==3.1.2 atomicwrites==1.3.0 attrs==19.1.0 Babel==2.6.0 backcall==0.1.0 backports.os==0.1.1 backports.shutil-get-terminal-size==1.0.0 beautifulsoup4==4.7.1 bitarray==0.8.3 bkcharts==0.2 bleach==3.1.0 bokeh==1.0.4 boto==2.49.0 Bottleneck==1.2.1 certifi==2019.3.9 cffi==1.12.2 chardet==3.0.4 Click==7.0 cloudpickle==0.8.0 clyent==1.2.2 colorama==0.4.1 conda==4.6.14 conda-build==3.17.8 conda-verify==3.1.1 confuse==1.0.0 contextlib2==0.5.5 cryptography==2.6.1 cycler==0.10.0 Cython==0.29.6 cytoolz==0.9.0.1 dask==1.1.4 decorator==4.4.0 defusedxml==0.5.0 distributed==1.26.0 docutils==0.14 entrypoints==0.3 et-xmlfile==1.0.1 fastcache==1.0.2 filelock==3.0.10 Flask==1.0.2 future==0.17.1 gevent==1.4.0 glob2==0.6 gmpy2==2.0.8 greenlet==0.4.15 h5py==2.9.0 heapdict==1.0.0 hpat==0.28.1 html5lib==1.0.1 htmlmin==0.1.12 idna==2.8 imageio==2.5.0 imagesize==1.1.0 importlib-metadata==0.0.0 ipykernel==5.1.0 ipyparallel==6.2.4 ipython==7.4.0 ipython-genutils==0.2.0 ipywidgets==7.4.2 isort==4.3.16 itsdangerous==1.1.0 jdcal==1.4 jedi==0.13.3 jeepney==0.4 Jinja2==2.10 jsonschema==3.0.1 jupyter==1.0.0 jupyter-client==5.2.4 jupyter-console==6.0.0 jupyter-core==4.4.0 jupyterlab==0.35.4 jupyterlab-server==0.2.0 keyring==18.0.0 kiwisolver==1.0.1 lazy-object-proxy==1.3.1 libarchive-c==2.8 lief==0.9.0 lightgbm==2.2.3 llvmlite==0.28.0 locket==0.2.0 lxml==4.3.2 MarkupSafe==1.1.1 matplotlib==3.0.3 mccabe==0.6.1 missingno==0.4.1 mistune==0.8.4 mkl-fft==1.0.10 mkl-random==1.0.2 mock==2.0.0 more-itertools==6.0.0 mpi4py==3.0.1 mpmath==1.1.0 msgpack==0.6.1 multipledispatch==0.6.0 navigator-updater==0.2.1 nbconvert==5.4.1 nbformat==4.4.0 networkx==2.2 nltk==3.4 nose==1.3.7 notebook==5.7.8 numba==0.43.1 numerapi==1.5.1 numerox==3.7.0 numexpr==2.6.9 numpy==1.16.2 numpydoc==0.8.0 olefile==0.46 openpyxl==2.6.1 packaging==19.0 pandas==0.24.2 pandas-profiling==1.4.1 pandocfilters==1.4.2 parso==0.3.4 partd==0.3.10 path.py==11.5.0 pathlib2==2.3.3 patsy==0.5.1 pbr==5.2.0 pep8==1.7.1 pexpect==4.6.0 phik==0.9.8 pickleshare==0.7.5 Pillow==5.4.1 pkginfo==1.5.0.1 plotly==3.8.1 pluggy==0.9.0 ply==3.11 prometheus-client==0.6.0 prompt-toolkit==2.0.9 psutil==5.6.1 ptyprocess==0.6.0 py==1.8.0 pyarrow==0.11.1 pycodestyle==2.5.0 pycosat==0.6.3 pycparser==2.19 pycrypto==2.6.1 pycurl==7.43.0.2 pyflakes==2.1.1 Pygments==2.3.1 pylint==2.3.1 pyodbc==4.0.26 pyOpenSSL==19.0.0 pyparsing==2.3.1 pyrsistent==0.14.11 PySocks==1.6.8 pytest==4.3.1 pytest-arraydiff==0.3 pytest-astropy==0.5.0 pytest-doctestplus==0.3.0 pytest-openfiles==0.3.2 pytest-pylint==0.14.0 pytest-remotedata==0.3.1 python-dateutil==2.8.0 python-igraph==0.7.1.post6 pytz==2018.9 PyWavelets==1.0.2 PyYAML==5.1 pyzmq==18.0.0 QtAwesome==0.5.7 qtconsole==4.4.3 QtPy==1.7.0 requests==2.21.0 retrying==1.3.3 rope==0.12.0 ruamel-yaml==0.15.46 scikit-image==0.14.2 scikit-learn==0.20.3 scipy==1.2.1 seaborn==0.9.0 SecretStorage==3.1.1 Send2Trash==1.5.0 simplegeneric==0.8.1 singledispatch==3.4.0.3 six==1.12.0 snowballstemmer==1.2.1 sortedcollections==1.1.2 sortedcontainers==2.1.0 soupsieve==1.8 Sphinx==1.8.5 sphinxcontrib-websupport==1.1.0 spyder==3.3.3 spyder-kernels==0.4.2 SQLAlchemy==1.3.1 statsmodels==0.9.0 sympy==1.3 tables==3.5.1 tblib==1.3.2 terminado==0.8.1 testpath==0.4.2 toolz==0.9.0 tornado==6.0.2 tqdm==4.31.1 traitlets==4.3.2 typed-ast==1.4.0 unicodecsv==0.14.1 urllib3==1.24.1 wcwidth==0.1.7 webencodings==0.5.1 Werkzeug==0.14.1 widgetsnbextension==3.4.2 wrapt==1.11.1 wurlitzer==1.0.2 xlrd==1.2.0 XlsxWriter==1.1.5 xlwt==1.3.0 zict==0.1.4 zipp==0.3.3

Additional context Add any other context about the problem here.
bug 🐛
opened by bdch1234 22
Ploting a response variable on the histograms

Hey,

Great job with pandas-profiling I love it. I think it would be great to have an extra parameter to specify a response column. Plotting the average response for every bin of the histograms (for each variables) would allow to see obvious trends/correlations and would be useful for any regression problem (might be more tricky for classification where the response are discrete). What do you think ?

Thanks!
feature request 💬

opened by Optimox 17
feat: added filter to locate columns

This is a follow-up PR to the PR made earlier (#1096). Closes #638 Have changed the input from an text field to a dropdown as per @fabclmnt's suggestion.

Here's how it looks and works now:

https://user-images.githubusercontent.com/57868024/194428807-a7642deb-6ba5-4404-95ef-3e9605ba10cd.mp4

The dropdown isn't visible due to restrictions on the screen-recorder, here's an image of it in action for reference.

P.S. I'm sorry for the hassle in the previous PR, I haven't worked with git very much. Thank you for your patience.

opened by g-kabra 16

Potential incompatiblity with Pandas 1.4.0

Describe the bug

Pandas version 1.4.0 was release few days ago and some tests start failing. I was able to reproduce with a minimum example which is failing with Pandas 1.4.0 and working with Pandas 1.3.5.

To Reproduce

import pandas as pd
import pandas_profiling

data = {"col1": [1, 2], "col2": [3, 4]}
dataframe = pd.DataFrame(data=data)

profile = pandas_profiling.ProfileReport(dataframe, minimal=False)
profile.to_html()

When running with Pandas 1.4.0, I get the following traceback:

Traceback (most recent call last):
  File "/tmp/bug.py", line 8, in <module>
    profile.to_html()
  File "/vemv/lib/python3.9/site-packages/pandas_profiling/profile_report.py", line 368, in to_html
    return self.html
  File "/vemv/lib/python3.9/site-packages/pandas_profiling/profile_report.py", line 185, in html
    self._html = self._render_html()
  File "/vemv/lib/python3.9/site-packages/pandas_profiling/profile_report.py", line 287, in _render_html
    report = self.report
  File "/vemv/lib/python3.9/site-packages/pandas_profiling/profile_report.py", line 179, in report
    self._report = get_report_structure(self.config, self.description_set)
  File "/vemv/lib/python3.9/site-packages/pandas_profiling/profile_report.py", line 161, in description_set
    self._description_set = describe_df(
  File "/vemv/lib/python3.9/site-packages/pandas_profiling/model/describe.py", line 71, in describe
    series_description = get_series_descriptions(
  File "/vemv/lib/python3.9/site-packages/multimethod/__init__.py", line 303, in __call__
    return func(*args, **kwargs)
  File "/vemv/lib/python3.9/site-packages/pandas_profiling/model/pandas/summary_pandas.py", line 92, in pandas_get_series_descriptions
    for i, (column, description) in enumerate(
  File "/home/lothiraldan/.pyenv/versions/3.9.1/lib/python3.9/multiprocessing/pool.py", line 870, in next
    raise value
  File "/home/lothiraldan/.pyenv/versions/3.9.1/lib/python3.9/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/vemv/lib/python3.9/site-packages/pandas_profiling/model/pandas/summary_pandas.py", line 72, in multiprocess_1d
    return column, describe_1d(config, series, summarizer, typeset)
  File "/vemv/lib/python3.9/site-packages/multimethod/__init__.py", line 303, in __call__
    return func(*args, **kwargs)
  File "/vemv/lib/python3.9/site-packages/pandas_profiling/model/pandas/summary_pandas.py", line 50, in pandas_describe_1d
    return summarizer.summarize(config, series, dtype=vtype)
  File "/vemv/lib/python3.9/site-packages/pandas_profiling/model/summarizer.py", line 37, in summarize
    _, _, summary = self.handle(str(dtype), config, series, {"type": str(dtype)})
  File "/vemv/lib/python3.9/site-packages/pandas_profiling/model/handler.py", line 62, in handle
    return op(*args)
  File "/vemv/lib/python3.9/site-packages/pandas_profiling/model/handler.py", line 21, in func2
    return f(*res)
  File "/vemv/lib/python3.9/site-packages/pandas_profiling/model/handler.py", line 21, in func2
    return f(*res)
  File "/vemv/lib/python3.9/site-packages/pandas_profiling/model/handler.py", line 21, in func2
    return f(*res)
  File "/vemv/lib/python3.9/site-packages/pandas_profiling/model/handler.py", line 17, in func2
    res = g(*x)
  File "/vemv/lib/python3.9/site-packages/multimethod/__init__.py", line 303, in __call__
    return func(*args, **kwargs)
  File "/vemv/lib/python3.9/site-packages/pandas_profiling/model/summary_algorithms.py", line 65, in inner
    return fn(config, series, summary)
  File "/vemv/lib/python3.9/site-packages/pandas_profiling/model/summary_algorithms.py", line 82, in inner
    return fn(config, series, summary)
  File "/vemv/lib/python3.9/site-packages/pandas_profiling/model/pandas/describe_categorical_pandas.py", line 205, in pandas_describe_categorical_1d
    summary.update(length_summary_vc(value_counts))
  File "/vemv/lib/python3.9/site-packages/pandas_profiling/model/pandas/describe_categorical_pandas.py", line 162, in length_summary_vc
    "median_length": weighted_median(
  File "/vemv/lib/python3.9/site-packages/pandas_profiling/model/pandas/utils_pandas.py", line 13, in weighted_median
    w_median = (data[weights == np.max(weights)])[0]
IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices

If I try changing the minimal from False to True, the script is now passing.

Version information:

Failing environment

Python version: Python 3.9.1 Pip version: pip 21.3.1 Pandas and pandas-profiling versions: 1.4.0 | 3.1.0 Full pip list:

Package               Version
--------------------- ---------
attrs                 21.4.0
certifi               2021.10.8
charset-normalizer    2.0.10
cycler                0.11.0
fonttools             4.28.5
htmlmin               0.1.12
idna                  3.3
ImageHash             4.2.1
Jinja2                3.0.3
joblib                1.0.1
kiwisolver            1.3.2
MarkupSafe            2.0.1
matplotlib            3.5.1
missingno             0.5.0
multimethod           1.6
networkx              2.6.3
numpy                 1.22.1
packaging             21.3
pandas                1.4.0
pandas-profiling      3.1.0
phik                  0.12.0
Pillow                9.0.0
pip                   21.3.1
pydantic              1.9.0
pyparsing             3.0.7
python-dateutil       2.8.2
pytz                  2021.3
PyWavelets            1.2.0
PyYAML                6.0
requests              2.27.1
scipy                 1.7.3
seaborn               0.11.2
setuptools            60.0.5
six                   1.16.0
tangled-up-in-unicode 0.1.0
tqdm                  4.62.3
typing_extensions     4.0.1
urllib3               1.26.8
visions               0.7.4
wheel                 0.37.1

Working environment

Python version: Python 3.9.1 Pip version: pip 21.3.1 Pandas and pandas-profiling versions: 1.3.5 | 3.1.0 Full pip list:

Package               Version
--------------------- ---------
attrs                 21.4.0
certifi               2021.10.8
charset-normalizer    2.0.10
cycler                0.11.0
fonttools             4.28.5
htmlmin               0.1.12
idna                  3.3
ImageHash             4.2.1
Jinja2                3.0.3
joblib                1.0.1
kiwisolver            1.3.2
MarkupSafe            2.0.1
matplotlib            3.5.1
missingno             0.5.0
multimethod           1.6
networkx              2.6.3
numpy                 1.22.1
packaging             21.3
pandas                1.3.5
pandas-profiling      3.1.0
phik                  0.12.0
Pillow                9.0.0
pip                   21.3.1
pydantic              1.9.0
pyparsing             3.0.7
python-dateutil       2.8.2
pytz                  2021.3
PyWavelets            1.2.0
PyYAML                6.0
requests              2.27.1
scipy                 1.7.3
seaborn               0.11.2
setuptools            60.0.5
six                   1.16.0
tangled-up-in-unicode 0.1.0
tqdm                  4.62.3
typing_extensions     4.0.1
urllib3               1.26.8
visions               0.7.4
wheel                 0.37.1

Let me know if I can provide more details and thank you for your good work!

bug 🐛

opened by Lothiraldan 15

TypeError: numpy boolean subtract, the `-` operator, is deprecated, use the bitwise_xor, the `^` operator, or the logical_xor function instead.
stats['range'] = stats['max'] - stats['min'] TypeError: numpy boolean subtract, the `-` operator, is deprecated, use the bitwise_xor, the `^` operator, or the logical_xor function instead.

I got this error
bug 🐛 information requested ❔ help wanted 🙋
opened by eyadsibai 15

2.10.0 - TraitError: The 'value' trait of a HTML instance must be a unicode string...

Describe the bug

Hi there - Looks like latest release (2.10.0) has broken a to_widgets functionality as outlined in the Getting started section of the docs. Confirmed eolling back to 2.9.0 does not produce the issue.

To Reproduce

# pandas_profiling==2.10.0
import numpy as np
import pandas as pd
from pandas_profiling import ProfileReport

df = pd.DataFrame(
    np.random.rand(100, 5),
    columns=["a", "b", "c", "d", "e"]
)

profile = ProfileReport(df, title='Pandas Profiling Report', explorative=True)

profile.to_widgets()

Returns:

TraitError: The 'value' trait of a HTML instance must be a unicode string, but a value of Numeric <class 'visions.types.type.VisionsBaseTypeMeta'> was specified.

Version information: 2.10.0

Additional context

opened by rynmccrmck 14

ZeroDivisionError when using version 1.4.1

There was a change in behavior between versions 1.4.0 and 1.4.1 where some calls to ProfileReport that previously succeeded will now raise a ZeroDivisionError.

An example reproduction is to take the following code and run it in a Jupyter notebook cell:

import pandas
import pandas_profiling

import IPython

df = pandas.DataFrame({'c': 'v'}, index=['c'])
report = pandas_profiling.ProfileReport(df)
IPython.core.display.HTML(report.html)

With version 1.4.0 this produced an HTML report, but with version 1.4.1 it produces the following stack trace:

ZeroDivisionErrorTraceback (most recent call last)
<ipython-input-2-ffb5392b4284> in <module>()
      5 
      6 df = pandas.DataFrame({'c': 'v'}, index=['c'])
----> 7 report = pandas_profiling.ProfileReport(df)
      8 IPython.core.display.HTML(report.html)

/usr/local/lib/python2.7/dist-packages/pandas_profiling/__init__.pyc in __init__(self, df, **kwargs)
     67 
     68         self.html = to_html(sample,
---> 69                             description_set)
     70 
     71         self.description_set = description_set

/usr/local/lib/python2.7/dist-packages/pandas_profiling/report.pyc in to_html(sample, stats_object)
    192 
    193     # Add plot of matrix correlation
--> 194     pearson_matrix = plot.correlation_matrix(stats_object['correlations']['pearson'], 'Pearson')
    195     spearman_matrix = plot.correlation_matrix(stats_object['correlations']['spearman'], 'Spearman')
    196     correlations_html = templates.template('correlations').render(

/usr/local/lib/python2.7/dist-packages/pandas_profiling/plot.pyc in correlation_matrix(corrdf, title, **kwargs)
    134     plt.title(title, size=18)
    135     plt.colorbar(matrix_image)
--> 136     axes_cor.set_xticks(np.arange(0, corrdf.shape[0], corrdf.shape[0] * 1.0 / len(labels)))
    137     axes_cor.set_yticks(np.arange(0, corrdf.shape[1], corrdf.shape[1] * 1.0 / len(labels)))
    138     axes_cor.set_xticklabels(labels, rotation=90)

ZeroDivisionError: float division by zero

opened by ojarjur 14

pandas_profiling.utils.cache

ModuleNotFoundError: No module named 'pandas_profiling.utils'*

To Reproduce

Version information:

Additional context
information requested ❔

opened by ajaimes07 13
This call to matplotlib.use() has no effect because the backend has already

/home/flash1/work/software/python/anaconda2/lib/python2.7/site-packages/pandas_profiling/base.py:20: UserWarning: This call to matplotlib.use() has no effect because the backend has already been chosen; matplotlib.use() must be called before pylab, matplotlib.pyplot, or matplotlib.backends is imported for the first time.

The backend was originally set to 'module://ipykernel.pylab.backend_inline' by the following code: File "/home/flash1/work/software/python/anaconda2/lib/python2.7/runpy.py", line 174, in _run_module_as_main "main", fname, loader, pkg_name) File "/home/flash1/work/software/python/anaconda2/lib/python2.7/runpy.py", line 72, in _run_code exec code in run_globals File "/home/flash1/work/software/python/anaconda2/lib/python2.7/site-packages/ipykernel_launcher.py", line 16, in app.launch_new_instance() File "/home/flash1/work/software/python/anaconda2/lib/python2.7/site-packages/traitlets/config/application.py", line 658, in launch_instance app.start() File "/home/flash1/work/software/python/anaconda2/lib/python2.7/site-packages/ipykernel/kernelapp.py", line 477, in start ioloop.IOLoop.instance().start() File "/home/flash1/work/software/python/anaconda2/lib/python2.7/site-packages/zmq/eventloop/ioloop.py", line 177, in start super(ZMQIOLoop, self).start() File "/home/flash1/work/software/python/anaconda2/lib/python2.7/site-packages/tornado/ioloop.py", line 888, in start handler_func(fd_obj, events) File "/home/flash1/work/software/python/anaconda2/lib/python2.7/site-packages/tornado/stack_context.py", line 277, in null_wrapper return fn(*args, **kwargs) File "/home/flash1/work/software/python/anaconda2/lib/python2.7/site-packages/zmq/eventloop/zmqstream.py", line 440, in _handle_events self._handle_recv() File "/home/flash1/work/software/python/anaconda2/lib/python2.7/site-packages/zmq/eventloop/zmqstream.py", line 472, in _handle_recv self._run_callback(callback, msg) File "/home/flash1/work/software/python/anaconda2/lib/python2.7/site-packages/zmq/eventloop/zmqstream.py", line 414, in _run_callback callback(*args, **kwargs) File "/home/flash1/work/software/python/anaconda2/lib/python2.7/site-packages/tornado/stack_context.py", line 277, in null_wrapper return fn(*args, **kwargs) File "/home/flash1/work/software/python/anaconda2/lib/python2.7/site-packages/ipykernel/kernelbase.py", line 283, in dispatcher return self.dispatch_shell(stream, msg) File "/home/flash1/work/software/python/anaconda2/lib/python2.7/site-packages/ipykernel/kernelbase.py", line 235, in dispatch_shell handler(stream, idents, msg) File "/home/flash1/work/software/python/anaconda2/lib/python2.7/site-packages/ipykernel/kernelbase.py", line 399, in execute_request user_expressions, allow_stdin) File "/home/flash1/work/software/python/anaconda2/lib/python2.7/site-packages/ipykernel/ipkernel.py", line 196, in do_execute res = shell.run_cell(code, store_history=store_history, silent=silent) File "/home/flash1/work/software/python/anaconda2/lib/python2.7/site-packages/ipykernel/zmqshell.py", line 533, in run_cell return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs) File "/home/flash1/work/software/python/anaconda2/lib/python2.7/site-packages/IPython/core/interactiveshell.py", line 2718, in run_cell interactivity=interactivity, compiler=compiler, result=result) File "/home/flash1/work/software/python/anaconda2/lib/python2.7/site-packages/IPython/core/interactiveshell.py", line 2822, in run_ast_nodes if self.run_code(code, result): File "/home/flash1/work/software/python/anaconda2/lib/python2.7/site-packages/IPython/core/interactiveshell.py", line 2882, in run_code exec(code_obj, self.user_global_ns, self.user_ns) File "", line 8, in import matplotlib.pyplot as plt File "/home/flash1/work/software/python/anaconda2/lib/python2.7/site-packages/matplotlib/pyplot.py", line 69, in from matplotlib.backends import pylab_setup File "/home/flash1/work/software/python/anaconda2/lib/python2.7/site-packages/matplotlib/backends/init.py", line 14, in line for line in traceback.format_stack()

matplotlib.use('Agg')

opened by iweey 13
IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices
Describe the bug

running the example below gives this error IndexError: only integers, slices (:), ellipsis (...), numpy.newaxis (None) and integer or boolean arrays are valid indices

latest version on conda-forge

To Reproduce

wine.csv

import numpy as np import pandas as pd from pandas_profiling import ProfileReport df = pd.read_csv("wine.csv") profile = ProfileReport(df, title="Pandas Profiling Report") profile.to_file("tmp.html")

Version information:

Python 3.9

pandas-profiling 3.1.0 pyhd8ed1ab_ 0 conda-forge

pandas 1.4.2 py39h1832856_1 conda-forge

bug 🐛
opened by darenr 12
chore(deps): update dependency scipy to >=1.10, <1.11
This PR contains the following updates:

| Package | Change | Age | Adoption | Passing | Confidence | |---|---|---|---|---|---| | scipy (source) | >=1.4.1, <1.10 -> >=1.10, <1.11 | | | | |

Release Notes

scipy/scipy

v1.10.0: SciPy 1.10.0

Compare Source

SciPy 1.10.0 Release Notes

SciPy 1.10.0 is the culmination of 6 months of hard work. It contains many new features, numerous bug-fixes, improved test coverage and better documentation. There have been a number of deprecations and API changes in this release, which are documented below. All users are encouraged to upgrade to this release, as there are a large number of bug-fixes and optimizations. Before upgrading, we recommend that users check that their own code does not use deprecated SciPy functionality (to do so, run your code with python -Wd and check for DeprecationWarning s). Our development attention will now shift to bug-fix releases on the 1.10.x branch, and on adding new features on the main branch.

This release requires Python 3.8+ and NumPy 1.19.5 or greater.

For running on PyPy, PyPy3 6.0+ is required.

Highlights of this release

A new dedicated datasets submodule (scipy.datasets) has been added, and is now preferred over usage of scipy.misc for dataset retrieval.

A new scipy.interpolate.make_smoothing_spline function was added. This function constructs a smoothing cubic spline from noisy data, using the generalized cross-validation (GCV) criterion to find the tradeoff between smoothness and proximity to data points.

scipy.stats has three new distributions, two new hypothesis tests, three new sample statistics, a class for greater control over calculations involving covariance matrices, and many other enhancements.

New features

scipy.datasets introduction

A new dedicated datasets submodule has been added. The submodules is meant for datasets that are relevant to other SciPy submodules ands content (tutorials, examples, tests), as well as contain a curated set of datasets that are of wider interest. As of this release, all the datasets from scipy.misc have been added to scipy.datasets (and deprecated in scipy.misc).

The submodule is based on Pooch (a new optional dependency for SciPy), a Python package to simplify fetching data files. This move will, in a subsequent release, facilitate SciPy to trim down the sdist/wheel sizes, by decoupling the data files and moving them out of the SciPy repository, hosting them externally and downloading them when requested. After downloading the datasets once, the files are cached to avoid network dependence and repeated usage.

Added datasets from scipy.misc: scipy.datasets.face, scipy.datasets.ascent, scipy.datasets.electrocardiogram

Added download and caching functionality:

scipy.datasets.download_all: a function to download all the scipy.datasets associated files at once.

scipy.datasets.clear_cache: a simple utility function to clear cached dataset files from the file system.

scipy/datasets/_download_all.py can be run as a standalone script for packaging purposes to avoid any external dependency at build or test time. This can be used by SciPy packagers (e.g., for Linux distros) which may have to adhere to rules that forbid downloading sources from external repositories at package build time.

scipy.integrate improvements

Added parameter complex_func to scipy.integrate.quad, which can be set True to integrate a complex integrand.

scipy.interpolate improvements

scipy.interpolate.interpn now supports tensor-product interpolation methods (slinear, cubic, quintic and pchip)

Tensor-product interpolation methods (slinear, cubic, quintic and pchip) in scipy.interpolate.interpn and scipy.interpolate.RegularGridInterpolator now allow values with trailing dimensions.

scipy.interpolate.RegularGridInterpolator has a new fast path for method="linear" with 2D data, and RegularGridInterpolator is now easier to subclass

scipy.interpolate.interp1d now can take a single value for non-spline methods.

A new extrapolate argument is available to scipy.interpolate.BSpline.design_matrix, allowing extrapolation based on the first and last intervals.

A new function scipy.interpolate.make_smoothing_spline has been added. It is an implementation of the generalized cross-validation spline smoothing algorithm. The lam=None (default) mode of this function is a clean-room reimplementation of the classic gcvspl.f Fortran algorithm for constructing GCV splines.

A new method="pchip" mode was aded to scipy.interpolate.RegularGridInterpolator. This mode constructs an interpolator using tensor products of C1-continuous monotone splines (essentially, a scipy.interpolate.PchipInterpolator instance per dimension).

scipy.sparse.linalg improvements

The spectral 2-norm is now available in scipy.sparse.linalg.norm.

The performance of scipy.sparse.linalg.norm for the default case (Frobenius norm) has been improved.

LAPACK wrappers were added for trexc and trsen.

The scipy.sparse.linalg.lobpcg algorithm was rewritten, yielding the following improvements:

a simple tunable restart potentially increases the attainable accuracy for edge cases,

internal postprocessing runs one final exact Rayleigh-Ritz method giving more accurate and orthonormal eigenvectors,

output the computed iterate with the smallest max norm of the residual and drop the history of subsequent iterations,

remove the check for LinearOperator format input and thus allow a simple function handle of a callable object as an input,

better handling of common user errors with input data, rather than letting the algorithm fail.

scipy.linalg improvements

scipy.linalg.lu_factor now accepts rectangular arrays instead of being restricted to square arrays.

scipy.ndimage improvements

The new scipy.ndimage.value_indices function provides a time-efficient method to search for the locations of individual values with an array of image data.

A new radius argument is supported by scipy.ndimage.gaussian_filter1d and scipy.ndimage.gaussian_filter for adjusting the kernel size of the filter.

scipy.optimize improvements

scipy.optimize.brute now coerces non-iterable/single-value args into a tuple.

scipy.optimize.least_squares and scipy.optimize.curve_fit now accept scipy.optimize.Bounds for bounds constraints.

Added a tutorial for scipy.optimize.milp.

Improved the pretty-printing of scipy.optimize.OptimizeResult objects.

Additional options (parallel, threads, mip_rel_gap) can now be passed to scipy.optimize.linprog with method='highs'.

scipy.signal improvements

The new window function scipy.signal.windows.lanczos was added to compute a Lanczos window, also known as a sinc window.

scipy.sparse.csgraph improvements

the performance of scipy.sparse.csgraph.dijkstra has been improved, and star graphs in particular see a marked performance improvement

scipy.special improvements

The new function scipy.special.powm1, a ufunc with signature powm1(x, y), computes x**y - 1. The function avoids the loss of precision that can result when y is close to 0 or when x is close to 1.

scipy.special.erfinv is now more accurate as it leverages the Boost equivalent under the hood.

scipy.stats improvements

Added scipy.stats.goodness_of_fit, a generalized goodness-of-fit test for use with any univariate distribution, any combination of known and unknown parameters, and several choices of test statistic (Kolmogorov-Smirnov, Cramer-von Mises, and Anderson-Darling).

Improved scipy.stats.bootstrap: Default method 'BCa' now supports multi-sample statistics. Also, the bootstrap distribution is returned in the result object, and the result object can be passed into the function as parameter bootstrap_result to add additional resamples or change the confidence interval level and type.

Added maximum spacing estimation to scipy.stats.fit.

Added the Poisson means test ("E-test") as scipy.stats.poisson_means_test.

Added new sample statistics.

Added scipy.stats.contingency.odds_ratio to compute both the conditional and unconditional odds ratios and corresponding confidence intervals for 2x2 contingency tables.

Added scipy.stats.directional_stats to compute sample statistics of n-dimensional directional data.

Added scipy.stats.expectile, which generalizes the expected value in the same way as quantiles are a generalization of the median.

Added new statistical distributions.

Added scipy.stats.uniform_direction, a multivariate distribution to sample uniformly from the surface of a hypersphere.

Added scipy.stats.random_table, a multivariate distribution to sample uniformly from m x n contingency tables with provided marginals.

Added scipy.stats.truncpareto, the truncated Pareto distribution.

Improved the fit method of several distributions.

scipy.stats.skewnorm and scipy.stats.weibull_min now use an analytical solution when method='mm', which also serves a starting guess to improve the performance of method='mle'.

scipy.stats.gumbel_r and scipy.stats.gumbel_l: analytical maximum likelihood estimates have been extended to the cases in which location or scale are fixed by the user.

Analytical maximum likelihood estimates have been added for scipy.stats.powerlaw.

Improved random variate sampling of several distributions.

Drawing multiple samples from scipy.stats.matrix_normal, scipy.stats.ortho_group, scipy.stats.special_ortho_group, and scipy.stats.unitary_group is faster.

The rvs method of scipy.stats.vonmises now wraps to the interval [-np.pi, np.pi].

Improved the reliability of scipy.stats.loggamma rvs method for small values of the shape parameter.

Improved the speed and/or accuracy of functions of several statistical distributions.

Added scipy.stats.Covariance for better speed, accuracy, and user control in multivariate normal calculations.

scipy.stats.skewnorm methods cdf, sf, ppf, and isf methods now use the implementations from Boost, improving speed while maintaining accuracy. The calculation of higher-order moments is also faster and more accurate.

scipy.stats.invgauss methods ppf and isf methods now use the implementations from Boost, improving speed and accuracy.

scipy.stats.invweibull methods sf and isf are more accurate for small probability masses.

scipy.stats.nct and scipy.stats.ncx2 now rely on the implementations from Boost, improving speed and accuracy.

Implemented the logpdf method of scipy.stats.vonmises for reliability in extreme tails.

Implemented the isf method of scipy.stats.levy for speed and accuracy.

Improved the robustness of scipy.stats.studentized_range for large df by adding an infinite degree-of-freedom approximation.

Added a parameter lower_limit to scipy.stats.multivariate_normal, allowing the user to change the integration limit from -inf to a desired value.

Improved the robustness of entropy of scipy.stats.vonmises for large concentration values.

Enhanced scipy.stats.gaussian_kde.

Added scipy.stats.gaussian_kde.marginal, which returns the desired marginal distribution of the original kernel density estimate distribution.

The cdf method of scipy.stats.gaussian_kde now accepts a lower_limit parameter for integrating the PDF over a rectangular region.

Moved calculations for scipy.stats.gaussian_kde.logpdf to Cython, improving speed.

The global interpreter lock is released by the pdf method of scipy.stats.gaussian_kde for improved multithreading performance.

Replaced explicit matrix inversion with Cholesky decomposition for speed and accuracy.

Enhanced the result objects returned by many scipy.stats functions

Added a confidence_interval method to the result object returned by scipy.stats.ttest_1samp and scipy.stats.ttest_rel.

The scipy.stats functions combine_pvalues, fisher_exact, chi2_contingency, median_test and mood now return bunch objects rather than plain tuples, allowing attributes to be accessed by name.

Attributes of the result objects returned by multiscale_graphcorr, anderson_ksamp, binomtest, crosstab, pointbiserialr, spearmanr, kendalltau, and weightedtau have been renamed to statistic and pvalue for consistency throughout scipy.stats. Old attribute names are still allowed for backward compatibility.

scipy.stats.anderson now returns the parameters of the fitted distribution in a scipy.stats._result_classes.FitResult object.

The plot method of scipy.stats._result_classes.FitResult now accepts a plot_type parameter; the options are 'hist' (histogram, default), 'qq' (Q-Q plot), 'pp' (P-P plot), and 'cdf' (empirical CDF plot).

Kolmogorov-Smirnov tests (e.g. scipy.stats.kstest) now return the location (argmax) at which the statistic is calculated and the variant of the statistic used.

Improved the performance of several scipy.stats functions.

Improved the performance of scipy.stats.cramervonmises_2samp and scipy.stats.ks_2samp with method='exact'.

Improved the performance of scipy.stats.siegelslopes.

Improved the performance of scipy.stats.mstats.hdquantile_sd.

Improved the performance of scipy.stats.binned_statistic_dd for several NumPy statistics, and binned statistics methods now support complex data.

Added the scramble optional argument to scipy.stats.qmc.LatinHypercube. It replaces centered, which is now deprecated.

Added a parameter optimization to all scipy.stats.qmc.QMCEngine subclasses to improve characteristics of the quasi-random variates.

Added tie correction to scipy.stats.mood.

Added tutorials for resampling methods in scipy.stats.

scipy.stats.bootstrap, scipy.stats.permutation_test, and scipy.stats.monte_carlo_test now automatically detect whether the provided statistic is vectorized, so passing the vectorized argument explicitly is no longer required to take advantage of vectorized statistics.

Improved the speed of scipy.stats.permutation_test for permutation types 'samples' and 'pairings'.

Added axis, nan_policy, and masked array support to scipy.stats.jarque_bera.

Added the nan_policy optional argument to scipy.stats.rankdata.

Deprecated features

scipy.misc module and all the methods in misc are deprecated in v1.10 and will be completely removed in SciPy v2.0.0. Users are suggested to utilize the scipy.datasets module instead for the dataset methods.

scipy.stats.qmc.LatinHypercube parameter centered has been deprecated. It is replaced by the scramble argument for more consistency with other QMC engines.

scipy.interpolate.interp2d class has been deprecated. The docstring of the deprecated routine lists recommended replacements.

Expired Deprecations

There is an ongoing effort to follow through on long-standing deprecations.

The following previously deprecated features are affected:

Removed cond & rcond kwargs in linalg.pinv

Removed wrappers scipy.linalg.blas.{clapack, flapack}

Removed scipy.stats.NumericalInverseHermite and removed tol & max_intervals kwargs from scipy.stats.sampling.NumericalInverseHermite

Removed local_search_options kwarg frrom scipy.optimize.dual_annealing.

Other changes

scipy.stats.bootstrap, scipy.stats.permutation_test, and scipy.stats.monte_carlo_test now automatically detect whether the provided statistic is vectorized by looking for an axis parameter in the signature of statistic. If an axis parameter is present in statistic but should not be relied on for vectorized calls, users must pass option vectorized==False explicitly.

scipy.stats.multivariate_normal will now raise a ValueError when the covariance matrix is not positive semidefinite, regardless of which method is called.

Authors

Name (commits)

h-vetinari (10)

Jelle Aalbers (1)

Oriol Abril-Pla (1) +

Alan-Hung (1) +

Tania Allard (7)

Oren Amsalem (1) +

Sven Baars (10)

Balthasar (1) +

Ross Barnowski (1)

Christoph Baumgarten (2)

Peter Bell (2)

Sebastian Berg (1)

Aaron Berk (1) +

boatwrong (1) +

boeleman (1) +

Jake Bowhay (50)

Matthew Brett (4)

Evgeni Burovski (93)

Matthias Bussonnier (6)

Dominic C (2)

Mingbo Cai (1) +

James Campbell (2) +

CJ Carey (4)

cesaregarza (1) +

charlie0389 (1) +

Hood Chatham (5)

Andrew Chin (1) +

Daniel Ching (1) +

Leo Chow (1) +

chris (3) +

John Clow (1) +

cm7S (1) +

cmgodwin (1) +

Christopher Cowden (2) +

Henry Cuzco (2) +

Anirudh Dagar (12)

Hans Dembinski (2) +

Jaiden di Lanzo (24) +

Felipe Dias (1) +

Dieter Werthmüller (1)

Giuseppe Dilillo (1) +

dpoerio (1) +

drpeteb (1) +

Christopher Dupuis (1) +

Jordan Edmunds (1) +

Pieter Eendebak (1) +

Jérome Eertmans (1) +

Fabian Egli (2) +

Sebastian Ehlert (2) +

Kian Eliasi (1) +

Tomohiro Endo (1) +

Stefan Endres (1)

Zeb Engberg (4) +

Jonas Eschle (1) +

Thomas J. Fan (9)

fiveseven (1) +

Neil Flood (1) +

Franz Forstmayr (1)

Sara Fridovich-Keil (1)

David Gilbertson (1) +

Ralf Gommers (251)

Marco Gorelli (2) +

Matt Haberland (387)

Andrew Hawryluk (2) +

Christoph Hohnerlein (2) +

Loïc Houpert (2) +

Shamus Husheer (1) +

ideasrule (1) +

imoiwm (1) +

Lakshaya Inani (1) +

Joseph T. Iosue (1)

iwbc-mzk (1) +

Nathan Jacobi (3) +

Julien Jerphanion (5)

He Jia (1)

jmkuebler (1) +

Johannes Müller (1) +

Vedant Jolly (1) +

Juan Luis Cano Rodríguez (2)

Justin (1) +

jvavrek (1) +

jyuv (2)

Kai Mühlbauer (1) +

Nikita Karetnikov (3) +

Reinert Huseby Karlsen (1) +

kaspar (2) +

Toshiki Kataoka (1)

Robert Kern (3)

Joshua Klein (1) +

Andrew Knyazev (7)

Jozsef Kutas (16) +

Eric Larson (4)

Lechnio (1) +

Antony Lee (2)

Aditya Limaye (1) +

Xingyu Liu (2)

Christian Lorentzen (4)

Loïc Estève (2)

Thibaut Lunet (2) +

Peter Lysakovski (1)

marianasalamoni (2) +

mariprudencio (1) +

Paige Martin (1) +

Arno Marty (1) +

matthewborish (3) +

Damon McDougall (1)

Nicholas McKibben (22)

McLP (1) +

mdmahendri (1) +

Melissa Weber Mendonça (9)

Jarrod Millman (1)

Naoto Mizuno (2)

Shashaank N (1)

Pablo S Naharro (1) +

nboudrie (2) +

Andrew Nelson (52)

Nico Schlömer (1)

NiMlr (1) +

o-alexandre-felipe (1) +

Maureen Ononiwu (1) +

Dimitri Papadopoulos (2) +

partev (1) +

Tirth Patel (10)

Paulius Šarka (1) +

Josef Perktold (1)

Giacomo Petrillo (3) +

Matti Picus (1)

Rafael Pinto (1) +

PKNaveen (1) +

Ilhan Polat (6)

Akshita Prasanth (2) +

Sean Quinn (1)

Tyler Reddy (155)

Martin Reinecke (1)

Ned Richards (1)

Marie Roald (1) +

Sam Rosen (4) +

Pamphile Roy (105)

sabonerune (2) +

Atsushi Sakai (94)

Daniel Schmitz (27)

Anna Scholtz (1) +

Eli Schwartz (11)

serge-sans-paille (2)

JEEVANSHI SHARMA (1) +

ehsan shirvanian (2) +

siddhantwahal (2)

Mathieu Dutour Sikiric (1) +

Sourav Singh (1)

Alexander Soare (1) +

Bjørge Solli (2) +

Scott Staniewicz (1)

Ethan Steinberg (3) +

Albert Steppi (3)

Thomas Stoeger (1) +

Kai Striega (4)

Tartopohm (1) +

Mamoru TASAKA (2) +

Ewout ter Hoeven (5)

TianyiQ (1) +

Tiger (1) +

Will Tirone (1)

Ajay Shanker Tripathi (1) +

Edgar Andrés Margffoy Tuay (1) +

Dmitry Ulyumdzhiev (1) +

Hari Vamsi (1) +

VitalyChait (1) +

Rik Voorhaar (1) +

Samuel Wallan (4)

Stefan van der Walt (2)

Warren Weckesser (145)

wei2222 (1) +

windows-server-2003 (3) +

Marek Wojciechowski (2) +

Niels Wouda (1) +

WRKampi (1) +

Yeonjoo Yoo (1) +

Rory Yorke (1)

Xiao Yuan (2) +

Meekail Zain (2) +

Fabio Zanini (1) +

Steffen Zeile (1) +

Egor Zemlyanoy (19)

Gavin Zhang (3) +

A total of 184 people contributed to this release. People with a "+" by their names contributed a patch for the first time. This list of names is automatically generated, and may not be fully complete.

Configuration

📅 Schedule: Branch creation - "after 9am and before 1pm every weekday" (UTC), Automerge - At any time (no schedule defined).

🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.

♻ Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 Ignore: Close this PR and you won't be reminded about this update again.

[ ] If you want to rebase/retry this PR, check this box

This PR has been generated by Mend Renovate. View repository job log here.
opened by renovate[bot] 1
chore(deps): update dependency numpy to >=1.24.1,<1.25
This PR contains the following updates:

| Package | Change | Age | Adoption | Passing | Confidence | |---|---|---|---|---|---| | numpy (source) | >=1.16.0,<1.24 -> >=1.24.1,<1.25 | | | | |

Release Notes

numpy/numpy

v1.24.1

Compare Source

NumPy 1.24.1 Release Notes

NumPy 1.24.1 is a maintenance release that fixes bugs and regressions discovered after the 1.24.0 release. The Python versions supported by this release are 3.8-3.11.

Contributors

A total of 12 people contributed to this release. People with a "+" by their names contributed a patch for the first time.

Andrew Nelson

Ben Greiner +

Charles Harris

Clément Robert

Matteo Raso

Matti Picus

Melissa Weber Mendonça

Miles Cranmer

Ralf Gommers

Rohit Goswami

Sayed Adel

Sebastian Berg

Pull requests merged

A total of 18 pull requests were merged for this release.

#22820: BLD: add workaround in setup.py for newer setuptools

#22830: BLD: CIRRUS_TAG redux

#22831: DOC: fix a couple typos in 1.23 notes

#22832: BUG: Fix refcounting errors found using pytest-leaks

#22834: BUG, SIMD: Fix invalid value encountered in several ufuncs

#22837: TST: ignore more np.distutils.log imports

#22839: BUG: Do not use getdata() in np.ma.masked_invalid

#22847: BUG: Ensure correct behavior for rows ending in delimiter in...

#22848: BUG, SIMD: Fix the bitmask of the boolean comparison

#22857: BLD: Help raspian arm + clang 13 about __builtin_mul_overflow

#22858: API: Ensure a full mask is returned for masked_invalid

#22866: BUG: Polynomials now copy properly (#22669)

#22867: BUG, SIMD: Fix memory overlap in ufunc comparison loops

#22868: BUG: Fortify string casts against floating point warnings

#22875: TST: Ignore nan-warnings in randomized out tests

#22883: MAINT: restore npymath implementations needed for freebsd

#22884: BUG: Fix integer overflow in in1d for mixed integer dtypes #22877

#22887: BUG: Use whole file for encoding checks with charset_normalizer.

Checksums

MD5

9e543db90493d6a00939bd54c2012085 numpy-1.24.1-cp310-cp310-macosx_10_9_x86_64.whl 4ebd7af622bf617b4876087e500d7586 numpy-1.24.1-cp310-cp310-macosx_11_0_arm64.whl 0c0a3012b438bb455a6c2fadfb1be76a numpy-1.24.1-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl 0bddb527345449df624d3cb9aa0e1b75 numpy-1.24.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl b246beb773689d97307f7b4c2970f061 numpy-1.24.1-cp310-cp310-win32.whl 1f3823999fce821a28dee10ac6fdd721 numpy-1.24.1-cp310-cp310-win_amd64.whl 8eedcacd6b096a568e4cb393d43b3ae5 numpy-1.24.1-cp311-cp311-macosx_10_9_x86_64.whl 50bddb05acd54b4396100a70522496dd numpy-1.24.1-cp311-cp311-macosx_11_0_arm64.whl 2a76bd9da8a78b44eb816bd70fa3aee3 numpy-1.24.1-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl 9e86658a414272f9749bde39344f9b76 numpy-1.24.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl 915dfb89054e1631574a22a9b53a2b25 numpy-1.24.1-cp311-cp311-win32.whl ab7caa2c6c20e1fab977e1a94dede976 numpy-1.24.1-cp311-cp311-win_amd64.whl 8246de961f813f5aad89bca3d12f81e7 numpy-1.24.1-cp38-cp38-macosx_10_9_x86_64.whl 58366b1a559baa0547ce976e416ed76d numpy-1.24.1-cp38-cp38-macosx_11_0_arm64.whl a96f29bf106a64f82b9ba412635727d1 numpy-1.24.1-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl 4c32a43bdb85121614ab3e99929e33c7 numpy-1.24.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl 09b20949ed21683ad7c9cbdf9ebb2439 numpy-1.24.1-cp38-cp38-win32.whl 9e9f1577f874286a8bdff8dc5551eb9f numpy-1.24.1-cp38-cp38-win_amd64.whl 4383c1137f0287df67c364fbdba2bc72 numpy-1.24.1-cp39-cp39-macosx_10_9_x86_64.whl 987f22c49b2be084b5d72f88f347d31e numpy-1.24.1-cp39-cp39-macosx_11_0_arm64.whl 848ad020bba075ed8f19072c64dcd153 numpy-1.24.1-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl 864b159e644848bc25f881907dbcf062 numpy-1.24.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl db339ec0b2693cac2d7cf9ca75c334b1 numpy-1.24.1-cp39-cp39-win32.whl fec91d4c85066ad8a93816d71b627701 numpy-1.24.1-cp39-cp39-win_amd64.whl 619af9cd4f33b668822ae2350f446a15 numpy-1.24.1-pp38-pypy38_pp73-macosx_10_9_x86_64.whl 46f19b4b147f8836c2bd34262fabfffa numpy-1.24.1-pp38-pypy38_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl e85b245c57a10891b3025579bf0cf298 numpy-1.24.1-pp38-pypy38_pp73-win_amd64.whl dd3aaeeada8e95cc2edf9a3a4aa8b5af numpy-1.24.1.tar.gz

SHA256

179a7ef0889ab769cc03573b6217f54c8bd8e16cef80aad369e1e8185f994cd7 numpy-1.24.1-cp310-cp310-macosx_10_9_x86_64.whl b09804ff570b907da323b3d762e74432fb07955701b17b08ff1b5ebaa8cfe6a9 numpy-1.24.1-cp310-cp310-macosx_11_0_arm64.whl f1b739841821968798947d3afcefd386fa56da0caf97722a5de53e07c4ccedc7 numpy-1.24.1-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl 0e3463e6ac25313462e04aea3fb8a0a30fb906d5d300f58b3bc2c23da6a15398 numpy-1.24.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl b31da69ed0c18be8b77bfce48d234e55d040793cebb25398e2a7d84199fbc7e2 numpy-1.24.1-cp310-cp310-win32.whl b07b40f5fb4fa034120a5796288f24c1fe0e0580bbfff99897ba6267af42def2 numpy-1.24.1-cp310-cp310-win_amd64.whl 7094891dcf79ccc6bc2a1f30428fa5edb1e6fb955411ffff3401fb4ea93780a8 numpy-1.24.1-cp311-cp311-macosx_10_9_x86_64.whl 28e418681372520c992805bb723e29d69d6b7aa411065f48216d8329d02ba032 numpy-1.24.1-cp311-cp311-macosx_11_0_arm64.whl e274f0f6c7efd0d577744f52032fdd24344f11c5ae668fe8d01aac0422611df1 numpy-1.24.1-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl 0044f7d944ee882400890f9ae955220d29b33d809a038923d88e4e01d652acd9 numpy-1.24.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl 442feb5e5bada8408e8fcd43f3360b78683ff12a4444670a7d9e9824c1817d36 numpy-1.24.1-cp311-cp311-win32.whl de92efa737875329b052982e37bd4371d52cabf469f83e7b8be9bb7752d67e51 numpy-1.24.1-cp311-cp311-win_amd64.whl b162ac10ca38850510caf8ea33f89edcb7b0bb0dfa5592d59909419986b72407 numpy-1.24.1-cp38-cp38-macosx_10_9_x86_64.whl 26089487086f2648944f17adaa1a97ca6aee57f513ba5f1c0b7ebdabbe2b9954 numpy-1.24.1-cp38-cp38-macosx_11_0_arm64.whl caf65a396c0d1f9809596be2e444e3bd4190d86d5c1ce21f5fc4be60a3bc5b36 numpy-1.24.1-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl b0677a52f5d896e84414761531947c7a330d1adc07c3a4372262f25d84af7bf7 numpy-1.24.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl dae46bed2cb79a58d6496ff6d8da1e3b95ba09afeca2e277628171ca99b99db1 numpy-1.24.1-cp38-cp38-win32.whl 6ec0c021cd9fe732e5bab6401adea5a409214ca5592cd92a114f7067febcba0c numpy-1.24.1-cp38-cp38-win_amd64.whl 28bc9750ae1f75264ee0f10561709b1462d450a4808cd97c013046073ae64ab6 numpy-1.24.1-cp39-cp39-macosx_10_9_x86_64.whl 84e789a085aabef2f36c0515f45e459f02f570c4b4c4c108ac1179c34d475ed7 numpy-1.24.1-cp39-cp39-macosx_11_0_arm64.whl 8e669fbdcdd1e945691079c2cae335f3e3a56554e06bbd45d7609a6cf568c700 numpy-1.24.1-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl ef85cf1f693c88c1fd229ccd1055570cb41cdf4875873b7728b6301f12cd05bf numpy-1.24.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl 87a118968fba001b248aac90e502c0b13606721b1343cdaddbc6e552e8dfb56f numpy-1.24.1-cp39-cp39-win32.whl ddc7ab52b322eb1e40521eb422c4e0a20716c271a306860979d450decbb51b8e numpy-1.24.1-cp39-cp39-win_amd64.whl ed5fb71d79e771ec930566fae9c02626b939e37271ec285e9efaf1b5d4370e7d numpy-1.24.1-pp38-pypy38_pp73-macosx_10_9_x86_64.whl ad2925567f43643f51255220424c23d204024ed428afc5aad0f86f3ffc080086 numpy-1.24.1-pp38-pypy38_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl cfa1161c6ac8f92dea03d625c2d0c05e084668f4a06568b77a25a89111621566 numpy-1.24.1-pp38-pypy38_pp73-win_amd64.whl 2386da9a471cc00a1f47845e27d916d5ec5346ae9696e01a8a34760858fe9dd2 numpy-1.24.1.tar.gz

v1.24.0

Compare Source

NumPy 1.24 Release Notes

The NumPy 1.24.0 release continues the ongoing work to improve the handling and promotion of dtypes, increase the execution speed, and clarify the documentation. There are also a large number of new and expired deprecations due to changes in promotion and cleanups. This might be called a deprecation release. Highlights are

Many new deprecations, check them out.

Many expired deprecations,

New F2PY features and fixes.

New "dtype" and "casting" keywords for stacking functions.

See below for the details,

This release supports Python versions 3.8-3.11.

Deprecations

Deprecate fastCopyAndTranspose and PyArray_CopyAndTranspose

The numpy.fastCopyAndTranspose function has been deprecated. Use the corresponding copy and transpose methods directly:

arr.T.copy()

The underlying C function PyArray_CopyAndTranspose has also been deprecated from the NumPy C-API.

(gh-22313)

Conversion of out-of-bound Python integers

Attempting a conversion from a Python integer to a NumPy value will now always check whether the result can be represented by NumPy. This means the following examples will fail in the future and give a DeprecationWarning now:

np.uint8(-1) np.array([3000], dtype=np.int8)

Many of these did succeed before. Such code was mainly useful for unsigned integers with negative values such as np.uint8(-1) giving np.iinfo(np.uint8).max.

Note that conversion between NumPy integers is unaffected, so that np.array(-1).astype(np.uint8) continues to work and use C integer overflow logic. For negative values, it will also work to view the array: np.array(-1, dtype=np.int8).view(np.uint8). In some cases, using np.iinfo(np.uint8).max or val % 2**8 may also work well.

In rare cases input data may mix both negative values and very large unsigned values (i.e. -1 and 2**63). There it is unfortunately necessary to use % on the Python value or use signed or unsigned conversion depending on whether negative values are expected.

(gh-22385)

Deprecate msort

The numpy.msort function is deprecated. Use np.sort(a, axis=0) instead.

(gh-22456)

np.str0 and similar are now deprecated

The scalar type aliases ending in a 0 bit size: np.object0, np.str0, np.bytes0, np.void0, np.int0, np.uint0 as well as np.bool8 are now deprecated and will eventually be removed.

(gh-22607)

Expired deprecations

The normed keyword argument has been removed from [np.histogram]{.title-ref}, [np.histogram2d]{.title-ref}, and [np.histogramdd]{.title-ref}. Use density instead. If normed was passed by position, density is now used.

(gh-21645)

Ragged array creation will now always raise a ValueError unless dtype=object is passed. This includes very deeply nested sequences.

(gh-22004)

Support for Visual Studio 2015 and earlier has been removed.

Support for the Windows Interix POSIX interop layer has been removed.

(gh-22139)

Support for Cygwin < 3.3 has been removed.

(gh-22159)

The mini() method of np.ma.MaskedArray has been removed. Use either np.ma.MaskedArray.min() or np.ma.minimum.reduce().

The single-argument form of np.ma.minimum and np.ma.maximum has been removed. Use np.ma.minimum.reduce() or np.ma.maximum.reduce() instead.

(gh-22228)

Passing dtype instances other than the canonical (mainly native byte-order) ones to dtype= or signature= in ufuncs will now raise a TypeError. We recommend passing the strings "int8" or scalar types np.int8 since the byte-order, datetime/timedelta unit, etc. are never enforced. (Initially deprecated in NumPy 1.21.)

(gh-22540)

The dtype= argument to comparison ufuncs is now applied correctly. That means that only bool and object are valid values and dtype=object is enforced.

(gh-22541)

The deprecation for the aliases np.object, np.bool, np.float, np.complex, np.str, and np.int is expired (introduces NumPy 1.20). Some of these will now give a FutureWarning in addition to raising an error since they will be mapped to the NumPy scalars in the future.

(gh-22607)

Compatibility notes

array.fill(scalar) may behave slightly different

numpy.ndarray.fill may in some cases behave slightly different now due to the fact that the logic is aligned with item assignment:

arr = np.array([1]) # with any dtype/value arr.fill(scalar)

is now identical to:

arr[0] = scalar

Previously casting may have produced slightly different answers when using values that could not be represented in the target dtype or when the target had object dtype.

(gh-20924)

Subarray to object cast now copies

Casting a dtype that includes a subarray to an object will now ensure a copy of the subarray. Previously an unsafe view was returned:

arr = np.ones(3, dtype=[("f", "i", 3)]) subarray_fields = arr.astype(object)[0] subarray = subarray_fields[0] # "f" field np.may_share_memory(subarray, arr)

Is now always false. While previously it was true for the specific cast.

(gh-21925)

Returned arrays respect uniqueness of dtype kwarg objects

When the dtype keyword argument is used with :pynp.array(){.interpreted-text role="func"} or :pyasarray(){.interpreted-text role="func"}, the dtype of the returned array now always exactly matches the dtype provided by the caller.

In some cases this change means that a view rather than the input array is returned. The following is an example for this on 64bit Linux where long and longlong are the same precision but different dtypes:

>>> arr = np.array([1, 2, 3], dtype="long") >>> new_dtype = np.dtype("longlong") >>> new = np.asarray(arr, dtype=new_dtype) >>> new.dtype is new_dtype True >>> new is arr False

Before the change, the dtype did not match because new is arr was True.

(gh-21995)

DLPack export raises BufferError

When an array buffer cannot be exported via DLPack a BufferError is now always raised where previously TypeError or RuntimeError was raised. This allows falling back to the buffer protocol or __array_interface__ when DLPack was tried first.

(gh-22542)

NumPy builds are no longer tested on GCC-6

Ubuntu 18.04 is deprecated for GitHub actions and GCC-6 is not available on Ubuntu 20.04, so builds using that compiler are no longer tested. We still test builds using GCC-7 and GCC-8.

(gh-22598)

New Features

New attribute symbol added to polynomial classes

The polynomial classes in the numpy.polynomial package have a new symbol attribute which is used to represent the indeterminate of the polynomial. This can be used to change the value of the variable when printing:

>>> P_y = np.polynomial.Polynomial([1, 0, -1], symbol="y") >>> print(P_y) 1.0 + 0.0·y¹ - 1.0·y²

Note that the polynomial classes only support 1D polynomials, so operations that involve polynomials with different symbols are disallowed when the result would be multivariate:

>>> P = np.polynomial.Polynomial([1, -1]) # default symbol is "x" >>> P_z = np.polynomial.Polynomial([1, 1], symbol="z") >>> P * P_z Traceback (most recent call last) ... ValueError: Polynomial symbols differ

The symbol can be any valid Python identifier. The default is symbol=x, consistent with existing behavior.

(gh-16154)

F2PY support for Fortran character strings

F2PY now supports wrapping Fortran functions with:

character (e.g. character x)

character array (e.g. character, dimension(n) :: x)

character string (e.g. character(len=10) x)

and character string array (e.g. character(len=10), dimension(n, m) :: x)

arguments, including passing Python unicode strings as Fortran character string arguments.

(gh-19388)

New function np.show_runtime

A new function numpy.show_runtime has been added to display the runtime information of the machine in addition to numpy.show_config which displays the build-related information.

(gh-21468)

strict option for testing.assert_array_equal

The strict option is now available for testing.assert_array_equal. Setting strict=True will disable the broadcasting behaviour for scalars and ensure that input arrays have the same data type.

(gh-21595)

New parameter equal_nan added to np.unique

np.unique was changed in 1.21 to treat all NaN values as equal and return a single NaN. Setting equal_nan=False will restore pre-1.21 behavior to treat NaNs as unique. Defaults to True.

(gh-21623)

casting and dtype keyword arguments for numpy.stack

The casting and dtype keyword arguments are now available for numpy.stack. To use them, write np.stack(..., dtype=None, casting='same_kind').

casting and dtype keyword arguments for numpy.vstack

The casting and dtype keyword arguments are now available for numpy.vstack. To use them, write np.vstack(..., dtype=None, casting='same_kind').

casting and dtype keyword arguments for numpy.hstack

The casting and dtype keyword arguments are now available for numpy.hstack. To use them, write np.hstack(..., dtype=None, casting='same_kind').

(gh-21627)

The bit generator underlying the singleton RandomState can be changed

The singleton RandomState instance exposed in the numpy.random module is initialized at startup with the MT19937 bit generator. The new function set_bit_generator allows the default bit generator to be replaced with a user-provided bit generator. This function has been introduced to provide a method allowing seamless integration of a high-quality, modern bit generator in new code with existing code that makes use of the singleton-provided random variate generating functions. The companion function get_bit_generator returns the current bit generator being used by the singleton RandomState. This is provided to simplify restoring the original source of randomness if required.

The preferred method to generate reproducible random numbers is to use a modern bit generator in an instance of Generator. The function default_rng simplifies instantiation:

>>> rg = np.random.default_rng(3728973198) >>> rg.random()

The same bit generator can then be shared with the singleton instance so that calling functions in the random module will use the same bit generator:

>>> orig_bit_gen = np.random.get_bit_generator() >>> np.random.set_bit_generator(rg.bit_generator) >>> np.random.normal()

The swap is permanent (until reversed) and so any call to functions in the random module will use the new bit generator. The original can be restored if required for code to run correctly:

>>> np.random.set_bit_generator(orig_bit_gen)

(gh-21976)

np.void now has a dtype argument

NumPy now allows constructing structured void scalars directly by passing the dtype argument to np.void.

(gh-22316)

Improvements

F2PY Improvements

The generated extension modules don't use the deprecated NumPy-C API anymore

Improved f2py generated exception messages

Numerous bug and flake8 warning fixes

various CPP macros that one can use within C-expressions of signature files are prefixed with f2py_. For example, one should use f2py_len(x) instead of len(x)

A new construct character(f2py_len=...) is introduced to support returning assumed length character strings (e.g. character(len=*)) from wrapper functions

A hook to support rewriting f2py internal data structures after reading all its input files is introduced. This is required, for instance, for BC of SciPy support where character arguments are treated as character strings arguments in C expressions.

(gh-19388)

IBM zSystems Vector Extension Facility (SIMD)

Added support for SIMD extensions of zSystem (z13, z14, z15), through the universal intrinsics interface. This support leads to performance improvements for all SIMD kernels implemented using the universal intrinsics, including the following operations: rint, floor, trunc, ceil, sqrt, absolute, square, reciprocal, tanh, sin, cos, equal, not_equal, greater, greater_equal, less, less_equal, maximum, minimum, fmax, fmin, argmax, argmin, add, subtract, multiply, divide.

(gh-20913)

NumPy now gives floating point errors in casts

In most cases, NumPy previously did not give floating point warnings or errors when these happened during casts. For examples, casts like:

np.array([2e300]).astype(np.float32) # overflow for float32 np.array([np.inf]).astype(np.int64)

Should now generally give floating point warnings. These warnings should warn that floating point overflow occurred. For errors when converting floating point values to integers users should expect invalid value warnings.

Users can modify the behavior of these warnings using np.errstate.

Note that for float to int casts, the exact warnings that are given may be platform dependent. For example:

arr = np.full(100, value=1000, dtype=np.float64) arr.astype(np.int8)

May give a result equivalent to (the intermediate cast means no warning is given):

arr.astype(np.int64).astype(np.int8)

May return an undefined result, with a warning set:

RuntimeWarning: invalid value encountered in cast

The precise behavior is subject to the C99 standard and its implementation in both software and hardware.

(gh-21437)

F2PY supports the value attribute

The Fortran standard requires that variables declared with the value attribute must be passed by value instead of reference. F2PY now supports this use pattern correctly. So integer, intent(in), value :: x in Fortran codes will have correct wrappers generated.

(gh-21807)

Added pickle support for third-party BitGenerators

The pickle format for bit generators was extended to allow each bit generator to supply its own constructor when during pickling. Previous versions of NumPy only supported unpickling Generator instances created with one of the core set of bit generators supplied with NumPy. Attempting to unpickle a Generator that used a third-party bit generators would fail since the constructor used during the unpickling was only aware of the bit generators included in NumPy.

(gh-22014)

arange() now explicitly fails with dtype=str

Previously, the np.arange(n, dtype=str) function worked for n=1 and n=2, but would raise a non-specific exception message for other values of n. Now, it raises a [TypeError]{.title-ref} informing that arange does not support string dtypes:

>>> np.arange(2, dtype=str) Traceback (most recent call last) ... TypeError: arange() not supported for inputs with DType <class 'numpy.dtype[str_]'>.

(gh-22055)

numpy.typing protocols are now runtime checkable

The protocols used in numpy.typing.ArrayLike and numpy.typing.DTypeLike are now properly marked as runtime checkable, making them easier to use for runtime type checkers.

(gh-22357)

Performance improvements and changes

Faster version of np.isin and np.in1d for integer arrays

np.in1d (used by np.isin) can now switch to a faster algorithm (up to >10x faster) when it is passed two integer arrays. This is often automatically used, but you can use kind="sort" or kind="table" to force the old or new method, respectively.

(gh-12065)

Faster comparison operators

The comparison functions (numpy.equal, numpy.not_equal, numpy.less, numpy.less_equal, numpy.greater and numpy.greater_equal) are now much faster as they are now vectorized with universal intrinsics. For a CPU with SIMD extension AVX512BW, the performance gain is up to 2.57x, 1.65x and 19.15x for integer, float and boolean data types, respectively (with N=50000).

(gh-21483)

Changes

Better reporting of integer division overflow

Integer division overflow of scalars and arrays used to provide a RuntimeWarning and the return value was undefined leading to crashes at rare occasions:

>>> np.array([np.iinfo(np.int32).min]*10, dtype=np.int32) // np.int32(-1) <stdin>:1: RuntimeWarning: divide by zero encountered in floor_divide array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int32)

Integer division overflow now returns the input dtype's minimum value and raise the following RuntimeWarning:

>>> np.array([np.iinfo(np.int32).min]*10, dtype=np.int32) // np.int32(-1) <stdin>:1: RuntimeWarning: overflow encountered in floor_divide array([-2147483648, -2147483648, -2147483648, -2147483648, -2147483648, -2147483648, -2147483648, -2147483648, -2147483648, -2147483648], dtype=int32)

(gh-21506)

masked_invalid now modifies the mask in-place

When used with copy=False, numpy.ma.masked_invalid now modifies the input masked array in-place. This makes it behave identically to masked_where and better matches the documentation.

(gh-22046)

nditer/NpyIter allows all allocating all operands

The NumPy iterator available through np.nditer in Python and as NpyIter in C now supports allocating all arrays. The iterator shape defaults to () in this case. The operands dtype must be provided, since a "common dtype" cannot be inferred from the other inputs.

(gh-22457)

Checksums

MD5

d60311246bd71b177258ce06e2a4ec57 numpy-1.24.0-cp310-cp310-macosx_10_9_x86_64.whl 02022b335938af55cb83bbaebdbff8e1 numpy-1.24.0-cp310-cp310-macosx_11_0_arm64.whl 02b35d6612369fcc614c6223aaec0119 numpy-1.24.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl 7b8ad389a9619db3e1f8243fc0cfe63d numpy-1.24.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl 6ff4acbb7b1258ccbd528c151eb0fe84 numpy-1.24.0-cp310-cp310-win32.whl d194c96601222db97b0af54fce1cfb1d numpy-1.24.0-cp310-cp310-win_amd64.whl 5fe4eb551a9312e37492da9f5bfb8545 numpy-1.24.0-cp311-cp311-macosx_10_9_x86_64.whl a8e836a768f73e9f509b11c3873c7e09 numpy-1.24.0-cp311-cp311-macosx_11_0_arm64.whl 10404d6d1a5a9624f85018f61110b2be numpy-1.24.0-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl cfdb0cb844f1db9be2cde998be54d65f numpy-1.24.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl 73bc66ad3ae8656ba18d64db98feb5e1 numpy-1.24.0-cp311-cp311-win32.whl 4bbc30a53009c48d364d4dc2c612af95 numpy-1.24.0-cp311-cp311-win_amd64.whl 94ce5f6a09605a9675a0d464b1ec6597 numpy-1.24.0-cp38-cp38-macosx_10_9_x86_64.whl e5e42b69a209eda7e6895dda39ea8610 numpy-1.24.0-cp38-cp38-macosx_11_0_arm64.whl 36eb6143d1e2aac3c618275edf636983 numpy-1.24.0-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl 712c3718e8b53ff04c626cc4c78492aa numpy-1.24.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl 0a1a48a8e458bd4ce581169484c17e4f numpy-1.24.0-cp38-cp38-win32.whl c8ab7e4b919548663568a5b5a8b5eab4 numpy-1.24.0-cp38-cp38-win_amd64.whl 1783a5d769566111d93c474c79892c01 numpy-1.24.0-cp39-cp39-macosx_10_9_x86_64.whl c9e77130674372c73f8209d58396624d numpy-1.24.0-cp39-cp39-macosx_11_0_arm64.whl 14c0f2f52f20f13a81bba7df27f30145 numpy-1.24.0-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl c106393b46fa0302dbac49b14a4dfed4 numpy-1.24.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl c83e6d6946f32820f166c3f1ff010ab6 numpy-1.24.0-cp39-cp39-win32.whl acd5a4737d1094d5f40afa584dbd6d79 numpy-1.24.0-cp39-cp39-win_amd64.whl 26e32f942c9fd62f64fd9bf6df95b5b1 numpy-1.24.0-pp38-pypy38_pp73-macosx_10_9_x86_64.whl 4f027df0cc313ca626b106849999de13 numpy-1.24.0-pp38-pypy38_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl ac58db9a90d0bec95bc7850b9e462f34 numpy-1.24.0-pp38-pypy38_pp73-win_amd64.whl 1ca41c84ad9a116402a025d21e35bc64 numpy-1.24.0.tar.gz

SHA256

6e73a1f4f5b74a42abb55bc2b3d869f1b38cbc8776da5f8b66bf110284f7a437 numpy-1.24.0-cp310-cp310-macosx_10_9_x86_64.whl 9387c7d6d50e8f8c31e7bfc034241e9c6f4b3eb5db8d118d6487047b922f82af numpy-1.24.0-cp310-cp310-macosx_11_0_arm64.whl 7ad6a024a32ee61d18f5b402cd02e9c0e22c0fb9dc23751991b3a16d209d972e numpy-1.24.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl 73cf2c5b5a07450f20a0c8e04d9955491970177dce8df8d6903bf253e53268e0 numpy-1.24.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl cec79ff3984b2d1d103183fc4a3361f5b55bbb66cb395cbf5a920a4bb1fd588d numpy-1.24.0-cp310-cp310-win32.whl 4f5e78b8b710cd7cd1a8145994cfffc6ddd5911669a437777d8cedfce6c83a98 numpy-1.24.0-cp310-cp310-win_amd64.whl 4445f472b246cad6514cc09fbb5ecb7aab09ca2acc3c16f29f8dca6c468af501 numpy-1.24.0-cp311-cp311-macosx_10_9_x86_64.whl ec3e5e8172a0a6a4f3c2e7423d4a8434c41349141b04744b11a90e017a95bad5 numpy-1.24.0-cp311-cp311-macosx_11_0_arm64.whl f9168790149f917ad8e3cf5047b353fefef753bd50b07c547da0bdf30bc15d91 numpy-1.24.0-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl ada6c1e9608ceadaf7020e1deea508b73ace85560a16f51bef26aecb93626a72 numpy-1.24.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl f3c4a9a9f92734a4728ddbd331e0124eabbc968a0359a506e8e74a9b0d2d419b numpy-1.24.0-cp311-cp311-win32.whl 90075ef2c6ac6397d0035bcd8b298b26e481a7035f7a3f382c047eb9c3414db0 numpy-1.24.0-cp311-cp311-win_amd64.whl 0885d9a7666cafe5f9876c57bfee34226e2b2847bfb94c9505e18d81011e5401 numpy-1.24.0-cp38-cp38-macosx_10_9_x86_64.whl e63d2157f9fc98cc178870db83b0e0c85acdadd598b134b00ebec9e0db57a01f numpy-1.24.0-cp38-cp38-macosx_11_0_arm64.whl cf8960f72997e56781eb1c2ea256a70124f92a543b384f89e5fb3503a308b1d3 numpy-1.24.0-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl 2f8e0df2ecc1928ef7256f18e309c9d6229b08b5be859163f5caa59c93d53646 numpy-1.24.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl fe44e925c68fb5e8db1334bf30ac1a1b6b963b932a19cf41d2e899cf02f36aab numpy-1.24.0-cp38-cp38-win32.whl d7f223554aba7280e6057727333ed357b71b7da7422d02ff5e91b857888c25d1 numpy-1.24.0-cp38-cp38-win_amd64.whl ab11f6a7602cf8ea4c093e091938207de3068c5693a0520168ecf4395750f7ea numpy-1.24.0-cp39-cp39-macosx_10_9_x86_64.whl 12bba5561d8118981f2f1ff069ecae200c05d7b6c78a5cdac0911f74bc71cbd1 numpy-1.24.0-cp39-cp39-macosx_11_0_arm64.whl 9af91f794d2d3007d91d749ebc955302889261db514eb24caef30e03e8ec1e41 numpy-1.24.0-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl 8b1ddfac6a82d4f3c8e99436c90b9c2c68c0bb14658d1684cdd00f05fab241f5 numpy-1.24.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl ac4fe68f1a5a18136acebd4eff91aab8bed00d1ef2fdb34b5d9192297ffbbdfc numpy-1.24.0-cp39-cp39-win32.whl 667b5b1f6a352419e340f6475ef9930348ae5cb7fca15f2cc3afcb530823715e numpy-1.24.0-cp39-cp39-win_amd64.whl 4d01f7832fa319a36fd75ba10ea4027c9338ede875792f7bf617f4b45056fc3a numpy-1.24.0-pp38-pypy38_pp73-macosx_10_9_x86_64.whl dbb0490f0a880700a6cc4d000384baf19c1f4df59fff158d9482d4dbbca2b239 numpy-1.24.0-pp38-pypy38_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl 0104d8adaa3a4cc60c2777cab5196593bf8a7f416eda133be1f3803dd0838886 numpy-1.24.0-pp38-pypy38_pp73-win_amd64.whl c4ab7c9711fe6b235e86487ca74c1b092a6dd59a3cb45b63241ea0a148501853 numpy-1.24.0.tar.gz

Configuration

📅 Schedule: Branch creation - "after 9am and before 1pm every weekday" (UTC), Automerge - At any time (no schedule defined).

🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.

♻ Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 Ignore: Close this PR and you won't be reminded about this update again.

[ ] If you want to rebase/retry this PR, check this box

This PR has been generated by Mend Renovate. View repository job log here.
opened by renovate[bot] 1
fix: Issue#915, Error for large integers in Series

This patch fixes issue#915: Error for series with large integers.

The issue is caused when using the numpy.histogram function with large integers which causes unevenly spaced bin edges. here:

https://github.com/ydataai/pandas-profiling/blob/5b1abac48ed9ed5a9e7e662be30c913acc3e7a5b/src/pandas_profiling/model/summary_algorithms.py#L39

and here:

https://github.com/ydataai/pandas-profiling/blob/5b1abac48ed9ed5a9e7e662be30c913acc3e7a5b/src/pandas_profiling/model/summary_algorithms.py#L52

This can cause the resulting histogram to be distorted or misleading, as the bin sizes may not be uniform.

To resolve this issue, I used the numpy.histogram_bin_edges function to compute the bin edges for the data before passing them to the numpy.histogram function. This function allows to specify the number of bins and the range of the data, and will compute the bin edges in a way that ensures they are evenly spaced. This fix does not raise an error as reported in the bug report and successfully generates a report. I have also included a test_issue915.py for testing the generation of the report.

opened by Sohaib90 0

Feature Request - Override templates of the html flavour

Missing functionality

Override templates of the html flavour.

Proposed feature

Allow overriding (some) templates in src/pandas_profiling/report/presentation/flavours/html/templates/ to personalize pdp.

Alternatives considered

I monkey patched pdp to support overriding templates, e.g. to change the layout a bit as jinja2 supports this, but this isn't a clean way to do it.

from pandas_profiling.report.presentation.flavours.html import templates
from pandas_profiling.report.formatters import fmt, fmt_badge, fmt_numeric, fmt_percent
import jinja2
from jinja2 import ChoiceLoader, FileSystemLoader

templates.package_loader = ChoiceLoader([
    FileSystemLoader(some_path),
    templates.package_loader,
])

templates.jinja2_env = jinja2.Environment(
    lstrip_blocks=True, trim_blocks=True, loader=templates.package_loader
)
templates.jinja2_env.filters["is_list"] = lambda x: isinstance(x, list)
templates.jinja2_env.filters["fmt_badge"] = fmt_badge
templates.jinja2_env.filters["fmt_percent"] = fmt_percent
templates.jinja2_env.filters["fmt_numeric"] = fmt_numeric
templates.jinja2_env.filters["fmt"] = fmt

Additional context

No response

feature request 💬

opened by prhbrt 0

Interaction plots for time series data

Missing functionality

Interaction plots for numeric time series variables.

Proposed feature

Calculate interaction plots for both numeric and numeric time series variables. Is there a setting to enable this?

Alternatives considered

I considered setting tsmode=False, but then I loose the autocorrelation plots.
needs-triage

opened by MauritsDescamps 0

Bug Report: KeyError: 'max_length' when comparing two profile_report (`minimal=True` is used to generate these report)

Current Behaviour

There is an error message:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/var/folders/60/6qphmx_d7x7_11vpj8524vf40000gn/T/ipykernel_17862/709405443.py in <module>
      7 
      8 # Save report to file
----> 9 comparison_report.to_file("comparison.html")

~/opt/miniconda3/envs/ml_project/lib/python3.8/site-packages/typeguard/__init__.py in wrapper(*args, **kwargs)
   1031         memo = _CallMemo(python_func, _localns, args=args, kwargs=kwargs)
   1032         check_argument_types(memo)
-> 1033         retval = func(*args, **kwargs)
   1034         try:
   1035             check_return_type(retval, memo)

~/opt/miniconda3/envs/ml_project/lib/python3.8/site-packages/pandas_profiling/profile_report.py in to_file(self, output_file, silent)
    307                 create_html_assets(self.config, output_file)
    308 
--> 309             data = self.to_html()
    310 
    311             if output_file.suffix != ".html":

~/opt/miniconda3/envs/ml_project/lib/python3.8/site-packages/typeguard/__init__.py in wrapper(*args, **kwargs)
   1031         memo = _CallMemo(python_func, _localns, args=args, kwargs=kwargs)
   1032         check_argument_types(memo)
-> 1033         retval = func(*args, **kwargs)
   1034         try:
   1035             check_return_type(retval, memo)

~/opt/miniconda3/envs/ml_project/lib/python3.8/site-packages/pandas_profiling/profile_report.py in to_html(self)
    418 
    419         """
--> 420         return self.html
    421 
    422     def to_json(self) -> str:

~/opt/miniconda3/envs/ml_project/lib/python3.8/site-packages/typeguard/__init__.py in wrapper(*args, **kwargs)
   1031         memo = _CallMemo(python_func, _localns, args=args, kwargs=kwargs)
   1032         check_argument_types(memo)
-> 1033         retval = func(*args, **kwargs)
   1034         try:
   1035             check_return_type(retval, memo)

~/opt/miniconda3/envs/ml_project/lib/python3.8/site-packages/pandas_profiling/profile_report.py in html(self)
    229     def html(self) -> str:
    230         if self._html is None:
--> 231             self._html = self._render_html()
    232         return self._html
    233 

~/opt/miniconda3/envs/ml_project/lib/python3.8/site-packages/typeguard/__init__.py in wrapper(*args, **kwargs)
   1031         memo = _CallMemo(python_func, _localns, args=args, kwargs=kwargs)
   1032         check_argument_types(memo)
-> 1033         retval = func(*args, **kwargs)
   1034         try:
   1035             check_return_type(retval, memo)

~/opt/miniconda3/envs/ml_project/lib/python3.8/site-packages/pandas_profiling/profile_report.py in _render_html(self)
    337         from pandas_profiling.report.presentation.flavours import HTMLReport
    338 
--> 339         report = self.report
    340 
    341         with tqdm(

~/opt/miniconda3/envs/ml_project/lib/python3.8/site-packages/typeguard/__init__.py in wrapper(*args, **kwargs)
   1031         memo = _CallMemo(python_func, _localns, args=args, kwargs=kwargs)
   1032         check_argument_types(memo)
-> 1033         retval = func(*args, **kwargs)
   1034         try:
   1035             check_return_type(retval, memo)

~/opt/miniconda3/envs/ml_project/lib/python3.8/site-packages/pandas_profiling/profile_report.py in report(self)
    223     def report(self) -> Root:
    224         if self._report is None:
--> 225             self._report = get_report_structure(self.config, self.description_set)
    226         return self._report
    227 

~/opt/miniconda3/envs/ml_project/lib/python3.8/site-packages/pandas_profiling/report/structure/report.py in get_report_structure(config, summary)
    376                     items=list(summary["variables"]),
    377                     item=Container(
--> 378                         render_variables_section(config, summary),
    379                         sequence_type="accordion",
    380                         name="Variables",

~/opt/miniconda3/envs/ml_project/lib/python3.8/site-packages/pandas_profiling/report/structure/report.py in render_variables_section(config, dataframe_summary)
    157             variable_type = summary["type"]
    158         render_map_type = render_map.get(variable_type, render_map["Unsupported"])
--> 159         template_variables.update(render_map_type(config, template_variables))
    160 
    161         # Ignore these

~/opt/miniconda3/envs/ml_project/lib/python3.8/site-packages/pandas_profiling/report/structure/variables/render_categorical.py in render_categorical(config, summary)
    405 
    406     if length:
--> 407         length_table, length_histo = render_categorical_length(config, summary, varid)
    408         overview_items.append(length_table)
    409 

~/opt/miniconda3/envs/ml_project/lib/python3.8/site-packages/pandas_profiling/report/structure/variables/render_categorical.py in render_categorical_length(config, summary, varid)
     61             {
     62                 "name": "Max length",
---> 63                 "value": fmt_number(summary["max_length"]),
     64                 "alert": False,
     65             },

KeyError: 'max_length'

Expected Behaviour

Run without error

Data Description

I'm runing the code for dataset comparison. The original code in that link works well. But when I set minimal=True to creat report, then compare the report, there comes a error

Code that reproduces the bug

from pandas_profiling import ProfileReport

train_df = pd.read_csv("train.csv")
train_report = ProfileReport(train_df, title="Train", minimal=True)

test_df = pd.read_csv("test.csv")
test_report = ProfileReport(test_df, title="Test", minimal=True)

comparison_report = train_report.compare(test_report)
comparison_report.to_file("comparison.html")

pandas-profiling version

v3.5.0

Dependencies

pandas                       1.4.2
pandas-profiling             3.5.0

OS

Mac

Checklist

[X] There is not yet another bug report for this issue in the issue tracker
[X] The problem is reproducible from this bug report. This guide can help to craft a minimal bug report.
[X] The issue has not been resolved by the entries listed under Common Issues.

bug 🐛

opened by xiaoye-hua 0

Releases(v3.6.2)

v3.6.2(Jan 2, 2023)
3.6.2 (2023-01-02)

Bug Fixes

comparison alerts (#1229) (bbca61b)

comparison histogram (#1228) (0081581)

comparison report style issues (a465cdd)

update the link for the people-example.csv (2bb5043)

Source code(tar.gz)
Source code(zip)
v3.6.1(Dec 23, 2022)
3.6.1 (2022-12-23)

Bug Fixes

categorical var frequency plot (6cb391f)

remove ipywidgets import (1b8b117)

Source code(tar.gz)
Source code(zip)
v3.6.0(Dec 21, 2022)
3.6.0 (2022-12-21)

Bug Fixes

add css to cope with large tables (7f42f87)

adjust categoricals layout (f0bb45a)

categorical data not being obscured in the common values plot (40236bc)

compare report ignoring config parameter (3d60556)

compare report warnings always showing the last alert type (6b3c13d)

comparison fails when duplicates are disable (#1208) (6d19620)

do no raise exception for percentage formatter (3ea626d)

enforce recomputation of description sets (a9fd1c8)

error comparing only one precomputed profile (00646cd)

html: sensible cloud-platform notebook html rendering (b22ece2)

ignoring config of precomputed reports (6478c40)

only compute auto correlation when no config is specified (d5d4f58)

remove malfunctioning hook (e2593f5)

remove unused test (2170338)

return the proper type for widgets (4c0b358)

set compute default to false (c70e491)

solve mypy error (9c4266e)

solve mypy issue (e3e7788)

uses colors from the specified config (c0c556d)

utils: use 'urllib.request' instead of 'requests' (#1177) (e4d020b), closes #1168

Features

add heatmap values as a table under correlations (fc5da9e)

allow to specify the configuration for the comparison report (ad725b0)

design improvements on the correlations section (e5cd8cf)

implement imbalanced warning (ce84c81)

update variables layout (#1207) (cf0e0a7)

Source code(tar.gz)
Source code(zip)
v3.5.0(Nov 22, 2022)
3.5.0 (2022-11-22)

Bug Fixes

change context managed backend (#1149) (11e1a8a)

dataset names on comparison report (#1159) (3c14d43)

duplicate key in test dict (#1126) (d19affe)

improve description and correct plot for ‘auto’ correlation (#1119) (2617b92)

remove correlation calculation for constants (#1152) (1ed2bc0)

time series render format (#1157) (39ca8ce)

update config files to only calculate 'auto' correlation (#1158) (34cf73d)

update repository links (#1141) (c742c5d)

Features

add typechecking to profile report (#1139) (ec8ece0)

report comparison example (#1160) (5e75fd2)

report comparisons (#1069) (70ee5c7), closes #1137 #1136 #1143 #1148 #1150

Source code(tar.gz)
Source code(zip)
v3.4.0(Oct 20, 2022)
3.4.0 (2022-10-20)

Bug Fixes

correlation auto passing extra parameters (#1114) (21f4fe6)

cramer's correlation fails with missings vals (#1109) (8e7f8b2)

drop joblib dependency (#1090) (586cef3), closes #1056

fix linter errors (#1117) (5f17cfd)

make tangled-up-in-unicode an optional dependency (#1070) (e6b2a00)

remove unused imports (56beed4)

remove unused imports (66864c1)

Remove unused imports. (985fbd1)

Features

add support for Pandas 1.5 (#1076) (5c5a710)

added filter to locate columns (#1115) (c2f817d)

introduce auto parameter for correlations (#1095) (4d2e415)

Source code(tar.gz)
Source code(zip)
v3.3.0(Sep 7, 2022)

The full changelog is available here: https://pandas-profiling.ydata.ai/docs/master/pages/reference/changelog.html?highlight=change+log
Source code(tar.gz)
Source code(zip)
v3.2.0(May 2, 2022)

The full changelog is available here: https://pandas-profiling.ydata.ai/docs/master/pages/reference/changelog.html?highlight=change+log
Source code(tar.gz)
Source code(zip)
v3.1.0(Sep 27, 2021)

The full changelog is available here: https://pandas-profiling.github.io/pandas-profiling/docs/master/rtd/pages/changelog.html.
Source code(tar.gz)
Source code(zip)
v3.0.0(May 11, 2021)

The full changelog is available here: https://pandas-profiling.github.io/pandas-profiling/docs/master/rtd/pages/changelog.html.
Source code(tar.gz)
Source code(zip)
v2.13.0(May 8, 2021)

The full changelog is available here: https://pandas-profiling.github.io/pandas-profiling/docs/master/rtd/pages/changelog.html.
Source code(tar.gz)
Source code(zip)
v2.12.0(May 5, 2021)

The full changelog is available here: https://pandas-profiling.github.io/pandas-profiling/docs/master/rtd/pages/changelog.html.
Source code(tar.gz)
Source code(zip)
v2.11.0(Feb 20, 2021)

The full changelog is available here: https://pandas-profiling.github.io/pandas-profiling/docs/master/rtd/pages/changelog.html.
Source code(tar.gz)
Source code(zip)
v2.10.1(Feb 7, 2021)

The full changelog is available here: https://pandas-profiling.github.io/pandas-profiling/docs/master/rtd/pages/changelog.html.
Source code(tar.gz)
Source code(zip)
v2.10.0rc1(Jan 5, 2021)

The full changelog is available here: https://pandas-profiling.github.io/pandas-profiling/docs/master/rtd/pages/changelog.html.
Source code(tar.gz)
Source code(zip)
v2.9.0(Sep 3, 2020)

The full changelog is available here: https://pandas-profiling.github.io/pandas-profiling/docs/master/rtd/pages/changelog.html.
Source code(tar.gz)
Source code(zip)
v2.9.0rc1(Jul 12, 2020)

This release candidate improves handling of sensitive data and futhermore reduces technical debt with various fixes. The full changelog is available here: https://pandas-profiling.github.io/pandas-profiling/docs/master/rtd/pages/changelog.html.

A warm thank you to everyone who has contributed to this release: @gauravkumar37 @Jooong @smaranjitghose @XavierBanos Tam Nguyen @andycraig @mgorsk1 @mbh86 @MHUNCHO @GaelVaroquaux @AmauryLepicard @baluyotraf @pvojnisek @abegong
Source code(tar.gz)
Source code(zip)
v2.8.0(May 12, 2020)

pandas-profiling now has build-in supports for Files and Images, such as extracting file sizes, creation dates and dimensions and scanning for truncated images or those containing EXIF information. Moreover, the text analysis features have also been reworked, providing more informative statistics.

Read the changelog v2.8.0 for more details.

Contributors: @loopyme @Bradley-Butcher @willemhendriks, @IscaAy, @frellnick, @dataverz @ieaves
Source code(tar.gz)
Source code(zip)
v2.7.1(May 11, 2020)

Fix #468 by pinning visions to 0.4.1
Source code(tar.gz)
Source code(zip)
v2.7.0(May 7, 2020)

Announcement and changelog are available in the documentation.

We are grateful for @loopyme and @kyleYang for creating parts of the features on this release.

Thanks for all contributors that made this release possible @1313e @dataprofessor @neomatrix369 @jiangfangfangxm @WesleyTheGeolien @NickYi1990 @ricgu8086.
Source code(tar.gz)
Source code(zip)
v2.6.0(Apr 13, 2020)

Dependency policy

The current dependency policy is suboptimal. Pinning the dependencies is great for reproducibility (high guarantee to work), but on the downside requires frequent maintenance and introduces compatibility issues with other packages. Therefore, we are moving away from pinning dependencies and instead specify a minimum version.

Pandas v1

Early releases of pandas v1 demonstrated many regressions that broke functionality (as acknowledged by the authors here). At this point, pandas is more stable and we notice high demand for compatibility. We move on to support pandas' latest versions. To ensure compatibility with both versions, we have extended the test matrix to test against both pandas 0.x.y and 1.x.y.

Python 3.6+ features

Python 3.6 introduces ordered dicts and f-strings, which we now rely on. This means that from pandas-profiling 2.6, you should minimally run Python 3.6. For users that for some reason cannot update, you can use pandas-profiling 2.5.0, but you unfortunately won't benefit from updates or maintenance.

Extended continuous integration

Starting from this release, we use Github Actions and Travis CI combined to increase maintainability. Travis CI handles the testing, Github Actions automates part of the development process by running black and building the docs.
Source code(tar.gz)
Source code(zip)
v2.5.0(Feb 14, 2020)
Progress bar added (#224)

Character analysis for Text/NLP (#278)

Themes: configuration and demo's (Orange, Dark)

Tutorial on modifying the report's structure (#362; #281, #259, #253, #234). This jupyter notebook also demonstrates how to use the Kaggle api together with pandas-profiling.

Toggle descriptions at correlations.

Deprecation:

This is the last version to support Python 3.5.

Stability:

The order of columns changed when sort="None" (#377, fixed).

Pandas v1.0.X is not yet supported (#367, #366, #363, #353, pinned pandas to < 1)

Improved mixed type detection (#351)

Refactor of report structures.

Correlations are more stable (e.g. Phi_k color scale now from 0-1, rows and columns with NaN values are dropped, #329).

Distinct counts exclude NaNs.

Fixed alerts in notebooks.

Other improvements:

Warnings are now sorted.

Links to Binder and Google Colab are added for notebooks (#349)

The overview section is tabbed.

Commit for pandas-profiling v2.5.0

Progress bar added (#224)

Character analysis for Text/NLP (#278)

Themes: configuration and demo's (Orange, Dark)

Tutorial on modifying the report's structure (#362; #281, #259, #253, #234). This jupyter notebook also demonstrates how to use the Kaggle api together with pandas-profiling.

Toggle descriptions at correlations.

Deprecation:

This is the last version to support Python 3.5.

Stability:

The order of columns changed when sort="None" (#377, fixed).

Pandas v1.0.X is not yet supported (#367, #366, #363, #353, pinned pandas to < 1)

Improved mixed type detection (#351)

Refactor of report structures.

Correlations are more stable (e.g. Phi_k color scale now from 0-1, rows and columns with NaN values are dropped, #329).

Distinct counts exclude NaNs.

Fixed alerts in notebooks.

Other improvements:

Warnings are now sorted.

Links to Binder and Google Colab are added for notebooks (#349)

The overview section is tabbed.

Source code(tar.gz)
Source code(zip)
v2.4.0(Jan 8, 2020)
The v2.4.0 release decouples the data structure of reports from the actual rendering. It's now much simpler to change the user interface, whether the user is in a jupyter notebook, webpage, native application or just wants a json view of the data.

We are also proud to announce that we are accepted for the GitHub Sponsor programme. You are cordially invited to support me through this programme, because you want to see me continue working on this project and to boost community funding, GitHub will match your contribution!

Other improvements:

extended configuration with better defaults, including minimal mode for big data (#258, #310)

more example datasets

rejection of highly correlated variables is generalized (#284, #299)

many structural and stability improvements (#254, #274, #239)

Special thanks to @marco-cardoso @ajupton @lvwerra @gliptak @neomatrix369 for their contributions.
Source code(tar.gz)
Source code(zip)
v2.3.0(Jul 27, 2019)
(Experimental) Support for "path" type

Fix numeric precision (#225)

Force labels in missing values diagram for large number of columns (#222)

Add pull request template

Add Census Dataset from the UCI ML Repository

Thanks @bensdm and @huaiweicheng for your valuable contributions to this version!
Source code(tar.gz)
Source code(zip)
v2.2.0(Jul 22, 2019)
New release introducing variable size binning (via astropy), PyCharm integration and various fixes and optimizations.

Added Variable bin sizing via Bayesian Boxing (feature request [#216])

PyCharm integration, console attempts to detect file type.

Fixed bug [#215].

Updated the missingno package to 0.4.2, fixing the font size in the bar diagram.

Various optimizations

Thanks to: @Utsav37 @mansenfranzen @jakevdp
Source code(tar.gz)
Source code(zip)
v2.1.2(Jul 11, 2019)

Fix [#211] and README
Source code(tar.gz)
Source code(zip)
v2.1.1(Jul 11, 2019)
Fix of [#206]

Improve code maintainability of the view (HTML templates, notebook)

Fix bug in dendrogram sizing

Source code(tar.gz)
Source code(zip)
v2.1.0(Jul 6, 2019)
The pandas-profiling release version 2.1.0 includes:

Correlations: correlation calculations are now more fault tolerant ([#51] and [#197]), correlation names in the report are clarified.

Jupyter Notebook: rendering a profiling report is done inside the srcdoc attribute (which fixes [#199]), a full-width option is added and the column layout is improved.

User experience: The table styling and sample section formatting is improved.

Warnings: detection added for categorical variable that is suspected to be of the datetime type.

Documentation and community:

The Contribution page helps users that want to contribute.

Typo's fixed [#195], Thank you @abhilashshakti

Added more examples.

Other bugfixes and improvements:

Add version information to console interface.

Fix: Remove one-time used logger [#202]

Fix: Dealing with string indices [#200]

Contributors: @abhilashshakti @adamrossnelson @manycoding @InsciteAnalytics
Source code(tar.gz)
Source code(zip)
v2.0.3(Jun 23, 2019)

Bugfix on version structure for 2.0.2.
Source code(tar.gz)
Source code(zip)
v2.0.2(Jun 22, 2019)

Revised version structure, fixed recursion preventing installation of dependencies ([#184]).

The setup.py file used to include utils from the package prior to installation. This causes errors when the dependencies are not yet present.
Source code(tar.gz)
Source code(zip)
v2.0.1(Jun 21, 2019)
Add offline support [#177], [#179] and [#180]

Source code(tar.gz)
Source code(zip)

Create HTML profiling reports from pandas DataFrame objects

Related tags

Overview

Pandas Profiling

Announcements

Support pandas-profiling

Examples

Installation

Using pip

Using conda

From source

Documentation

Getting started

Explore deeper

Jupyter Notebook

Saving the report

Large datasets

Command line usage

Advanced usage

Integrations

Great Expectations

Supporting open source

Types

Contributing

Editor integration

PyCharm integration

Other integrations

Dependencies

Comments

Failing environment

Working environment

Release Notes

v1.10.0: SciPy 1.10.0

SciPy 1.10.0 Release Notes

Highlights of this release

New features

scipy.datasets introduction

scipy.integrate improvements

scipy.interpolate improvements

scipy.sparse.linalg improvements

scipy.linalg improvements

scipy.ndimage improvements

scipy.optimize improvements

scipy.signal improvements

scipy.sparse.csgraph improvements

scipy.special improvements

scipy.stats improvements

Deprecated features

Expired Deprecations

Other changes

Authors

Configuration

Release Notes

NumPy 1.24.1 Release Notes

Contributors

Pull requests merged

Checksums

MD5

SHA256

NumPy 1.24 Release Notes

Deprecations

Deprecate fastCopyAndTranspose and PyArray_CopyAndTranspose

Conversion of out-of-bound Python integers

Deprecate msort

np.str0 and similar are now deprecated

Expired deprecations

Compatibility notes

array.fill(scalar) may behave slightly different

is now identical to:

Subarray to object cast now copies

Returned arrays respect uniqueness of dtype kwarg objects

DLPack export raises BufferError

NumPy builds are no longer tested on GCC-6

New Features

New attribute symbol added to polynomial classes

F2PY support for Fortran character strings

New function np.show_runtime

strict option for testing.assert_array_equal

New parameter equal_nan added to np.unique

casting and dtype keyword arguments for numpy.stack

Support `pandas-profiling`

`v1.10.0`: SciPy 1.10.0

`scipy.datasets` introduction

`scipy.integrate` improvements

`scipy.interpolate` improvements

`scipy.sparse.linalg` improvements

`scipy.linalg` improvements

`scipy.ndimage` improvements

`scipy.optimize` improvements

`scipy.signal` improvements

`scipy.sparse.csgraph` improvements

`scipy.special` improvements

`scipy.stats` improvements

Deprecate `msort`

`np.str0` and similar are now deprecated

`array.fill(scalar)` may behave slightly different

DLPack export raises `BufferError`

New attribute `symbol` added to polynomial classes

F2PY support for Fortran `character` strings

New function `np.show_runtime`

`strict` option for `testing.assert_array_equal`

New parameter `equal_nan` added to `np.unique`

`casting` and `dtype` keyword arguments for `numpy.stack`

`casting` and `dtype` keyword arguments for `numpy.vstack`

`casting` and `dtype` keyword arguments for `numpy.hstack`

`np.void` now has a `dtype` argument

`numpy.typing` protocols are now runtime checkable

Faster version of `np.isin` and `np.in1d` for integer arrays

`masked_invalid` now modifies the mask in-place

`nditer`/`NpyIter` allows all allocating all operands