Visual analysis and diagnostic tools to facilitate machine learning model selection.

Last update: Dec 30, 2022

Overview

Yellowbrick

Visual analysis and diagnostic tools to facilitate machine learning model selection.

What is Yellowbrick?

Yellowbrick is a suite of visual diagnostic tools called "Visualizers" that extend the scikit-learn API to allow human steering of the model selection process. In a nutshell, Yellowbrick combines scikit-learn with matplotlib in the best tradition of the scikit-learn documentation, but to produce visualizations for your machine learning workflow!

For complete documentation on the Yellowbrick API, a gallery of available visualizers, the contributor's guide, tutorials and teaching resources, frequently asked questions, and more, please visit our documentation at www.scikit-yb.org.

Installing Yellowbrick

Yellowbrick is compatible with Python 3.4 or later and also depends on scikit-learn and matplotlib. The simplest way to install Yellowbrick and its dependencies is from PyPI with pip, Python's preferred package installer.

$ pip install yellowbrick

Note that Yellowbrick is an active project and routinely publishes new releases with more visualizers and updates. In order to upgrade Yellowbrick to the latest version, use pip as follows.

$ pip install -U yellowbrick

You can also use the -U flag to update scikit-learn, matplotlib, or any other third party utilities that work well with Yellowbrick to their latest versions.

If you're using Anaconda (recommended for Windows users), you can take advantage of the conda utility to install Yellowbrick:

conda install -c districtdatalabs yellowbrick

Using Yellowbrick

The Yellowbrick API is specifically designed to play nicely with scikit-learn. Here is an example of a typical workflow sequence with scikit-learn and Yellowbrick:

Feature Visualization

In this example, we see how Rank2D performs pairwise comparisons of each feature in the data set with a specific metric or algorithm and then returns them ranked as a lower left triangle diagram.

from yellowbrick.features import Rank2D

visualizer = Rank2D(
    features=features, algorithm='covariance'
)
visualizer.fit(X, y)                # Fit the data to the visualizer
visualizer.transform(X)             # Transform the data
visualizer.show()                   # Finalize and render the figure

Model Visualization

In this example, we instantiate a scikit-learn classifier and then use Yellowbrick's ROCAUC class to visualize the tradeoff between the classifier's sensitivity and specificity.

from sklearn.svm import LinearSVC
from yellowbrick.classifier import ROCAUC

model = LinearSVC()
visualizer = ROCAUC(model)
visualizer.fit(X,y)
visualizer.score(X,y)
visualizer.show()

For additional information on getting started with Yellowbrick, view the Quick Start Guide in the documentation and check out our examples notebook.

Contributing to Yellowbrick

Yellowbrick is an open source project that is supported by a community who will gratefully and humbly accept any contributions you might make to the project. Large or small, any contribution makes a big difference; and if you've never contributed to an open source project before, we hope you will start with Yellowbrick!

If you are interested in contributing, check out our contributor's guide. Beyond creating visualizers, there are many ways to contribute:

Submit a bug report or feature request on GitHub Issues.
Contribute a Jupyter notebook to our examples gallery.
Assist us with user testing.
Add to the documentation or help with our website, scikit-yb.org.
Write unit or integration tests for our project.
Answer questions on our issues, mailing list, Stack Overflow, and elsewhere.
Translate our documentation into another language.
Write a blog post, tweet, or share our project with others.
Teach someone how to use Yellowbrick.

As you can see, there are lots of ways to get involved and we would be very happy for you to join us! The only thing we ask is that you abide by the principles of openness, respect, and consideration of others as described in the Python Software Foundation Code of Conduct.

For more information, checkout the CONTRIBUTING.md file in the root of the repository or the detailed documentation at Contributing to Yellowbrick

Yellowbrick Datasets

Yellowbrick gives easy access to several datasets that are used for the examples in the documentation and testing. These datasets are hosted in our CDN and must be downloaded for use. Typically, when a user calls one of the data loader functions, e.g. load_bikeshare() the data is automatically downloaded if it's not already on the user's computer. However, for development and testing, or if you know you will be working without internet access, it might be easier to simply download all the data at once.

The data downloader script can be run as follows:

$ python -m yellowbrick.download

This will download the data to the fixtures directory inside of the Yellowbrick site packages. You can specify the location of the download either as an argument to the downloader script (use --help for more details) or by setting the $YELLOWBRICK_DATA environment variable. This is the preferred mechanism because this will also influence how data is loaded in Yellowbrick.

Note: Developers who have downloaded data from Yellowbrick versions earlier than v1.0 may experience some problems with the older data format. If this occurs, you can clear out your data cache as follows:

$ python -m yellowbrick.download --cleanup

This will remove old datasets and download the new ones. You can also use the --no-download flag to simply clear the cache without re-downloading data. Users who are having difficulty with datasets can also use this or they can uninstall and reinstall Yellowbrick using pip.

Citing Yellowbrick

We would be glad if you used Yellowbrick in your scientific publications! If you do, please cite us using the citation guidelines.

Affiliations

Comments

Conda packages and image test-driven text removal for Miniconda on Appveyor for #744 and #690
Summary

To address the failing imports in Windows VMs with Miniconda:

created a conda environment with conda packages in .appveyor.yml, which fixes #744.

The Matplotlib backend configuration has been addressed by #862. mpl.use('agg') has been working as intended, as explained in that PR.

To minimize the probability of image comparison tests failing in general:

removed x and y labels

changed default value of remove_legend to True

updated baseline images

Before removing the labels and legends, approximately 25% of Windows image comparison tests have required an elevated value of tol in order to pass (227 tests have the check if sys.platform == 'win32'), out of 894 image comparison tests.

When 'win32' is changed to 'wwwww' purely for analysis purposes, ignoring the Windows-specific tol values, the number of failing tests with regular Python on Appveyor is at 24, and with Miniconda, the number is 31. Therefore, the number of Windows tests with RMSE values >0.1 has dropped roughly 90%. Follow-up efforts can remove the instances of Windows-specific tol values.

Builds for above analysis: PyPI/pip Python: https://ci.appveyor.com/project/nickpowersys/yellowbrick/builds/25047957 Anaconda: https://ci.appveyor.com/project/nickpowersys/yellowbrick/builds/25049306

To create the conda environment, I used tests/requirements.txt. The conda packages were determined by the conda solver. The name of the environment is yellowbrick. It is activated directly following the installation of the packages so that any further commands with the environment are done from the new environment.

In general, it can be expected that conda environments are used primarily with conda packages, and therefore using conda packages in Miniconda builds can help to verify conda users' and developers' experience through tests using these environments, including those involving the backend.
ready
opened by nickpowersys 33
Add random sampling and improve performance of Parallel Components viz

Fixes #59 (partially)

Allows for random sampling via "shuffle" parameter, consistent with sklearn train/test split, for instance.

Improves performance by at least an order of magnitude for several test data sets.

@DistrictDataLabs/team-oz-maintainers

opened by thekylesaurus 23
Improvements to Silhouette Visualizer
The following improvements to the Silhouette Visualizer are left over from #91:

Note to contributors: items in the below checklist don't need to be completed in a single PR; if you see one that catches your eye, feel to pick it off the list!

[x] Improve the documentation describing what Silhouette scores are and how to use the visualizer to qualitatively evaluate a clustering solution.

[x] Find a real world example rather than just using make_blobs (note: we also have an example using the Iris dataset; ideally we'd having something a bit more unique to YB that we can add to yellowbrick.datasets module - perhaps this should be a separate issue?).

[x] Instead of hard fixing the limits of the X-axis from -1.0 to 1.0; be more flexible so that the visualizer has a better display (or give the user the option of setting the limits).

[x] Move the cluster identity labels away from the middle and to the y-axis.

[x] Add ability to define cluster colors and improve color selection methodology.

[x] Add a legend/annotation that describes the average clustering coefficient (e.g. label the red axvline)

type: feature priority: medium level: novice type: documentation
opened by bbengfort 22
Ready: Feature 139 confusion matrix
Work in progress branch; initiating pull request for discussion with @rebeccabilbro

So far:

Made ConfusionMatrix class

Added init, fit, score, and draw methods. These mostly mimic the ClassificationReport

created example demo using the music data set I used in my user test. Can clean this up and include it in the commit, or remove this after development of this feature is complete (i.e. if we don't want it in the examples folder)

To discuss:

Should we use imshow like ClassificationReport, or convert over to pcolormesh like seaborn uses for their heatmap?

Hoisting duplicated code between ConfusionMatrix and ClassificationReport - what's worth moving up?

Handling of font size based on image size (so font size gets smaller with increased categories)

Handling of font size when predicted quantities have too-large-to-fit number of digits. Do we force the user over to percent-based heatmap?

Color defaults - force light gray background with medium gray font when the estimate is zero? Use a different categorical label for 100% right prediction squares (e.g. green, but w/ flexibility)?

Handling class imbalance - should we default to percent-of-true confusion matrix?

Handling class imbalance - should we create an option for a treemap-style resizing of row width, to visually demonstrate class sizes of the 'true' values?

type: feature
opened by NealHumphrey 22
Add elbow detection using the "kneedle" method to Elbow Visualizer
I. Are you merging from a feature branch into develop?

Yes

II. Summarize your PR

This PR fixes #764 that aimed to include a feature to annotate the optimal value of 'k' using kneedle method described in https://github.com/arvkevi/kneed .

I have solved the issue in the following way:

By using kneed library which can be added as an optional dependancy.

By plotting a vertical line (black dashed) showing optimal K value.

III. Include a sample plot

In the above plot k=7 is the optimal K.

IV. List any TODOs or questions

Still to do:

Upon getting feedback from the maintainers , it can be decided on whether it is to be added as an optional feature or regular feature.

Questions for the @DistrictDataLabs/team-oz-maintainers:

Is the legend required or not. ? It can be removed if not required.
opened by pswaldia 21

ImportError: cannot import name 'safe_indexing' from 'sklearn.utils'

Describe the bug

I'm trying to import KElbowVisualizer from yellowbrick.cluster, and it is returning the following error:

ImportError: cannot import name 'safe_indexing' from 'sklearn.utils'

To Reproduce

import pandas as pd
from sklearn.cluster import KMeans
from yellowbrick.cluster import KElbowVisualizer

Traceback

mportError: cannot import name 'safe_indexing' from 'sklearn.utils' (~/.venv/lib/python3.8/site-packages/sklearn/utils/__init__.py)
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-3-eb5694100c70> in <module>
      1 import pandas as pd
----> 2 from yellowbrick.cluster import KElbowVisualizer
~/.venv/lib/python3.8/site-packages/yellowbrick/__init__.py in <module>
     37 from .anscombe import anscombe
     38 from .datasaurus import datasaurus
---> 39 from .classifier import ROCAUC, ClassBalance, ClassificationScoreVisualizer
     40 
     41 # from .classifier import crplot, rocplot
~/.venv/lib/python3.8/site-packages/yellowbrick/classifier/__init__.py in <module>
     28 from .confusion_matrix import ConfusionMatrix, confusion_matrix
     29 from .rocauc import ROCAUC, roc_auc
---> 30 from .threshold import DiscriminationThreshold, discrimination_threshold
     31 from .prcurve import PrecisionRecallCurve, PRCurve, precision_recall_curve
     32 
~/Repositories/player-similarity-clusters/.venv/lib/python3.8/site-packages/yellowbrick/classifier/threshold.py in <module>
     28 from sklearn.model_selection import ShuffleSplit
     29 from sklearn.metrics import precision_recall_curve
---> 30 from sklearn.utils import indexable, safe_indexing
     31 from sklearn.utils.multiclass import type_of_target
     32 
ImportError: cannot import name 'safe_indexing' from 'sklearn.utils' (~/.venv/lib/python3.8/site-packages/sklearn/utils/__init__.py)

Desktop (please complete the following information):

OS: Ubuntu 20.04
Python Version: 3.8.5
Yellowbrick Version: 1.2

Additional context

skleaarn Version: 0.24.0

opened by Paulj1989 20

Analyze and address image comparisons for Travis Miniconda builds to fix #690
Since image comparison tests are not yet addressed for Travis as they have been on Appveyor, this PR handles the image test failures and fixes #690. I will

[x] Make Miniconda builds fully active in .travis.yml as long as image comparisons are passing

[x] Assess the roughly 40 failing image comparison tests based on diffs with regular Python on Linux

[x] Determine if there is significant overlap with the set of image comparisons on Appveyor with RMSE>0.1. Are there are any recurring patterns that could potentially be addressed through code across environments?

[x] Recommend how to handle failing comparisons based on the analysis

[x] Based on feedback, indicate the failing tests' status by resolving them primarily through configuration

[x] Update docs by removing reference to past PyQt4 dependency affecting Anaconda on Linux
opened by nickpowersys 19
Extend PCA Visualizer with Component-Feature Strength
Describe the solution you'd like

Provide an optional heatmap and color bar underneath the PCA visualizer (by shifting the lower axes) that shows the magnitude of each feature value to the component. This provides an explanation of which features are contributing the most to which component.

Is your feature request related to a problem? Please describe.

Although we have the biplot mode to plot feature strengths, they can sometimes be visually overlapping or unintelligible, particularly if there is a large number of features.

Examples

Code to generate this:

fig, ax = plt.subplots(figsize=(8, 4)) plt.imshow(pca.components_, interpolation = 'none', cmap = 'plasma') feature_names = list(cancer.feature_names) ax.set_xticks(np.arange(-.5, len(feature_names))); ax.set_yticks(np.arange(0.5, 2)); ax.set_xticklabels(feature_names, rotation=90, ha='left', fontsize=12); ax.set_yticklabels(['First PC', 'Second PC'], va='bottom', fontsize=12); plt.colorbar(orientation='horizontal', ticks=[pca.components_.min(), 0, pca.components_.max()], pad=0.65);

Though we will probably want to use the pcolormesh rather than imshow as in Rank2D, ClassificationReport and ConfusionMatrix. Additionally it might be a tad nicer if the color bar was above the feature plot so that the axes names were the last thing in the chart.

Notes

This idea comes from page 55-56 of Data Science Documentation. I would be happy to include a citation to this in our documentation. (HTML version is here). @mapattacker any thoughts?

See also #476 for other updates to the PCA visualizer.
type: feature level: intermediate
opened by bbengfort 19
n_neighbors defaults for Manifold Visualizer

This PR fixes the n_neighbors concern from Issue #437 , so that the default is None, and the parameter is required for all transformers except for mds and tsne

opened by jimmyshah 18

JointPlotVisualizer not poofing

OS:Linux 18.04LTS Python:3.6.8 Anaconda inc. Yellowbrick:0.9

So,as said by @rebeccabilbro I tried to use fit_transform instead of fit in the JointPlotVisualizer,but no change .I am tracing the code along with my terminal output; CODE:

import pandas as pd
data=pd.read_csv('bikeshare.csv')
X=data[["season","month","hour","holiday","weekday","workingday","weather","temp","feelslike","humidity","windspeed"]]
y=data["riders"]


from yellowbrick.features import JointPlotVisualizer

visualizer=JointPlotVisualizer(feature='temp',target='feelslike')
visualizer.fit_transform(X["temp"],X["feelslike"])
visualizer.poof()

OUTPUT:

[email protected]:~$ cd Desktop/YellowBrick/
[email protected]:~/Desktop/YellowBrick$ python3 yb1.py
[email protected]:~/Desktop/YellowBrick$ python3 yb1.py
[email protected]:~/Desktop/YellowBrick$

No,changes no figures showing up.

I know I am increasing your tasks with my questions,Sorry!

type: question

opened by dnabanita7 18

Add Support for Other ML Libraries

We currently have an experimental wrapper for StatsModels in : https://github.com/DistrictDataLabs/yellowbrick/pull/383

It would be fantastic if we can add visualizer support for Tensorflow, Keras and other classification selection visualizers using ATM.

opened by ndanielsen 18

ConfusionMatrix visualizer error with sklearn models

Hello! I try to do a simple example like:

from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split as tts
from sklearn.linear_model import LogisticRegression
from yellowbrick.classifier import ConfusionMatrix

# We'll use the handwritten digits data set from scikit-learn.
# Each feature of this dataset is an 8x8 pixel image of a handwritten number.
# Digits.data converts these 64 pixels into a single array of features
digits = load_digits()
X = digits.data
y = digits.target

X_train, X_test, y_train, y_test = tts(X, y, test_size =0.2, random_state=11)

model = LogisticRegression(multi_class="auto", solver="liblinear")

# The ConfusionMatrix visualizer taxes a model
cm = ConfusionMatrix(model, classes=[0,1,2,3,4,5,6,7,8,9])

# Fit fits the passed model. This is unnecessary if you pass the visualizer a pre-fitted model
cm.fit(X_train, y_train)

# To create the ConfusionMatrix, we need some test data. Score runs predict() on the data
# and then creates the confusion_matrix from scikit-learn.
cm.score(X_test, y_test)

# How did we do?
cm.show()

The following error raised: AttributeError: 'LogisticRegression' object has no attribute 'classes'

This type of confussion matrix visualization worked a few days ago, could it be due to incompatibility between library versions?

Thank you very much in advance!

opened by AmericaBG 2

Add CodeQL workflow
Hi DistrictDataLabs/yellowbrick!

This is not an automatic, 🤖-generated PR, as you can check in my GitHub profile, I work for GitHub and I am part of the GitHub Security Lab which is helping out with the migration of LGTM configurations to Code Scanning. You might have heard that we've integrated LGTM's underlying CodeQL analysis engine natively into GitHub. The result is GitHub code scanning!

With LGTM fully integrated into code scanning, we are focused on improving CodeQL within the native GitHub code scanning experience. In order to take advantage of current and future improvements to our analysis capabilities, we suggest you enable code scanning on your repository. Please take a look at our blog post for more information.

This pull request enables code scanning by adding an auto-generated codeql.yml workflow file for GitHub Actions to your repository — take a look! We tested it before opening this pull request, so all should be working :heavy_check_mark:. In fact, you might already have seen some alerts appear on this pull request!

Where needed and if possible, we’ve adjusted the configuration to the needs of your particular repository. But of course, you should feel free to tweak it further! Check this page for detailed documentation.

Questions? Check out the FAQ below!

FAQ

Click here to expand the FAQ section

How often will the code scanning analysis run?

By default, code scanning will trigger a scan with the CodeQL engine on the following events:

On every pull request — to flag up potential security problems for you to investigate before merging a PR.

On every push to your default branch and other protected branches — this keeps the analysis results on your repository’s Security tab up to date.

Once a week at a fixed time — to make sure you benefit from the latest updated security analysis even when no code was committed or PRs were opened.

What will this cost?

Nothing! The CodeQL engine will run inside GitHub Actions, making use of your unlimited free compute minutes for public repositories.

What types of problems does CodeQL find?

The CodeQL engine that powers GitHub code scanning is the exact same engine that powers LGTM.com. The exact set of rules has been tweaked slightly, but you should see almost exactly the same types of alerts as you were used to on LGTM.com: we’ve enabled the security-and-quality query suite for you.

How do I upgrade my CodeQL engine?

No need! New versions of the CodeQL analysis are constantly deployed on GitHub.com; your repository will automatically benefit from the most recently released version.

The analysis doesn’t seem to be working

If you get an error in GitHub Actions that indicates that CodeQL wasn’t able to analyze your code, please follow the instructions here to debug the analysis.

How do I disable LGTM.com?

If you have LGTM’s automatic pull request analysis enabled, then you can follow these steps to disable the LGTM pull request analysis. You don’t actually need to remove your repository from LGTM.com; it will automatically be removed in the next few months as part of the deprecation of LGTM.com (more info here).

Which source code hosting platforms does code scanning support?

GitHub code scanning is deeply integrated within GitHub itself. If you’d like to scan source code that is hosted elsewhere, we suggest that you create a mirror of that code on GitHub.

How do I know this PR is legitimate?

This PR is filed by the official LGTM.com GitHub App, in line with the deprecation timeline that was announced on the official GitHub Blog. The proposed GitHub Action workflow uses the official open source GitHub CodeQL Action. If you have any other questions or concerns, please join the discussion here in the official GitHub community!

I have another question / how do I get in touch?

Please join the discussion here to ask further questions and send us suggestions!
opened by jorgectf 0
Radviz error from DataFrame which doesn't have sequantial index
Describe the bug I drew RadViz for below DataFrame (below dataset) below error occured (refer to python code) I think this is becoz of below code [radviz.py] y[i] should be changed to y.iloc[i]

To Reproduce

fig,ax = plt.subplots() rad = RadViz(ax =ax, classes=["not diabete","diabete"]) rad.fit(train_X,train_y) rad.show()

error message KeyError Traceback (most recent call last) File c:\Users\ksd20\Python\Python39\lib\site-packages\pandas\core\indexes\base.py:3800, in Index.get_loc(self, key, method, tolerance) 3799 try: -> 3800 return self._engine.get_loc(casted_key) 3801 except KeyError as err:

File c:\Users\ksd20\Python\Python39\lib\site-packages\pandas_libs\index.pyx:138, in pandas._libs.index.IndexEngine.get_loc()

File c:\Users\ksd20\Python\Python39\lib\site-packages\pandas_libs\index.pyx:165, in pandas._libs.index.IndexEngine.get_loc()

File pandas_libs\hashtable_class_helper.pxi:2263, in pandas._libs.hashtable.Int64HashTable.get_item()

File pandas_libs\hashtable_class_helper.pxi:2273, in pandas._libs.hashtable.Int64HashTable.get_item()

KeyError: 0

The above exception was the direct cause of the following exception:

KeyError Traceback (most recent call last) Cell In [8], line 7 1 fig,ax = plt.subplots() 2 rad = RadViz(ax =ax, 3 classes=["not diabete","diabete"], 4 5 ) ----> 7 rad.fit(train_X,train_y) 8 rad.show()

File c:\Users\ksd20\Python\Python39\lib\site-packages\yellowbrick\features\radviz.py:159, in RadialVisualizer.fit(self, X, y, **kwargs) 137 """ 138 The fit method is the primary drawing input for the 139 visualization since it has both the X and y data required for the (...) 156 Returns the instance of the transformer/visualizer 157 """ 158 super(RadialVisualizer, self).fit(X, y) --> 159 self.draw(X, y, **kwargs) 160 return self

File c:\Users\ksd20\Python\Python39\lib\site-packages\yellowbrick\features\radviz.py:200, in RadialVisualizer.draw(self, X, y, **kwargs) 198 row_ = np.repeat(np.expand_dims(row, axis=1), 2, axis=1) 199 xy = (s * row_).sum(axis=0) / row.sum() --> 200 label = self._label_encoder[y[i]] 202 to_plot[label][0].append(xy[0]) 203 to_plot[label][1].append(xy[1])

File c:\Users\ksd20\Python\Python39\lib\site-packages\pandas\core\series.py:982, in Series.getitem(self, key) 979 return self._values[key] 981 elif key_is_scalar: --> 982 return self._get_value(key) 984 if is_hashable(key): 985 # Otherwise index.get_value will raise InvalidIndexError 986 try: 987 # For labels that don't resolve as scalars like tuples and frozensets

File c:\Users\ksd20\Python\Python39\lib\site-packages\pandas\core\series.py:1092, in Series._get_value(self, label, takeable) 1089 return self._values[label] 1091 # Similar to Index.get_value, but we do not fall back to positional -> 1092 loc = self.index.get_loc(label) 1093 return self.index._get_values_for_loc(self, loc, label)

File c:\Users\ksd20\Python\Python39\lib\site-packages\pandas\core\indexes\base.py:3802, in Index.get_loc(self, key, method, tolerance) 3800 return self._engine.get_loc(casted_key) 3801 except KeyError as err: ... 3805 # InvalidIndexError. Otherwise we fall through and re-raise 3806 # the TypeError. 3807 self._check_indexing_error(key)

KeyError: 0

Dataset [used dataset] not started with index 0 and not sequential

Desktop (please complete the following information):

OS: [Windows 10]

Python Version [3.9]

Yellowbrick Version [1.5]

Additional context Add any other context about the problem here.
opened by KimByoungmo 3

Fix Warnings in Build and Deploy Process

When we deployed v1.5 we received the following warnings and deprecation errors:

python setup.py sdist bdist_wheel
/Users/benjamin/.pyenv/versions/3.10.2/envs/yellowbrick/lib/python3.10/site-packages/setuptools/dist.py:717: UserWarning: Usage of dash-separated 'description-file' will not be supported in future versions. Please use the underscore name 'description_file' instead
  warnings.warn(
Warning: 'classifiers' should be a list, got type 'tuple'
Warning: 'keywords' should be a list, got type 'tuple'
usage: setup.py [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...]
   or: setup.py --help [cmd1 cmd2 ...]
   or: setup.py --help-commands
   or: setup.py cmd --help

python setup.py register
/Users/benjamin/.pyenv/versions/3.10.2/envs/yellowbrick/lib/python3.10/site-packages/setuptools/dist.py:717: UserWarning: Usage of dash-separated 'description-file' will not be supported in future versions. Please use the underscore name 'description_file' instead
  warnings.warn(
Warning: 'classifiers' should be a list, got type 'tuple'
Warning: 'keywords' should be a list, got type 'tuple'
running register
running check
Registering yellowbrick to https://upload.pypi.org/legacy/
Server response (410): Project pre-registration is no longer required or supported, upload your files instead.
twine upload dist/*
Uploading distributions to https://upload.pypi.org/legacy/
Uploading yellowbrick-1.5-py3-none-any.whl
100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 294.5/294.5 kB • 00:00 • 1.9 MB/s
Uploading yellowbrick-1.5.tar.gz
100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 20.0/20.0 MB • 00:01 • 10.8 MB/s

View at:
https://pypi.org/project/yellowbrick/1.5/

Other notes:

update classifiers to Python 3.10
Check to make sure build/deploy is correct (e.g. the wheel build)
Update to API tokens instead of basic login:

During your recent upload or upload attempt of yellowbrick to PyPI, we noticed you used basic authentication (username & password). However, your account has two-factor authentication (2FA) enabled.

In the near future, PyPI will begin prohibiting uploads using basic authentication for accounts with two-factor authentication enabled. Instead, we will require API tokens to be used.

What should I do?

First, generate an API token for your account or project at https://pypi.org/manage/account/token/. Then, use this token when publishing instead of your username and password. See https://pypi.org/help/#apitoken for help using API tokens to publish.

type: task priority: high level: expert

opened by bbengfort 0

Update Validation Curve Docs

Describe the issue We updated the Validation curve so that you can change the marker style. #1258

Now, we need to update the Validation curve documentation to state this new capability.

@DistrictDataLabs/team-oz-maintainers
type: documentation

opened by lwgray 0

Add Sklearn pipeline test for more complicated Visualizers

There are some visualizers that require additional work in order to write sklearn pipeline test. It is likely that the underlying visualizer needs to expose learned attributes needed to generate the visualizers. The following is an example using sklearn pipeline for the InterClusterDistanceMetric visualizer:

AttributeError: 'Pipeline' object has no attribute 'cluster_centers_'

See issues and PR https://github.com/DistrictDataLabs/yellowbrick/issues/1253 https://github.com/DistrictDataLabs/yellowbrick/issues/1248 https://github.com/DistrictDataLabs/yellowbrick/pull/1249

Issue: https://github.com/DistrictDataLabs/yellowbrick/issues/1257 PR: https://github.com/DistrictDataLabs/yellowbrick/pull/1259

Issue: https://github.com/DistrictDataLabs/yellowbrick/issues/1256 PR: https://github.com/DistrictDataLabs/yellowbrick/pull/1262

[ ] Decision Boundaries
[ ] RFECV
[ ] ValidationCurve
[ ] Add a pipeline model input test and quick method test for feature importances
[ ] Add a pipeline model input test and quick method test for alpha selection
[ ] Add a pipeline model input test and quick method test for InterClusterDistanceMetric
[ ] KElbowVisualizer
[ ] SilhouetteVisualizer
[ ] GridSearchColorPlot

Example

     def test_within_pipeline(self):
        """
        Test that visualizer can be accessed within a sklearn pipeline
        """
        X, y = load_mushroom(return_dataset=True).to_numpy()
        X = OneHotEncoder().fit_transform(X).toarray()

        cv = StratifiedKFold(n_splits=2, shuffle=True, random_state=11)
        model = Pipeline([
            ('minmax', MinMaxScaler()), 
            ('cvscores', CVScores(BernoulliNB(), cv=cv))
        ])

        model.fit(X, y)
        model['cvscores'].finalize()
        self.assert_images_similar(model['cvscores'], tol=2.0)

    def test_within_pipeline_quickmethod(self):
        """
        Test that visualizer quickmethod can be accessed within a
        sklearn pipeline
        """
        X, y = load_mushroom(return_dataset=True).to_numpy()
        X = OneHotEncoder().fit_transform(X).toarray()
        
        cv = StratifiedKFold(n_splits=2, shuffle=True, random_state=11)
        model = Pipeline([
            ('minmax', MinMaxScaler()), 
            ('cvscores', cv_scores(BernoulliNB(), X, y, cv=cv, show=False,
                                      random_state=42))
            ])
        self.assert_images_similar(model['cvscores'], tol=2.0)

    def test_pipeline_as_model_input(self):
        """
        Test that visualizer can handle sklearn pipeline as model input
        """
        X, y = load_mushroom(return_dataset=True).to_numpy()
        X = OneHotEncoder().fit_transform(X).toarray()

        cv = StratifiedKFold(n_splits=2, shuffle=True, random_state=11)
        model = Pipeline([
            ('minmax', MinMaxScaler()), 
            ('nb', BernoulliNB())
        ])

        oz = CVScores(model, cv=cv)
        oz.fit(X, y)
        oz.finalize()
        self.assert_images_similar(oz, tol=2.0)

    def test_pipeline_as_model_input_quickmethod(self):
        """
        Test that visualizer can handle sklearn pipeline as model input
        within a quickmethod
        """
        X, y = load_mushroom(return_dataset=True).to_numpy()
        X = OneHotEncoder().fit_transform(X).toarray()

        cv = StratifiedKFold(n_splits=2, shuffle=True, random_state=11)
        model = Pipeline([
            ('minmax', MinMaxScaler()), 
            ('nb', BernoulliNB())
        ])

        oz = cv_scores(model, X, y, show=False, cv=cv)
        self.assert_images_similar(oz, tol=2.0)

@DistrictDataLabs/team-oz-maintainers

level: intermediate

opened by lwgray 0

Releases(v1.5)

v1.5(Aug 21, 2022)
Deployed: Sunday, August 21, 2022 Current Contributors: @stefmolin, @pdamodaran, @SangamSwadiK, @davidgilbertson, @lwgray, @bbengfort, @admo1, @charlesincharge, @uricod, @pdeziel, @rebeccabilbro

Major:

Added WordCorrelationPlot Visualizer

Built tests for using sklearn pipeline with visualizers

Allowed Marker Style to be specified in Validation Curve Visualizer

Fixed get_params for estimator wrapper to prevent AttributeError

Updated missing values visualizer to handle multiple data types and work on both numpy arrays and pandas data frames.

Added pairwise distance metrics to scoring metrics in KElbowVisualizer

Minor

Pegged Numba to v0.55.2

Updated Umap to v0.5.3

Fixed Missing labels in classification report visualizer

Updated Numpy to v1.22.0

Documentation

The Spanish language Yellowbrick docs are now live: https://www.scikit-yb.org/es/latest/

Added Dropping curve documentation

Added new example Notebook for Regression Visualizers

Fixed Typo in PR section of getting started docs

Fixed Typo in rank docs

Updated docstring in kneed.py utility file

Clarified how to run make html in PR template

Infrastructure

Added ability to run linting Actions on PRs

Implemented black code formatting as pre-commit hook

Source code(tar.gz)
Source code(zip)
v1.4(Feb 19, 2022)
Deployed: Saturday, February 19, 2022 Current Contributors: @lwgray, @bbengfort, @falcaopetri, @pkaf, @akx, @pdamodaran, @emarkou, @ndanielsen, @rebeccabilbro, @pdeziel, @busFred, @charlesincharge, @wagner2010

Major:

Upgrade dependencies to support sklearn v1.0, Numpy 1.20+, Scipy 1.6, nltk 3.6.7, and Matplotlib 3.4.1

Implement new set_params and get_params on ModelVisualizers to ensure wrapped estimator is being correctly accessed via the new Estimator methods.

Fix the test dependencies to prevent variability in CI (must periodically review dependencies to ensure we're testing what our users are experiencing).

Change model param to estimator param to ensure that Visualizer arguments match their property names so that inspect works with get and set params and other scikit-learn utility functions.

Minor

Improved argmax handling in DiscriminationThreshold Visualizer

Improved error handling in FeatureImportances Visualizer

Gave option to remove colorer from ClassificationReport Visualizer

Allowed for more flexible KElbow colors that use default palette by default

Import scikit-learn private API _safe_indexing without error.

Remove any calls to set_params in Visualizer __init__ methods.

Modify test fixtures and baseline images to accommodate new sklearn implementation

Temporarily set the numpy dependency to be less than 1.20 because this is causing Pickle issues with joblib and umap

Add shuffle=True argument to any CV class that uses a random seed.

Set our CI matrix to Python and Miniconda 3.7 and 3.8

Bugs

Fixed score label display in PredictionError Visualizer

Fixed axes limit in PredictionError Visualizer

Fixed KElbowVisualizer to handle null cluster encounters

Fixed broken url to pytest fixtures

Fixed random_state to be in sync with PCA transformer

Fixed the inability to place FeatureCorrelations into subplots

Fixed hanging printing impacting model visualizers

Fixed error handling when decision function models encounter binary data

Fixed missing code in README.md

Infrastructure/Housekeeping/documentation

Updated status badges for build result and code coverage

Removed deprecated pytest-runner from testing

Replaced Travis with Github Actions

Changed our master branch to the main branch

Created a release issue template

Updated our CI to test Python 3.8 and 3.9

Managed test warnings

Adds .gitattributes to fix handle white space changes

Updated to use add_css_file for documentation because of deprecation of add_stylesheet

Added a Sphinx build to GitHub Actions for ensuring that the docs build correctly

Switched to a YB-specific data lake for datasets storage

Source code(tar.gz)
Source code(zip)
v1.3.post1(Feb 13, 2021)

Deployed: Saturday, February 13, 2021 Current Contributors: @rebeccabilbro, @bbengfort, @Viveckh

Fixes hanging print impacting ModelVisualizers.
Source code(tar.gz)
Source code(zip)
v1.3(Feb 9, 2021)
Deployed: Tuesday, February 9, 2021 Current Contributors: @bbengfort, @rebeccabilbro, @Paulj1989, @phbillet, @pdamodaran, @pdeziel

This version primarily repairs the dependency issues we faced with scipy 1.6, scikit-learn 0.24 and Python 3.6 (or earlier). As part of the rapidly changing Python library landscape, we've been forced to react quickly to dependency changes, even where those libraries have been responsibly issuing future and deprecation warnings in our code base.

Major Changes:

Implement new set_params and get_params on ModelVisualizers to ensure wrapped estimator is being correctly accessed via the new Estimator methods.

Fix the test dependencies to prevent variability in CI (must periodically review dependencies to ensure we're testing what our users are experiencing).

Change model param to estimator param to ensure that Visualizer arguments match their property names so that inspect works with get and set params and other scikit-learn utility functions.

Minor Changes:

Import scikit-learn private API _safe_indexing without error.

Remove any calls to set_params in Visualizer __init__ methods.

Modify test fixtures and baseline images to accommodate new sklearn implementation

Set the numpy dependency to be less than 1.20 because this is causing Pickle issues with joblib and umap

Add shuffle=True argument to any CV class that uses a random seed.

Set our CI matrix to Python and Miniconda 3.7 and 3.8

Source code(tar.gz)
Source code(zip)
v1.2.1(Jan 15, 2021)

Deployed: Friday, January 15, 2021 Current Contributors: @rebeccabilbro, @bbengfort, @Paulj1989, @mattharrison

On December 22, 2020, scikit-learn released version 0.24 which deprecated the external use of scikit-learn's internal utilities such as safe_indexing. Unfortunately, Yellowbrick depends on a few of these utilities and must refactor our internal code base to port this functionality or work around it. To ensure that Yellowbrick continues to work when installed via pip, we have temporarily changed our scikit-learn dependency to be less than 0.24. We will update our dependencies on the v1.3 release when we have made the associated fixes.
Source code(tar.gz)
Source code(zip)
v1.2(Oct 9, 2020)
Deployed: Friday, October 9, 2020 Current Contributors: @rebeccabilbro, @lwgray, @VladSkripniuk, @Express50, @pdamodaran, @aldermartinez, @tktran, @bbengfort, @melonhead901, @Kautumn06, @ojedatony1616, @eschmier, @wagner2010, @ndanielsen

Major Changes:

Added Q-Q plot as side-by-side option to the ResidualsPlot visualizer.

More robust handling of binary classification in ROCAUC visualization, standardizing the way that classifiers with predict_proba and decision_function methods are handling. A binary hyperparameter was added to the visualizer to ensure correct interpretation of binary ROCAUC plots.

Fixes to ManualAlphaSelection to move it from prototype to prime time including documentation, tests, and quick method. This method allows users to perform alpha selection visualization on non-CV estimators.

Removal of AppVeyor from the CI matrix after too many out-of-core (non-Yellowbrick) failures with setup and installation on the VisualStudio images. Yellowbrick CI currently omits Windows and Miniconda from the test matrix and we are actively looking for new solutions.

Third party estimator wrapper in contrib to provide enhanced support for non-scikit-learn estimators such as those in Keras, CatBoost, and cuML.

Minor Changes:

Allow users to specify colors for the PrecisionRecallCurve.

Update ClassificationScoreVisualizer base class to have a class_colors_ learned attribute instead of a colors property; additional polishing of multi-class colors in PrecisionRecallCurve, ROCAUC, and ClassPredictionError.

Update KElbowVisualizer fit method and quick method to allow passing sample_weight parameter through the visualizer.

Enhancements to classification documentation to better discuss precision and recall and to diagnose with PrecisionRecallCurve and ClassificationReport visualizers.

Improvements to CooksDistance visualizer documentation.

Corrected KElbowVisualizer label and legend formatting.

Typo fixes to ROCAUC documentation, labels, and legend. Typo fix to Manifold documentation.

Use of tight_layout accessing the Visualizer figure property to finalize images and resolve discrepancies in plot directive images in documentation.

Add get_param_names helper function to identify keyword-only parameters that belong to a specific method.

Splits package namespace for yellowbrick.regressor.residuals to move PredictionError to its own module, yellowbrick.regressor.prediction_error.

Update tests to use SVC instead of LinearSVC and correct KMeans scores based on updates to scikit-learn v0.23.

Continued maintenance and management of baseline images following dependency updates; removal of mpl.cbook dependency.

Explicitly include license file in source distribution via MANIFEST.in.

Fixes to some deprecation warnings from sklearn.metrics.

Testing requirements depends on Pandas v1.0.4 or later.

Reintegrates pytest-spec and verbose test logging, updates pytest dependency to v0.5.0 or later.

Added Pandas v0.20 or later to documentation dependencies.

Source code(tar.gz)
Source code(zip)
v1.1(Feb 26, 2020)
Deployed: Wednesday, February 12, 2020 Contributors: @rebeccabilbro @bbengfort @Kautumn06 @lwgray @pdamodaran @wagner2010 @mchestnut91, @mgarod, @shivendra90, @naresh-bachwani, @percygautam, @navarretedaniel, @mmorrison1670, @ekwska, @sjainit

Major Changes:

Quick methods (aka Oneliners), which return a fully fitted finalized visualizer object in only a single line, are now implemented for all Yellowbrick Visualizers. Test coverage has been added for all quick methods. The documentation has been updated to document and demonstrate the usage of the quick methods.

Added Part of Speech tagging for raw text using spaCy and NLTK to POSTagVisualizer.

Minor Changes:

Adds Board of Directors minutes for Spring meeting.

Miscellaneous documentation corrections and fixes.

Miscellaneous CI and testing corrections and fixes.

Source code(tar.gz)
Source code(zip)
v1.0.1(Oct 6, 2019)
Deployed: Sunday, October 6, 2019 Contributors: @rebeccabilbro @Kautumn06 @bbengfort

Major API change: the poof() method is now deprecated, please use show() instead. After a significant discussion with community members, we have deprecated our original "make the magic happen" method due to concerns about the usage of the word. We've renamed the original method to and created a stub method with the original name that issues a deprecation warning and calls show().

Major Changes:

Changes poof() to show().

Updated clustering and regression example notebooks.

Fixes a syntax error in Python 3.5 and earlier.

Updated Manifold documentation to fix example bug.

Added advisors names to the release changelog.

Adds advisory board minutes for Fall 2019.

Updates our Travis-CI semi-secure token for Slack integration.

Source code(tar.gz)
Source code(zip)
v1.0(Aug 29, 2019)
Deployed: Wednesday, August 28, 2019 Contributors: @rebeccabilbro @bbengfort @Kautumn06 @lwgray @pdamodaran @naresh-bachwani @ndanielsen @mrdawson @navarretedaniel @fdion @haleemason @discdiver @joeyzhang823 @jimmyshah @jc-healy @justinormont @arvkevi @mgarod @mike-curry00 @Naba7 @nickpowersys @percygautam @pswaldia @rohit-ganapathy @rwhitt2049 @sangarshanan @souravsingh @thomasjpfan @zjpoh @xingularity

Note: Python 2 Deprecation: Please note that this release deprecates Yellowbrick's support for Python 2.7. After careful consideration and following the lead of our primary dependencies (NumPy, scikit-learn, and Matplolib), we have chosen to move forward with the community and support Python 3.4 and later.

Major Changes:

New JointPlot visualizer that is specifically designed for machine learning. The new visualizer can compare a feature to a target, features to features, and even feature to feature to target using color. The visualizer gives correlation information at a glance and is designed to work on ML datasets.

New PosTagVisualizer is specifically designed for diagnostics around natural language processing and grammar-based feature extraction for machine learning. This new visualizer shows counts of different parts-of-speech throughout a tagged corpus.

New datasets module that provide greater support for interacting with Yellowbrick example datasets including support for Pandas, npz, and text corpora.

Management repository for Yellowbrick example data, yellowbrick-datasets.

Add support for matplotlib 3.0.1 or greater.

UMAPVisualizer as an alternative manifold to TSNE for corpus visualization that is fast enough to not require preprocessing PCA or SVD decomposition and preserves higher order similarities and distances.

Added ..plot:: directives to the documentation to automatically build the images along with the docs and keep them as up to date as possible. The directives also include the source code making it much simpler to recreate examples.

Added target_color_type functionality to determine continuous or discrete color representations based on the type of the target variable.

Added alpha param for both test and train residual points in ResidualsPlot.

Added frameon param to Manifold.

Added frequency sort feature to PosTagVisualizer.

Added elbow detection using the "kneedle" method to the KElbowVisualizer.

Added governance document outlining new Yellowbrick structure.

Added CooksDistance regression visualizer.

Updated DataVisualizer to handle target type identification.

Extended DataVisualizer and updated its subclasses.

Added ProjectionVisualizer base class.

Restructured yellowbrick.target, yellowbrick.features, and yellowbrick.model_selection API.

Restructured regressor and classifier API.

Minor Changes:

Updated Rank2D to include Kendall-Tau metric.

Added user specification of ISO F1 values to PrecisionRecallCurve and updated the quick method to accept train and test splits.

Added code review checklist and conventions to the documentation and expanded the contributing docs to include other tricks and tips.

Added polish to missing value visualizers code, tests, and documentation.

Improved RankD tests for better coverage.

Added quick method test for DispersionPlot visualizer.

BugFix: fixed resolve colors bug in TSNE and UMAP text visualizers and added regression tests to prevent future errors.

BugFix: Added support for Yellowbrick palettes to return colormap.

BugFix: fixed PrecisionRecallCurve visual display problem with multi-class labels.

BugFix: fixed the RFECV step display bug.

BugFix: fixed error in distortion score calculation.

Extended FeatureImportances documentation and tests for stacked importances and added a warning when stack should be true.

Improved the documentation readability and structure.

Refreshed the README.md and added testing and documentation READMEs.

Updated the gallery to generate thumbnail-quality images.

Updated the example notebooks and created a quickstart notebook.

Fixed broken links in the documentation.

Enhanced the SilhouetteVisualizer with legend and color parameter, while also move labels to the y-axis.

Extended FeatureImportances docs/tests for stacked importances.

Documented the yellowbrick.download script.

Added JOSS citation for "Yellowbrick: Visualizing the Scikit-Learn Model Selection Process".

Added new pull request (PR) template.

Added alpha param to PCA Decomposition Visualizer.

Updated documentation with affiliations.

Added a windows_tol for the visual unittest suite.

Added stacked barchart to PosTagVisualizer.

Let users set colors for FreqDistVisualizer and other ax_bar visualizers.

Updated Manifold to extend ProjectionVisualizer.

Check if an estimator is already fitted before calling fit method.

Ensure poof returns ax.

Compatibility Notes:

This version provides support for matplotlib 3.0.1 or greater and drops support for matplotlib versions less than 2.0.

This version drops support for Python 2

Source code(tar.gz)
Source code(zip)
v0.9.1(Feb 6, 2019)
This hotfix adds matplotlib3 support by requiring any version of matplotlib except for 3.0.0 which had a backend bug that affected Yellowbrick. Note that this hotfix includes changes to tests that will need to be resolved when merging from develop (see #725).

Deployed: Tuesday, February 5, 2019

Contributors: @bbengfort, @rebeccabilbro, @ianozsvald, @fdion

Source code(tar.gz)
Source code(zip)
v0.9(Nov 14, 2018)

Deployed: Wednesday, November 14, 2018 Contributors: @rebeccabilbro, @bbengfort, @zjpoh, @Kautumn06, @ndanielsen, @drwaterman, @lwgray, @pdamodaran, @Juan0001, @abatula, @peterespinosa, @jlinGG, @rlshuhart, @archaeocharlie, @dschoenleber, @black-tea, @iguk1987, @mohfadhil, @lacanlale, @agodbehere, @sivu1, @gokriznastic

Major Changes: - Target module added for visualizing dependent variable in supervised models. - Added a prototype for a missing values visualizer to the contrib module. - BalancedBinningReference visualizer for thresholding unbalanced data (undocumented). - CVScores visualizer to instrument cross-validation. - FeatureCorrelation visualizer to compare relationship between a single independent variable and the target. - ICDM visualizer, intercluster distance mapping using projections similar to those used in pyLDAVis. - PrecisionRecallCurve visualizer showing the relationship of precision and recall in a threshold-based classifier. - Enhanced FeatureImportance for multi-target and multi-coefficient models (e.g probabilistic models) and allows stacked bar chart. - Adds option to plot PDF to ResidualsPlot histogram. - Adds document boundaries option to DispersionPlot and uses colored markers to depict class. - Added alpha parameter for opacity to the scatter plot visualizer. - Modify KElbowVisualizer to accept a list of k values. - ROCAUC bugfix to allow binary classifiers that only have a decision function. - TSNE bugfix so that title and size params are respected. - ConfusionMatrix bugfix to correct percentage displays adding to 100. - ResidualsPlot bugfix to ensure specified colors are both in histogram and scatterplot. - Fixed unicode decode error on Py2 compatible Windows using Hobbies corpus. - Require matplotlib 1.5.1 or matplotlib 2.0 (matplotlib 3.0 not supported yet). - Yellowbrick now depends on SciPy 1.0 and scikit-learn 0.20. - Deprecated percent and sample_weight arguments to ConfusionMatrix fit method.

Minor Changes: - Removed hardcoding of SilhouetteVisualizer axes dimensions. - Audit classifiers to ensure they conform to score API. - Fix for Manifold fit_transform bug. - Fixed Manifold import bug. - Started reworking datasets API for easier loading of examples. - Added Timer utility for keeping track of fit times. - Added slides to documentation for teachers teaching ML/Yellowbrick. - Added an FAQ to the documentation. - Manual legend drawing utility. - New examples notebooks for Regression and Clustering. - Example of interactive classification visualization using ipywidgets. - Example of using Yellowbrick with PyTorch. - Repairs to ROCAUC tests and binary/multiclass ROCAUC construction. - Rename tests/random.py to tests/rand.py to prevent NumPy errors. - Improves ROCAUC, KElbowVisualizer, and SilhouetteVisualizer documentation. - Fixed visual display bug in JointPlotVisualizer. - Fixed image in JointPlotVisualizer documentation. - Clear figure option to poof. - Fix color plotting error in residuals plot quick method. - Fixed bugs in KElbowVisualizer, FeatureImportance, Index, and Datasets documentation. - Use LGTM for code quality analysis (replacing Landscape). - Updated contributing docs for better PR workflow. - Submitted JOSS paper.
Source code(tar.gz)
Source code(zip)
v0.8(Jul 12, 2018)

Deployed: Thursday, July 12, 2018 Contributors: @bbengfort, @ndanielsen, @rebeccabilbro, @lwgray, @RaulPL, @Kautumn06, @ariley1472, @ralle123, @thekylesaurus, @lumega, @pdamodaran, @lumega, @chrisfs, @mitevpi, @sayali-sonawane

Major Changes: - Added Support to ClassificationReport - @ariley1472 - We have an updated Image Gallery - @ralle123 - Improved performance of ParallelCoordinates Visualizer @thekylesaurus - Added Alpha Transparency to RadViz Visualizer @lumega - CVScores Visualizer - @pdamodaran - Added fast and alpha parameters to ParallelCoordinates visualizer @bbengfort - Make support an optional parameter for ClassificationReport @lwgray - Bug Fix for Usage of multidimensional arrays in FeatureImportance visualizer @rebeccabilbro - Deprecate ScatterVisualizer to contrib @bbengfort - Implements histogram alongside ResidualsPlot @bbengfort - Adds biplot to the PCADecomposition visualizer @RaulPL - Adds Datasaurus Dataset to show importance of visualizing data @lwgray - Add DispersionPlot Plot @lwgray

Minor Changes: - Fix grammar in tutorial.rst - @chrisfs - Added Note to tutorial indicating subtle differences when working in Jupyter notebook - @chrisfs - Update Issue template @bbengfort - Added Test to check for NLTK postag data availability - @sayali-sonawane - Clarify quick start documentation @mitevpi - Deprecated DecisionBoundary - Threshold Visualization aliases deprecated
Source code(tar.gz)
Source code(zip)
v0.7(May 18, 2018)
Deployed: Thursday, May 17, 2018 Contributors: @bbengfort, @ndanielsen, @rebeccabilbro, @lwgray, @ianozsvald, @jtpio, @bharaniabhishek123, @RaulPL, @tabishsada, @Kautumn06, @NealHumphrey

Changes:

New Feature! Manifold visualizers implement high-dimensional visualization for non-linear structural feature analysis.

New Feature! There is now a model_selection module with LearningCurve and ValidationCurve visualizers.

New Feature! The RFECV (recursive feature elimination) visualizer with cross-validation visualizes how removing the least performing features improves the overall model.

New Feature! The VisualizerGrid is an implementation of the MultipleVisualizer that creates axes for each visualizer using plt.subplots, laying the visualizers out as a grid.

New Feature! Added yellowbrick.datasets to load example datasets.

New Experimental Feature! An experimental StatsModelsWrapper was added to yellowbrick.contrib.statsmodels that will allow user to use StatsModels estimators with visualizers.

Enhancement! ClassificationReport documentation to include more details about how to interpret each of the metrics and compare the reports against each other.

Enhancement! Modifies scoring mechanism for regressor visualizers to include the R2 value in the plot itself with the legend.

Enhancement! Updated and renamed the ThreshViz to be defined as DiscriminationThreshold, implements a few more discrimination features such as F1 score, maximizing arguments and annotations.

Enhancement! Update clustering visualizers and corresponding distortion_score to handle sparse matrices.

Added code of conduct to meet the GitHub community guidelines as part of our contributing documentation.

Added is_probabilistic type checker and converted the type checking tests to pytest.

Added a contrib module and DecisionBoundaries visualizer has been moved to it until further work is completed.

Numerous fixes and improvements to documentation and tests. Add academic citation example and Zenodo DOI to the Readme.

Bug Fixes

Adds RandomVisualizer for testing and add it to the VisualizerGrid test cases.

Fix / update tests in tests.test_classifier.test_class_prediction_error.py to remove hardcoded data.

Deprecation Warnings

ScatterPlotVisualizer is being moved to contrib in 0.8

DecisionBoundaryVisualizer is being moved to contrib in 0.8

ThreshViz is renamed to DiscriminationThreshold.

NOTE: These deprecation warnings originally mentioned deprecation in 0.7, but their life was extended by an additional version.
Source code(tar.gz)
Source code(zip)
v0.6(Mar 19, 2018)
Markdown for GitHub repo:

Deployed: Saturday, March 17, 2018 Contributors: @bbengfort, @ndanielsen, @rebeccabilbro, @lwgray, @Kautumn06, @georgerichardson, @pbs929, @Aylr, @gary-mayfield, @jkeung

Changes

New Feature! The FeatureImportances Visualizer enables the user to visualize the most informative (relative and absolute) features in their model, plotting a bar graph of feature_importances_ or coef_ attributes.

New Feature! The ExplainedVariance Visualizer produces a plot of the explained variance resulting from a dimensionality reduction to help identify the best tradeoff between number of dimensions and amount of information retained from the data.

New Feature! The GridSearchVisualizer creates a color plot showing the best grid search scores across two parameters.

New Feature! The ClassPredictionError Visualizer is a heatmap implementation of the class balance visualizer, which provides a way to quickly understand how successfully your classifier is predicting the correct classes.

New Feature! The ThresholdVisualizer allows the user to visualize the bounds of precision, recall and queue rate at different thresholds for binary targets after a given number of trials.

New MultiFeatureVisualizer helper class to provide base functionality for getting the names of features for use in plot annotation.

Adds font size param to the confusion matrix to adjust its visibility.

Add quick method to the confusion matrix

Tests: In this version, we've switched from using nose to pytest. Image comparison tests have been added and the visual tests are updated to matplotlib 2.2.0. Test coverage has also been improved for a number of visualizers, including JointPlot, AlphaPlot, FreqDist, RadViz, ElbowPlot, SilhouettePlot, ConfusionMatrix, Rank1D, and Rank2D.

Documentation updates, including discussion of Image Comparison Tests for contributors.

Bug Fixes:

Fixes the resolve_colors function. You can now pass in a number of colors and a colormap and get back the correct number of colors.

Fixes TSNEVisualizer Value Error when no classes are specified.

Adds the circle back to RadViz! This visualizer has also been updated to ensure there's a visualization even when there are missing values

Updated RocAuc to correctly check the number of classes

Switch from converting structured arrays to ndarrays using np.copy instead of np.tolist to avoid NumPy deprecation warning.

DataVisualizer updated to remove np.nan values and warn the user that nans are not plotted.

ClassificationReport no longer has lines that run through the numbers, is more grid-like

Deprecation Warnings:

ScatterPlotVisualizer is being moved to contrib in 0.7

DecisionBoundaryVisualizer is being moved to contrib in 0.7

Source code(tar.gz)
Source code(zip)
v0.5(Aug 9, 2017)
Deployed: Wednesday, August 9, 2017 Contributors: @bbengfort, @rebeccabilbro, @ndanielsen, @cjmorale, @JimStearns206, @pbs929, @jkeung

Changes

Added VisualTestCase.

New PCADecomposition Visualizer, which decomposes high dimensional data into two or three dimensions so that each instance can be plotted in a scatter plot.

New and improved ROCAUC Visualizer, which now supports multiclass classification.

Prototype Decision Boundary Visualizer, which is a bivariate data visualization algorithm that plots the decision boundaries of each class.

Added Rank1D Visualizer, which is a one dimensional ranking of features that utilizes the Shapiro-Wilks ranking that takes into account only a single feature at a time (e.g. histogram analysis).

Improved Prediction Error Plot with identity line, shared limits, and r squared.

Updated FreqDist Visualizer to make word features a hyperparameter.

Added normalization and scaling to Parallel Coordinates.

Added Learning Curve Visualizer, which displays a learning curve based on the number of samples versus the training and cross validation scores to show how a model learns and improves with experience.

Added data downloader module to the yellowbrick library.

Complete overhaul of the yellowbrick documentation; categories of methods are located in separate pages to make it easier to read and contribute to the documentation.

Added a new color palette inspired by ANN-generated colors

Bug Fixes:

Repairs to PCA, RadViz, FreqDist unit tests

Repair to matplotlib version check in JointPlot Visualizer

Source code(tar.gz)
Source code(zip)
v0.4.2(May 22, 2017)
Update to the deployment docs and package on both Anaconda and PyPI.

Deployed: Monday, May 22, 2017

Contributors: @bbengfort, @jkeung

Source code(tar.gz)
Source code(zip)
v0.4.1(May 22, 2017)
This release is an intermediate version bump in anticipation of the PyCon 2017 sprints.

The primary goals of this version were to (1) update the Yellowbrick dependencies (2) enhance the Yellowbrick documentation to help orient new users and contributors, and (3) make several small additions and upgrades (e.g. pulling the Yellowbrick utils into a standalone module).

We have updated the Scikit-Learn and SciPy dependencies from version 0.17.1 or later to 0.18 or later. This primarily entails moving from from sklearn.cross_validation import train_test_split to from sklearn.model_selection import train_test_split.

The updates to the documentation include new Quickstart and Installation guides as well as updates to the Contributors documentation, which is modeled on the Scikit-Learn contributing documentation.

This version also included upgrades to the KMeans visualizer, which now supports not only silhouette_score but also distortion_score and calinski_harabaz_score. The distortion_score computes the mean distortion of all samples as the sum of the squared distances between each observation and its closest centroid. This is the metric that K-Means attempts to minimize as it is fitting the model. The calinski_harabaz_score is defined as ratio between the within-cluster dispersion and the between-cluster dispersion.

Finally, this release includes a prototype of the VisualPipeline, which extends Scikit-Learn's Pipeline class, allowing multiple Visualizers to be chained or sequenced together.

Deployed: Monday, May 22, 2017 Contributors: @bbengfort, @rebeccabilbro, @ndanielsen

Changes

Score and model visualizers now wrap estimators as proxies so that all methods on the estimator can be directly accessed from the visualizer

Updated Scikit-learn dependency from >=0.17.1 to >=0.18

Replaced sklearn.cross_validation with model_selection

Updated SciPy dependency from >=0.17.1 to >=0.18

ScoreVisualizer now subclasses ModelVisualizer; towards allowing both fitted and unfitted models passed to Visualizers

Added CI tests for Python 3.6 compatibility

Added new quickstart guide and install instructions

Updates to the contributors documentation

Added distortion_score and calinski_harabaz_score computations and visualizations to KMeans visualizer.

Replaced the self.ax property on all of the individual draw methods with a new property on the Visualizer class that ensures all visualizers automatically have axes.

Refactored the utils module into a package

Continuing to update the docstrings to conform to Sphinx

Added a prototype visual pipeline class that extends the Scikit-learn pipeline class to ensure that visualizers get called correctly.

Bug Fixes:

Fixed title bug in Rank2D FeatureVisualizer

Source code(tar.gz)
Source code(zip)
v0.4(May 4, 2017)
This release is the culmination of the Spring 2017 DDL Research Labs that focused on developing Yellowbrick as a community effort guided by a sprint/agile workflow. We added several more visualizers, did a lot of user testing and bug fixes, updated the documentation, and generally discovered how best to make Yellowbrick a friendly project to contribute to.

Notable in this release is the inclusion of two new feature visualizers that use few, simple dimensions to visualize features against the target. The JointPlotVisualizer graphs a scatter plot of two dimensions in the data set and plots a best fit line across it. The ScatterVisualizer also uses two features, but also colors the graph by the target variable, adding a third dimension to the visualization.

This release also adds support for clustering visualizations, namely the elbow method for selecting K, KElbowVisualizer and a visualization of cluster size and density using the SilhouetteVisualizer. The release also adds support for regularization analysis using the AlphaSelection visualizer. Both the text and classification modules were also improved with the inclusion of the PosTagVisualizer and the ConfusionMatrix visualizer respectively.

This release also added an Anaconda repository and distribution so that users can conda install yellowbrick. Even more notable, we got yellowbrick stickers! We've also updated the documentation to make it more friendly and a bit more visual; fixing the API rendering errors. All-in-all, this was a big release with a lot of contributions and we thank everyone that participated in the lab!

Deployed: Thursday, May 4, 2017 Contributors: @bbengfort, @rebeccabilbro, @ndanielsen, @mattandahalfew, @pdamodaran, @NealHumphrey, @jkeung, @balavenkatesan, @pbwitt, @morganmendis, @tuulihill

Changes

Part of speech tags visualizer -- PosTagVisualizer.

Alpha selection visualizer for regularized regression -- AlphaSelection

Confusion Matrix Visualizer -- ConfusionMatrix

Elbow method for selecting K vis -- KElbowVisualizer

Silhouette score cluster visualization -- SilhouetteVisualizer

Joint plot visualizer with best fit -- JointPlotVisualizer

Scatter visualization of features -- ScatterVisualizer

Added three more example datasets: mushroom, game, and bike share

Contributor's documentation and style guide

Maintainers listing and contacts

Light/Dark background color selection utility

Structured array detection utility

Updated classification report to use colormesh

Added anacondas packaging and distribution

Refactoring of the regression, cluster, and classification modules

Image based testing methodology

Docstrings updated to a uniform style and rendering

Submission of several more user studies

Source code(tar.gz)
Source code(zip)
v0.3.3(Feb 22, 2017)
Intermediate sprint to demonstrate prototype implementations of text visualizers for NLP models. Primary contributions were the FreqDistVisualizer and the TSNEVisualizer.

The TSNEVisualizer displays a projection of a vectorized corpus in two dimensions using TSNE, a nonlinear dimensionality reduction method that is particularly well suited to embedding in two or three dimensions for visualization as a scatter plot. TSNE is widely used in text analysis to show clusters or groups of documents or utterances and their relative proximities.

The FreqDistVisualizer implements frequency distribution plot that tells us the frequency of each vocabulary item in the text. In general, it could count any kind of observable event. It is a distribution because it tells us how the total number of word tokens in the text are distributed across the vocabulary items.

Deployed: Wednesday, February 22, 2017 Contributors: @rebeccabilbro, @bbengfort

Changes

TSNEVisualizer for 2D projections of vectorized documents

FreqDistVisualizer for token frequency of text in a corpus

Added the user testing evaluation to the documentation

Created scikit-yb.org and host documentation there with RFD

Created a sample corpus and text examples notebook

Created a base class for text, TextVisualizer

Model selection tutorial using Mushroom Dataset

Created a text examples notebook but have not added to documentation.

Source code(tar.gz)
Source code(zip)
v0.3.2(Jan 20, 2017)
Hardened the Yellowbrick API to elevate the idea of a Visualizer to a first principle. This included reconciling shifts in the development of the preliminary versions to the new API, formalizing Visualizer methods like draw() and finalize(), and adding utilities that revolve around Scikit-Learn. To that end we also performed administrative tasks like refreshing the documentation and preparing the repository for more and varied open source contributions.

Deployed: Friday, January 20, 2017 Contributors: @bbengfort , @rebeccabilbro, @StampedPassp0rt

Changes

Converted Mkdocs documentation to Sphinx documentation

Updated docstrings for all Visualizers and functions

Created a DataVisualizer base class for dataset visualization

Single call functions for simple visualizer interaction

Added yellowbrick specific color sequences and palettes and env handling

More robust examples with downloader from DDL host

Better axes handling in visualizer, matplotlib/sklearn integration

Added a finalize method to complete drawing before render

Improved testing on real data sets from examples

Bugfixes

Score visualizer renders in notebook but not in Python scripts.

Tests updated to support new API

Source code(tar.gz)
Source code(zip)
v0.3.1a2(Oct 13, 2016)
Hotfix to solve pip install issues with Yellowbrick.

Deployed: Monday, October 10, 2016 Contributors: Benjamin Bengfort

Changes

Modified packaging and wheel for Python 2.7 and 3.5 compatibility

Modified deployment to PyPI and pip install ability

Fixed Travis-CI tests with the backend failures.

Source code(tar.gz)
Source code(zip)
v0.3a1(Oct 13, 2016)
This release marks a major change from the previous MVP releases as Yellowbrick moves towards direct integration with Scikit-Learn for visual diagnostics and steering of machine learning and could therefore be considered the first alpha release of the library. To that end we have created a Visualizer model which extends sklearn.base.BaseEstimator and can be used directly in the ML Pipeline. There are a number of visualizers that can be used throughout the model selection process:

Feature Analysis

Model Selection

Hyperparameter Tuning

In this release specifically we focused on visualizers in the data space for feature analysis and visualizers in the model space for scoring and evaluating models. Future releases will extend these base classes and add more functionality.

Deployed: Sunday, October 9, 2016 Contributors: Benjamin Bengfort, Rebecca Bilbro, Marius van Niekerk

Enhancements

Created an API for visualization with machine learning: Visualizers that are BaseEstimators.

Created a class hierarchy for Visualizers throughout the ML process particularly feature analysis and model evaluation

Visualizer interface is draw method which can be called multiple times on data or model spaces and a poof method to finalize the figure and display or save to disk.

ScoreVisualizers wrap Scikit-Learn estimators and implement fit and predict (pass-throughs to the estimator) and also score which calls draw in order to visually score the estimator. If the estimator isn't appropriate for the scoring method an exception is raised.

ROCAUC is a ScoreVisualizer that plots the receiver operating characteristic curve and displays the area under the curve score.

ClassificationReport is a ScoreVisualizer that renders the confusion matrix of a classifier as a heatmap.

PredictionError is a ScoreVisualizer that plots the actual vs. predicted values and the 45 degree accuracy line for regressors.

ResidualPlot is a ScoreVisualizer that plots the residuals (y - yhat) across the actual values (y) with the zero accuracy line for both train and test sets.

ClassBalance is a ScoreVisualizer that displays the support for each class as a bar plot.

FeatureVisualizers are Scikit-Learn Transformers that implement fit and transform and operate on the data space, calling draw to display instances.

ParallelCoordinates plots instances with class across each feature dimension as line segments across a horizontal space.

RadViz plots instances with class in a circular space where each feature dimension is an arc around the circumference and points are plotted relative to the weight of the feature.

Rank2D plots pairwise scores of features as a heatmap in the space [-1, 1] to show relative importance of features. Currently implemented ranking functions are Pearson correlation and covariance.

Coordinated and added palettes in the bgrmyck space and implemented a version of the Seaborn set_palette and set_color_codes functions as well as the ColorPalette object and other matplotlib.rc modifications.

Inherited Seaborn's notebook context and whitegrid axes style but make them the default, don't allow user to modify (if they'd like to, they'll have to import Seaborn). This gives Yellowbrick a consistent look and feel without giving too much work to the user and prepares us for Matplotlib 2.0.

Jupyter Notebook with Examples of all Visualizers and usage.

Bug Fixes

Fixed Travis-CI test failures with matplotlib.use('Agg').

Fixed broken link to Quickstart on README

Refactor of the original API to the Scikit-Learn Visualizer API

Source code(tar.gz)
Source code(zip)
v0.2(Sep 4, 2016)
Intermediate steps towards a complete API for visualization. Preparatory stages for Scikit-Learn visual pipelines.

Deployed: Sunday, September 4, 2016 Contributors: Benjamin Bengfort, Rebecca Bilbro, Patrick O'Melveny, Ellen Lowy, Laura Lorenz

Changes

Continued attempts to fix the Travis-CI Scipy install failure (broken tests)

Utility function: get the name of the model

Specified a class based API and the basic interface (render, draw, fit, predict, score)

Added more documentation, converted to Sphinx, autodoc, docstrings for viz methods, and a quickstart

How to contribute documentation, repo images etc.

Prediction error plot for regressors (mvp)

Residuals plot for regressors (mvp)

Basic style settings a la seaborn

ROC/AUC plot for classifiers (mvp)

Best fit functions for "select best", linear, quadratic

Several Jupyter notebooks for examples and demonstrations

Source code(tar.gz)
Source code(zip)
v0.1(May 18, 2016)
Created the yellowbrick library MVP with two primary operations: a classification report heat map and a ROC/AUC curve model analysis for classifiers. This is the base package deployment for continuing yellowbrick development.

Deployed: Wednesday, May 18, 2016 Contributors: Benjamin Bengfort, Rebecca Bilbro

Changes

Created the anscombe quartet visualization example

Added DDL specific color maps and a stub for more style handling

Created crplot which visualizes the confusion matrix of a classifier

Created rocplot_compare which compares two classifiers using ROC/AUC metrics

Stub tests/stub documentation

Source code(tar.gz)
Source code(zip)

Visual analysis and diagnostic tools to facilitate machine learning model selection.

Related tags

Overview

Yellowbrick

What is Yellowbrick?

Installing Yellowbrick

Using Yellowbrick

Feature Visualization

Model Visualization

Contributing to Yellowbrick

Yellowbrick Datasets

Citing Yellowbrick

Affiliations

Comments

Summary

I. Are you merging from a feature branch into develop?

II. Summarize your PR

III. Include a sample plot

IV. List any TODOs or questions

FAQ

How often will the code scanning analysis run?

What will this cost?

What types of problems does CodeQL find?

How do I upgrade my CodeQL engine?

The analysis doesn’t seem to be working

How do I disable LGTM.com?

Which source code hosting platforms does code scanning support?

How do I know this PR is legitimate?

I have another question / how do I get in touch?

Releases(v1.5)

v1.5(Aug 21, 2022)

v1.4(Feb 19, 2022)

v1.3.post1(Feb 13, 2021)

v1.3(Feb 9, 2021)

v1.2.1(Jan 15, 2021)

v1.2(Oct 9, 2020)

v1.1(Feb 26, 2020)

v1.0.1(Oct 6, 2019)

v1.0(Aug 29, 2019)

v0.9.1(Feb 6, 2019)

v0.9(Nov 14, 2018)

v0.8(Jul 12, 2018)

v0.7(May 18, 2018)

v0.6(Mar 19, 2018)

Changes

Bug Fixes:

Deprecation Warnings:

v0.5(Aug 9, 2017)

Changes

Bug Fixes:

v0.4.2(May 22, 2017)

v0.4.1(May 22, 2017)

Changes

Bug Fixes:

v0.4(May 4, 2017)

Changes

v0.3.3(Feb 22, 2017)

Changes

v0.3.2(Jan 20, 2017)

Changes

Bugfixes

v0.3.1a2(Oct 13, 2016)

Changes

v0.3a1(Oct 13, 2016)

Enhancements

Bug Fixes

v0.2(Sep 4, 2016)

Changes

v0.1(May 18, 2016)

Changes

Owner

District Data Labs

Python implementation of R package breakDown

Implementation of linear CorEx and temporal CorEx.

Lucid library adapted for PyTorch

A library for debugging/inspecting machine learning classifiers and explaining their predictions

A python library for decision tree visualization and model interpretation.

treeinterpreter - Interpreting scikit-learn's decision tree and random forest predictions.

Pytorch Feature Map Extractor

Many Class Activation Map methods implemented in Pytorch for CNNs and Vision Transformers. Including Grad-CAM, Grad-CAM++, Score-CAM, Ablation-CAM and XGrad-CAM