A Python package to facilitate research on building and evaluating automated scoring models.

Last update: Oct 10, 2022

Overview

Rater Scoring Modeling Tool

Introduction

Automated scoring of written and spoken test responses is a growing field in educational natural language processing. Automated scoring engines employ machine learning models to predict scores for such responses based on features extracted from the text/audio of these responses. Examples of automated scoring engines include Project Essay Grade for written responses and SpeechRater for spoken responses.

Rater Scoring Modeling Tool (RSMTool) is a python package which automates and combines in a single pipeline multiple analyses that are commonly conducted when building and evaluating such scoring models. The output of RSMTool is a comprehensive, customizable HTML statistical report that contains the output of these multiple analyses. While RSMTool does make it really simple to run a set of standard analyses using a single command, it is also fully customizable and allows users to easily exclude unneeded analyses, modify the default analyses, and even include custom analyses in the report.

We expect the primary users of RSMTool to be researchers working on developing new automated scoring engines or on improving existing ones. Note that RSMTool is not a scoring engine by itself but rather a tool for building and evaluating machine learning models that may be used in such engines.

RSMTool is driven by a configuration file that users have to supply. Given the large number of available options, this can get complicated especially for new users. That's why RSMTool can help users generate configuration files interactively via guided prompts! The video below demonstrates this feature.

Getting Started

To get started with RSMTool, please see the extensive official documentation. If you use the Dash app on macOS, you can also download the complete RSMTool documentation for offline use. Go to the Dash preferences, click on "Downloads", then "User Contributed", and search for "RSMTool".

Requirements

Python >=3.7, <3.10
numpy
scipy
scikit-learn
statsmodels
skll
pandas
ipython
jupyter
notebook
seaborn

Contributing

Contributions to RSMTool are very welcome. Please refer to the documentation for how to get started on developing new features or functionality for RSMTool.

Citing

If you are using RSMTool in your work, you can cite it as follows:

MLA

Madnani, Nitin and Loukina, Anastassia. "RSMTool: A Collection of Tools for Building and Evaluating Automated Scoring Models". Journal of Open Source Software 1(3), 2016.

BibTex

@article{MadnaniLoukina2016,
  doi = {10.21105/joss.00033},
  url = {http://dx.doi.org/10.21105/joss.00033},
  year  = {2016},
  month = {jul},
  publisher = {The Open Journal},
  volume = {1},
  number = {3},
  author = {Nitin Madnani and Anastassia Loukina},
  title = {{RSMTool}: A Collection of Tools for Building and Evaluating Automated Scoring Models},
  journal = {{Journal of Open Source Software}}
}

Changelog

See GitHub Releases.

Comments

Fix CmdOption on Python 3.6

This PR fixes #401.

It does so by adding a workaround for the defaults keyword argument for collections.namedtuple that is only supported in Python 3.7+. It also replaces the capture_output keyword argument for subprocess.run() calls that is used in test_cli.py.

It also make sure that Travis CI builds use Python 3.6 and that Azure ones use Python 3.8. This ensures that we are covering the two Python versions most important to us - 3.6 for internal use and the latest (3.8) for external use.

Update: 3.8 support is broken right now (see #403). So, we have to make do with Python 3.7 instead of 3.8 in Azure.

opened by desilinguist 12
Wishlist for interactive generation
Few things to add to interactive generation:

Add -i flag (in addition to --interactive)

Make subgroups an option rather than flag: "Do you have subgroups?" I always forget to set the flag.

Add a question at the end: "Save output to a file?"

enhancement
opened by aloukina 11
Add interactive configuration generation
This PR closes #354 and #397.

It adds interactive configuration generation for all of the tools. Non-interactive or batch-mode configuration generation was previously added by #398.

To try out interactive generation, just run X generate --interactive where X is any of the five RSM command line tools. You can also add --subgroups for X = rsmtool, rsmeval, and rsmcompare.

The actual interaction bit is mostly handled by the excellent prompt_toolkit library. Here's what this this PR does specifically:

The configuration generation (both non-interactive and interactive) is handled by a class called rsmtool.utils.commandline.ConfigurationGenerator. The generate() method handles the non-interactive generation and the interact() method handles the interactive generation.

All of the required fields for the 5 tools and some of the more important optional fields are displayed to the user to provide inputs for them. The appropriate auto-completion and value validation are attached to each fields so that the user is shown appropriate options just like in interactive IPython sessions (which is where prompt_toolkit was spun out of). All of this is handled by the new class rsmtool.utils.commandline.InteractiveField. Essentially, when the interact() method is called, an InteractiveField instance is created for each of the chosen fields with the appropriate completer and validator attached and shown to the user. The values input by the user and then used to create a configuration dictionary and then eventually output as a string.

The chosen fields are contained in rsmtool.utils.constants.INTERACTIVE_MODE_METADATA which also defines the metadata for each field. This metadata is used to structure the interaction model (what to display, how to complete, how to validate etc.). The metadata consists of:

the label to be shown for the field in interactive mode,

the type of the field

whether the field takes a single value or multiple values (think "subgroups")

whether the field takes its value from a fixed list of choices

Testing for this specific functionality was quite tricky. We don't really want to test the prompt_toolkit functionality as that is already tested outside of our code. So, in keeping with the best practices of unit testing, we use mocking to mock the functionality provided by prompt_toolkit and only test that our code is doing the right thing. The new testing classes added here are test_utils.TestInteractiveField and test_utils.TestInteractiveGenerate. The first class checks that InteractiveField works as expected and the second checks that the ConfigurationGenerator.interact() works as expected. Between the two of them, we are covering the whole functionality.

This PR also adds documentation as follows: it adds a main page at the top level called "Auto-generating configuration files" that contains documentation for interactive, non-interactive generation (which was previously in each tool's configuration file page), and also API usage. I also added multiple screencasts that illustrate how the interactive generation works. Finally, I added links to this top-level page from the configuration file pages of all 5 tools and also all 5 tutorials. I updated the API documentation to include the documentation for the ConfigurationGenerator class and its generate() method and pointed to that from the top-level page too.

This PR also removes all imports from __init__.py other than the 5 main API functions. This meant minor changes to the API documentation.
opened by desilinguist 10
Deprecation warning for xlsx

We get the following deprecation warning when using xlsx:

python3.7/site-packages/xlrd/xlsx.py:266: PendingDeprecationWarning: This method will be removed in future versions. Use 'tree.iter()' or 'list(tree.iter())' instead.
enhancement

opened by aloukina 10
Make check in RSMTool more explicit

There's a currently a place in RSMTool where we raise a ValueError if a coefficients file already exists in the output directory. There's no documentation as to why this is being checked.

From @aloukina, this is why we do this:

This is what I think it’s doing in a roundabout way: it uses the existence of coefficients file as a shortcut to establish that we are dealing with a linear model. If you first run a LR experiment and then followed it with SVR using the same id and the same output_directory, you’ll get this error. The coefficients file will be there but the modeler will not be able to scale them because these lines will fail.

We should replace this roundabout check with a much more explicit check, e.g., simply testingwhether predconfig.get_coefficients() fails? That would make it much more readable.
bug

opened by desilinguist 8
Add config generator

Several uses expressed interest in config generator. The concern is that it is sometimes hard to remember all fields and there names.

We could have a script that you run as

generate_config rsmtool --groups --consistency >config.json

which generates a json file with all fields. The defaults would be prefilled and the obligatory fields will be filled with "ENTER_INFO_HERE". It would also be really helpful to list all sections since it will make it easy to delete the ones you don't want to run.

In addition to CL script, I'd love this functionality exposed via API for using in Python notebooks.

The --groups and --consistency flags are necessary since some of the fields are conditioned on whether you want these.

Another idea we discussed is to have the script prompt for user input (e.g. to allow tab completion for paths).
enhancement

opened by aloukina 8
Fixed Regex in parse_json_with_comments

Regex to remove comments from JSON file in parse_json_with_comments was incorrectly matching URLs like "https://...".

Fixed by adding negative lookbehind to regex.

Added additional unit tests for parse_json_with_comments

Closes #492.

opened by Frost45 7
LinearSVR fails to converge in new version

After switching to scipy 1.4.1 we get the following warning in tests: envs/rsmtool_skll/lib/python3.7/site-packages/sklearn/svm/base.py:929: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.

I have re-run the tests using old environment and can confirm that this warning is new.

opened by aloukina 7
Update conda recipe and requirements
This PR is intended to address #297, but it includes several additional changes that should to be reviewed closely.

Here is a quick rundown of the changes:

Removed conda_requirements.txt. Previously, we were maintaining two sets of requirements: requirements.txt and conda_requirements.txt. The second was being used to create conda environments (using conda create -n foo --file conda_requirements.txt). However, since these sets of requirements were virtually identical and both can be used to create conda environments, I discarded the conda_requirements.txt file. The one major difference between these files is that conda_requirements.txt had zeromq. This cannot be in the requirements.txt file, since it leads to an error when using pip install -e .. I think this is all right, since it's not entirely necessary (and may be installed by Jupyter anyway?), but we can add it to the meta.yaml separately, if it's a problem. (See below on the updates to meta.yaml.)

Switched to requirements.txt in .travis.yml. Related to the above, our travis.yml file had previously used conda_requirements.txt to set up the conda environment for our builds. This has been changed to use requirements.txt in our Travis build.

Updated requirements.txt. Some minor tweaks were also made to the requirements.txt file, mostly reordering the list of requirements. I also removed scikit-learn (since this is already a skll dependency).

Updated meta.yaml to use setup.py. Previously, the meta.yaml file was listing all of the requirements manually, which meant we had to ensure these requirements were always up-to-date. I changed this configuration file to instead rely on the setup.py data, and pull the requirements directly from install_requires. This update takes advantage of Jinja templating. I tested this on my Mac, and it worked. I am going to test it on a Windows machine today, and it would probably be good to have others test it on their machines, as well.

Re-generated environment.yml. Finally, I re-generated the environment.yml to ensure that it was consistent with our requirements.txt file.

Add missing test file. Added test_container.py to .travis.yml and Azure, per #310.

enhancement
opened by jbiggsets 7
Better handling of `.flilepath` attribute of `Configuration`

Currently if Configuration is loaded from the dictionary the .filepath attribute is set to None. This breaks things in the Preprocessor which calls for configdir = dirname(abspath(config_obj.filepath)) (l. 1565).

We probably want to set .filepath to cwd?
bug

opened by aloukina 7
Pip installation succeeds without installing dependencies
I think it should probably fail or at least specify that the dependencies need to be installed.

mkvirtualenv rsmtool-tester --python=/usr/bin/python3 cd $checkoutdir pip install . ... # succeeds quietly

enhancement
opened by jkahn 7
RSMTool fairness section is broken with numpy 1.24
This is because of a change in the numpy API where they have deprecated a bunch of stuff:

AttributeError: module 'numpy' has no attribute 'warnings'

This will be fixed in the next release. For now, the workaround is to downgrade numpy to 1.23.5.
dependency
opened by desilinguist 0

Draft: Integrate feature preprocessor as step in SKLL learner pipeline

The basic idea is that one of the outputs of running RSMTool should be a model file that can be loaded and used immediately with the same type of raw features used to run the original experiment. This PR adds a named step to the SKLL learner pipeline and then also saves the pipeline separately.

In [1]: import joblib

In [2]: model = joblib.load(open("output/ASAP2.pipeline.model", "rb"))

In [3]: ! head -2 train.csv
ID,DISCOURSE,ORGANIZATION,GRAMMAR,MECHANICS,LENGTH,score,score2
RESPONSE_1,4.93806460126142,-0.0846667513334603,-0.316793975540994,4.65591397849462,279,3,3

In [4]: ! head -2 output/ASAP2_pred_train.csv
spkitemid,raw,sc1,scale,raw_trim,raw_trim_round,scale_trim,scale_trim_round
RESPONSE_1,3.467158796079344,3.0,3.487689689334681,3.467158796079344,3,3.487689689334681,3

In [5]: model.predict([{"DISCOURSE": 4.93806460126142, "ORGANIZATION": -0.0846667513334603, "GRAMMAR": -0.316793975540994, "MECHANICS": 4.65591397849462}])
Out[5]: array([3.4671588])

opened by mulhod 2

Add support for sections in rsmxval

The initial version of RSMTool cross-validation (rsmxval) does not support customizing report sections in any form. Support for this should be added in the next version.

opened by desilinguist 0
Runs both black and flynt formatters against the entire codebase

@desilinguist ,

This PR addresses #530 and since we're also planning to include flynt as part of the pre-commit hooks (PR #551 ), I've branched off of that branch and run both flynt and black against the entire code-base.

This seems to have resulted in quite a lot of changes - please take your time with the review and let me know if I can run any additional tests aside from the nosetests.

Thanks!

opened by srhrshr 2
Runs flynt on the code-base and adds flynt a pre-commit check

@desilinguist ,

This PR addresses #550. I've run flynt on the codebase and added it to the pre-commit config.

I see that there is no existing pre-commit config to this repo - let me know if you want me to mirror the config used on skll - it currently only contains flynt

Thanks!

opened by srhrshr 2

Releases(v9.0.1)

v9.0.1(Dec 8, 2022)
What's Changed

This is a minor bugfix release.

Delete the stable branch by @desilinguist in https://github.com/EducationalTestingService/rsmtool/pull/573

Disallow negative confidence intervals in fairness plots since they cause new versions of pandas to break by @desilinguist in https://github.com/EducationalTestingService/rsmtool/pull/574

Add workaround for broken SVGs in nbconvert by overriding clean_html by @desilinguist in https://github.com/EducationalTestingService/rsmtool/pull/575

Fix bug for integer IDs when using rsmxval by @desilinguist in https://github.com/EducationalTestingService/rsmtool/pull/577

Update SKLL dependency to v3.1.0 by @desilinguist in https://github.com/EducationalTestingService/rsmtool/pull/578

Full Changelog: https://github.com/EducationalTestingService/rsmtool/compare/v9.0.0...v9.0.1
Source code(tar.gz)
Source code(zip)
v9.0.0(Mar 10, 2022)
This is a major new release. It includes new functionality and breaking changes to the API as well as to dependencies.

⚡️ RSMTool 9.0 is incompatible with previous versions ⚡️

💡 New features 💡

Dependencies

RSMTool is now compatible with SKLL v3.0 and, therefore, scikit-learn v1.0.2.

RSMTool now supports Python 3.10, in addition to 3.8 and 3.9. Python 3.7 is no longer supported.

tqdm is now a required dependency.

Native cross-validation support

Add native support for cross-validation experiments to RSMTool. Using a single train-test split may lead to biased estimates of performance since those estimates will depend on the specific characteristics of that split. However, using cross-validation instead can provide more accurate estimates of scoring model performance since those estimates are averaged over multiple train-test splits that are randomly selected based on the data.

Add new command-line utility rsmxval to run cross-validation experiments. Underlyingly, it leverages the RSMTool API functions run_experiment(), run_evaluation(), and run_summary() to generate multiple useful reports for the users.

Add support for automated configuration generation to rsmxval in both batch and interactive mode.

Add comprehensive documentation on how to run cross-validation experiments.

Add comprehensive functional tests for cross-validation.

API Changes

Add two new logging functions in rsmtool.utils.logging. These are only meant to be used by RSMTool developers, not users.

Factor out the code that was used to write a dataframe to disk into a separate utility method DataWriter.write_frame_to_disk() so that it an also be used by rsmxval. This can prove useful to advanced RSMTool users as well.

Add new cross-validation specific utility functions to rsmtool.utils.cross_validation.

Convert several class or static methods in various classes to instance methods in order to allow for passing and using an optional logger instance.

Tweak the check_scaled_coefficients() test utility function to take the output directory as an argument instead of taking an experiment name to allow its usage for rsmxval functional tests.

🛠 Bugfixes & Improvements 🛠

Fix the behavior of the use_thumbnails option in RSMTool configuration files. It was generating both the thumbnail as well as the full-sized figure due to the behavior of Matplotlib’s savefig(). The solution was to turn off interactive plotting in all header notebooks.

Replace deprecated methods and keywords in RSMTool code as recommended by the latest versions of pandas, numpy, and scikit-learn.

Fix several duplicate target warnings when compiling the documentation. Make sure included RST files have an extension of .rst.inc so that they are not compiled twice. Turn all web links into anonymous references so that there are no conflicts with the same target names.

Make feature boxplots for subgroups in reports more flexible in terms of the number of features. Specifically, if the experiment has more than 150 features, no boxplots are shown. Previously this limit was 30. In addition, the message that the boxplots have been omitted is displayed more prominently when it happens. Finally, if the number of features is > 30 but <=150, a new message asking the user to enable thumbnails is shown.

Update Gitlab CI plan to use Python 3.8 and Azure Pipelines to use Python 3.10. Add new cross-validation tests to both CI plans.

Source code(tar.gz)
Source code(zip)
v8.1.2(Jul 20, 2021)
This is a bugfix release.

Update the code for compatibility with pandas 1.3.0 and scikit-learn 0.24.2.

Source code(tar.gz)
Source code(zip)
v8.1.1(Jun 3, 2021)
This is a bugfix release with some minor improvements.

Continuous integration build for RSMTool migrated from Travis CI to Gitlab CI.

Minor bug fixed in parse_json_with_comments to handle URLs correctly.

Minor updates to warnings and documentation.

Source code(tar.gz)
Source code(zip)
v8.1.0(Mar 3, 2021)
This is a minor but backwards-incompatible release which includes changes necessary to make RSMTool compatible with SKLL v2.5.

What's new

RSMTool is now compatible with SKLL 2.5!

💥 Breaking Changes 💥

Python 3.6 is no longer officially supported since the latest versions of pandas and numpy have dropped support for it. RSMTool officially supports Python 3.7, 3.8, and 3.9.

RSMTool no longer supports .xls files. For users who use Excel to prepare their data, we continue supporting xlsx files.

Models trained with earlier versions of RSMTool can no longer be used to generate predictions. If you use rsmpredict or compute_and_save_predictions to generate predictions based on existing models, you will need to re-train the models.

Source code(tar.gz)
Source code(zip)
v8.0.2(Sep 30, 2020)
This is a bugfix release with some minor improvements.

The version of nbconvert used by RSMTool is now pinned to <6.0 due to a change in v6.0 and above that broke RSMTool report generation. We will remove the pin in a future release when the upstream issue is fixed.

RSMTool reports no longer displays a pie chart for the model coefficients if any of the coefficients are negative.

Minor updates for compatibility with external packages.

Minor updates to warnings and documentation.

Source code(tar.gz)
Source code(zip)
v8.0.1(Aug 7, 2020)
This is a bugfix release with some minor improvements.

Update the code for compatibility with pandas 1.1.0.

prmse_true no longer raises an error if there are no double-scored responses. Instead the function displays a warning and returns None.

Command line tools rsmtool, rsmeval, rsmpredict, rsmcompare and rsmsummarize no longer raise an error if a user does not provide any command line arguments. Instead the tools display the help message.

Minor updates to documentation.

Improvements to the testing and coverage measurement process.

Source code(tar.gz)
Source code(zip)
v8.0.0(May 11, 2020)
This is a major new release. It includes a lot of new functionality and multiple changes to the API.

⚡️ RSMTool 8.0 is backwards incompatible with previous versions ⚡️

💡 New features 💡

Dependencies

RSMTool is now compatible with SKLL v2.1

All dependencies other than skll are now unpinned.

RSMTool now supports Python versions 3.6, 3.7 and 3.8.

Interactive generation of configuration files

Configuration files for rsmtool, rsmeval, rsmpredict, rsmcompare and rsmsummarize can now be generated automatically, either interactively or non-interactively. This exciting new functionality makes it easier to keep track of the many configuration options available in RSMTool and greatly simplifies the process of setting up the experiment. Watch the video demonstrating the new interactive generation or read the documentation.

Passing hyperparameters to SKLL models

It is now possible to pass custom hyperparameter values to skll learners used through RSMTool. This is done using a new configuration field skll_fixed_parameters. The parameters are also displayed in the report.

Generalized version of PRMSE

The formula for PRMSE has been updated to a more general version derived by Matthew S. Johnson that allows computation of PRMSE for any number of raters. For two raters, the formula returns the same result as the formula used in previous versions of the tool.

The API now provides a new function prmse_true() which accepts scikit-learn style parameters and returns the PRMSE value.

It is now possible to supply error variance of human raters necessary to compute PRMSE. This can be useful when the experiments require computing this parameter on data other than the evaluation set. This can be done via the rater_error_variance field in the configuration file or by passing the variance as a parameter to prmse_true().

Changes to RSMTool reports

The report now always displays the headers for the "Consistency" and "True score evaluations" sections. If no second score is available, the report will indicate this. If you do not want these section headers to appear in your report, use the general_section field to exclude these sections. TIP: If you use automatic configuration generation, you configuration file will contain the full list of available sections that you can edit to exclude unnecessary sections.

💥 Incompatible Changes 💥

File formats

rsmcompare and rsmsummarize no longer support experiments that were generated with earlier versions of RSMTool. You will need to re-run the experiments that you want to compare or summarize.

rsmtool no longer supports old-style configuration files (not used since v5.5 or earlier).

rsmtool no longer supports feature files in .json format (not used since v5.5 or earlier).

The Intermediate file containing true score evaluations true_score_eval no longer contains variance of human scores. This information can still be obtained from consistency files.

API Changes

The Configuration and ConfigurationParser objects in the configuration_parser module have been fully refactored. A new Configuration object can now be instantiated using a dictionary with keys using the same name as the fields in the configuration file . Validation and normalization is now done as part of initialization. See this PR for more detail.

Configuration objects no longer have a filepath attribute. Use the configdir attribute to indicate what any relative paths in the dictionary are relative to.

Functions in the erstwhile rsmtool.utils module have been moved to new locations. This includes several functions for computing evaluation metrics (agreement, difference_of_standardized_means, partial_correlations, quadratic_weighted_kappa, and standardized_mean_difference). See the API documentation for the new location of these functions.

The API for computing PRMSE has changed. See the API documentation for new functions.

🛠 Bugfixes & Improvements 🛠

v7.1.0 did not allow run_* functions to accept pathlib.Path objects for paths to configuration files. This is now allowed.

Error messages and warnings produced by RSMTool are now more meaningful and consistent.

Multiple changes to improve code readability and consistency.

Source code(tar.gz)
Source code(zip)
v7.1.0(Feb 24, 2020)
This is a minor release which includes changes necessary to make RSMTool compatible with SKLL 2.0.

What's new

RSMTool is now compatible with SKLL 2.0.

The implementation of scipy.stats.pearsonr used in RSMTool to compute Pearson's correlation coefficient has changed. The new implementation is equivalent to the old one in the majority of cases but tends to produce slightly different values for very small N. See https://github.com/EducationalTestingService/rsmtool/issues/343 for further detail.

If you use the Dash app on macOS, you can now download the complete RSMTool documentation for offline use. Go to Dash preferences, click on "Downloads", then "User Contributed", and search for "RSMTool".

The conda package for RSMTool is now available from the official ETS conda channel.

API changes

The run_experiment, run_evaluation, run_comparison, run_summary, and compute_and_save_predictions functions now accept Python dictionaries as input.

The .filepath attribute of Configuration object will be deprecated in a future version and replaced with two new atttributes: configdir and filename. Use join(configdir, filename) if you need the full path to the configuration file.

Other

Minor changes to the documentation.

Many functions used for tests have been refactored for efficiency.

Source code(tar.gz)
Source code(zip)
v7.0.0(Dec 19, 2019)
This is a major release which includes changes to several key evaluation metrics computed by RSMTool.

What's new

Changes to evaluation metrics

The exact definitions of all evaluation metrics and their method of computation are now available in

RSMTool documentation under evaluation metrics.

Changes to evaluation metrics

Quadratic weighted kappa (QWK) for raw, raw_trim, scale and scale_trim scores is now computed on continuous score values using formula suggested by Haberman (2019). In previous versions of RSMTool such continuous score values were rounded to compute QWK.

Subgroup differences are now evaluated using a new metrics "Difference in standardized means". This metrics was designed to be more robust to differences in scale between human and machine scores.

SMD for human-human agreement is now computed using pooled standard deviation of H1 and H2 for the double-scored sample in the denominator.

The default tolerance for score postprocessing is now set to 0.4998 (instead of 0.49998). This may result in small changes to the values of all evaluation metrics for raw_trim and scale_trim scores. See below for new configuration files if you need to define custom tolerance.

New evaluation metrics

Test-theory based evaluations: RSMTool and RSMEval now compute proportional reduction in mean squared error when using system scores to predict true scores.

RSMTool and RSMEval now compute various additional metrics of model fairness suggested in Loukina et al. 2019.

New configuration settings

A new configuration setting experiment_names for RSMSummarize allows specifying custom names for each experiment. These will be used to refer to the experiments in intermediate output files and in the report.

A new configuration setting trim_tolerance allows specifying custom tolerance when trimming scores to ceiling and floor values in RSMTool and RSMEval.

A new configuration setting min_n_per_group allows defining a threshold so that only groups with more than a certain number of members are included into the report. All groups are still included into the intermediate output files.

Other new functionality

.jsonlines format is now one of the supported input file formats.

API changes

Several additional methods for computing standardized mean difference (SMD) are now available via rsmtool.utils.standardized_mean_difference

The new routine for computing QWK is available via rsmtool.utils.quadratic_weighted_kappa

The new metrics differences in standardized means (DSM) is available via rsmtool.utils.difference_of_standardized_means

Functions for computing fairness analyses are now available via rsmtool.fairness_utils.get_fairness_analyses.

Bugfixes

partial_correlations() function has been updated to return a correctly formatted matrix in a situation where the covariance matrix is very close to zero.

The reports have been updated to correctly display plots for features with very long names.

Source code(tar.gz)
Source code(zip)
v6.1.0(Dec 20, 2018)
This is a major release which includes a number of improvements primarily aimed to increase the flexibility of RSMTool API.

What's New

New functionality

RSMTool now supports input files in SAS SAS7BDAT format.

New learner NNLRIterative. This is a new built-in linear regression model that learns empirical OLS regression weights with feature selection using an iterative implementation of non-negative least squares regression.

Custom truncation thresholds. The user can now remove outliers using pre-existing truncation thresholds specified in the features file by using the field use_truncation_thresholds

Users can now run the .ipynb notebook generated from the experiment interactively, without having to set any environment variables. Each experiment now generates a (hidden) environment JSON file, which the notebook will automatically read.

API changes

There is now a separate function utils.standardized_mean_difference() that can be used to compute SMD.

A new function reader.try_to_load_file() allows API user to specify what they want to happen if a file cannot be loaded. The functions can be set to return None, to raise warning, or to raise error.

DataContainer class now includes additional helper methods. These methods allow users to drop() and rename() data frames in the DataContainer, and to select data frames using a specified prefix or suffix with the get_frames() method.

Configuration class now includes several additional helper methods pop() and copy().

utils.get_thumbnail_as_html() now accepts an optional argument path_to_thumbnail which allows using two different paths for thumbnails and full-size images.

Other

Support for seaborn 0.9.0 and statsmodels 0.9.0.

Support for numpy 1.14.0, scipy 1.1.0, and pandas 0.23.0+.

Support for ipython 6.5.0 and notebook 5.7.2.

The documentation incorrectly stated the order of operations in the processing pipeline: the change of feature sign (if applicable) happens after standardization.

If the user specifies a list of features and one of such features has zero variance, the tool now displays the correct error message.

The logging messages displayed by check_flag_column now indicate the partition if different flag columns were used for training and evaluating the model.

Miscellaneous minor bug fixes in the notebooks.

Source code(tar.gz)
Source code(zip)
v6.0.1(May 11, 2018)
This is a bugfix release.

The "System Information" section of the reports now uses pkg_resources instead of pip to get the list of installed packages since pip disallows the use of its internal API starting with v10.

Fix incorrect formatting in the documentation.

Update ipython and notebook package versions in order to address an incompatibility issue with the latest version of the tornado web server that affects interactive use of ipython notebook but not the report generation itself.

Updated the description of the marginal/partial correlation plot in the report.

Source code(tar.gz)
Source code(zip)
v6.0(Feb 28, 2018)
What's new?

This is a major release. The entire code base has been fully refactored to use a much more object-oriented design. This should make it much easier to make improvements and to add extensions. As result, there have been significant changes to the RSMTool API (see link in documentation below for more details).

New features

New learners

New regressors from the latest SKLL release (v1.5.1) have been added to rsmtool.

rsmtool can now be used with both regressors and classifiers from SKLL, including classifiers that produce probabilistic output which can be used to produce expected values as predictions.

See the SKLL documentation for the full list of learners.

Enhanced outputs

Users can now specify the file_format configuration option to save intermediate files in either tsv, csv, or xlsx format.

Users can specify a use_thumbnails configuration option that will embed clickable thumbnails in the HTML report, rather than full-sized images. Upon clicking the thumbnails, full-sized images will be displayed in a new window. This is particularly useful for larger reports with many images, improving both the readability and the loading speed of such reports.

Reports for rsmtool, rsmeval, and rsmsummarize now contain a new section containing links to intermediate files (intermediate_file_paths.ipynb) so that users can now easily inspect these files from the report itself.

New configuration options

Users can now specify features in the configuration file as a list. When providing a list of features, signs or transformations cannot be specified. This makes creating configuration files for simple experiments much easier and faster.

Users can now specify a skll_objective for tuning the SKLL learners used in their experiments.

Users can now specify a flag_column_test configuration option to use different flags for the test file and the training file.

Users can now specify a standardize_features boolean option if they do not want the feature values standardized, which is the default.

New evaluations

rsmtool and rsmeval now compute disattenuated correlations if the data includes two human scores.

Code changes

New helper classes have been added to rsmtool, which allow easy reading, writing, and manipulation of multiple pandas data frames.

container.DataContainer(): A class to encapsulate multiple data frames.

reader.DataReader(): A class to read multiple tabular files into a DataContainer() object.

writer.DataWriter(): A class to write all data frames contained in aDataContainer() object to separate files, with a specified file extension.

The rsmtool module is now installable via pip, in addition to being installable with conda.

preprocessor.trim() can now take both numpy arrays and lists as inputs.

Bugfixes

Fixed warning in rsmcompare when computing summary evaluations.

Previously confusion matrices forced human scores to integers, while score distributions used the value "as is". Now both analyses use rounded human scores.

Length columns are now forced to numeric, if they are non-numeric.

Documentation

Added documentation for refactored API.

Added detailed documentation about how to write RSMTool tests.

Source code(tar.gz)
Source code(zip)
v5.7(Jan 12, 2018)
What's new?

Update Python to v3.6, pandas to v0.22.0 and SKLL to v1.5. This required minor changes to the code and updates to some of the test files.

The conda installation command has changed. See the new command here.

Improvements

The evaluation_by_group notebook in addition to bar plots now includes a table showing the main metrics for each subgroup.

When using the RSMTool API, it is now possible to specify a tolerance keyword argument for trim method. Read more here.

Bugfixes

The differential feature functioning (DFF) plots are now correctly generated using preprocessed feature values. In the previous version, they incorrectly used raw feature values.

In v0.19.0 of scikit-learn, the implementation of explained_variance_ in their PCA implementation underwent some bugfixes. Due to this, the results of PCA analyses no longer match those produced by the previous versions of RSMTool and had to be changed.

Other minor changes

Updated the utility script update_skll_model.py to make it compatible with SKLL v.1.5.

Minor updates for the documentation.

Source code(tar.gz)
Source code(zip)
v5.6(Jul 10, 2017)
This is an important release that has a critical bugfix as well as useful improvements.

Bugfixes

Fixed critical bug in computation of standardized mean differences. The denominator for SMDs should be using population standard deviations, not the ones computed over the subgroups themselves.

Added converters to the notebook header to allow correct treatment of candidate IDs with leading zeros.

Modified the test utility functions to catch discrepancies caused by missing leading zero.

Improvements

The tables generated by rsmsummarize are now saved in the same way as for other tools.

rsmsummarize now shows a table with standardized coefficients for all models.

The predictions for the post-processed training set are now also saved.

Added a new notebook that shows differential feature functioning (DFF) plots by subgroup. To use it, add dff_by_group to the general_sectionconfiguration option. Read more here.

The features that have not been used in the model are now excluded from the datasets before they are sent to SKLL for prediction. This makes the prediction step much faster for large datasets.

When testing whether the feature std. dev. in the training set is zero, we currently set tolerance to 1e-06. This is not sufficient with features with very low values (these can result from an inverse transform of acoustic likelihoods which are logs of very small values). This tolerance is now increased to 1e-07.

Other Minor Changes

Update the utility script update_skll_model.py to allow it to be used with other tools.

Update tests and documentation.

Source code(tar.gz)
Source code(zip)
v5.5.2(Feb 28, 2017)
This is primarily a bug fix release but it also has some improvements.

Bugfixes

The notebooks are fixed so that any plots are now shown in their assigned places (this was broken in v5.5.1 due to the underlying matplotlib dependency being upgraded to v2.0).

Improvements

The widths of the subgroup plots is now more intelligently determined. No more plots with really wide bars when there are only a few groups.

Many of the unnecessary warnings that popped up in the reports and on the terminal are now suppressed and handled in code where appropriate.

Source code(tar.gz)
Source code(zip)
v5.5.1(Feb 14, 2017)
This is a minor bugfix release.

What's new?

Update SKLL requirement to v1.3. This allows us to streamline the RSMTool conda recipe into a single recipe (using the MKL backend instead of OpenBLAS on macOS/Linux)

Update all other conda packages to their latest versions.

Minor fixes and updates to tests.

Source code(tar.gz)
Source code(zip)
v5.5.0(Nov 1, 2016)
This is a major release.

What's new?

New tool: rsmsummarize which can summarize any number of rsmtool experiments and produce a summary report.

All input files can now be in any tabular format (CSV/TSV/XLS/XLSX). This is an improvement over previous releases where input files were required to be CSV files. For more details, see the documentation. This includes the feature description file although the old JSON format is still supported for backwards compatibility (you will get a DeprecationWarning when using that format).

rsmtool now includes a new model ScoreWeightedLR which estimates feature coefficients using weighted least squares regression. The weights are computed as an inverse proportion of total number of responses with a given score level.

rsmtool now produces the feature sub-directory as part of its output for all experiments. Previously, this sub-directory was only produced for experiments with some form of feature selection.

rsmcompare now requires the user to specify a "comparison ID" instead of generating one automatically from the experiment IDs of the two experiments being compared.

Improvements

Improved CSS for HTML report printing.

Several updates and fixes to documentation.

Fix errors in PCA computation when the number of components was smaller than the total number of features.

Use skll API to convert featureset to data frame instead of writing our own function.

Separate the file reading and processing functions in rsmpredict for more modularity.

Wrap longer labels on box plots automatically.

Update package dependencies to latest releases.

Increase report generation timeout to be 60 minutes instead of 10 minutes. This is useful for experiments with very large data files.

Fix bug that had system and human scores reversed in the confusion matrix.

Limit the length of experiment IDs where appropriate such that we don't encounter "filename is too long" OS errors.

Source code(tar.gz)
Source code(zip)
v5.2.1(Sep 13, 2016)

This is a minor release that fixes a bug in how some javascript was loaded in the Jupyter notebooks.
Source code(tar.gz)
Source code(zip)
v5.2(Aug 29, 2016)
This release has minor features and bug fixes.

rsmcompare now includes extra checks to make sure the experiment paths and ids specified by the user actually exist.

Factored out rsmcompare code from the header notebook and moved to comparison.py.

Factored out the float formatting functions from the rsmtool/rsmcompare header notebooks and moved them to utils.py.

Added new tests for comparison.py and the float formatting and highlighting functions in utils.py.

Fixed the bug in rsmcompare which seemed to ignore zero scores in confusion matrices.

Fixed a bug in rsmcompare that prevented the score distribution table from being displayed correctly if the score levels differed between the two models.

Source code(tar.gz)
Source code(zip)
v5.1.1(Aug 26, 2016)
This is a minor bugfix release.

Previously, if rsmpredict was given a model requiring a transformation that could yield Inf/NaN values for new data (e.g. sqrt(-1)), it would raise an error and terminate. Now, it simply excludes such responses and displays a warning.

Updated various conda files to use newer versions of the ipython and notebook packages since there seem to have been some updates that broke older recipes and requirements files.

Source code(tar.gz)
Source code(zip)
v5.1.0(Jul 27, 2016)
This is a major release.

Completely overhauled the documentation. Instead of relying on a collection of loosely organized markdown files, the documentation is much more cohesive and hosted on readthedocs. It now includes a clear introduction to what RSMTool is as well as tutorials.

The RSMTool API is now richer and explicitly documented.

rsmcompare can now compare two rsmeval experiments as well as an rsmtool experiment to an rsmeval experiment.

Code coverage is now automatically computed as part of CI testing.

Expected warnings are now suppressed when running the tests.

Fixed several stylistic issues in the codebase raised by pep8 and pyflakes.

Source code(tar.gz)
Source code(zip)
v5.0.2(Jun 27, 2016)
Added files necessary for submission to the Journal of Open Source Software.

Source code(tar.gz)
Source code(zip)
v5.0.1(Jun 7, 2016)
This is a hotfix release that fixes the following regression:

rsmcompare now does not accidentally swap the old and the new experiments.

Source code(tar.gz)
Source code(zip)
v5.0.0(Jun 4, 2016)
New features

Evaluations on the test set now include R2 and RMSE.

The rsmtool reports now include model fit parameters (R2 and adjusted R2) for the training set.

It is now possible to exclude candidates with less than X responses from model training/evaluation.

rsmcompare can now handle experiments which used SKLL models.

rsmcompare now includes a notebook for consistency between human raters (thanks @bndgyawali!)

Bug fixes

Correct handling of repeated feature names in the feature .json file.

Correct printing of feature coefficients for SKLL models.

Correct handling of quoted boolean values in config .json file.

Fixed rounding and highlighting in feature correlation table.

And several dozen more.

Source code(tar.gz)
Source code(zip)
v4.6.0(Apr 6, 2016)

This is the first GitHub release for RSMTool. Before being open-sourced, RSMTool was an internal research project at the Educational Testing Service.
Source code(tar.gz)
Source code(zip)

A Python package to facilitate research on building and evaluating automated scoring models.

Related tags

Overview

Rater Scoring Modeling Tool

Introduction

Getting Started

Requirements

Contributing

Citing

MLA

BibTex

Changelog

Comments

Releases(v9.0.1)

v9.0.1(Dec 8, 2022)

What's Changed

v9.0.0(Mar 10, 2022)

💡 New features 💡

Dependencies

Native cross-validation support

API Changes

🛠 Bugfixes & Improvements 🛠

v8.1.2(Jul 20, 2021)

v8.1.1(Jun 3, 2021)

v8.1.0(Mar 3, 2021)

What's new

v8.0.2(Sep 30, 2020)

v8.0.1(Aug 7, 2020)

v8.0.0(May 11, 2020)

💡 New features 💡

Dependencies

Interactive generation of configuration files

Passing hyperparameters to SKLL models

Generalized version of PRMSE

Changes to RSMTool reports

💥 Incompatible Changes 💥

File formats

API Changes

🛠 Bugfixes & Improvements 🛠

v7.1.0(Feb 24, 2020)

What's new

API changes

Other

v7.0.0(Dec 19, 2019)

What's new

Changes to evaluation metrics

Changes to evaluation metrics

New evaluation metrics

New configuration settings

Other new functionality

API changes

Bugfixes

v6.1.0(Dec 20, 2018)

What's New

New functionality

API changes

Other

v6.0.1(May 11, 2018)

v6.0(Feb 28, 2018)

What's new?

New features

New learners

Enhanced outputs

New configuration options

New evaluations

Code changes

Bugfixes

Documentation

v5.7(Jan 12, 2018)

What's new?

Improvements

Bugfixes

Other minor changes

v5.6(Jul 10, 2017)

Bugfixes

Improvements

Other Minor Changes

v5.5.2(Feb 28, 2017)

Bugfixes

Improvements