An Explainable Leaderboard for NLP

Overview

ExplainaBoard: An Explainable Leaderboard for NLP

Introduction | Website | Download | Backend | Paper | Video | Bib

Introduction

ExplainaBoard is an interpretable, interactive and reliable leaderboard with seven (so far) new features (F) compared with generic leaderboard.

  • F1: Single-system Analysis: What is a system good or bad at?
  • F2: Pairwise Analysis: Where is one system better (worse) than another?
  • F3: Data Bias Analysis: What are the characteristics of different evaluated datasets?
  • F5: Common errors: What are common mistakes that top-5 systems made?
  • F6: Fine-grained errors: where will errors occur?
  • F7: System Combination: Is there potential complementarity between different systems?

Website

We deploy ExplainaBoard as a Web toolkit, which includes 9 NLP tasks, 40 datasets and 300 systems. Detailed information is as follows.

Task

Task Sub-task Dataset Model Attribute
Sentiment 8 40 2
Text Classification Topics 4 18 2
Intention 1 3 2
Text-Span Classification Aspect Sentiment 4 20 4
Text pair Classification NLI 2 6 7
NER 3 74 9
Sequence Labeling POS 3 14 4
Chunking 3 14 9
CWS 7 64 7
Structure Prediction Semantic Parsing 4 12 4
Text Generation Summarization 2 36 7

Download System Outputs

We haven't released datasets or corresponding system outputs that require licenses. But If you have licenses please fill in this form and we will send them to you privately. (Description of output's format can refer here If these system outputs are useful for you, you can cite our work.

Test Your Results

pip install -r requirements.txt

Description of Each Directory

  • task-[task_name]: fine-grained analysis for each task, aiming to generating fine-grained analysis results with the json format. For example, task-mlqa can calculate the fine-graied F1 scores for different systems, and output corresponding json files in task-mlqa/output/ .

  • meta-eval is a sort of controller, which can be used to start the fine-graind anlsysis of all tasks, and analyze output json files.

    • calculate fine-grained results for all tasks: ./meta-eval/run-allTasks.sh
        cd ./meta-eval/
        ./run-allTasks.sh
    • merge json files of all tasks into a csv file, which would be useful for further SQL import: ./meta-eval/genCSV/json2csv.py
        cd ./meta-eval/genCSV/json2csv.py
        python json2csv.py > explainabord.csv
  • src stores some auxiliary codes.

Submit Your Results

You can submit your system's output by this form following the format description.

Acknowledgement

We thanks all authors who share their system outputs with us: Ikuya Yamada, Stefan Schweter, Colin Raffel, Yang Liu, Li Dong. We also thank Vijay Viswanathan, Yiran Chen, Hiroaki Hayashi for useful discussion and feedback about ExplainaBoard.

Comments
  • Is the current applicable condition of t-test correct?

    Is the current applicable condition of t-test correct?

    opened by tetsuok 22
  • Allowed specification of the metric #dimensions

    Allowed specification of the metric #dimensions

    This PR loosens the restriction that sufficient statistics must be a vector, and allows them to be a tensor with the dimension equal to Metric.stats_ndim().

    It also demonstrates how this works on the NLGMetaEvaluation metric.

    @pfliu-nlp and @odashi : could you please check this PR as a potential solution to the discussion in https://github.com/neulab/ExplainaBoard/pull/527 ?

    (sorry, after sending the review request I made a change of naming from dim->ndim, which I think is more in line with the naming in numpy)

    opened by neubig 12
  • test_generate_system_analysis in integration_tests.summarization_test.SummarizationTest is too slow

    test_generate_system_analysis in integration_tests.summarization_test.SummarizationTest is too slow

    commit 8c514c3d81a079d967d208f8bc330c2f202620bb (#437) increases the execution time of integration_tests.summarization_test.SummarizationTest. When I measured on my GCP VM, the time of the test increased by 430 seconds (from 6 seconds to 436 seconds), which is too slow to run as automated tests in pull requests. Slow tests need to be removed or replaced with more focused and fast tests. In general, having slow tests leads to productivity drains: Time to update pull requests takes longer, developers would try to include large commits into pull requests to work around slow CI time, pull requests become expensive to review, which makes identifying bugs or design flaws in code review difficult.

    Repro steps

    rm -rf ~/.cache/explainaboard
    time python -m unittest -v integration_tests.summarization_test.SummarizationTest
    

    Output

    test_datalab_loader (integration_tests.summarization_test.SummarizationTest) ... skipped 'time consuming'
    test_default_features_dont_modify_condgen (integration_tests.summarization_test.SummarizationTest) ... ok
    test_generate_system_analysis (integration_tests.summarization_test.SummarizationTest) ... WARNING:datalabs.load:Couldn't find a directory or a dataset named 'cnn_dailymail' in this version. It was picked from the master branch on github instead.
    WARNING:datalabs.builder:No config specified, defaulting to: cnn_dailymail/3.0.0
    WARNING:datalabs.builder:Reusing dataset cnn_dailymail (/home/t/.cache/expressai/datalab/cnn_dailymail/3.0.0/3.0.0/6e2f5d689f0225c4f22eb78d11ba7a21399810c5cb853edafe39b1d006a1ff95)
    100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 287113/287113 [06:20<00:00, 755.03it/s]
    100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 287113/287113 [00:29<00:00, 9616.19it/s]
    INFO:explainaboard:caching stats for cnn_dailymail None
    calculating example-level features: 3it [00:00, 51.88it/s]
    calculating token-level features: 3it [00:00, 139.83it/s]
    /home/t/explainaboard-fork/explainaboard/metrics/metric.py:336: DeprecationWarning: Use of keyword argument `alpha` for method `interval` is deprecated. Use first positional argument or keyword argument `confidence` instead.
      return stats_t.interval(
    100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 14/14 [00:00<00:00, 349.50it/s]
    ok
    test_generate_system_human_eval (integration_tests.summarization_test.SummarizationTest) ... skipped 'Not yet fixed in v0.11'
    test_load_tsv (integration_tests.summarization_test.SummarizationTest) ... ok
    
    ----------------------------------------------------------------------
    Ran 5 tests in 438.659s
    
    OK (skipped=2)
    python -m unittest -v integration_tests.summarization_test.SummarizationTest  434.35s user 2.58s system 98% cpu 7:22.46 total
    
    opened by tetsuok 12
  • Use 'confidence' instead of deprecated 'alpha' for scipy.stats.t.interval

    Use 'confidence' instead of deprecated 'alpha' for scipy.stats.t.interval

    Reduced heavy logging uncovered buried DeprecationWarnings in tests. We get the following DeprecationWarning in the tests that invoke scipy.stats.t.interval method:

    test_hits (explainaboard.tests.test_metric.TestMetric) ... /home/runner/work/ExplainaBoard/ExplainaBoard/explainaboard/metrics/metric.py:338: DeprecationWarning: Use of keyword argument `alpha` for method `interval` is deprecated. Use first positional argument or keyword argument `confidence` instead.
    

    This PR fixes the warning as the warning suggests.

    opened by tetsuok 12
  • Cache pip dependencies to speed up CI

    Cache pip dependencies to speed up CI

    This PR attempts to speed up both unit-tests and integration-tests CI jobs. Every CI job spends about 2 minutes on installing pip packages. The step dominates about 90% of the total time of unit-tests and about 30% of the total time of integration-tests. The step to install pip packages can be skipped by creating virtual environments and caching the installed packages onto the environments using actions/cache. Note that actions/[email protected] doesn't support caching installed packages. It only allow to avoid re-downloading by caching downloaded packages from PyPI under ~/.cache/pip.

    Dependencies listed in setup.py are moved to requirements.txt. This is to generate lock files for every Python version from requirements.txt. The generated lock files are used as keys to caches to properly invalidate when dependencies are updated. Unless dependencies are changed, every CI job should be reproducible (with respect to installing pip dependencies). Making the CI jobs reproducible and faster achieves at the expense of periodical updates of these lock files. Maintaining lock files for dependencies is pretty common in other programming languages such as JS and Rust. This update can be done by running cicd/gen_requirements_lock.sh.

    opened by tetsuok 12
  • Refactor/loaders

    Refactor/loaders

    1. Commit 1: refactored Loader.__init__()
    • made data a required argument
    • all loaders now call the __init__ method of the base loader
    1. Commit 2: implemented file-specific loaders to simplify the task-specific loaders
    • implements TSVFileLoader, JSONFileLoader, DatalabFileLoader and CoNLLFileLoader which knows how to load a certain type of file given the fields
    • refactored all the existing loaders to use these file-specific loaders instead
    • QAMultipleChoiceLoader KgLinkTailPredictionLoader still uses custom load() methods because they support user-defined features. The way they load these extra features is different so I decided to leave them for now. It'll be easy to incorporate user-defined features to the file loaders (we just need to update the fields based on self.user_defined_features_configs)
    • hellaswag is removed in https://github.com/neulab/ExplainaBoard/commit/4b93b9542b714754eb91d718cd82b98ab706d11c
    • This refactor makes it easier to do #141 in the future. We just need to have two sets of file loaders for each task-specific loader. One is for the (input, reference_output) file and the other one is for the predictions file.

    Please let me know what you think! Thanks!

    opened by lyuyangh 12
  • Potential issue with spearman R bootstrapping

    Potential issue with spearman R bootstrapping

    We observed the following test failure when integrating another PR:

    ======================================================================
    FAIL: test_sample_level_spearmanr_bootstrap (integration_tests.meta_eval_wmt_da_test.MetaEvalNLGCITest)
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "/home/runner/work/ExplainaBoard/ExplainaBoard/integration_tests/meta_eval_wmt_da_test.py", line 191, in test_sample_level_spearmanr_bootstrap
        self.assertAlmostEqual(ci[0], 0.6488, 2)
    AssertionError: 0.7325904563487001 != 0.6488 within 2 places (0.08379045634870008 difference)
    
    ----------------------------------------------------------------------
    

    We are not sure whether this is an issue with the test or the underlying code, but as a temporary measure we reduced the sensitivity of the test. We should go back and check to make sure whether this is just due to bootstrapping variance or whether it's due to a bug in the test itself.

    opened by neubig 10
  • Implement CalibrationAnalysis

    Implement CalibrationAnalysis

    Calibration is whether a system's confidence is well-correlated with whether the system got the answer right or not. It would be nice if we could do analyses related to calibration, such as calculating expected calibration error: https://arxiv.org/abs/1706.04599

    I think this should probably be implemented as an additional variety of analysis, which would be simple and self-contained: https://github.com/neulab/ExplainaBoard/blob/main/explainaboard/analysis/analyses.py#L45

    good first issue new-analysis 
    opened by neubig 10
  • Correct training set feature field names

    Correct training set feature field names

    Previously, calculation of training set features would fail if the datalab dataset used unconventional column names.

    This does the following things:

    1. Makes an option to use Loader to load only datasets without system outputs if output_data is set to None
    2. Changes _statistics_func to simply take in the samples and system info, and return the statistics (in contrast to previously using the datalab aggregating() functionality.
    3. Loads data used in calculating training features through Loader so that appropriate field mapping will be performed

    Fixes https://github.com/neulab/ExplainaBoard/issues/416

    Notably, @pfliu-nlp, "2." may require some discussion, here are the pros and cons of doing it this new way:

    Pros

    • it makes the statistics code self-contained and not rely on an external library. honestly, even though I'm very familiar with explainaboard, I was always a bit confused about what was actually going on here because the aggregating() decorator was a bit mysterious to me
    • statistics_func can now be called on any set of samples, so it could be called on a non-datalab dataset. this may be useful if we want to, for example, calculate training set features with custom datasets

    Cons

    • the datalab aggregating operator may have implemented parallelism so this aggregation of statistics might be able to be done faster? but I actually am not sure if that's actually the case in practice
    • something else I'm missing?
    opened by neubig 9
  • Unsafe en_core_web_sm downloading in setup.py

    Unsafe en_core_web_sm downloading in setup.py

    Currently setup.py will execute an external command python -m spacy download en_core_web_sm to install a spaCy model during setup. This approach has several issues about system consystency:

    • spaCy models are intendedly not registered to PyPI, and PyPI does not allow libraries depending on external requirements.
    • The command is just a system command which possibly breaks the system, or won't work correctly.

    Since there is no recommended way to add spaCy models to install_requires, we need to take either of follows:

    • Download the model programatically when spacy.load() fails.
    • Bundle the model file into this repository.
    • Ask users to download appropriate models additionally.
    opened by odashi 9
  • How to name metrics when registering them

    How to name metrics when registering them

    There are two ways to name metrics

    (1)

    
    @dataclass
    @metric_config_registry.register("AccuracyConfig")
    class AccuracyConfig(MetricConfig):
        def to_metric(self):
            return Accuracy(self)
    
    

    (2)

    @dataclass
    @metric_config_registry.register("Accuracy")
    class AccuracyConfig(MetricConfig):
        def to_metric(self):
            return Accuracy(self)
    
    

    Currently, we are using (1), which, however, is inconsistent with how the Processor names them. For example:

    https://github.com/neulab/ExplainaBoard/blob/cd54c1b61e490295db8c1cfee8460aff4cce1880/explainaboard/processors/text_classification.py#L132

    Which one do you prefer?

    If we go with (2), this code should be modified to avoid naming bug: https://github.com/neulab/ExplainaBoard/blob/cd54c1b61e490295db8c1cfee8460aff4cce1880/explainaboard/metrics/registry.py#L11

    config_cls = metric_config_registry.get_type(dikt["name"]) # instead of type
    

    I could send a PR of this.

    opened by pfliu-nlp 8
  • add tests for meval to replicate paper results

    add tests for meval to replicate paper results

    Overview

    This PR adds tests to verify whether our implemented meta-evaluation processor is able to replicate reported results from existing published papers.

    Relevant issue: https://github.com/inspired-co/taskboard/issues/180

    Details

    • Collect system outputs from this repo of two metrics (rouge1 and bartscore)
    • Using Explainaboard to process these outputs and compare the results with the ones reported from the above repo.

    References

    • Paper: BARTSCORE: Evaluating Generated Text as Text Generation
    • Code: https://github.com/neulab/BARTScore
    opened by pfliu-nlp 0
  • `TypeError: 'type' object is not subscriptable` when attempt to import or use CLI

    `TypeError: 'type' object is not subscriptable` when attempt to import or use CLI

    How I install ?

    pip install explainaboard
    or
    pip install -U --force-reinstall explainaboard
    

    Both cause same problem

    Version : 0.12.3

    When try to import explainaboard, or run explainaboard from CLI, same error:

    Python 3.8.15 (default, Nov 24 2022, 15:19:38) 
    [GCC 11.2.0] :: Anaconda, Inc. on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import explainaboard
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/home/cpu12595/miniconda3/envs/nlppytorch/lib/python3.8/site-packages/explainaboard/__init__.py", line 6, in <module>
        from explainaboard.loaders import DatalabLoaderOption, get_loader_class
      File "/home/cpu12595/miniconda3/envs/nlppytorch/lib/python3.8/site-packages/explainaboard/loaders/__init__.py", line 5, in <module>
        from explainaboard.loaders import file_loader, loader_factory
      File "/home/cpu12595/miniconda3/envs/nlppytorch/lib/python3.8/site-packages/explainaboard/loaders/file_loader.py", line 18, in <module>
        from explainaboard.analysis.analyses import Analysis
      File "/home/cpu12595/miniconda3/envs/nlppytorch/lib/python3.8/site-packages/explainaboard/analysis/analyses.py", line 14, in <module>
        from explainaboard.analysis.bucketing import get_bucketing_method
      File "/home/cpu12595/miniconda3/envs/nlppytorch/lib/python3.8/site-packages/explainaboard/analysis/bucketing.py", line 13, in <module>
        from explainaboard.serialization.types import SerializableData
      File "/home/cpu12595/miniconda3/envs/nlppytorch/lib/python3.8/site-packages/explainaboard/serialization/__init__.py", line 8, in <module>
        from explainaboard.serialization.types import Serializable
      File "/home/cpu12595/miniconda3/envs/nlppytorch/lib/python3.8/site-packages/explainaboard/serialization/types.py", line 21, in <module>
        list["PrimitiveData"],  # type: ignore
    TypeError: 'type' object is not subscriptable
    
    
    opened by ttpro1995 0
  • Bump mypy version to 0.990

    Bump mypy version to 0.990

    Since mypy 0.990 was released yesterday (blog post), it would be better to bump mypy version to 0.990 to take advantage of the new features and bug fixes. It seems there is some sort of efforts to be made to adopt the version when I run mypy 0.990 in the codebase of explainaboard. Below is the output of pre-commit run mypy --color=never --all-files

    mypy.....................................................................Failed
    - hook id: mypy
    - exit code: 1
    
    explainaboard/utils/spacy_loader.py:5: error: Cannot find implementation or library stub for module named "spacy"  [import]
    explainaboard/utils/spacy_loader.py:6: error: Cannot find implementation or library stub for module named "spacy.language"  [import]
    explainaboard/utils/agreement.py:5: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/analysis/sum_attribute.py:8: error: Cannot find implementation or library stub for module named "nltk"  [import]
    explainaboard/analysis/sum_attribute.py:10: error: Cannot find implementation or library stub for module named "nltk.util"  [import]
    explainaboard/utils/async_eaas.py:10: error: Cannot find implementation or library stub for module named "eaas"  [import]
    explainaboard/third_party/text_to_sql_test_suit_eval/parse.py:7: error: Cannot find implementation or library stub for module named "sqlparse"  [import]
    explainaboard/third_party/text_to_sql_test_suit_eval/parse.py:8: error: Cannot find implementation or library stub for module named "sqlparse.sql"  [import]
    explainaboard/third_party/text_to_sql_test_suit_eval/parse.py:9: error: Cannot find implementation or library stub for module named "sqlparse.tokens"  [import]
    setup.py:3: error: Skipping analyzing "setuptools": module is installed, but missing library stubs or py.typed marker  [import]
    explainaboard/metrics/auxiliary/qa_table_text_hybrid_auxiliary.py:16: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/metrics/auxiliary/qa_table_text_hybrid_auxiliary.py:17: error: Cannot find implementation or library stub for module named "scipy.optimize"  [import]
    explainaboard/utils/logging.py:9: error: Library stubs not installed for "tqdm"  [import]
    explainaboard/utils/logging.py:9: note: Hint: "python3 -m pip install types-tqdm"
    explainaboard/utils/logging.py:9: note: (or run "mypy --install-types" to install all missing stub packages)
    explainaboard/utils/logging.py:16: error: Incompatible default for argument "desc" (default has type "None", argument has type "str")  [assignment]
    explainaboard/utils/logging.py:16: note: PEP 484 prohibits implicit Optional. Accordingly, mypy has changed its default to no_implicit_optional=True
    explainaboard/utils/logging.py:16: note: Use https://github.com/hauntsaninja/no_implicit_optional to automatically upgrade your codebase
    explainaboard/visualizers/bar_chart.py:8: error: Cannot find implementation or library stub for module named "matplotlib"  [import]
    explainaboard/visualizers/bar_chart.py:9: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/analysis/bucketing.py:10: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/analysis/feature.py:239: error: Incompatible types in assignment (expression has type "Dict[str, FeatureType]", target has type "SerializableData")  [assignment]
    explainaboard/utils/agreement_test.py:7: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/utils/typing_utils_test.py:10: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
    explainaboard/serialization/serializers.py:53: error: Incompatible return value type (got "Union[List[Union[None, int, float, str, List[SerializableData], Tuple[SerializableData, ...], Dict[str, SerializableData], Serializable]], Tuple[Union[None, bool, int, float, str, List[SerializableData], Tuple[SerializableData, ...], Dict[str, SerializableData], Serializable], ...]]", expected "Union[None, bool, int, float, str, List[PrimitiveData], Tuple[PrimitiveData, ...], Dict[str, PrimitiveData]]")  [return-value]
    explainaboard/serialization/serializers.py:53: error: Generator has incompatible item type "Union[None, int, float, str, List[PrimitiveData], Tuple[PrimitiveData, ...], Dict[str, PrimitiveData]]"; expected "Union[None, int, float, str, List[SerializableData], Tuple[SerializableData, ...], Dict[str, SerializableData], Serializable]"  [misc]
    explainaboard/serialization/serializers.py:89: error: Incompatible return value type (got "Union[List[Union[None, int, float, str, List[PrimitiveData], Tuple[PrimitiveData, ...], Dict[str, PrimitiveData]]], Tuple[Union[None, bool, int, float, str, List[PrimitiveData], Tuple[PrimitiveData, ...], Dict[str, PrimitiveData]], ...]]", expected "Union[None, bool, int, float, str, List[SerializableData], Tuple[SerializableData, ...], Dict[str, SerializableData], Serializable]")  [return-value]
    explainaboard/serialization/serializers.py:89: error: Generator has incompatible item type "Union[None, int, float, str, List[SerializableData], Tuple[SerializableData, ...], Dict[str, SerializableData], Serializable]"; expected "Union[None, int, float, str, List[PrimitiveData], Tuple[PrimitiveData, ...], Dict[str, PrimitiveData]]"  [misc]
    explainaboard/utils/tensor_analysis.py:12: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/metrics/metric.py:10: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/metrics/metric.py:11: error: Cannot find implementation or library stub for module named "scipy.stats"  [import]
    explainaboard/metrics/metric.py:178: error: Dict entry 0 has incompatible type "str": "Dict[str, MetricValue]"; expected "str": "Union[None, int, float, str, List[SerializableData], Tuple[SerializableData, ...], Dict[str, SerializableData], Serializable]"  [dict-item]
    explainaboard/metrics/metric.py:196: error: Argument 1 to "MetricResult" has incompatible type "Dict[str, Union[None, bool, int, float, str, List[SerializableData], Tuple[SerializableData, ...], Dict[str, SerializableData], Serializable]]"; expected "Dict[str, MetricValue]"  [arg-type]
    explainaboard/third_party/text_to_sql_test_suit_eval/process_sql.py:30: error: Cannot find implementation or library stub for module named "nltk"  [import]
    explainaboard/utils/tokenizer.py:15: error: Cannot find implementation or library stub for module named "sacrebleu.tokenizers"  [import]
    explainaboard/utils/tokenizer.py:16: error: Cannot find implementation or library stub for module named "sacrebleu.tokenizers.tokenizer_intl"  [import]
    explainaboard/utils/tokenizer.py:17: error: Cannot find implementation or library stub for module named "sacrebleu.tokenizers.tokenizer_ja_mecab"  [import]
    explainaboard/utils/tokenizer.py:18: error: Cannot find implementation or library stub for module named "sacrebleu.tokenizers.tokenizer_zh"  [import]
    explainaboard/metrics/continuous.py:8: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/metrics/metric_test.py:10: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/metrics/external_eval.py:8: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/metrics/meta_evaluation.py:8: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/metrics/meta_evaluation.py:9: error: Cannot find implementation or library stub for module named "scipy"  [import]
    explainaboard/analysis/feature_test.py:69: error: Argument 1 to "deserialize" of "PrimitiveSerializer" has incompatible type "Dict[str, object]"; expected "PrimitiveData"  [arg-type]
    explainaboard/analysis/feature_test.py:134: error: Argument 1 to "deserialize" of "PrimitiveSerializer" has incompatible type "Dict[str, object]"; expected "PrimitiveData"  [arg-type]
    explainaboard/analysis/feature_test.py:205: error: Argument 1 to "deserialize" of "PrimitiveSerializer" has incompatible type "Dict[str, object]"; expected "PrimitiveData"  [arg-type]
    explainaboard/serialization/serializers_test.py:230: error: Argument 1 to "deserialize" of "PrimitiveSerializer" has incompatible type "Dict[str, object]"; expected "PrimitiveData"  [arg-type]
    explainaboard/serialization/serializers_test.py:231: error: Argument 1 to "deserialize" of "PrimitiveSerializer" has incompatible type "Dict[str, object]"; expected "PrimitiveData"  [arg-type]
    explainaboard/serialization/serializers_test.py:232: error: Argument 1 to "deserialize" of "PrimitiveSerializer" has incompatible type "Dict[str, object]"; expected "PrimitiveData"  [arg-type]
    explainaboard/serialization/serializers_test.py:233: error: Argument 1 to "deserialize" of "PrimitiveSerializer" has incompatible type "Dict[str, Collection[str]]"; expected "PrimitiveData"  [arg-type]
    explainaboard/serialization/serializers_test.py:234: error: List item 0 has incompatible type "Dict[str, object]"; expected "Union[None, int, float, str, List[PrimitiveData], Tuple[PrimitiveData, ...], Dict[str, PrimitiveData]]"  [list-item]
    explainaboard/serialization/serializers_test.py:234: error: List item 1 has incompatible type "Dict[str, object]"; expected "Union[None, int, float, str, List[PrimitiveData], Tuple[PrimitiveData, ...], Dict[str, PrimitiveData]]"  [list-item]
    explainaboard/serialization/serializers_test.py:234: error: List item 2 has incompatible type "Dict[str, object]"; expected "Union[None, int, float, str, List[PrimitiveData], Tuple[PrimitiveData, ...], Dict[str, PrimitiveData]]"  [list-item]
    explainaboard/serialization/serializers_test.py:235: error: Argument 1 to "deserialize" of "PrimitiveSerializer" has incompatible type "Tuple[Dict[str, object], Dict[str, object], Dict[str, object]]"; expected "PrimitiveData"  [arg-type]
    explainaboard/serialization/serializers_test.py:237: error: Dict entry 0 has incompatible type "str": "Dict[str, object]"; expected "str": "Union[None, int, float, str, List[PrimitiveData], Tuple[PrimitiveData, ...], Dict[str, PrimitiveData]]"  [dict-item]
    explainaboard/serialization/serializers_test.py:237: error: Dict entry 1 has incompatible type "str": "Dict[str, object]"; expected "str": "Union[None, int, float, str, List[PrimitiveData], Tuple[PrimitiveData, ...], Dict[str, PrimitiveData]]"  [dict-item]
    explainaboard/serialization/serializers_test.py:237: error: Dict entry 2 has incompatible type "str": "Dict[str, object]"; expected "str": "Union[None, int, float, str, List[PrimitiveData], Tuple[PrimitiveData, ...], Dict[str, PrimitiveData]]"  [dict-item]
    explainaboard/serialization/serializers_test.py:240: error: Argument 1 to "deserialize" of "PrimitiveSerializer" has incompatible type "Dict[str, object]"; expected "PrimitiveData"  [arg-type]
    explainaboard/serialization/serializers_test.py:241: error: Argument 1 to "deserialize" of "PrimitiveSerializer" has incompatible type "Dict[str, object]"; expected "PrimitiveData"  [arg-type]
    explainaboard/serialization/serializers_test.py:242: error: Argument 1 to "deserialize" of "PrimitiveSerializer" has incompatible type "Dict[str, object]"; expected "PrimitiveData"  [arg-type]
    explainaboard/serialization/serializers_test.py:243: error: Argument 1 to "deserialize" of "PrimitiveSerializer" has incompatible type "Dict[str, Collection[str]]"; expected "PrimitiveData"  [arg-type]
    explainaboard/serialization/serializers_test.py:244: error: List item 0 has incompatible type "Dict[str, object]"; expected "Union[None, int, float, str, List[PrimitiveData], Tuple[PrimitiveData, ...], Dict[str, PrimitiveData]]"  [list-item]
    explainaboard/serialization/serializers_test.py:244: error: List item 1 has incompatible type "Dict[str, object]"; expected "Union[None, int, float, str, List[PrimitiveData], Tuple[PrimitiveData, ...], Dict[str, PrimitiveData]]"  [list-item]
    explainaboard/serialization/serializers_test.py:244: error: List item 2 has incompatible type "Dict[str, object]"; expected "Union[None, int, float, str, List[PrimitiveData], Tuple[PrimitiveData, ...], Dict[str, PrimitiveData]]"  [list-item]
    explainaboard/serialization/serializers_test.py:245: error: Argument 1 to "deserialize" of "PrimitiveSerializer" has incompatible type "Tuple[Dict[str, object], Dict[str, object], Dict[str, object]]"; expected "PrimitiveData"  [arg-type]
    explainaboard/serialization/serializers_test.py:247: error: Dict entry 0 has incompatible type "str": "Dict[str, object]"; expected "str": "Union[None, int, float, str, List[PrimitiveData], Tuple[PrimitiveData, ...], Dict[str, PrimitiveData]]"  [dict-item]
    explainaboard/serialization/serializers_test.py:247: error: Dict entry 1 has incompatible type "str": "Dict[str, object]"; expected "str": "Union[None, int, float, str, List[PrimitiveData], Tuple[PrimitiveData, ...], Dict[str, PrimitiveData]]"  [dict-item]
    explainaboard/serialization/serializers_test.py:247: error: Dict entry 2 has incompatible type "str": "Dict[str, object]"; expected "str": "Union[None, int, float, str, List[PrimitiveData], Tuple[PrimitiveData, ...], Dict[str, PrimitiveData]]"  [dict-item]
    explainaboard/metrics/eaas.py:9: error: Cannot find implementation or library stub for module named "eaas.async_client"  [import]
    explainaboard/metrics/eaas.py:10: error: Cannot find implementation or library stub for module named "eaas.config"  [import]
    explainaboard/metrics/eaas.py:11: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/metrics/eaas.py:12: error: Cannot find implementation or library stub for module named "sacrebleu"  [import]
    explainaboard/metrics/eaas.py:13: error: Cannot find implementation or library stub for module named "sacrebleu.metrics.base"  [import]
    explainaboard/metrics/eaas.py:13: error: Cannot find implementation or library stub for module named "sacrebleu.metrics"  [import]
    explainaboard/metrics/ranking.py:9: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/analysis/performance.py:51: error: Dict entry 1 has incompatible type "str": "List[int]"; expected "str": "Union[None, int, float, str, List[SerializableData], Tuple[SerializableData, ...], Dict[str, SerializableData], Serializable]"  [dict-item]
    explainaboard/analysis/performance.py:52: error: Dict entry 2 has incompatible type "str": "Dict[str, MetricResult]"; expected "str": "Union[None, int, float, str, List[SerializableData], Tuple[SerializableData, ...], Dict[str, SerializableData], Serializable]"  [dict-item]
    explainaboard/analysis/performance.py:72: error: Argument 1 to "float" has incompatible type "Union[str, None, int, float, List[SerializableData], Tuple[SerializableData, ...], Dict[str, SerializableData], Serializable]"; expected "Union[SupportsFloat, SupportsIndex, str, bytes, bytearray, memoryview, array[Any], mmap, _CData, PickleBuffer]"  [arg-type]
    explainaboard/analysis/performance.py:73: error: Argument 1 to "float" has incompatible type "Union[str, None, int, float, List[SerializableData], Tuple[SerializableData, ...], Dict[str, SerializableData], Serializable]"; expected "Union[SupportsFloat, SupportsIndex, str, bytes, bytearray, memoryview, array[Any], mmap, _CData, PickleBuffer]"  [arg-type]
    explainaboard/metrics/log_prob.py:7: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/metrics/accuracy.py:8: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/metrics/external_eval_test.py:7: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/analysis/performance_test.py:219: error: Argument 1 to "deserialize" of "PrimitiveSerializer" has incompatible type "Dict[str, object]"; expected "PrimitiveData"  [arg-type]
    explainaboard/analysis/performance_test.py:241: error: Argument 1 to "deserialize" of "PrimitiveSerializer" has incompatible type "Dict[str, object]"; expected "PrimitiveData"  [arg-type]
    explainaboard/metrics/qa_table_text_hybrid.py:10: error: Cannot find implementation or library stub for module named "numpy"  [import]
    integration_tests/meta_eval_nlg_test.py:5: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/metrics/accuracy_test.py:7: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/analysis/analyses.py:12: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/analysis/analyses.py:245: error: Dict entry 0 has incompatible type "str": "List[BucketPerformance]"; expected "str": "Union[None, int, float, str, List[SerializableData], Tuple[SerializableData, ...], Dict[str, SerializableData], Serializable]"  [dict-item]
    explainaboard/analysis/analyses.py:446: error: Dict entry 0 has incompatible type "str": "List[BucketPerformance]"; expected "str": "Union[None, int, float, str, List[SerializableData], Tuple[SerializableData, ...], Dict[str, SerializableData], Serializable]"  [dict-item]
    explainaboard/analysis/analyses.py:563: error: Argument "bucket_setting" to "__call__" of "BucketingFn" has incompatible type "List[Tuple[float, float]]"; expected "SerializableData"  [arg-type]
    explainaboard/analysis/analyses.py:563: note: "List" is invariant -- see https://mypy.readthedocs.io/en/stable/common_issues.html#variance
    explainaboard/analysis/analyses.py:563: note: Consider using "Sequence" instead, which is covariant
    explainaboard/analysis/analyses.py:658: error: Dict entry 2 has incompatible type "str": "List[int]"; expected "str": "Union[None, int, float, str, List[SerializableData], Tuple[SerializableData, ...], Dict[str, SerializableData], Serializable]"  [dict-item]
    explainaboard/analysis/analyses.py:722: error: Dict entry 1 has incompatible type "str": "List[ComboOccurence]"; expected "str": "Union[None, int, float, str, List[SerializableData], Tuple[SerializableData, ...], Dict[str, SerializableData], Serializable]"  [dict-item]
    explainaboard/analysis/analyses.py:841: error: Dict entry 1 has incompatible type "str": "Dict[str, FeatureType]"; expected "str": "Union[None, int, float, str, List[SerializableData], Tuple[SerializableData, ...], Dict[str, SerializableData], Serializable]"  [dict-item]
    explainaboard/analysis/analyses.py:842: error: Dict entry 2 has incompatible type "str": "Dict[str, MetricConfig]"; expected "str": "Union[None, int, float, str, List[SerializableData], Tuple[SerializableData, ...], Dict[str, SerializableData], Serializable]"  [dict-item]
    explainaboard/metrics/extractive_qa.py:11: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/analysis/analyses_test.py:90: error: Argument 1 to "deserialize" of "PrimitiveSerializer" has incompatible type "Dict[str, Collection[str]]"; expected "PrimitiveData"  [arg-type]
    explainaboard/analysis/analyses_test.py:237: error: Argument 1 to "serialize" of "PrimitiveSerializer" has incompatible type "List[BucketPerformance]"; expected "SerializableData"  [arg-type]
    explainaboard/analysis/analyses_test.py:237: note: "List" is invariant -- see https://mypy.readthedocs.io/en/stable/common_issues.html#variance
    explainaboard/analysis/analyses_test.py:237: note: Consider using "Sequence" instead, which is covariant
    explainaboard/analysis/analyses_test.py:266: error: Argument 1 to "deserialize" of "PrimitiveSerializer" has incompatible type "Dict[str, object]"; expected "PrimitiveData"  [arg-type]
    explainaboard/analysis/analyses_test.py:280: error: Argument 1 to "deserialize" of "PrimitiveSerializer" has incompatible type "Dict[str, object]"; expected "PrimitiveData"  [arg-type]
    explainaboard/analysis/analyses_test.py:321: error: Argument 1 to "serialize" of "PrimitiveSerializer" has incompatible type "List[ComboOccurence]"; expected "SerializableData"  [arg-type]
    explainaboard/analysis/analyses_test.py:321: note: "List" is invariant -- see https://mypy.readthedocs.io/en/stable/common_issues.html#variance
    explainaboard/analysis/analyses_test.py:321: note: Consider using "Sequence" instead, which is covariant
    explainaboard/analysis/analyses_test.py:328: error: Argument 1 to "deserialize" of "PrimitiveSerializer" has incompatible type "Dict[str, Union[Sequence[str], None, int, float, List[PrimitiveData], Tuple[PrimitiveData, ...], Dict[str, PrimitiveData]]]"; expected "PrimitiveData"  [arg-type]
    explainaboard/analysis/analyses_test.py:350: error: Argument 1 to "deserialize" of "PrimitiveSerializer" has incompatible type "Dict[str, object]"; expected "PrimitiveData"  [arg-type]
    explainaboard/analysis/analyses_test.py:477: error: Argument 1 to "serialize" of "PrimitiveSerializer" has incompatible type "List[BucketPerformance]"; expected "SerializableData"  [arg-type]
    explainaboard/analysis/analyses_test.py:477: note: "List" is invariant -- see https://mypy.readthedocs.io/en/stable/common_issues.html#variance
    explainaboard/analysis/analyses_test.py:477: note: Consider using "Sequence" instead, which is covariant
    explainaboard/analysis/analyses_test.py:507: error: Argument 1 to "deserialize" of "PrimitiveSerializer" has incompatible type "Dict[str, object]"; expected "PrimitiveData"  [arg-type]
    explainaboard/analysis/analyses_test.py:518: error: Argument 1 to "serialize" of "PrimitiveSerializer" has incompatible type "Dict[str, FeatureType]"; expected "SerializableData"  [arg-type]
    explainaboard/analysis/analyses_test.py:518: note: "Dict" is invariant -- see https://mypy.readthedocs.io/en/stable/common_issues.html#variance
    explainaboard/analysis/analyses_test.py:518: note: Consider using "Mapping" instead, which is covariant in the value type
    explainaboard/analysis/analyses_test.py:519: error: Argument 1 to "serialize" of "PrimitiveSerializer" has incompatible type "Dict[str, MetricConfig]"; expected "SerializableData"  [arg-type]
    explainaboard/analysis/analyses_test.py:519: note: "Dict" is invariant -- see https://mypy.readthedocs.io/en/stable/common_issues.html#variance
    explainaboard/analysis/analyses_test.py:519: note: Consider using "Mapping" instead, which is covariant in the value type
    explainaboard/analysis/result.py:33: error: Dict entry 0 has incompatible type "str": "Dict[str, Dict[str, MetricResult]]"; expected "str": "Union[None, int, float, str, List[SerializableData], Tuple[SerializableData, ...], Dict[str, SerializableData], Serializable]"  [dict-item]
    explainaboard/analysis/result.py:34: error: Dict entry 1 has incompatible type "str": "List[AnalysisResult]"; expected "str": "Union[None, int, float, str, List[SerializableData], Tuple[SerializableData, ...], Dict[str, SerializableData], Serializable]"  [dict-item]
    explainaboard/loaders/file_loader.py:15: error: Cannot find implementation or library stub for module named "datalabs"  [import]
    explainaboard/loaders/file_loader.py:16: error: Cannot find implementation or library stub for module named "datalabs.features.features"  [import]
    explainaboard/loaders/file_loader.py:212: error: Incompatible default for argument "fields" (default has type "None", argument has type "List[FileLoaderField]")  [assignment]
    explainaboard/loaders/file_loader.py:212: note: PEP 484 prohibits implicit Optional. Accordingly, mypy has changed its default to no_implicit_optional=True
    explainaboard/loaders/file_loader.py:212: note: Use https://github.com/hauntsaninja/no_implicit_optional to automatically upgrade your codebase
    explainaboard/loaders/file_loader.py:475: error: Incompatible default for argument "fields" (default has type "None", argument has type "List[FileLoaderField]")  [assignment]
    explainaboard/loaders/file_loader.py:475: note: PEP 484 prohibits implicit Optional. Accordingly, mypy has changed its default to no_implicit_optional=True
    explainaboard/loaders/file_loader.py:475: note: Use https://github.com/hauntsaninja/no_implicit_optional to automatically upgrade your codebase
    explainaboard/loaders/file_loader.py:522: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
    explainaboard/analysis/result_test.py:35: error: Argument 1 to "serialize" of "PrimitiveSerializer" has incompatible type "Dict[str, Dict[str, MetricResult]]"; expected "SerializableData"  [arg-type]
    explainaboard/analysis/result_test.py:36: error: Argument 1 to "serialize" of "PrimitiveSerializer" has incompatible type "List[AnalysisResult]"; expected "SerializableData"  [arg-type]
    explainaboard/analysis/result_test.py:36: note: "List" is invariant -- see https://mypy.readthedocs.io/en/stable/common_issues.html#variance
    explainaboard/analysis/result_test.py:36: note: Consider using "Sequence" instead, which is covariant
    explainaboard/third_party/text_to_sql_test_suit_eval/exec_eval.py:11: error: Library stubs not installed for "tqdm"  [import]
    explainaboard/info.py:186: error: Dict entry 11 has incompatible type "str": "List[AnalysisLevel]"; expected "str": "Union[None, int, float, str, List[SerializableData], Tuple[SerializableData, ...], Dict[str, SerializableData], Serializable]"  [dict-item]
    explainaboard/info.py:187: error: Dict entry 12 has incompatible type "str": "List[Analysis]"; expected "str": "Union[None, int, float, str, List[SerializableData], Tuple[SerializableData, ...], Dict[str, SerializableData], Serializable]"  [dict-item]
    explainaboard/info.py:260: error: Argument 1 to "deserialize" of "PrimitiveSerializer" has incompatible type "Dict[str, Union[None, int, float, str, List[SerializableData], Tuple[SerializableData, ...], Dict[str, SerializableData], Serializable]]"; expected "PrimitiveData"  [arg-type]
    explainaboard/analysis/feature_funcs.py:8: error: Cannot find implementation or library stub for module named "lexicalrichness"  [import]
    explainaboard/analysis/feature_funcs.py:8: note: See https://mypy.readthedocs.io/en/stable/running_mypy.html#missing-imports
    explainaboard/analysis/feature_funcs.py:9: error: Cannot find implementation or library stub for module named "sacrebleu"  [import]
    explainaboard/meta_analyses/ranking.py:8: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/meta_analyses/ranking.py:9: error: Cannot find implementation or library stub for module named "pandas"  [import]
    explainaboard/metrics/f1_score.py:9: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/processors/processor.py:9: error: Cannot find implementation or library stub for module named "eaas.async_client"  [import]
    explainaboard/processors/processor.py:10: error: Cannot find implementation or library stub for module named "eaas.config"  [import]
    explainaboard/processors/sequence_labeling.py:43: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
    explainaboard/processors/argument_pair_extraction.py:34: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
    explainaboard/processors/qa_tat.py:7: error: Cannot find implementation or library stub for module named "datalabs"  [import]
    explainaboard/processors/language_modeling.py:8: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/processors/conditional_generation.py:9: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/processors/cloze_generative.py:8: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/processors/summarization.py:8: error: Cannot find implementation or library stub for module named "datalabs.operations.featurize.plugins.summarization.sum_attribute"  [import]
    integration_tests/summarization_test.py:7: error: Cannot find implementation or library stub for module named "numpy"  [import]
    integration_tests/meta_eval_wmt_da_test.py:7: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/metrics/text_to_sql.py:11: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/metrics/f1_score_test.py:7: error: Cannot find implementation or library stub for module named "sklearn.metrics"  [import]
    explainaboard/visualizers/draw_charts.py:24: error: Cannot find implementation or library stub for module named "matplotlib"  [import]
    explainaboard/visualizers/draw_charts.py:25: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/info_test.py:116: error: Argument 1 to "serialize" of "PrimitiveSerializer" has incompatible type "List[AnalysisLevel]"; expected "SerializableData"  [arg-type]
    explainaboard/info_test.py:116: note: "List" is invariant -- see https://mypy.readthedocs.io/en/stable/common_issues.html#variance
    explainaboard/info_test.py:116: note: Consider using "Sequence" instead, which is covariant
    explainaboard/info_test.py:117: error: Argument 1 to "serialize" of "PrimitiveSerializer" has incompatible type "List[Analysis]"; expected "SerializableData"  [arg-type]
    explainaboard/info_test.py:117: note: "List" is invariant -- see https://mypy.readthedocs.io/en/stable/common_issues.html#variance
    explainaboard/info_test.py:117: note: Consider using "Sequence" instead, which is covariant
    explainaboard/info_test.py:160: error: Argument 1 to "deserialize" of "PrimitiveSerializer" has incompatible type "Dict[str, Union[Collection[str], None, int, float, List[PrimitiveData], Tuple[PrimitiveData, ...]]]"; expected "PrimitiveData"  [arg-type]
    integration_tests/metric_test.py:6: error: Cannot find implementation or library stub for module named "eaas"  [import]
    integration_tests/metric_test.py:7: error: Cannot find implementation or library stub for module named "eaas.async_client"  [import]
    integration_tests/metric_test.py:9: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/explainaboard_main.py:10: error: Cannot find implementation or library stub for module named "eaas.endpoint"  [import]
    explainaboard/explainaboard_main.py:10: error: Cannot find implementation or library stub for module named "eaas"  [import]
    explainaboard/explainaboard_main.py:89: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
    explainaboard/explainaboard_main.py:90: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
    explainaboard/explainaboard_main.py:91: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
    explainaboard/explainaboard_main.py:92: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
    explainaboard/explainaboard_main.py:93: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
    explainaboard/explainaboard_main.py:94: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
    explainaboard/explainaboard_main.py:364: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
    explainaboard/explainaboard_main.py:365: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
    explainaboard/explainaboard_main.py:367: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
    explainaboard/explainaboard_main.py:368: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
    explainaboard/explainaboard_main.py:369: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
    explainaboard/explainaboard_main.py:370: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
    explainaboard/explainaboard_main.py:371: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
    explainaboard/explainaboard_main.py:390: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
    explainaboard/explainaboard_main.py:401: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
    explainaboard/explainaboard_main.py:402: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
    explainaboard/explainaboard_main.py:403: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
    explainaboard/explainaboard_main.py:404: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
    explainaboard/explainaboard_main.py:405: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
    explainaboard/explainaboard_main.py:406: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
    explainaboard/explainaboard_main.py:407: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
    explainaboard/explainaboard_main.py:408: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
    explainaboard/explainaboard_main.py:499: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
    integration_tests/cli_test.py:10: error: Cannot find implementation or library stub for module named "datalabs"  [import]
    Found 141 errors in 59 files (checked 231 source files)
    
    opened by tetsuok 0
  • add_tasks.md is out of date

    add_tasks.md is out of date

    It seems add_tasks.md is out of date. add_tasks.md mentions tasks.py in three places below:

    • https://github.com/neulab/ExplainaBoard/blame/fcedd5d7aab172b943c6b0025685b09744f149fd/docs/add_new_tasks.md#L6
    • https://github.com/neulab/ExplainaBoard/blame/fcedd5d7aab172b943c6b0025685b09744f149fd/docs/add_new_tasks.md#L12
    • https://github.com/neulab/ExplainaBoard/blame/fcedd5d7aab172b943c6b0025685b09744f149fd/docs/add_new_tasks.md#L133

    but the Python script was removed in #373. add_tasks.md needs to be updated properly.

    opened by tetsuok 0
  • Add system metadata class

    Add system metadata class

    Processor.process() takes metadata, which is used to directly initialize SysOutputInfo. However, these are essentially different data (especially, "metadata" $\subset$ SysOutputInfo, but not $=$) and the current implementation makes some confusion around this:

    The most significant abuse around this behavior is that FileLoaderMetadata is implicitly converted into SysOutputInfo. This shouldn't work unless explicit conversion: https://github.com/neulab/ExplainaBoard/blob/4cec0a01cbe2617e9a67a440be25ee4252f792b2/integration_tests/ner_test.py#L148-L154

    To this end, we need:

    • A struct defining the system metadata.
    • Change the behavior of Processor to take the system metadata, not a dict.
    • Either:
      • A conversion method between system metadata and FileLoaderReturn/SysOutputInfo
      • Include system metadata as a direct member of FileLoaderReturn/SysOutputInfo
    opened by odashi 3
  • Reconsider default number of buckets

    Reconsider default number of buckets

    Currently the default number of buckets is 4: https://github.com/neulab/ExplainaBoard/blob/38db95801cbd15e2e9b2db7b60c40bd7173e1deb/explainaboard/analysis/analyses.py#L117

    But this is probably too few when we're doing discrete bucketing. It'd probably be better to have the default be 4 for continuous and more (maybe 10) for discrete bucketing.

    opened by neubig 0
Releases(v0.8.5)
  • v0.8.5(Apr 2, 2022)

    This release:

    • Refactors the metrics class and the report structure.
    • Adds significance tests to all metrics.
    • Does major code style improvements and adds type checking.
    • Fixes several bugs.
    Source code(tar.gz)
    Source code(zip)
Owner
NeuLab
Graham Neubig's Lab at LTI/CMU
NeuLab
NLP project that works with news (NER, context generation, news trend analytics)

СоАвтор СоАвтор – платформа и открытый набор инструментов для редакций и журналистов-фрилансеров, который призван сделать процесс создания контента ма

38 Jan 04, 2023
A design of MIDI language for music generation task, specifically for Natural Language Processing (NLP) models.

MIDI Language Introduction Reference Paper: Pop Music Transformer: Beat-based Modeling and Generation of Expressive Pop Piano Compositions: code This

Robert Bogan Kang 3 May 25, 2022
NeMo: a toolkit for conversational AI

NVIDIA NeMo Introduction NeMo is a toolkit for creating Conversational AI applications. NeMo product page. Introductory video. The toolkit comes with

NVIDIA Corporation 5.3k Jan 04, 2023
Ελληνικά νέα (Python script) / Greek News Feed (Python script)

Ελληνικά νέα (Python script) / Greek News Feed (Python script) Ελληνικά English Το 2017 είχα υλοποιήσει ένα Python script για να εμφανίζει τα τωρινά ν

Loren Kociko 1 Jun 14, 2022
The first online catalogue for Arabic NLP datasets.

Masader The first online catalogue for Arabic NLP datasets. This catalogue contains 200 datasets with more than 25 metadata annotations for each datas

ARBML 94 Dec 26, 2022
NeuralQA: A Usable Library for Question Answering on Large Datasets with BERT

NeuralQA: A Usable Library for (Extractive) Question Answering on Large Datasets with BERT Still in alpha, lots of changes anticipated. View demo on n

Victor Dibia 220 Dec 11, 2022
Code for ACL 2021 main conference paper "Conversations are not Flat: Modeling the Intrinsic Information Flow between Dialogue Utterances".

Conversations are not Flat: Modeling the Intrinsic Information Flow between Dialogue Utterances This repository contains the code and pre-trained mode

ICTNLP 90 Dec 27, 2022
Code for the Python code smells video on the ArjanCodes channel.

7 Python code smells This repository contains the code for the Python code smells video on the ArjanCodes channel (watch the video here). The example

55 Dec 29, 2022
A toolkit for document-level event extraction, containing some SOTA model implementations

Document-level Event Extraction via Heterogeneous Graph-based Interaction Model with a Tracker Source code for ACL-IJCNLP 2021 Long paper: Document-le

84 Dec 15, 2022
A complete NLP guideline for enthusiasts

NLP-NINJA A complete guide for Natural Language Processing in Python Table of Contents S.No. Topic Level Meaning 1 Tokenization 🤍 Beginner 2 Stemming

MAINAK CHAUDHURI 22 Dec 27, 2022
MEDIALpy: MEDIcal Abbreviations Lookup in Python

A small python package that allows the user to look up common medical abbreviations.

Aberystwyth Systems Biology 7 Nov 09, 2022
PUA Programming Language written in Python.

pua-lang PUA Programming Language written in Python. Installation git clone https://github.com/zhaoyang97/pua-lang.git cd pua-lang pip install . Try

zy 4 Feb 19, 2022
Smart discord chatbot integrated with Dialogflow

academic-NLP-chatbot Smart discord chatbot integrated with Dialogflow to interact with students naturally and manage different classes in a school. De

Tom Huynh 5 Oct 24, 2022
Repo for Enhanced Seq2Seq Autoencoder via Contrastive Learning for Abstractive Text Summarization

ESACL: Enhanced Seq2Seq Autoencoder via Contrastive Learning for AbstractiveText Summarization This repo is for our paper "Enhanced Seq2Seq Autoencode

Rachel Zheng 14 Nov 01, 2022
Natural language computational chemistry command line interface.

nlcc Install pip install nlcc Must have Open-AI Codex key: export OPENAI_API_KEY=your key here then nlcc key bindings ctrl-w copy to clipboard (Note

Andrew White 37 Dec 14, 2022
Use the power of GPT3 to execute any function inside your programs just by giving some doctests

gptrun Don't feel like coding today? Use the power of GPT3 to execute any function inside your programs just by giving some doctests. How is this diff

Roberto Abdelkader Martínez Pérez 11 Nov 11, 2022
A simple visual front end to the Maya UE4 RBF plugin delivered with MetaHumans

poseWrangler Overview PoseWrangler is a simple UI to create and edit pose-driven relationships in Maya using the MayaUE4RBF plugin. This plugin is dis

Christopher Evans 105 Dec 18, 2022
A telegram bot to translate 100+ Languages

🔥 GOOGLE TRANSLATER 🔥 The owner would not be responsible for any kind of bans due to the bot. • ⚡ INSTALLING ⚡ • • 🔰 Deploy To Railway 🔰 • • ✅ OFF

Aɴᴋɪᴛ Kᴜᴍᴀʀ 5 Dec 20, 2021
This repository contains the code for "Generating Datasets with Pretrained Language Models".

Datasets from Instructions (DINO 🦕 ) This repository contains the code for Generating Datasets with Pretrained Language Models. The paper introduces

Timo Schick 154 Jan 01, 2023
Unsupervised Document Expansion for Information Retrieval with Stochastic Text Generation

Unsupervised Document Expansion for Information Retrieval with Stochastic Text Generation Official Code Repository for the paper "Unsupervised Documen

NLP*CL Laboratory 2 Oct 26, 2021