A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

Overview

Website | Documentation | Tutorials | Installation | Release Notes

GitHub license PyPI version Conda Version GitHub issues Telegram

CatBoost is a machine learning method based on gradient boosting over decision trees.

Main advantages of CatBoost:

Get Started and Documentation

All CatBoost documentation is available here.

Install CatBoost by following the guide for the

Next you may want to investigate:

If you cannot open documentation in your browser try adding yastatic.net and yastat.net to the list of allowed domains in your privacy badger.

Catboost models in production

If you want to evaluate Catboost model in your application read model api documentation.

Questions and bug reports

Help to Make CatBoost Better

  • Check out open problems and help wanted issues to see what can be improved, or open an issue if you want something.
  • Add your stories and experience to Awesome CatBoost.
  • To contribute to CatBoost you need to first read CLA text and add to your pull request, that you agree to the terms of the CLA. More information can be found in CONTRIBUTING.md
  • Instructions for contributors can be found here.

News

Latest news are published on twitter.

Reference Paper

Anna Veronika Dorogush, Andrey Gulin, Gleb Gusev, Nikita Kazeev, Liudmila Ostroumova Prokhorenkova, Aleksandr Vorobev "Fighting biases with dynamic boosting". arXiv:1706.09516, 2017.

Anna Veronika Dorogush, Vasily Ershov, Andrey Gulin "CatBoost: gradient boosting with categorical features support". Workshop on ML Systems at NIPS 2017.

License

© YANDEX LLC, 2017-2019. Licensed under the Apache License, Version 2.0. See LICENSE file for more details.

Comments
  • UnicodeDecodeError: 'ascii' codec can't decode byte 0xcd in position 9: ordinal not in range(128)

    UnicodeDecodeError: 'ascii' codec can't decode byte 0xcd in position 9: ordinal not in range(128)

    Problem:UnicodeDecodeError: 'ascii' codec can't decode byte 0xcd in position 9: ordinal not in range(128) catboost version: catboost 0.25 Operating System:win10

    When I use setup.py to install Catboost, this error occurs, and if I look closely it is divided into two parts: 1. Using CUDA to create _catboost.pyd will cause an error like 'UnicodeDecodeError:' ASCII 'codec can't decode byte 0xCD in position 9: Ordinal not in range(128). 2. Do not use the CUDA to create _catboost. pyd, there will be "subprocess. CalledProcessError:Command '['D:\anaconda3\python.exe', 'D:\learn\catboost-master\ya', 'make', 'D:\learn\catboost-master\catboost\python-package\..\..\catboost\python-package\catboost', '--no-src-links', '--output', 'D:\ learn\ catboost-master\catboost\python-package\build\temp.win-amd64-3.8\Release', '-dpython_config =python3-config',' -duse_arcadia_python =no', '-dos_sdk =local', '-r','-DNO_DEBUGINFO', '-DHAVE_CUDA= NO '] returned non-zero exit status 1." I also tried converting _catboost.pyx from GitHub to _catboost.pyd using 'python setup.py build_ext --inplace' directly, but I got the same error as when installing CatBoost.

    C:\Users\王普聪>pip install -e D:\learn\catboost-master\catboost\python-package
    Obtaining file:///D:/learn/catboost-master/catboost/python-package
    Requirement already satisfied: graphviz in d:\anaconda3\lib\site-packages (from catboost==0.24.4) (0.16)
    Requirement already satisfied: plotly in d:\anaconda3\lib\site-packages (from catboost==0.24.4) (4.14.3)
    Requirement already satisfied: six in d:\anaconda3\lib\site-packages (from catboost==0.24.4) (1.15.0)
    Requirement already satisfied: matplotlib in d:\anaconda3\lib\site-packages (from catboost==0.24.4) (3.2.2)
    Requirement already satisfied: numpy>=1.16.0 in d:\anaconda3\lib\site-packages (from catboost==0.24.4) (1.18.5)
    Requirement already satisfied: pandas>=0.24 in d:\anaconda3\lib\site-packages (from catboost==0.24.4) (1.0.5)
    Requirement already satisfied: scipy in d:\anaconda3\lib\site-packages (from catboost==0.24.4) (1.5.0)
    Requirement already satisfied: retrying>=1.3.3 in d:\anaconda3\lib\site-packages (from plotly->catboost==0.24.4) (1.3.3)
    Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in d:\anaconda3\lib\site-packages (from matplotlib->catboost==0.24.4) (2.4.7)
    Requirement already satisfied: cycler>=0.10 in d:\anaconda3\lib\site-packages (from matplotlib->catboost==0.24.4) (0.10.0)
    Requirement already satisfied: kiwisolver>=1.0.1 in d:\anaconda3\lib\site-packages (from matplotlib->catboost==0.24.4) (1.2.0)
    Requirement already satisfied: python-dateutil>=2.1 in d:\anaconda3\lib\site-packages (from matplotlib->catboost==0.24.4) (2.8.1)
    Requirement already satisfied: pytz>=2017.2 in d:\anaconda3\lib\site-packages (from pandas>=0.24->catboost==0.24.4) (2020.1)
    Installing collected packages: catboost
      Running setup.py develop for catboost
        ERROR: Command errored out with exit status 1:
         command: 'D:\anaconda3\python.exe' -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'D:\\learn\\catboost-master\\catboost\\python-package\\setup.py'"'"'; __file__='"'"'D:\\learn\\catboost-master\\catboost\\python-package\\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' develop --no-deps
             cwd: D:\learn\catboost-master\catboost\python-package\
        Complete output (159 lines):
        running develop
        15:30:22 I Targeting for CUDA support with C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1
        running egg_info
        writing catboost.egg-info\PKG-INFO
        writing dependency_links to catboost.egg-info\dependency_links.txt
        writing requirements to catboost.egg-info\requires.txt
        writing top-level names to catboost.egg-info\top_level.txt
        15:30:24 I Targeting for CUDA support with C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1
        reading manifest file 'catboost.egg-info\SOURCES.txt'
        writing manifest file 'catboost.egg-info\SOURCES.txt'
        running build_ext
        15:30:24 I Targeting for CUDA support with C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1
        15:30:24 I Buildling _catboost.pyd with ymake
        15:30:24 I EXECUTE: D:\anaconda3\python.exe D:\learn\catboost-master\ya make D:\learn\catboost-master\catboost\python-package\..\..\catboost\python-package\catboost --no-src-links --output D:\learn\catboost-master\catboost\python-package\build\temp.win-amd64-3.8\Release -DPYTHON_CONFIG=python3-config -DUSE_ARCADIA_PYTHON=no -DOS_SDK=local -r -DNO_DEBUGINFO -DHAVE_CUDA=yes "-DCUDA_ROOT=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1"
        Output root is subdirectory of Arcadia root, this may cause non-idempotent build
        Traceback (most recent call last):
          File "devtools/ya/app.py", line 422, in configure_exit_interceptor
            yield
          File "devtools/ya/app.py", line 65, in helper
            return action(args)
          File "devtools/ya/entry/entry.py", line 55, in do_main
            res = handler.handle(handler, args, prefix=['ya'])
          File "devtools/ya/core/handler.py", line 159, in handle
            return handler.handle(self, args[1:], prefix + [name])
          File "devtools/ya/core/dispatch.py", line 37, in handle
            return self.command().handle(root_handler, args, prefix)
          File "devtools/ya/core/handler.py", line 341, in handle
            return self._action(params)
          File "devtools/ya/app.py", line 92, in helper
            return action(ctx.params)
          File "devtools/ya/build/build_handler.py", line 85, in do_ya_make
            builder = ya_make.YaMake(params, app_ctx)
          File "devtools/ya/build/ya_make.py", line 895, in __init__
            self.ctx = Context(self.opts, app_ctx=app_ctx, graph=graph, tests=tests, stripped_tests=stripped_tests, configure_errors=configure_errors, make_files=make_files, lite_graph=lite_graph)
          File "devtools/ya/build/ya_make.py", line 574, in __init__
            self.graph, self.tests, self.stripped_tests, self.configure_errors, self.make_files = _build_graph_and_tests(self.opts, app_ctx)
          File "devtools/ya/build/ya_make.py", line 258, in _build_graph_and_tests
            graph, tests, stripped_tests, gh, make_files = lg.build_graph_and_tests(opts, check=True, ev_listener=ev_listener, display=display)
          File "devtools/ya/build/graph.py", line 1688, in build_graph_and_tests
            return _build_graph_and_tests(opts, check, ev_listener, exit_stack, display)
          File "devtools/ya/build/graph.py", line 1992, in _build_graph_and_tests
            real_ymake_bin = tools.tool('ymake')
          File "devtools/ya/yalibrary/tools/__init__.py", line 220, in tool
            return toolchain.find(name, with_params, for_platform, cache=cache)
          File "devtools/ya/yalibrary/tools/__init__.py", line 158, in find
            executable = cur_bottle[executable_name]  # if executable_name is None it's Ok
          File "devtools/ya/yalibrary/tools/__init__.py", line 64, in __getitem__
            path = self.resolve()
          File "devtools/ya/yalibrary/tools/__init__.py", line 46, in resolve
            return self.__fetcher.fetch_if_need(self.__formula["match"], tared, binname, cache=cache).where
          File "devtools/ya/yalibrary/fetcher/__init__.py", line 385, in fetch_if_need
            self.__c[key] = self._fetch_if_need(*args, **kwargs)
          File "devtools/ya/yalibrary/fetcher/__init__.py", line 452, in _fetch_if_need
            if self._fetch(name, tared, lambda x: name.lower() in x.lower(), binname):
          File "devtools/ya/yalibrary/fetcher/__init__.py", line 368, in _fetch
            _install(res_path, do_install)
          File "devtools/ya/yalibrary/fetcher/__init__.py", line 104, in _install
            fs_handler(install_guard)
          File "devtools/ya/yalibrary/fetcher/__init__.py", line 95, in fs_handler
            func(install_guard)
          File "devtools/ya/yalibrary/fetcher/__init__.py", line 350, in do_install
            deploy_params=(UNTAR, resource_info if resource_info else {"file_name": "FILE"}, ""))
          File "devtools/ya/yalibrary/fetcher/__init__.py", line 137, in _deploy_tool
            exts.archive.extract_from_tar(archive, extract_to)
          File "devtools/ya/exts/archive.py", line 16, in extract_from_tar
            archive.extract_tar(tar_file_path, output_dir)
          File "library/python/archive/__init__.py", line 62, in extract_tar
            output_dir = encode(output_dir, ENCODING)
          File "library/python/archive/__init__.py", line 58, in encode
            return value.encode(encoding)
        UnicodeDecodeError: 'ascii' codec can't decode byte 0xcd in position 9: ordinal not in range(128)
        15:30:37 E Cannot build _catboost.pyd with CUDA support, will build without CUDA
        15:30:37 I EXECUTE: D:\anaconda3\python.exe D:\learn\catboost-master\ya make D:\learn\catboost-master\catboost\python-package\..\..\catboost\python-package\catboost --no-src-links --output D:\learn\catboost-master\catboost\python-package\build\temp.win-amd64-3.8\Release -DPYTHON_CONFIG=python3-config -DUSE_ARCADIA_PYTHON=no -DOS_SDK=local -r -DNO_DEBUGINFO -DHAVE_CUDA=no
        Output root is subdirectory of Arcadia root, this may cause non-idempotent build
        Traceback (most recent call last):
          File "devtools/ya/app.py", line 422, in configure_exit_interceptor
            yield
          File "devtools/ya/app.py", line 65, in helper
            return action(args)
          File "devtools/ya/entry/entry.py", line 55, in do_main
            res = handler.handle(handler, args, prefix=['ya'])
          File "devtools/ya/core/handler.py", line 159, in handle
            return handler.handle(self, args[1:], prefix + [name])
          File "devtools/ya/core/dispatch.py", line 37, in handle
            return self.command().handle(root_handler, args, prefix)
          File "devtools/ya/core/handler.py", line 341, in handle
            return self._action(params)
          File "devtools/ya/app.py", line 92, in helper
            return action(ctx.params)
          File "devtools/ya/build/build_handler.py", line 85, in do_ya_make
            builder = ya_make.YaMake(params, app_ctx)
          File "devtools/ya/build/ya_make.py", line 895, in __init__
            self.ctx = Context(self.opts, app_ctx=app_ctx, graph=graph, tests=tests, stripped_tests=stripped_tests, configure_errors=configure_errors, make_files=make_files, lite_graph=lite_graph)
          File "devtools/ya/build/ya_make.py", line 574, in __init__
            self.graph, self.tests, self.stripped_tests, self.configure_errors, self.make_files = _build_graph_and_tests(self.opts, app_ctx)
          File "devtools/ya/build/ya_make.py", line 258, in _build_graph_and_tests
            graph, tests, stripped_tests, gh, make_files = lg.build_graph_and_tests(opts, check=True, ev_listener=ev_listener, display=display)
          File "devtools/ya/build/graph.py", line 1688, in build_graph_and_tests
            return _build_graph_and_tests(opts, check, ev_listener, exit_stack, display)
          File "devtools/ya/build/graph.py", line 1992, in _build_graph_and_tests
            real_ymake_bin = tools.tool('ymake')
          File "devtools/ya/yalibrary/tools/__init__.py", line 220, in tool
            return toolchain.find(name, with_params, for_platform, cache=cache)
          File "devtools/ya/yalibrary/tools/__init__.py", line 158, in find
            executable = cur_bottle[executable_name]  # if executable_name is None it's Ok
          File "devtools/ya/yalibrary/tools/__init__.py", line 64, in __getitem__
            path = self.resolve()
          File "devtools/ya/yalibrary/tools/__init__.py", line 46, in resolve
            return self.__fetcher.fetch_if_need(self.__formula["match"], tared, binname, cache=cache).where
          File "devtools/ya/yalibrary/fetcher/__init__.py", line 385, in fetch_if_need
            self.__c[key] = self._fetch_if_need(*args, **kwargs)
          File "devtools/ya/yalibrary/fetcher/__init__.py", line 452, in _fetch_if_need
            if self._fetch(name, tared, lambda x: name.lower() in x.lower(), binname):
          File "devtools/ya/yalibrary/fetcher/__init__.py", line 368, in _fetch
            _install(res_path, do_install)
          File "devtools/ya/yalibrary/fetcher/__init__.py", line 104, in _install
            fs_handler(install_guard)
          File "devtools/ya/yalibrary/fetcher/__init__.py", line 95, in fs_handler
            func(install_guard)
          File "devtools/ya/yalibrary/fetcher/__init__.py", line 350, in do_install
            deploy_params=(UNTAR, resource_info if resource_info else {"file_name": "FILE"}, ""))
          File "devtools/ya/yalibrary/fetcher/__init__.py", line 137, in _deploy_tool
            exts.archive.extract_from_tar(archive, extract_to)
          File "devtools/ya/exts/archive.py", line 16, in extract_from_tar
            archive.extract_tar(tar_file_path, output_dir)
          File "library/python/archive/__init__.py", line 62, in extract_tar
            output_dir = encode(output_dir, ENCODING)
          File "library/python/archive/__init__.py", line 58, in encode
            return value.encode(encoding)
        UnicodeDecodeError: 'ascii' codec can't decode byte 0xcd in position 9: ordinal not in range(128)
        Traceback (most recent call last):
          File "<string>", line 1, in <module>
          File "D:\learn\catboost-master\catboost\python-package\setup.py", line 259, in <module>
            setup(
          File "D:\anaconda3\lib\site-packages\setuptools\__init__.py", line 153, in setup
            return distutils.core.setup(**attrs)
          File "D:\anaconda3\lib\distutils\core.py", line 148, in setup
            dist.run_commands()
          File "D:\anaconda3\lib\distutils\dist.py", line 966, in run_commands
            self.run_command(cmd)
          File "D:\anaconda3\lib\distutils\dist.py", line 985, in run_command
            cmd_obj.run()
          File "D:\anaconda3\lib\site-packages\setuptools\command\develop.py", line 34, in run
            self.install_for_development()
          File "D:\anaconda3\lib\site-packages\setuptools\command\develop.py", line 136, in install_for_development
            self.run_command('build_ext')
          File "D:\anaconda3\lib\distutils\cmd.py", line 313, in run_command
            self.distribution.run_command(command)
          File "D:\anaconda3\lib\distutils\dist.py", line 985, in run_command
            cmd_obj.run()
          File "D:\learn\catboost-master\catboost\python-package\setup.py", line 186, in run
            self.build_with_ymake(topsrc_dir, build_dir, catboost_ext, put_dir, verbose, dry_run)
          File "D:\learn\catboost-master\catboost\python-package\setup.py", line 219, in build_with_ymake
            logging_execute(ymake_cmd + ['-DHAVE_CUDA=no'], verbose, dry_run)
          File "D:\learn\catboost-master\catboost\python-package\setup.py", line 62, in logging_execute
            subprocess.check_call(cmd, universal_newlines=True)
          File "D:\anaconda3\lib\subprocess.py", line 364, in check_call
            raise CalledProcessError(retcode, cmd)
        subprocess.CalledProcessError: Command '['D:\\anaconda3\\python.exe', 'D:\\learn\\catboost-master\\ya', 'make', 'D:\\learn\\catboost-master\\catboost\\python-package\\..\\..\\catboost\\python-package\\catboost', '--no-src-links', '--output', 'D:\\learn\\catboost-master\\catboost\\python-package\\build\\temp.win-amd64-3.8\\Release', '-DPYTHON_CONFIG=python3-config', '-DUSE_ARCADIA_PYTHON=no', '-DOS_SDK=local', '-r', '-DNO_DEBUGINFO', '-DHAVE_CUDA=no']' returned non-zero exit status 1.
        ----------------------------------------
    ERROR: Command errored out with exit status 1: 'D:\anaconda3\python.exe' -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'D:\\learn\\catboost-master\\catboost\\python-package\\setup.py'"'"'; __file__='"'"'D:\\learn\\catboost-master\\catboost\\python-package\\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' develop --no-deps Check the logs for full command output.
    
    opened by Wangpc-972 67
  • User description is used by default. Move metric creation metric to corresponding class factories.

    User description is used by default. Move metric creation metric to corresponding class factories.

    Each metric now uses user-specified parameters in their descriptions by default.

    Design

    TMetric now stores a TMap<TString, TString> of user parameters, which are used to construct a metric description (e.g. MetricName:key1=value1;key2=value2). This implementation is defined in the base class and is now the default behaviour for building metric descriptions.

    Some of specifiv GetDescription method implementations are kept in order to be consistent with the existing behaviour.

    Note

    UserQuerywiseMetric now uses the options in its representation as well.

    opened by ivanychev 38
  • Sum of shap values does not equal to the prediction

    Sum of shap values does not equal to the prediction

    Problem: Sum of shap values does not equal to the prediction catboost version: 0.18.1 Operating System: Ubuntu 19.10 CPU: i7-8565U

    It only happens sometimes but we find that the of shap values does not equal to the prediction. Please let us know how we can provide further information

    in progress bug 
    opened by hopoluicha 27
  • How catboost handle with big data?

    How catboost handle with big data?

    Hi! I try to use catboost in kaggle competition. https://www.kaggle.com/c/talkingdata-adtracking-fraud-detection The size of my train set about 40m rows with 14 features. When i try to train model, kernel always dies without any errors...

    need info 
    opened by Mechanix12 27
  • Unknown class labels

    Unknown class labels

    I'm beginner using boosting models ,I'm trying to implement catboost . My input data has 6 categorical features and 2 numerical feature . My target variable is numerical data. I'm running on GPU . I'm facing the problem below please help me. Cannot chare data due privacy issue.

    Traceback (most recent call last): File "/work/ilt/css8222/cat_boost/cat_boost.py", line 127, in save_snapshot = True File "/fibus/fs2/15/css8222/.local/lib/python3.6/site-packages/catboost/core.py", line 4718, in fit silent, early_stopping_rounds, save_snapshot, snapshot_file, snapshot_interval, init_model, callbacks, log_cout, log_cerr) File "/fibus/fs2/15/css8222/.local/lib/python3.6/site-packages/catboost/core.py", line 2042, in _fit train_params["init_model"] File "/fibus/fs2/15/css8222/.local/lib/python3.6/site-packages/catboost/core.py", line 1464, in _train self._object._train(train_pool, test_pool, params, allow_clear_pool, init_model._object if init_model else None) File "_catboost.pyx", line 4393, in _catboost._CatBoost._train File "_catboost.pyx", line 4442, in _catboost._CatBoost._train _catboost.CatBoostError: catboost/private/libs/target/target_converter.cpp:226: Unknown class label: "14289"

    opened by sujay003 25
  • Faster SHAP values for small batches

    Faster SHAP values for small batches

    For small batches use direct SHAP values calculation. Direct algorithm (without precalculation) is faster when (where DocumentsNumber < MeanLeafCount), because for preprocessing we find SHAP values for MeanLeafCount documents.

    (algorithm from https://arxiv.org/abs/1802.03888)

    With preprocessing final complexity was O(NT(D+F))+O(TL^2 D^2) where N is the number of documents(objects), T - number of trees, D - average tree depth, F - average number of features in tree, L - average number of leaves in tree. But if the batch is small we can use default algorithm with complexity O(NTLD^2), which is better when N < L.

    Example: On dataset gisette (https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary.html) with 100 first features train CatBoostRegressor(iterations=500, depth=6, random_seed=42) and then use get_feature_importance to find SHAP values for the first object in test.

    Old:

    • 0.32 s

    New:

    • shap_mode="Auto" or "NoPreCalc"- 0.015 s
    • shap_mode="UsePreCalc" - 0.32 s (this is like it was before)

    I hereby agree to the terms of the CLA available at: link

    opened by Lokutrus 25
  • Tutorial for ranking modes in CatBoost

    Tutorial for ranking modes in CatBoost

    Hello.

    Looks like the current version of CatBoost supports learning to rank. There are some clues about it in the documentation, but I couldn't find any minimal working examples. I wonder which methods should be considered as a baseline approach and what are the prerequisites?

    Should we use YetiRank as the training metric and just provide a query id as the Pool group_id parameter? What other CatBoost parameters should be taken into account specifically for a ranking problem?

    Thank you!

    planned documentation 
    opened by hanky 24
  • GPU yields worse metric than CPU

    GPU yields worse metric than CPU

    Problem:various measurements become worse when I switch from CPU to GPU catboost version:0.22 Operating System:Linux 4.4.0-1100-aws x86_64 CPU: Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz

    GPU: Tesla M60

    I wanted to reduce the training time and so I specified 'task_type' as 'GPU'. I immediately noticed that its metrics got worse. The only change I made was setting task_type as GPU. The rest are the same.

    The training dataset has 1.2M rows and 218 columns. Among these 218 columns, 42 are categorical features. The rest are floats or integers, no text features. The validation dataset has 120K rows.

    The following are the parameters for the CPU version: {'nan_mode': 'Min', 'eval_metric': 'Logloss', 'combinations_ctr': ['Borders:CtrBorderCount=15:CtrBorderType=Uniform:TargetBorderCount=1:TargetBorderType=MinEntropy:Prior=0/1:Prior=0.5/1:Prior=1/1', 'Counter:CtrBorderCount=15:CtrBorderType=Uniform:Prior=0/1'], 'iterations': 1000, 'sampling_frequency': 'PerTree', 'fold_permutation_block': 0, 'leaf_estimation_method': 'Newton', 'od_pval': 0, 'counter_calc_method': 'SkipTest', 'grow_policy': 'SymmetricTree', 'boosting_type': 'Plain', 'model_shrink_mode': 'Constant', 'feature_border_type': 'GreedyLogSum', 'ctr_leaf_count_limit': 18446744073709551615, 'bayesian_matrix_reg': 0.10000000149011612, 'one_hot_max_size': 2, 'l2_leaf_reg': 3, 'random_strength': 1, 'od_type': 'Iter', 'rsm': 1, 'boost_from_average': False, 'max_ctr_complexity': 4, 'model_size_reg': 0.5, 'simple_ctr': ['Borders:CtrBorderCount=15:CtrBorderType=Uniform:TargetBorderCount=1:TargetBorderType=MinEntropy:Prior=0/1:Prior=0.5/1:Prior=1/1', 'Counter:CtrBorderCount=15:CtrBorderType=Uniform:Prior=0/1'], 'subsample': 0.800000011920929, 'use_best_model': True, 'od_wait': 35, 'class_names': [0, 1], 'random_seed': 42, 'depth': 6, 'ctr_target_border_count': 1, 'has_time': False, 'store_all_simple_ctr': False, 'border_count': 254, 'classes_count': 0, 'sparse_features_conflict_fraction': 0, 'leaf_estimation_backtracking': 'AnyImprovement', 'best_model_min_trees': 1, 'model_shrink_rate': 0, 'min_data_in_leaf': 1, 'loss_function': 'Logloss', 'learning_rate': 0.30000001192092896, 'score_function': 'Cosine', 'task_type': 'CPU', 'leaf_estimation_iterations': 10, 'bootstrap_type': 'MVS', 'max_leaves': 64, 'permutation_count': 4}

    The following are the parameters for the GPU version: {'nan_mode': 'Min', 'gpu_ram_part': 0.95, 'eval_metric': 'Logloss', 'combinations_ctr': ['Borders:CtrBorderCount=15:CtrBorderType=Uniform:TargetBorderCount=1:TargetBorderType=MinEntropy:Prior=0/1:Prior=0.5/1:Prior=1/1', 'FeatureFreq:CtrBorderCount=15:CtrBorderType=Median:Prior=0/1'], 'iterations': 1000, 'fold_permutation_block': 64, 'leaf_estimation_method': 'Newton', 'observations_to_bootstrap': 'TestOnly', 'od_pval': 0, 'counter_calc_method': 'SkipTest', 'grow_policy': 'SymmetricTree', 'boosting_type': 'Plain', 'ctr_history_unit': 'Sample', 'feature_border_type': 'GreedyLogSum', 'bayesian_matrix_reg': 0.10000000149011612, 'one_hot_max_size': 2, 'devices': '-1', 'pinned_memory_bytes': '104857600', 'l2_leaf_reg': 3, 'random_strength': 1, 'od_type': 'Iter', 'rsm': 1, 'boost_from_average': False, 'fold_size_loss_normalization': False, 'max_ctr_complexity': 4, 'gpu_cat_features_storage': 'GpuRam', 'simple_ctr': ['Borders:CtrBorderCount=15:CtrBorderType=Uniform:TargetBorderCount=1:TargetBorderType=MinEntropy:Prior=0/1:Prior=0.5/1:Prior=1/1', 'FeatureFreq:CtrBorderCount=15:CtrBorderType=MinEntropy:Prior=0/1'], 'use_best_model': True, 'od_wait': 35, 'class_names': [0, 1], 'random_seed': 42, 'depth': 6, 'ctr_target_border_count': 1, 'has_time': False, 'border_count': 128, 'min_fold_size': 100, 'data_partition': 'FeatureParallel', 'bagging_temperature': 1, 'classes_count': 0, 'leaf_estimation_backtracking': 'AnyImprovement', 'best_model_min_trees': 1, 'min_data_in_leaf': 1, 'add_ridge_penalty_to_loss_function': False, 'loss_function': 'Logloss', 'learning_rate': 0.30000001192092896, 'score_function': 'Cosine', 'task_type': 'GPU', 'leaf_estimation_iterations': 10, 'bootstrap_type': 'Bayesian', 'max_leaves': 64, 'permutation_count': 4}

    opened by kdlin 23
  • Using parameters from saved model for cross-validation leads to 'exclusive parameters' error.

    Using parameters from saved model for cross-validation leads to 'exclusive parameters' error.

    Problem: "Only one of parameters ['verbose', 'logging_level', 'verbose_eval', 'silent'] should be set" printed by cv function after loading from file previously saved model. catboost version: 0.12.2 Operating System: CentOS Linux release 7.4.1708 CPU: Intel(R) Xeon(R) CPU E5-2450 v2 @ 2.50GHz

    model = CatBoostClassifier(loss_function='MultiClass')
    model.fit(train_pool, 
      verbose=False, 
      plot=True,
      eval_set=validation_pool)
    model.save_model(str(model_path.absolute()))
    model = CatBoostClassifier()
    model.load_model(str(model_path.absolute()))
    cv_data = cv(
        whole_pool,
        params=model.get_params()
    )
    
    ---------------------------------------------------------------------------
    CatboostError                             Traceback (most recent call last)
    <ipython-input-40-f150897615b8> in <module>
          1 cv_data = cv(
          2     whole_pool,
    ----> 3     params=model.get_params()
          4 )
    
    ~/.conda/envs/catboost/lib/python3.6/site-packages/catboost/core.py in cv(pool, params, dtrain, iterations, num_boost_round, fold_count, nfold, inverted, partition_random_seed, seed, shuffle, logging_level, stratified, as_pandas, metric_period, verbose, verbose_eval, plot, early_stopping_rounds, save_snapshot, snapshot_file, snapshot_interval, max_time_spent_on_fixed_cost_ratio, dev_max_iterations_batch_size)
       2876 
       2877     params = deepcopy(params)
    -> 2878     _process_synonyms(params)
       2879 
       2880     metric_period, verbose, logging_level = _process_verbose(metric_period, verbose, logging_level, verbose_eval)
    
    ~/.conda/envs/catboost/lib/python3.6/site-packages/catboost/core.py in _process_synonyms(params)
        754         del params['silent']
        755 
    --> 756     metric_period, verbose, logging_level = _process_verbose(metric_period, verbose, logging_level, verbose_eval, silent)
        757 
        758     if metric_period is not None:
    
    ~/.conda/envs/catboost/lib/python3.6/site-packages/catboost/core.py in _process_verbose(metric_period, verbose, logging_level, verbose_eval, silent)
        133     at_most_one = sum(params.get(exclusive) is not None for exclusive in exclusive_params)
        134     if at_most_one > 1:
    --> 135         raise CatboostError('Only one of parameters {} should be set'.format(exclusive_params))
        136 
        137     if verbose is None:
    
    CatboostError: Only one of parameters ['verbose', 'logging_level', 'verbose_eval', 'silent'] should be set
    
    bug 
    opened by protsenkovi 23
  • Flag not copied unnecessarily with blank and whitespace

    Flag not copied unnecessarily with blank and whitespace

    Before submitting a pull request, please do the following steps:

    1. Read instructions for contributors here.
    2. Run ya make in catboost folder to make sure the code builds.
    3. Add tests that test your change.
    4. Run tests using ya make -t -A command.
    5. If you haven't already, complete the CLA. I hereby agree to the terms of the CLA available at https://yandex.ru/legal/cla/?lang=en.
    opened by sharaalfa 23
  • Issue trying to compile with specified gcc version

    Issue trying to compile with specified gcc version

    I'm trying to compile the catboost python wheel on my system. The default gcc version I have is 8, but I also have 7 installed so I'm trying to use that by setting the CC and CXX environment variables. However, when running:

    python mk_wheel.py -DCUDA_ROOT="/opt/cuda"
    

    I get the message:

    Info: Attention! Using system user-defined compiler: g++-7 (check CC and CXX env vars).
    Cross compilation with system CXX is not supported
    

    catboost version: git master Operating System: Linux CPU: i7 GPU: GTX 1080

    Thanks!

    build issues 
    opened by ctlaltdefeat 23
  • Prediction probability result mismatch - C API and Python

    Prediction probability result mismatch - C API and Python

    Problem: We used the Python API of catboost to train our multiclass classification model and the resultant .cbm model was used in python / C to do the prediction.

    I noticed that when making inferences using the same model and the same input data (the model expects 3 float features and 4 categorical features.), the prediction probability in Python is slightly different compared to the prediction probability using the C API.

    We use CatboostClassifier.predict_proba in Python with all default parameters, and we set SetPredictionType(modelHandle, APT_PROBABILITY); in C API.

    We found that the sum total of the probabilities returned in Python are always different from 1 (sometimes it is greater or less than 1), and in the case of the probabilities returned in C the sum of them is always equal to 1.

    We do not know if both ways to get the probability are the same (CatboostClassifier.predict_proba and SetPredictionType(modelHandle, APT_PROBABILITY);), but if they are the same, why is the result different?

    catboost version: 1.0.3

    Operating System: MacOS Ventura 13.1

    CPU: Apple M1

    opened by eli3xm 0
  • Spark Feature Importance issue

    Spark Feature Importance issue

    Problem: ai.catboost.CatBoostError: Unsupported data type for Label at ai.catboost.spark.DatasetLoadingContext$.getLabelCallback(DataHelpers.scala:465) catboost version: 1.1.1 Operating System: Linux, Spark 3.3.1

    The following method call fails with the error described above:

    ((CatBoostClassificationModel) model).getFeatureImportance(EFstrType.LossFunctionChange, evalPool, ECalcTypeShapValues.Regular)
    
    opened by eugene-kamenev 0
  •  Saved model's params are different from current model's params

    Saved model's params are different from current model's params

    Problem: Can't fit models on GPU, Saved model's params are different from current model's params catboost version: '1.1.1' Operating System: Windows 10 CPU: 0 GPU: 1

    model_cat_tm_1 = CatBoostClassifier( iterations=5000, loss_function ='Logloss', #eval_metric = 'AUC', learning_rate = 0.05, random_seed = 1, od_type = "Iter", od_wait = 200, depth = 5, task_type = "GPU", devices = '0:1', save_snapshot= False, )

    cv_params_tm_1 = model_cat_tm_1.get_params() cv_data_tm_1 = cv( Pool(train_tm_treatment_one_features, train_tm_treatment_one_target), cv_params_tm_1, plot=True, verbose=100, )

    Gettting this error (tried, rebooting the system, open another script - doesn't help)

    Training on fold [0/3]

    CatBoostError Traceback (most recent call last) ~\AppData\Local\Temp\ipykernel_3516\715857703.py in 1 cv_params_tm_1 = model_cat_tm_1.get_params() ----> 2 cv_data_tm_1 = cv( 3 Pool(train_tm_treatment_one_features, train_tm_treatment_one_target), 4 cv_params_tm_1, 5 plot=True,

    ~\AppData\Roaming\Python\Python39\site-packages\catboost\core.py in cv(pool, params, dtrain, iterations, num_boost_round, fold_count, nfold, inverted, partition_random_seed, seed, shuffle, logging_level, stratified, as_pandas, metric_period, verbose, verbose_eval, plot, plot_file, early_stopping_rounds, save_snapshot, snapshot_file, snapshot_interval, metric_update_interval, folds, type, return_models, log_cout, log_cerr) 6648 with log_fixup(log_cout, log_cerr), plot_wrapper(plot, plot_file=plot_file, plot_title='Cross-validation plot', train_dirs=plot_dirs): 6649 if not return_models: -> 6650 return _cv(params, pool, fold_count, inverted, partition_random_seed, shuffle, stratified, 6651 metric_update_interval, as_pandas, folds, type, return_models) 6652 else:

    _catboost.pyx in _catboost._cv()

    _catboost.pyx in _catboost._cv()

    CatBoostError: C:/Program Files (x86)/Go Agent/pipelines/BuildMaster/catboost.git/catboost/cuda/methods/boosting_progress_tracker.cpp:171: Saved model's params are different from current model's params

    opened by MiMakh 0
  • Catboost spark fit error java.lang.ClassCastException

    Catboost spark fit error java.lang.ClassCastException

    Problem: net.razorvine.pickle.objects.TimeDelta cannot be cast to java.time.Duration catboost version: 1.0.6 Operating System: 10.4 LTS ML (includes Apache Spark 3.2.1, Scala 2.12)

    Hi, I'm trying to test catboost_spark in a Databricks notebook using the example from the official documentation: https://catboost.ai/en/docs/concepts/spark-quickstart-python#binary-classification

    When I run this command:

    classifier.fit(dataset=trainPool, evalDatasets=[evalPool])
    

    The following error is raised:

    java.lang.ClassCastException: net.razorvine.pickle.objects.TimeDelta cannot be cast to java.time.Duration
    
    ...
    
    Py4JJavaError: An error occurred while calling o18779.w.
    : java.lang.ClassCastException: net.razorvine.pickle.objects.TimeDelta cannot be cast to java.time.Duration
    	at ai.catboost.spark.params.DurationParam.w(Helpers.scala:61)
    	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    	at java.lang.reflect.Method.invoke(Method.java:498)
    	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380)
    	at py4j.Gateway.invoke(Gateway.java:295)
    	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    	at py4j.commands.CallCommand.execute(CallCommand.java:79)
    	at py4j.GatewayConnection.run(GatewayConnection.java:251)
    	at java.lang.Thread.run(Thread.java:748)
    

    I believe there is a similar issue to this but it is now closed. Thank you in advance for the help.

    opened by vitormanita 0
  • parameter missing for non_linear regression

    parameter missing for non_linear regression

    Problem: Non Linear Regression "Poly" Kernal parameter missing catboost version: 0.26.1 Operating System: Linux CPU:True GPU:False

    Hi there, I am training a model for linear regression problem but my data has non-linear in nature. So I have decided to change kernel like Poly or something for non_linear that we have Support Vector Regressor. I have tried searching for same in Catboost parameters but i couldn't get. Do you have plans for adding it? Thanks

    opened by hamza1424 0
Releases(v1.1.1)
Owner
CatBoost
CatBoost is a fast, scalable, high performance gradient boosting on decision trees library. Used for ranking, classification, regression and other ML tasks.
CatBoost
🌊 River is a Python library for online machine learning.

River is a Python library for online machine learning. It is the result of a merger between creme and scikit-multiflow. River's ambition is to be the go-to library for doing machine learning on strea

OnlineML 4k Jan 03, 2023
A simple python program that draws a tree for incrementing values using the Collatz Conjecture.

Collatz Conjecture A simple python program that draws a tree for incrementing values using the Collatz Conjecture. Values which can be edited: Length

davidgasinski 1 Oct 28, 2021
Scikit-Garden or skgarden is a garden for Scikit-Learn compatible decision trees and forests.

Scikit-Garden or skgarden (pronounced as skarden) is a garden for Scikit-Learn compatible decision trees and forests.

260 Dec 21, 2022
CinnaMon is a Python library which offers a number of tools to detect, explain, and correct data drift in a machine learning system

CinnaMon is a Python library which offers a number of tools to detect, explain, and correct data drift in a machine learning system

Zelros 67 Dec 28, 2022
Bayesian optimization based on Gaussian processes (BO-GP) for CFD simulations.

BO-GP Bayesian optimization based on Gaussian processes (BO-GP) for CFD simulations. The BO-GP codes are developed using GPy and GPyOpt. The optimizer

KTH Mechanics 8 Mar 31, 2022
Python library which makes it possible to dynamically mask/anonymize data using JSON string or python dict rules in a PySpark environment.

pyspark-anonymizer Python library which makes it possible to dynamically mask/anonymize data using JSON string or python dict rules in a PySpark envir

6 Jun 30, 2022
Mortality risk prediction for COVID-19 patients using XGBoost models

Mortality risk prediction for COVID-19 patients using XGBoost models Using demographic and lab test data received from the HM Hospitales in Spain, I b

1 Jan 19, 2022
Module is created to build a spam filter using Python and the multinomial Naive Bayes algorithm.

Naive-Bayes Spam Classificator Module is created to build a spam filter using Python and the multinomial Naive Bayes algorithm. Main goal is to code a

Viktoria Maksymiuk 1 Jun 27, 2022
Simulate & classify transient absorption spectroscopy (TAS) spectral features for bulk semiconducting materials (Post-DFT)

PyTASER PyTASER is a Python (3.9+) library and set of command-line tools for classifying spectral features in bulk materials, post-DFT. The goal of th

Materials Design Group 4 Dec 27, 2022
A series of Jupyter notebooks that walk you through the fundamentals of Machine Learning and Deep Learning in Python using Scikit-Learn, Keras and TensorFlow 2.

Machine Learning Notebooks, 3rd edition This project aims at teaching you the fundamentals of Machine Learning in python. It contains the example code

Aurélien Geron 1.6k Jan 05, 2023
Short PhD seminar on Machine Learning Security (Adversarial Machine Learning)

Short PhD seminar on Machine Learning Security (Adversarial Machine Learning)

141 Dec 27, 2022
Real-time domain adaptation for semantic segmentation

Advanced-Machine-Learning This repository contains the code for the project Real

Andrea Cavallo 1 Jan 30, 2022
Kaggle Tweet Sentiment Extraction Competition: 1st place solution (Dark of the Moon team)

Kaggle Tweet Sentiment Extraction Competition: 1st place solution (Dark of the Moon team)

Artsem Zhyvalkouski 64 Nov 30, 2022
Factorization machines in python

Factorization Machines in Python This is a python implementation of Factorization Machines [1]. This uses stochastic gradient descent with adaptive re

Corey Lynch 892 Jan 03, 2023
Python package for machine learning for healthcare using a OMOP common data model

This library was developed in order to facilitate rapid prototyping in Python of predictive machine-learning models using longitudinal medical data from an OMOP CDM-standard database.

Sontag Lab 75 Jan 03, 2023
OptaPy is an AI constraint solver for Python to optimize planning and scheduling problems.

OptaPy is an AI constraint solver for Python to optimize the Vehicle Routing Problem, Employee Rostering, Maintenance Scheduling, Task Assignment, School Timetabling, Cloud Optimization, Conference S

OptaPy 208 Dec 27, 2022
Learn Machine Learning Algorithms by doing projects in Python and R Programming Language

Learn Machine Learning Algorithms by doing projects in Python and R Programming Language. This repo covers all aspect of Machine Learning Algorithms.

Ravi Chaubey 6 Oct 20, 2022
My capstone project for Udacity's Machine Learning Nanodegree

MLND-Capstone My capstone project for Udacity's Machine Learning Nanodegree Lane Detection with Deep Learning In this project, I use a deep learning-b

Michael Virgo 407 Dec 12, 2022
Pandas Machine Learning and Quant Finance Library Collection

Pandas Machine Learning and Quant Finance Library Collection

148 Dec 07, 2022
Pragmatic AI Labs 421 Dec 31, 2022