Lale is a Python library for semi-automated data science.

Overview

Lale

Tests Documentation Status PyPI version shields.io Imports: isort Code style: black License
logo

README in other languages: 中文, deutsch, français, or contribute your own.

Lale is a Python library for semi-automated data science. Lale makes it easy to automatically select algorithms and tune hyperparameters of pipelines that are compatible with scikit-learn, in a type-safe fashion. If you are a data scientist who wants to experiment with automated machine learning, this library is for you! Lale adds value beyond scikit-learn along three dimensions: automation, correctness checks, and interoperability. For automation, Lale provides a consistent high-level interface to existing pipeline search tools including Hyperopt, GridSearchCV, and SMAC. For correctness checks, Lale uses JSON Schema to catch mistakes when there is a mismatch between hyperparameters and their type, or between data and operators. And for interoperability, Lale has a growing library of transformers and estimators from popular libraries such as scikit-learn, XGBoost, PyTorch etc. Lale can be installed just like any other Python package and can be edited with off-the-shelf Python tools such as Jupyter notebooks.

The name Lale, pronounced laleh, comes from the Persian word for tulip. Similarly to popular machine-learning libraries such as scikit-learn, Lale is also just a Python library, not a new stand-alone programming language. It does not require users to install new tools nor learn new syntax.

Lale is distributed under the terms of the Apache 2.0 License, see LICENSE.txt. It is currently in an Alpha release, without warranties of any kind.

Comments
  • replace BaselineClassifier/Regressor by DummyClassifier/Regressor. Issue #618

    replace BaselineClassifier/Regressor by DummyClassifier/Regressor. Issue #618

    Replacing BaselineClassifier with DummyClassifier from Sklearn. Relacing BaselineRegressor with DummyRegressor from Sklearn. Created new operator with make_operator with schemas defined in BaselineClassifier and BaselineRegressor

    Contributed by students of SRH University, Heidelberg. https://github.com/tauseefhashmi https://github.com/frankcode101 https://github.com/tanmaygaikwad https://github.com/RajathReddy9 https://github.com/vickyvishal/

    opened by vickyvishal 11
  • Improve Lale sklearn schema

    Improve Lale sklearn schema

    • BaggingClassifier - Add 2 constraints.
    • BaggingRegressor - Add 2 constraints.
    • ColumnTransformer - Add a sparse constraint.
    • ExtraTreesClassifier - Add a constraint.
    • ExtraTreesRegressor - Add 2 constraints.
    • FeatureAgglomeration - Add a sparse constraint.
    • FunctionTransformer - Fix a constraint.
    • GaussianNB - Add a sparse constraint.
    • KNeighborsClassifier - Remove 2 constraints (false negatives).
    • KNeighborsRegressor - Remove 2 constraints (false negatives).
      • For KNN-C and KNN-R, this constraint might be useful but the schema is long and contains many TODOs.

    Metric 'minkowski' not valid for sparse input. Use sorted(sklearn.neighbors.VALID_METRICS_SPARSE['brute']) to get valid options. Metric can also be a callable function.

    • LinearSVC - Improve constaints.
    • LinearSVR - Modify "loss" schema. Add a constraint.
    • LogisticRegression - Add 4 constaints.
    • MinMaxScaler - Add a sparse constraint.
    • MissingIndicator - Add a constraint.
    • OneHotEncoder - Add "drop" schema, new in 0.21. Add a constraint.
    • OrdinalEncoder - Remove "ignore" from "handle_unknown" schema. Add 2 constraints.
    • RandomForestClassifier - Add a constraint.
    • RandomForestRegressor - Add 2 constraints.
    • RidgeClassifier - Add 2 constraints.
    • Ridge - Add 2 constraints.
    • RobustScaler - Add a constraint.
    • SimpleImputer - Add a constraint.
    • SVC - Remove a constraint (false negative). Add a constraint.
    • SVR - Remove a constraint (false negative). Add a constraint.
    opened by Ingkarat 8
  • Out Scripts are Optimized in the Importing Task

    Out Scripts are Optimized in the Importing Task

    isort module observed on the outer scripts. So that neither search/*.py, lib/*.py, datasets/*.py, and utils/*.py have been changed which means that the schema2search_space.py file is still like the usual.

    hacktoberfest 
    opened by lnxpy 8
  • Added additional logisticaix360 wrappers in the existing lale codebase

    Added additional logisticaix360 wrappers in the existing lale codebase

    Created additional Logisticaix360 , added notebook comparing the Prejudice remover|Logistic regression| Logisticaix360 notebook and added testcases .

    opened by priyankabanda2202 6
  • ImportError: cannot import name '_UnstableArchMixin'

    ImportError: cannot import name '_UnstableArchMixin'

    IBM Watson Studio:Version 1.1.0-151 (1.1.0-151) on macOS Catalina 10.15.4

    from sklearn.preprocessing import Normalizer
    from sklearn.tree import DecitionTreeRegressor as Tree
    from lale.lib.lale import Hyperopt
    
    
    ---------------------------------------------------------------------------
    ImportError                               Traceback (most recent call last)
    <ipython-input-28-2eee442a0b4d> in <module>
    ----> 1 from lale.lib.lale import Hyperopt
    
    ~/WatsonStudioDesktop/miniconda3/envs/desktop/lib/python3.6/site-packages/lale/lib/lale/__init__.py in <module>
         61 from .baseline_classifier import BaselineClassifier
         62 from .baseline_regressor import BaselineRegressor
    ---> 63 from .grid_search_cv import GridSearchCV
         64 from .hyperopt import Hyperopt
         65 from .topk_voting_classifier import TopKVotingClassifier
    
    ~/WatsonStudioDesktop/miniconda3/envs/desktop/lib/python3.6/site-packages/lale/lib/lale/grid_search_cv.py in <module>
         15 from typing import Any, Dict
         16 
    ---> 17 import lale.lib.sklearn
         18 import lale.search.lale_grid_search_cv
         19 import lale.operators
    
    ~/WatsonStudioDesktop/miniconda3/envs/desktop/lib/python3.6/site-packages/lale/lib/sklearn/__init__.py in <module>
        130 from .extra_trees_classifier import ExtraTreesClassifier
        131 from .extra_trees_regressor import ExtraTreesRegressor
    --> 132 from .feature_agglomeration import FeatureAgglomeration
        133 from .function_transformer import FunctionTransformer
        134 from .gaussian_nb import GaussianNB
    
    ~/WatsonStudioDesktop/miniconda3/envs/desktop/lib/python3.6/site-packages/lale/lib/sklearn/feature_agglomeration.py in <module>
         13 # limitations under the License.
         14 
    ---> 15 import sklearn.cluster.hierarchical
         16 import lale.docstrings
         17 import lale.operators
    
    ~/WatsonStudioDesktop/miniconda3/envs/desktop/lib/python3.6/site-packages/sklearn/cluster/__init__.py in <module>
          4 """
          5 
    ----> 6 from .spectral import spectral_clustering, SpectralClustering
          7 from .mean_shift_ import (mean_shift, MeanShift,
          8                           estimate_bandwidth, get_bin_seeds)
    
    ~/WatsonStudioDesktop/miniconda3/envs/desktop/lib/python3.6/site-packages/sklearn/cluster/spectral.py in <module>
         15 from ..metrics.pairwise import pairwise_kernels
         16 from ..neighbors import kneighbors_graph
    ---> 17 from ..manifold import spectral_embedding
         18 from .k_means_ import k_means
         19 
    
    ~/WatsonStudioDesktop/miniconda3/envs/desktop/lib/python3.6/site-packages/sklearn/manifold/__init__.py in <module>
          3 """
          4 
    ----> 5 from .locally_linear import locally_linear_embedding, LocallyLinearEmbedding
          6 from .isomap import Isomap
          7 from .mds import MDS, smacof
    
    ~/WatsonStudioDesktop/miniconda3/envs/desktop/lib/python3.6/site-packages/sklearn/manifold/locally_linear.py in <module>
         10 from scipy.sparse.linalg import eigsh
         11 
    ---> 12 from ..base import BaseEstimator, TransformerMixin, _UnstableArchMixin
         13 from ..utils import check_random_state, check_array
         14 from ..utils.extmath import stable_cumsum
    
    ImportError: cannot import name '_UnstableArchMixin'
    
    opened by sreev 6
  • Srh group8

    Srh group8

    Replacing BaselineClassifier with DummyClassifier from sklearn. Relacing BaselineRegressor with DummyRegressor from sklearn.

    Contributed by students of SRH University, Heidelberg. https://github.com/tauseefhashmi https://github.com/frankcode101 https://github.com/tanmaygaikwad https://github.com/RajathReddy9 https://github.com/vickyvishal/

    opened by vickyvishal 5
  • replace BaselineClassifier/Regressor by DummyClassifier/Regressor

    replace BaselineClassifier/Regressor by DummyClassifier/Regressor

    Our lale.lib.lale package has a BaselineClassifier that simply predicts the majority class. Scikit-learn has a DummyClassifier that does the same, with a few additional useful configuration options. We should add the DummyClassifier to our lale.lib.sklearn package, and eliminate the BaselineClassifier, since it is redundant. Similarly, we should also replace lale.lib.lale.BaselineRegressor by scikit-learn's DummyRegressor.

    good first issue 
    opened by hirzel 5
  • module resolution issue in pretty_print()

    module resolution issue in pretty_print()

    Hi All,

    I have implemented a custom imputer based on Scikit-learn SimpleImputer as an example. My code lives in albert_imputer.py. Everything in fine until the final result is being printed. This is what I see in debugger:

    > /Users/albert/miniconda3/envs/lale/lib/python3.7/site-packages/lale/pretty_print.py(160)_get_module_name()
    -> op = find_op(mod_name_short, op_name)
    (Pdb) l
    155  	    mod_name_long = class_name[: class_name.rfind(".")]
    156  	    mod_name_short = mod_name_long[: mod_name_long.rfind(".")]
    157  	    unqualified = class_name[class_name.rfind(".") + 1 :]
    158  	    if class_name.startswith("lale.") and unqualified.endswith("Impl"):
    159  	        unqualified = unqualified[: -len("Impl")]
    160  ->	    op = find_op(mod_name_short, op_name)
    161  	    if op is not None:
    162  	        mod = mod_name_short
    163  	    else:
    164  	        op = find_op(mod_name_long, op_name)
    165  	        if op is not None:
    (Pdb) p mod_name_long, mod_name_short, unqualified,
    ('albert_imputer', 'albert_impute', 'AlbertImputerImpl')
    

    In "mod_name_short" the last "r" is missed. For this reason, importlib cannot load the module in find_op(). As a temporary workaround, I created a symbolic link "albert_impute.py" to "albert_imputer.py" file and it works.

    opened by ghost 4
  • Prefix snapml estimators with 'Snap'

    Prefix snapml estimators with 'Snap'

    Inside autoai_core, estimators are identified by their class name (rather than a full path include the module etc). While there does not seem to be any limitation on the Lale side related to having multiple estimators with the same class name, it looks like significant changes would be required in AutoAI core in order to handle this, and these changes would propagate all the way up to the top-level interface.

    In order to have smooth integration without changing the KaggleBot interface, I propose to simply prefix the snapml esimator class names with Snap. I've tested these changes with autoai_core development branch and everything seems to work nicely.

    opened by tdoublep 4
  • Adding random_state argument to fair_stratified_train_test_split

    Adding random_state argument to fair_stratified_train_test_split

    Fixes #596

    In the existing implementation of fair_stratified_train_test_split, there is no argument for setting random_state and we internally just set its value to 42 when calling scikit learn's train_test_split. It will be good idea to add this argument to the fair_stratified_train_test_split method similar to corresponding scikit routine.

    opened by vaisaxena 4
  • Hyperopt Algorithm used

    Hyperopt Algorithm used

    Not actually an issue, just a question.

    Hyperopt supports Random Search, Tree of Parzen Estimators (TPE), and Adaptive TPE.

    When using optimizer Hyperopt in lale, which is the search algorithm behind?

    Thank you in advance for your time and contribution.

    opened by tsiakmaki 4
  • redirect Lale autogen to lale.lib.sklearn where overlap

    redirect Lale autogen to lale.lib.sklearn where overlap

    There are 43 operators that exist in both lale.lib.autogen and lale.lib.sklearn. This overlap is problematic, because it can lead to unexpected behavior depending on the order of imports, and users might end up with a lower-quality version of an operator for which we also have a higher-quality version. We should simply remove lale.lib.autogen operators from the repository for which there is also a lale.lib.sklearn operator. To avoid breaking code that uses them, we can change the __init__.py file of lale.lib.autogen to forward to the relevant replacements.

    List of duplicate operators: ada_boost_classifier, ada_boost_regressor, decision_tree_classifier, decision_tree_regressor, extra_trees_classifier, extra_trees_regressor, function_transformer, gaussian_nb, gradient_boosting_classifier, gradient_boosting_regressor, isomap, k_means, k_neighbors_classifier, k_neighbors_regressor, linear_regression, linear_svc, linear_svr, logistic_regression, min_max_scaler, missing_indicator, mlp_classifier, multinomial_nb, nmf, normalizer, nystroem, one_hot_encoder, ordinal_encoder, passive_aggressive_classifier, pca, polynomial_features, quadratic_discriminant_analysis, quantile_transformer, random_forest_classifier, random_forest_regressor, ridge, ridge_classifier, robust_scaler, sgd_classifier, sgd_regressor, simple_imputer, standard_scaler, svc, svr

    opened by hirzel 0
  • test suite for hyperparameter optimizers

    test suite for hyperparameter optimizers

    There is an existing ad-hoc set of tests for our our optimizers.
    We should factor out a standard set of tests that are generic over the choice of optimizer, that can be re-used/run against each optimizer. This can then be used as a test suite for new optimizers, including ones that live in other repositories (such as NSGA-II based optimizer recently added to lale-gpl)

    opened by shinnar 1
  • Handle Project producing zero columns

    Handle Project producing zero columns

    It would be nice if the user could provide a pipeline with more preprocessing subpipelines than necessary. For example, if a pipeline contains a branch with one-hot encoding for string columns, but the data only has numeric columns, it would be convenient if it worked anyway. Unfortunately, some sklearn operators raise an exception when their input data has zero columns. This issue proposes preventing that exception during fit, and possibly even pruning them from the pipeline returned by fit.

    Example:

    import sklearn.datasets
    X, y = sklearn.datasets.load_digits(return_X_y=True)
    
    from lale.lib.lale import Project, ConcatFeatures
    from lale.lib.sklearn import LogisticRegression, OneHotEncoder
    
    proj_nums = Project(columns={"type": "number"})
    proj_cats = Project(columns={"type": "string"})
    one_hot = OneHotEncoder(handle_unknown="ignore")
    prep = (proj_nums & (proj_cats >> one_hot)) >> ConcatFeatures
    trainable = prep >> LogisticRegression()
    
    print(f"shapes: X {X.shape}, y {y.shape}, "
          f"nums {proj_nums.fit(X).transform(X).shape}, "
          f"cats {proj_cats.fit(X).transform(X).shape}")
    
    trained = trainable.fit(X, y)
    

    This prints:

    shapes: X (1797, 64), y (1797,), nums (1797, 64), cats (1797, 0)
    Traceback (most recent call last):
      File "~/tmp.py", line 17, in <module>
        trained = trainable.fit(X, y)
      File "~/git/user/lale/lale/operators.py", line 3981, in fit
        trained = trainable.fit(X=inputs)
      File "~/git/user/lale/lale/operators.py", line 2526, in fit
        trained_impl = trainable_impl.fit(X, y, **filtered_fit_params)
      File "~/git/user/lale/lale/lib/sklearn/one_hot_encoder.py", line 145, in fit
        self._wrapped_model.fit(X, y)
      File "~/python3.7venv/lib/python3.7/site-packages/sklearn/preprocessing/_encoders.py", line 385, in fit
        self._fit(X, handle_unknown=self.handle_unknown)
      File "~/python3.7venv/lib/python3.7/site-packages/sklearn/preprocessing/_encoders.py", line 74, in _fit
        X_list, n_samples, n_features = self._check_X(X)
      File "~/python3.7venv/lib/python3.7/site-packages/sklearn/preprocessing/_encoders.py", line 43, in _check_X
        X_temp = check_array(X, dtype=None)
      File "~/python3.7venv/lib/python3.7/site-packages/sklearn/utils/validation.py", line 72, in inner_f
        return f(**kwargs)
      File "~/python3.7venv/lib/python3.7/site-packages/sklearn/utils/validation.py", line 661, in check_array
        context))
    ValueError: Found array with 0 feature(s) (shape=(1797, 0)) while a minimum of 1 is required.
    
    opened by hirzel 2
  • Update to newest Hyperopt

    Update to newest Hyperopt

    Hyperopt 0.2.6 was released on November 15: https://pypi.org/project/hyperopt/0.2.6/

    Unfortunately, it breaks many Lale tests: https://github.com/IBM/lale/actions/runs/1467468837

    For example, the failures include some very basic tests such as:

    • test.test_core_transformers.TestFeaturePreprocessing.test_MinMaxScaler
    • test.test_core_transformers.TestFeaturePreprocessing.test_PCA
    • test.test_core_transformers.TestConcatFeatures.test_concat_with_hyperopt

    So for now, we limit the Hyperopt version to <=0.2.5: https://github.com/IBM/lale/pull/875/commits/24db05830ff79d0d1b474b5595a612bad9e62959

    We should try to update to the latest (in fact, if we are lucky, Hyperopt 0.2.7 fixes the problem).

    opened by hirzel 1
  • Add a test case to test_autoai_output_consumption.py to do fairness mitigation

    Add a test case to test_autoai_output_consumption.py to do fairness mitigation

    Add a test case to test_autoai_output_consumption.py covering the following scenario:

    1. Read an output AutoAI pipeline.
    2. Use DisparateImpactRemover on the preprocessing prefix and perform refinement with a choice of classifiers.
    3. Use Hyperopt to choose the best model with the pre-estimator mitigation of step 2.

    Here is some code for using the pipeline generated for the German credit dataset:

    fairness_info = {
                "protected_attributes": [
                    {"feature": "Sex", "reference_group": ['male'], "monitored_group": ['female']},
                    {"feature": "Age", "reference_group": [[20,40], [60,90]], "monitored_group": [[41, 59]]}
                ],
                "favorable_labels": ["No Risk"],
                "unfavorable_labels": ["Risk"],
    }
    
    prefix = best_pipeline.remove_last().freeze_trainable()
    
    from sklearn.linear_model import LogisticRegression as LR
    from sklearn.ensemble import RandomForestClassifier as RF
    from lale.operator_wrapper import wrap_imported_operators
    from lale.lib.aif360 import DisparateImpactRemover
    wrap_imported_operators()
    
    di_remover = DisparateImpactRemover(**fairness_info, preparation=prefix, redact=True)
    planned_fairer = di_remover >> (LR | RF)
    
    from lale.lib.aif360 import accuracy_and_disparate_impact
    from lale.lib.aif360 import FairStratifiedKFold
    
    combined_scorer = accuracy_and_disparate_impact(**fairness_info)
    fair_cv = FairStratifiedKFold(**fairness_info, n_splits=3)
    
    from lale.lib.lale import Hyperopt
    
    import pandas as pd
    df = pd.read_csv("german_credit_data_biased_training.csv")
    y = df.iloc[:, -1]
    X = df.drop(columns=['Risk'])
    
    trained_fairer = planned_fairer.auto_configure(
        X, y, optimizer=Hyperopt, cv=fair_cv, verbose=True,
        max_evals=1, scoring=combined_scorer, best_score=1.0)
    
    opened by kiran-kate 0
Releases(v0.7.2)
  • v0.7.2(Oct 25, 2022)

  • v0.7.1(Oct 4, 2022)

  • v0.7.0(Oct 3, 2022)

    • Improves support for partial_fit
    • Improves the pretty printer
    • Improves support for typed users
    • Adds lale.lib.sklearn.perceptron (wrapping sklearn.linear_model.Perceptron)
    • RASL (experimental):
      • Removes support for Spark Dataframes that don't have an index
      • Moves HashingEncoder to category_encoders and improved documentation
    Source code(tar.gz)
    Source code(zip)
  • v0.6.19(Sep 26, 2022)

  • v0.6.18(Sep 22, 2022)

  • v0.6.17(Sep 21, 2022)

  • v0.6.16(Sep 8, 2022)

  • v0.6.15(Sep 8, 2022)

    • Add support for scikit-learn 1.1
    • Add lower and upper bound constraints for scikit-learn to help suggest recommended versions
    • Add support for newer versions of XGBoost
    Source code(tar.gz)
    Source code(zip)
  • v0.6.14(Aug 25, 2022)

  • v0.6.13(Aug 16, 2022)

  • v0.6.12(Aug 13, 2022)

  • v0.6.11(Jul 25, 2022)

  • v0.6.10(Jun 29, 2022)

  • v0.6.9(May 23, 2022)

  • v0.6.8(May 6, 2022)

    1. Batching can handle an iterable or data loader without knowing n_batches.
    2. XGBoost 1.6
    3. pretty_print lists a list of external modules in wrap_imported_operators.
    Source code(tar.gz)
    Source code(zip)
  • v0.6.7(Apr 21, 2022)

    1. Batching changes to use task graphs
    2. Removed autoai_ts_libs operators
    3. BatchedTreeEnsemble estimators from SnapML
    4. New rasl operators such as BatchedBaggingClassifier and HashingEncoder
    5. Spilling in task graphs
    Source code(tar.gz)
    Source code(zip)
  • v0.6.6(Mar 2, 2022)

  • v0.6.5(Feb 21, 2022)

  • v0.5.11(Feb 2, 2022)

  • v0.6.4(Jan 27, 2022)

  • v0.6.3(Jan 26, 2022)

  • v0.6.2(Jan 25, 2022)

  • v0.6.1(Jan 18, 2022)

    1. New RASL operators: MinMaxScaler, OrdinalEncoder and OneHotEncoder
    2. Fixes and changes for autoai-ts-libs
    3. Scikit-learn compatibility by creating a steps property on lale pipelines and a mechanism to forward attribute access.
    Source code(tar.gz)
    Source code(zip)
  • v0.5.10(Jan 18, 2022)

    1. New RASL operators: MinMaxScaler, OrdinalEncoder and OneHotEncoder
    2. Fixes and changes for autoai-ts-libs
    3. Scikit-learn compatibility by creating a steps property on lale pipelines and a mechanism to forward attribute access.
    Source code(tar.gz)
    Source code(zip)
  • v0.5.9(Dec 6, 2021)

  • v0.5.8(Dec 3, 2021)

    • schema changes for autoai_ts_libs.
    • partial_fit for a pipeline.
    • diff of pipelines.
    • some fixes and other changes.
    • fixes for autoai_ts_libs.
    Source code(tar.gz)
    Source code(zip)
  • v0.6.0(Dec 2, 2021)

  • v0.5.7(Nov 17, 2021)

  • v0.5.6(Oct 12, 2021)

    1. RASL operator implementation such as Filter, Aggregate, GroupBy, OrderBy etc.
    2. Changes for ensembling experiments with lale.lib.aif360.
    3. Refactoring of lale.lib.aif360 and creation of a new setup target fairness.
    4. Customize schemas if the environment has sklearn 1.0.
    5. Update of schema constraints based on the "weakest precondition" work.
    6. Other changes and bug fixes.
    Source code(tar.gz)
    Source code(zip)
  • v0.5.5(Jun 28, 2021)

Owner
International Business Machines
International Business Machines
A pipeline that creates consensus sequences from a Nanopore reads. I

A pipeline that creates consensus sequences from a Nanopore reads. It clusters reads that are similar to each other and creates a consensus that is then identified using BLAST.

Ada Madejska 2 May 15, 2022
Used for data processing in machine learning, and help us to construct ML model more easily from scratch

Used for data processing in machine learning, and help us to construct ML model more easily from scratch. Can be used in linear model, logistic regression model, and decision tree.

ShawnWang 0 Jul 05, 2022
Extract Thailand COVID-19 Cluster data from daily briefing pdf.

Thailand COVID-19 Cluster Data Extraction About Extract Clusters from Thailand Daily COVID-19 briefing PDF Download latest data Here. Data will be upd

Noppakorn Jiravaranun 5 Sep 27, 2021
PipeChain is a utility library for creating functional pipelines.

PipeChain Motivation PipeChain is a utility library for creating functional pipelines. Let's start with a motivating example. We have a list of Austra

Michael Milton 2 Aug 07, 2022
High Dimensional Portfolio Selection with Cardinality Constraints

High-Dimensional Portfolio Selecton with Cardinality Constraints This repo contains code for perform proximal gradient descent to solve sample average

Du Jinhong 2 Mar 22, 2022
Analysiscsv.py for extracting analysis and exporting as CSV

wcc_analysis Lichess page documentation: https://lichess.org/page/world-championships Each WCC has a study, studies are fetched using: https://lichess

32 Apr 25, 2022
Efficient matrix representations for working with tabular data

Efficient matrix representations for working with tabular data

QuantCo 70 Dec 14, 2022
Sentiment analysis on streaming twitter data using Spark Structured Streaming & Python

Sentiment analysis on streaming twitter data using Spark Structured Streaming & Python This project is a good starting point for those who have little

Himanshu Kumar singh 2 Dec 04, 2021
ETL flow framework based on Yaml configs in Python

ETL framework based on Yaml configs in Python A light framework for creating data streams. Setting up streams through configuration in the Yaml file.

Павел Максимов 18 Jul 06, 2022
A script to "SHUA" H1-2 map of Mercenaries mode of Hearthstone

lushi_script Introduction This script is to "SHUA" H1-2 map of Mercenaries mode of Hearthstone Installation Make sure you installed python=3.6. To in

210 Jan 02, 2023
A multi-platform GUI for bit-based analysis, processing, and visualization

A multi-platform GUI for bit-based analysis, processing, and visualization

Mahlet 529 Dec 19, 2022
Produces a summary CSV report of an Amber Electric customer's energy consumption and cost data.

Amber Electric Usage Summary This is a command line tool that produces a summary CSV report of an Amber Electric customer's energy consumption and cos

Graham Lea 12 May 26, 2022
ASOUL直播间弹幕抓取&&数据分析

ASOUL直播间弹幕抓取&&数据分析(更新中) 这些文件用于爬取ASOUL直播间的弹幕(其他直播间也可以)和其他信息,以及简单的数据分析生成。

159 Dec 10, 2022
Collections of pydantic models

pydantic-collections The pydantic-collections package provides BaseCollectionModel class that allows you to manipulate collections of pydantic models

Roman Snegirev 20 Dec 26, 2022
TE-dependent analysis (tedana) is a Python library for denoising multi-echo functional magnetic resonance imaging (fMRI) data

tedana: TE Dependent ANAlysis TE-dependent analysis (tedana) is a Python library for denoising multi-echo functional magnetic resonance imaging (fMRI)

136 Dec 22, 2022
Hue Editor: Open source SQL Query Assistant for Databases/Warehouses

Hue Editor: Open source SQL Query Assistant for Databases/Warehouses

Cloudera 759 Jan 07, 2023
Integrate bus data from a variety of sources (batch processing and real time processing).

Purpose: This is integrate bus data from a variety of sources such as: csv, json api, sensor data ... into Relational Database (batch processing and r

1 Nov 25, 2021
LynxKite: a complete graph data science platform for very large graphs and other datasets.

LynxKite is a complete graph data science platform for very large graphs and other datasets. It seamlessly combines the benefits of a friendly graphical interface and a powerful Python API.

124 Dec 14, 2022
Pizza Orders Data Pipeline Usecase Solved by SQL, Sqoop, HDFS, Hive, Airflow.

PizzaOrders_DataPipeline There is a Tony who is owning a New Pizza shop. He knew that pizza alone was not going to help him get seed funding to expand

Melwin Varghese P 4 Jun 05, 2022
Exploratory data analysis

Exploratory data analysis An Exploratory data analysis APP TAPIWA CHAMBOKO 🚀 About Me I'm a full stack developer experienced in deploying artificial

tapiwa chamboko 1 Nov 07, 2021