Orchest is a browser based IDE for Data Science.

Overview


WebsiteDocsQuickstartVideo tutorials


Join us on Slack

Orchest is a browser based IDE for Data Science. It integrates your favorite Data Science tools out of the box, so you don’t have to. The application is easy to use and can run on your laptop as well as on a large scale cloud cluster.

orchest-0.3.0-demo

A preview of creating pipelines in Orchest. Watch the full video to learn more.

Features

For a complete list of Orchest's features, check out the overview in our docs!

  • Visually construct pipelines.
  • Run any subset of a pipeline directly or on a cron-like schedule.
  • Parametrize your data science pipelines to try out different modeling ideas.
  • Easily define your custom runtime environment that runs on any machine.

Who should use Orchest?

  • Data Scientists who want to rapidly prototype.
  • Data Scientists who like to work in Notebooks.
  • Data Scientists who are looking to create pipelines through a visual interface instead of YAML.

Installation

NOTE: Orchest is in alpha.

For GPU support, language dependencies other than Python, and other installation methods, such as building from source, please refer to our installation docs.

Requirements

  • Docker

If you do not yet have Docker installed, please visit https://docs.docker.com/get-docker/.

NOTE: On Windows, Docker has to be configured to use WSL 2. Make sure to clone Orchest inside the Linux environment. For more info and installation steps for Docker with WSL 2 backend, please visit https://docs.docker.com/docker-for-windows/wsl/.

Linux, macOS and Windows

git clone https://github.com/orchest/orchest.git && cd orchest
./orchest install

# Verify the installation.
./orchest --help

# Start Orchest.
./orchest start

Now that you have installed Orchest, get started with our quickstart tutorial, check out pipelines made by your fellow users, or have a look at our knowledge base videos explaining and showing some of Orchest's core concepts.

License

The software in this repository is licensed as follows:

  • All content residing under the "orchest-sdk/" directory of this repository is licensed under the "Apache-2.0" license as defined in "orchest-sdk/LICENSE".
  • Content outside of the above mentioned directory is available under the "AGPL-3.0" license.

We love your feedback

We would love to hear what you think and add features based on your ideas. Come chat with us on our Slack Channel or open an issue on GitHub.

Contributing

Contributions are more than welcome! Please see our contributor guides for more details.

Not sure where to start? Book a free, no-pressure pairing session with one of our core contributors.

Contributors

Comments
  • Support `containerd` as the container runtime

    Support `containerd` as the container runtime

    Description

    This PR enables working with containerd runtime by introducing an init container to pull images, the controller detects the runtime and configures orchest accordingly”

    In order to test this PR you need to have a cluster with contained runtime, microk8s is suggested, after microk8s is installed (here), the following addons need to be enabled.

    microk8s enable hostpath-storage \
      && microk8s enable dns \
      && microk8s enable ingress
    

    In order to be able to push images to microk8s node, after rebuilding all the images with some valid tag (for example v2022.06.4), you can save them to a tar file via following command:

    docker save $(docker images | awk '{if ($1 ~ /^orchest\//) new_var=sprintf("%s:%s", $1, $2); print new_var}' | grep v2022.06.4 | sort | uniq) -o orchest-images.tar
    

    Then this tar file can be shipped to microk8s node via scp. scp ./orchest-images.tar {your_user}@${microk8s node ip}:~/

    then inside the microk8s node, you can import the images via following command (note ctr has to be installed, binaries can be found here)

    sudo ctr -n k8s.io -a /var/snap/microk8s/common/run/containerd.sock i import orchest-images.tar
    
    # Or use microk8s ctr
    microk8s ctr --namespace k8s.io --address /var/snap/microk8s/common/run/containerd.sock image import orchest-images.tar
    

    then orchest can be installed with orchest-cli, with following command

    orchest install --socket-path=/var/snap/microk8s/common/run/containerd.sock --dev
    

    Note:

    the manifests must be generated via make manifestgen in the orchest-controller directory.

    TAGNAME=v2022.06.4 make -C ./services/orchest-controller manifestgen
    

    Checklist

    • [x] The documentation reflects the changes.
    • [x] The PR branch is set up to merge into dev instead of master.
    • [x] I haven't introduced breaking changes that would disrupt existing jobs, i.e. backwards compatibility is maintained.
    • [x] In case I changed the dependencies in any requirements.in I have run pip-compile to update the corresponding requirements.txt.
    • [x] In case I changed one of the services' models.py I have performed the appropriate database migrations (refer to the DB migration docs).
    • [x] In case I changed code in the orchest-sdk I followed its release checklist.
    • [x] In case I changed code in the orchest-cli I followed its release checklist.
    • [x] The newly added image-puller has to be pushed to DockerHub on release. So they need to be added to the correct .github/workflow/... file.
    • [x] add document about installing in microk8s
    • [x] merge both cli init containers into one
    • [x] ~~update thirdparties on update~~
    new feature request 
    opened by nhaghighat 32
  • Error attempting to connect to Gateway server url 'http://jupyter-EG-93c7d122-a3b1-435f-d8f6a2d0-6584-4c1c:8888'.  Ensure gateway url is valid and the Gateway instance is running.

    Error attempting to connect to Gateway server url 'http://jupyter-EG-93c7d122-a3b1-435f-d8f6a2d0-6584-4c1c:8888'. Ensure gateway url is valid and the Gateway instance is running.

    Describe the bug
    When we open pipeline in jupyterlab and run cell it fail to execute cell and throws following error.

    Error attempting to connect to Gateway server url 'http://jupyter-EG-93c7d122-a3b1-435f-d8f6a2d0-6584-4c1c:8888'. Ensure gateway url is valid and the Gateway instance is running.

    To Reproduce

    Create new project -> new pipeline -> open in jupyter notebook

    Screenshots
    image

    Environment

    • elementary os (linux)
    bug 
    opened by Practcdi 30
  • [Feature] File Manager

    [Feature] File Manager

    Description

    Add File Manager in Pipeline Editor as the default way of managing files in Orchest.

    Fixes: #612

    Checklist

    • [x] The PR branch is set up to merge into dev instead of master.
    new feature request 
    opened by iannbing 27
  • Sending notifications when jobs fail.

    Sending notifications when jobs fail.

    Description

    This PR exposes the BE functionality of sending notifications to desired channels (e.g. Slack) when jobs fail.

    Closes: #120

    Todo

    • [x] Items from https://github.com/orchest/orchest/pull/1008#issuecomment-1143731359

    Checklist

    • [x] The documentation reflects the changes.
    • [x] The PR branch is set up to merge into dev instead of master.
    • [x] In case I changed one of the services’ models.py I have performed the appropriate database migrations (refer to scripts/migration_manager.sh).
    • [x] In case I changed code in the orchest-sdk I followed its release checklist
    • [x] In case I changed code in the orchest-cli I followed its release checklist
    • [x] I haven't introduced breaking changes that would disrupt existing jobs, i.e. backwards compatibility is maintained.
    • [x] In case I changed the dependencies in any requirements.in I have run pip-compile to update the corresponding requirements.txt.
    new feature request 
    opened by iannbing 25
  • Improve robustness of Orchest Operator and update mechanism

    Improve robustness of Orchest Operator and update mechanism

    Description

    This PR fixes order of deployment by introducing new internal CRD named OrchestComponent. The OrchestCluster will be created by the user or the orchest-cli, then OrchestCluster controller creates different OrchestComponent for each service, then a dedicated component controller for each component, controls the status of the underlying objects of OrchestComponent

    Fixes: #952, #991

    Testing the PR

    minikube addons enable ingress
    eval $(minikube docker-env)
    scripts/build_container.sh -M -t "v2022.05.3" -o "v2022.05.3"
    pip install -e orchest-cli
    
    # yes, this is ALL that is needed now to install Orchest
    orchest install --dev
    

    Testing orchest update through the CLI

    orchest uninstall
    scripts/build_container.sh -M -t "v2022.04.4" -o "v2022.04.4"
    orchest install --dev
    scripts/build_container.sh -M -t "v2022.04.5" -o "v2022.04.5"
    orchest update --dev --version=v2022.04.5
    

    Testing orchest update through the UI

    NOTE: You need to have created the minikube with the orchest-dev-repo mount

    orchest uninstall
    scripts/build_container.sh -M -t "v2022.04.4" -o "v2022.04.4"
    orchest install --dev
    orchest patch --dev
    pnpm run dev
    scripts/build_container.sh -M -t "v2022.04.5" -o "v2022.04.5"
    INVOKE THROUGH UI (go to http://localorchest.io/update)
    scripts/build_container.sh -M -t "v2022.04.6" -o "v2022.04.6"
    INVOKE THROUGH UI (go to http://localorchest.io/update)
    ... (as often as you like)
    

    Checklist

    • [x] The documentation reflects the changes.
    • [x] The PR branch is set up to merge into dev instead of master.
    • [x] In case I changed code in the orchest-cli I followed its release checklist
    • [x] I haven't introduced breaking changes that would disrupt existing jobs, i.e. backwards compatibility is maintained.
    • [x] In case I changed the dependencies in any requirements.in I have run pip-compile to update the corresponding requirements.txt
    • [x] Start reports that Orchest is successfully started, but some deployments still have to start.
    • [x] webserver hangs then restarts after a while on start if it's started concurrently w.r.t. to the orchest-api.
    • [x] Celery fails to boot on fresh install
    • [x] reliably report the status from orchest start about availability
    • [x] check Ingress/deployment and service status of each component.
    • [x] Updating the default pvc sizes: 50 GiB, 25 GiB, 25 GiB (userdir, registry, builder cache)
    • [x] ~~make possible through CLI to specify it as well as singleNode installation.~~ --> To be done later so we can get this PR merged.
    • [x] Enable to update all controller manifest in update.
      • [x] Update CRD changes through update.
      • [x] Add all manifests into one big file
      • [x] orchest-cli use the manifest from release assets
      • [x] GitHub Action to add the yaml files as assets to the release
      • [x] Update GitHub Action of updating controller image on manifests (removed)
    • [x] Add annotation in the namespace manifest. to avoid kubectl warning (Not needed)
    • [x] Testing behavior
      • [x] orchest update through CLI
      • [x] orchest update through UI
        • [x] The UpdateView.tsx needs to parse the response from the controller endpoint correctly and show it as logs
      • [x] orchest restart through CLI and UI
      • [x] orchest patch
      • [x] orchest install
    • [x] Update documentation
      • [x] Update installation docs
      • [x] mention in the docs that ingress controller needs to be present to move Running state
      • [x] ~~Note that the docker-registry is managed through Helm and not the orchest-controller.~~
      • [x] Document about how to run the operator outside of the cluster for easy debugging
      • [x] Improve docstrings/comments around key functionality of the controller
      • [x] Controller readme.
      • [x] Remove unused Helm charts, e.g. orchest-api ones, from services/orchest-controller/deploy
      • [X] ~~Mention in docs that pvc size can only be increased & that the default storage class is used if not specified.~~ --> Will be added to the CLI in another PR
      • [x] Update internal document about the release process given the changes that took place
    • [x] renaming pause/unpause to start/stop
    • [x] Migrate the CLI to install in one namespace only. Needs to parse release asset to set the namespace.
      • [x] Check whether #866 is resolved. --> Not yet, the Helm deployer in the orchest-controller doesn't seem to be picking up custom namespaces
      • [x] Depending on the choice, the CLI orchest uninstall might need to change slightly. If we choose to go for it, then there should be a default flag.
    improvement breaking change 
    opened by nhaghighat 25
  • Improv/material UI

    Improv/material UI

    Description

    Replace MDC custom UI components with Material-UI components.

    Resolves: #413, #557, #554, #540, #598, #380, #613, #311

    Checklist

    • [x] I have manually tested the application to make sure the changes don’t cause any downstream issues, which includes making sure ./orchest status --ext is not reporting failures when Orchest is running.
    opened by iannbing 24
  •  404 error after installation on local mode

    404 error after installation on local mode

    Hello.

    I have tried to install orchest in a Linux Ubuntu (20.04) virtualization. I have followed the instructions as explained in https://docs.orchest.io/en/stable/getting_started/installation.html, with the difference that I hace used docker as the driver for minikube

    minikube start --cpus=4 --driver=docker

    The installation seems to be OK, and no error message has been diplayed.

    But when accesing localorchest.io a 404 not found message is returned from nginx. The /etc/host file has been updated and the adddress is resolved, but no page is found (or is not propertly redirected)

    Going into the minikube dashboard the services seem to be running as shown in the next picture. image

    JuanLuis has suggested using minikube addons enable ingres to try to solve the issue, but it doesn't seem to work. I have noticed that the Ingresses section from the minikube dashboard are not resolved image (1)

    Best regards, Alvaro

    bug 
    opened by AlvaroGarciaTEK 23
  • New UI design part 2: Project List

    New UI design part 2: Project List

    Description

    The new Project List view.

    Some explanations on not implementing the details in the design:

    • The drop-files-to-screen-to-create-project functionality is removed due to unclear value and usability concerns (now Examples Tab resides in the same view, it's confusing, for example, a user might want to drag-n-drop files to submit an example).
    • The pagination of Project List in the design didn't consider "specify the number of project per page", we need further discussion on this. At the moment the original TablePagination is kept.
    • I think the "Sorting" in the Example List needs more discussion: 1) it's a new feature 2) its location is far away from the list in the design, might have some usability concerns (normally the filter/sorting option should directly on top of the list). So, I decided to postpone the implementation.
    Screenshot 2022-07-06 at 11 35 41 Screenshot 2022-07-06 at 11 35 19 Screenshot 2022-07-06 at 11 35 07
    opened by iannbing 22
  • Save

    Save "dirty" open files in JupyterLab when navigating away from the JupyterLab page

    Describe the solution you'd like
    Save "dirty" open files in JupyterLab when navigating away from the JupyterLab page

    What does your solution aim to solve?
    It's not always clear to the user saving is necessary to propagate changes of behavior. In addition, unsaved changes cause a browser navigation prompt "Are you sure you want to leave..." without JupyterLab showing. Finally, it by default autosaves already so it's not a big change in behavior.

    Note: saving files in the JupyterLab UI is a bit glitchy with the recent addition of real-time collaboration in JupyterLab. So we probably want to track JupyterLab upstream closely to make sure we get rid of this glitchy-ness as soon as possible. Glitchiness is there as of JupytrLab 3.1.12.

    improvement 
    opened by ricklamers 18
  • Remove unnecessary side effects in  Step details

    Remove unnecessary side effects in Step details

    Description

    The StepDetailsProperties contains some side effects that saves pipeline steps multiple times, which could result in losing data (new data was overwritten by old data). This PR removes these side effects, and also fixes a bug in FilePicker.

    Checklist

    • [x] The PR branch is set up to merge into dev instead of master.
    opened by iannbing 17
  • Merge  the functionality of the `Pipelines` view into PipelineEditor

    Merge the functionality of the `Pipelines` view into PipelineEditor

    Description

    To simplify the workflow, this PR merges the functionality of the Pipelines view into PipelineEditor, so that user could directly manage pipelines without going back and forth between Pipelines and PipelineEditor.

    Major changes:

    • The old "Pipelines" view is removed. Therefore, the navigation item "Pipelines" in the main navigation bar will now lead to PipelineEditor directly.
    • When landing on PipelineEditor, it will load the first pipeline in the project. PipelineEditor will try to automatically load the first from the remaining pipelines in the project, for example, when user deletes the currently-open pipeline, the next one will be loaded automatically.
    • In PipelineEditor, added a "Sessions" panel for managing Orchest sessions. Added "Create Pipeline" in FileManager for opening the "Create a new pipeline" dialog. Sessions panel is vertically resizable.
    • Added the pipeline file path under the pipeline name in the HeaderBar to make it more explicit.
    Screenshot 2022-04-11 at 14 08 04

    Checklist

    • [x] The PR branch is set up to merge into dev instead of master.
    improvement 
    opened by iannbing 17
  • chore(deps): bump gitpython from 3.1.27 to 3.1.30 in /services/jupyter-server

    chore(deps): bump gitpython from 3.1.27 to 3.1.30 in /services/jupyter-server

    Bumps gitpython from 3.1.27 to 3.1.30.

    Release notes

    Sourced from gitpython's releases.

    v3.1.30 - with important security fixes

    See gitpython-developers/GitPython#1515 for details.

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies python 
    opened by dependabot[bot] 1
  • Make environment builds more robust w.r.t. `.orchest` being git ignored

    Make environment builds more robust w.r.t. `.orchest` being git ignored

    Describe the problem this improvement solves Orchest expects the .orchest/environments directory to be versioned (or, at least, to not be .gitignore'd). This is because an environment build will take a snapshot of the project, and said snapshot will exclude files and directories according to the .gitignore file. However, some users would like to not version any content of the .orchest directory.

    Describe the solution you'd like The build should work regardless of the .orchest directory being in the snapshot or not. Making the build read the environment properties and setup script prior to the snapshot should be feasible. Note: the PR fixing this should include some changes to docs/source/fundamentals/environments.md to remove the notion that .orchest/environments shouldn't be git ignored (i.e. revert https://github.com/orchest/orchest/commit/02c2fa4caaf2cccd770721265b812caa437e86c0.

    good first issue improvement 
    opened by fruttasecca 0
  • WIP: New home view

    WIP: New home view

    Description

    This removes the /projects view in favor of a new more flexible "home" view, where other things than Projects can be displayed, such as all Job and Interactive Runs.

    Features

    • [x] Remove the /projects page and fix any inbound links
    • [x] Redirect /projects to /?tab=projects
    • [x] Move projects to the new home page and align components with new design
    • [ ] Update the Project selector menu with the new design
    • [x] Implement interactive job runs under "all runs"
    • [ ] Implement all job runs under "all runs"
      • [ ] Make the back-end support querying of multiple (or all) projects
      • [ ] Implement filtering, pagination, etc... in the new front-end based on the new API.

    Checklist

    • [ ] I have manually tested my changes and I am happy with the result.
    • [ ] The documentation reflects the changes.
    • [ ] The PR branch is set up to merge into dev instead of master.
    improvement 
    opened by mausworks 0
  • Environment view code editor horizontal bar overlap environment name

    Environment view code editor horizontal bar overlap environment name

    Describe the bug In the environment view, if the code editor has the horizontal scrolling bar, the bar overlaps the element containing the environment name when scrolling down the page.

    To Reproduce Steps to reproduce the behavior:

    • in the environment setup script editor, write a line that's long enough to trigger the presence of the horizontal scroll bar
    • scroll down the page

    Screenshots image

    bug 
    opened by fruttasecca 0
  • build(deps): bump certifi from 2021.10.8 to 2022.12.7 in /services/session-sidecar

    build(deps): bump certifi from 2021.10.8 to 2022.12.7 in /services/session-sidecar

    Bumps certifi from 2021.10.8 to 2022.12.7.

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies python 
    opened by dependabot[bot] 1
  • Tab switch URL change is debounced causing jumpy behavior

    Tab switch URL change is debounced causing jumpy behavior

    Describe the bug Tab switch URL change is debounced causing jumpy behavior

    When you on the projects page for example switch between "My projects" and "Examples" the URL changes only after a delay which causes quick navigation between the tabs to cause a jumpy behavior.

    Expected behavior When you click on the tabs quickly it should "follow your click".

    To Reproduce Steps to reproduce the behavior:

    1. Go to the Projects page
    2. Click on the Examples tab
    3. Quickly quick on the My projects tab
    4. See that it goes to "My projects" and jumps back to "Examples" shortly after (after the URL change)

    Environment

    • OS (e.g. macOS): Linux
    • Browser (e.g. Chrome): Brave
    • Orchest's version (in the settings page): v2022.11.2
    bug 
    opened by ricklamers 0
Releases(v2023.01.2)
Owner
Orchest
A new kind of IDE for Data Science.
Orchest
Detailed analysis on fraud claims in insurance companies, gives you information as to why huge loss take place in insurance companies

Insurance-Fraud-Claims Detailed analysis on fraud claims in insurance companies, gives you information as to why huge loss take place in insurance com

1 Jan 27, 2022
Import, connect and transform data into Excel

xlwings_query Import, connect and transform data into Excel. Description The concept is to apply data transformations to a main query object. When the

George Karakostas 1 Jan 19, 2022
Tuplex is a parallel big data processing framework that runs data science pipelines written in Python at the speed of compiled code

Tuplex is a parallel big data processing framework that runs data science pipelines written in Python at the speed of compiled code. Tuplex has similar Python APIs to Apache Spark or Dask, but rather

Tuplex 791 Jan 04, 2023
We're Team Arson and we're using the power of predictive modeling to combat wildfires.

We're Team Arson and we're using the power of predictive modeling to combat wildfires. Arson Map Inspiration There’s been a lot of wildfires in Califo

Jerry Lee 3 Oct 17, 2021
Data Analytics: Modeling and Studying data relating to climate change and adoption of electric vehicles

Correlation-Study-Climate-Change-EV-Adoption Data Analytics: Modeling and Studying data relating to climate change and adoption of electric vehicles I

Jonathan Feng 1 Jan 03, 2022
Bearsql allows you to query pandas dataframe with sql syntax.

Bearsql adds sql syntax on pandas dataframe. It uses duckdb to speedup the pandas processing and as the sql engine

14 Jun 22, 2022
For making Tagtog annotation into csv dataset

tagtog_relation_extraction for making Tagtog annotation into csv dataset How to Use On Tagtog 1. Go to Project Downloads 2. Download all documents,

hyeong 4 Dec 28, 2021
Monitor the stability of a pandas or spark dataframe ⚙︎

Population Shift Monitoring popmon is a package that allows one to check the stability of a dataset. popmon works with both pandas and spark datasets.

ING Bank 403 Dec 07, 2022
CRISP: Critical Path Analysis of Microservice Traces

CRISP: Critical Path Analysis of Microservice Traces This repo contains code to compute and present critical path summary from Jaeger microservice tra

Uber Research 110 Jan 06, 2023
A script to "SHUA" H1-2 map of Mercenaries mode of Hearthstone

lushi_script Introduction This script is to "SHUA" H1-2 map of Mercenaries mode of Hearthstone Installation Make sure you installed python=3.6. To in

210 Jan 02, 2023
ped-crash-techvol: Texas Ped Crash Tech Volume Pack

ped-crash-techvol: Texas Ped Crash Tech Volume Pack In conjunction with the Final Report "Identifying Risk Factors that Lead to Increase in Fatal Pede

Network Modeling Center; Center for Transportation Research; The University of Texas at Austin 2 Sep 28, 2022
Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting with data.

Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting with data.

Amundsen 3.7k Jan 03, 2023
BasstatPL is a package for performing different tabulations and calculations for descriptive statistics.

BasstatPL is a package for performing different tabulations and calculations for descriptive statistics. It provides: Frequency table constr

Angel Chavez 1 Oct 31, 2021
Python Implementation of Scalable In-Memory Updatable Bitmap Indexing

PyUpBit CS490 Large Scale Data Analytics — Implementation of Updatable Compressed Bitmap Indexing Paper Table of Contents About The Project Usage Cont

Hyeong Kyun (Daniel) Park 1 Jun 28, 2022
Binance Kline Data With Python

Binance Kline Data by seunghan(gingerthorp) reference https://github.com/binance/binance-public-data/ All intervals are supported: 1m, 3m, 5m, 15m, 30

shquant 5 Jul 13, 2022
Used for data processing in machine learning, and help us to construct ML model more easily from scratch

Used for data processing in machine learning, and help us to construct ML model more easily from scratch. Can be used in linear model, logistic regression model, and decision tree.

ShawnWang 0 Jul 05, 2022
PyClustering is a Python, C++ data mining library.

pyclustering is a Python, C++ data mining library (clustering algorithm, oscillatory networks, neural networks). The library provides Python and C++ implementations (C++ pyclustering library) of each

Andrei Novikov 1k Jan 05, 2023
An ETL Pipeline of a large data set from a fictitious music streaming service named Sparkify.

An ETL Pipeline of a large data set from a fictitious music streaming service named Sparkify. The ETL process flows from AWS's S3 into staging tables in AWS Redshift.

1 Feb 11, 2022
An extension to pandas dataframes describe function.

pandas_summary An extension to pandas dataframes describe function. The module contains DataFrameSummary object that extend describe() with: propertie

Mourad 450 Dec 30, 2022
WithPipe is a simple utility for functional piping in Python.

A utility for functional piping in Python that allows you to access any function in any scope as a partial.

Michael Milton 1 Oct 26, 2021