BigDL - Evaluate the performance of BigDL (Distributed Deep Learning on Apache Spark) in big data analysis problems

Last update: Jan 06, 2022

Related tags

Overview

Evaluate the performance of BigDL (Distributed Deep Learning on Apache Spark) in big data analysis problems.

Introduction

BigDL is a distributed deep learning library for Apache Spark; with BigDL, users can write their deep learning applications as standard Spark programs, which can directly run on top of existing Spark or Hadoop clusters.

Installation

Please download BigDL Packages or pip install BigDL (conda)

How to run Program on Spark

Usage: spark-submit-with-bigdl.sh + [options] + file.py

Options:

master MASTER URL: spark, yarn, k8s, local.
local[k]: Run Spark locally with k worker threads as logical cores on your machine.
File.py: File for executing program.

System configuration

Program run on system includes:

System/Host Processor: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
CPU(s): 48
Core(s) per socket: 12
Socket(s): 2
Memory: 183 G (free)

Data Description and Run Model

It is a dataset of 60,000 small square 28×28 pixel grayscale images of handwritten single digits between 0 and 9. The MNIST data is split into three parts: 60,000 data points of training data, 10,000 points of test data.

With this BigDL Problem, We use LSTM model for MNIST digit classification problem.

BigDL - Evaluate the performance of BigDL (Distributed Deep Learning on Apache Spark) in big data analysis problems

Related tags

Overview

Evaluate the performance of BigDL (Distributed Deep Learning on Apache Spark) in big data analysis problems.

Introduction

Installation

How to run Program on Spark

System configuration

Data Description and Run Model

BigDL Performance Evaluation

Execution running time

Computation Evaluation (SPEED UP)

Owner

Vo Cong Thanh

A model checker for verifying properties in epistemic models

LynxKite: a complete graph data science platform for very large graphs and other datasets.

A Streamlit web-app for a data-science project that aims to evaluate if the answer to a question is helpful.

Automated Exploration Data Analysis on a financial dataset

Exploratory Data Analysis for Employee Retention Dataset

A pipeline that creates consensus sequences from a Nanopore reads. I

Stochastic Gradient Trees implementation in Python

A columnar data container that can be compressed.

pipeline for migrating lichess data into postgresql

Projects that implement various aspects of Data Engineering.

Option Pricing Calculator using the Binomial Pricing Method (No Libraries Required)

PyStan, a Python interface to Stan, a platform for statistical modeling. Documentation: https://pystan.readthedocs.io

Orchest is a browser based IDE for Data Science.

The OHSDI OMOP Common Data Model allows for the systematic analysis of healthcare observational databases.

DaDRA (day-druh) is a Python library for Data-Driven Reachability Analysis.

DataPrep — The easiest way to prepare data in Python

A 2-dimensional physics engine written in Cairo

Feature engineering and machine learning: together at last

Data Analysis for First Year Laboratory at Imperial College, London.

PCAfold is an open-source Python library for generating, analyzing and improving low-dimensional manifolds obtained via Principal Component Analysis (PCA).