BigDL - Evaluate the performance of BigDL (Distributed Deep Learning on Apache Spark) in big data analysis problems

Last update: Jan 06, 2022

Related tags

Overview

Evaluate the performance of BigDL (Distributed Deep Learning on Apache Spark) in big data analysis problems.

Introduction

BigDL is a distributed deep learning library for Apache Spark; with BigDL, users can write their deep learning applications as standard Spark programs, which can directly run on top of existing Spark or Hadoop clusters.

Installation

Please download BigDL Packages or pip install BigDL (conda)

How to run Program on Spark

Usage: spark-submit-with-bigdl.sh + [options] + file.py

Options:

master MASTER URL: spark, yarn, k8s, local.
local[k]: Run Spark locally with k worker threads as logical cores on your machine.
File.py: File for executing program.

System configuration

Program run on system includes:

System/Host Processor: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
CPU(s): 48
Core(s) per socket: 12
Socket(s): 2
Memory: 183 G (free)

Data Description and Run Model

It is a dataset of 60,000 small square 28×28 pixel grayscale images of handwritten single digits between 0 and 9. The MNIST data is split into three parts: 60,000 data points of training data, 10,000 points of test data.

With this BigDL Problem, We use LSTM model for MNIST digit classification problem.

BigDL - Evaluate the performance of BigDL (Distributed Deep Learning on Apache Spark) in big data analysis problems

Related tags

Overview

Evaluate the performance of BigDL (Distributed Deep Learning on Apache Spark) in big data analysis problems.

Introduction

Installation

How to run Program on Spark

System configuration

Data Description and Run Model

BigDL Performance Evaluation

Execution running time

Computation Evaluation (SPEED UP)

Owner

Vo Cong Thanh

LynxKite: a complete graph data science platform for very large graphs and other datasets.

Business Intelligence (BI) in Python, OLAP

💬 Python scripts to parse Messenger, Hangouts, WhatsApp and Telegram chat logs into DataFrames.

Generates a simple report about the current Covid-19 cases and deaths in Malaysia

Processo de ETL (extração, transformação, carregamento) realizado pela equipe no projeto final do curso da Soul Code Academy.

Using Python to scrape some basic player information from www.premierleague.com and then use Pandas to analyse said data.

Semi-Automated Data Processing

Elementary is an open-source data reliability framework for modern data teams. The first module of the framework is data lineage.

Improving your data science workflows with

Demonstrate a Dataflow pipeline that saves data from an API into BigQuery table

Stitch together Nanopore tiled amplicon data without polishing a reference

Fit models to your data in Python with Sherpa.

Data collection, enhancement, and metrics calculation.

Sentiment analysis on streaming twitter data using Spark Structured Streaming & Python

Karate Club: An API Oriented Open-source Python Framework for Unsupervised Learning on Graphs (CIKM 2020)

The OHSDI OMOP Common Data Model allows for the systematic analysis of healthcare observational databases.

University Challenge 2021 With Python

Methylation/modified base calling separated from basecalling.

Fitting thermodynamic models with pycalphad

Airflow ETL With EKS EFS Sagemaker