This project is the implementation template for HW 0 and HW 1 for both the programming and non-programming tracks

Overview

S22-W4111-HW-1-0:
W4111 - Intro to Databases HW0 and HW1

Introduction

This project is the implementation template for HW 0 and HW 1 for both the programming and non-programming tracks.

HW 0 - All Students

You have completed the first step, which is cloning the project template.

Note: You are Columbia students. You should be able to install SW and follow instructions.

MySQL:

  • Download the installation files for MySQL Community Server..

    • Make sure you download for the correct operating system.
    • If you are on Mac make sure you choose the correct architecture. ARM is for Apple silicon. x86 is for other Apple systems.
    • On Windows, you can download and use the MSI.
  • Follow the installation instructions for MySQL. There are official instructions and many online tutorials.

  • Remember your root user ID and password, that you set during installation. Also, choose "Legacy Authentication" when prompted.

    • If you forget your root user or password, you are on your own. The TAs and I will not fix any problems due to forgetting the information.
    • Also, if you say something like, "It did not prompt me for a user ID and password when I instaled ... ..," we will laugh. We will say something like, ""Sure. 20 million MySQL installations asked for the information, but it decide to not to ask you."
    • If you tell us that you are sure that you are entering the correct user ID and password we will laugh. We will say something like, "Which is more likely. That a DATABASE forgot something or" you did?"
  • You only need to install the server. All other SW packages are optional.

Anaconda:

  • I strongly recommend uninstalling any existing version of Anaconda. If you choose not to uninstall previous versions, you may hit issues. You are on your own if you hit issues due to conflicting versions of Anaconda during the semester.

  • Download the most recent version of Ananconda..

  • Follow the installation instructions. Choose "Install for me" when prompted. If you hit a problem and I find your Anaconda installation in the wrong directory, you are on your own. If you say something like, "But, it did not give me that option," you can guess what will happen.

DataGrip:

  • Download DataGrip. Make sure you choose the correct OS and silicon.

  • Follow the installation instructions.

  • Apply for a student license.

  • When you receive confirmation of your student license, set the license information in DataGrip.

HW0: Non-Programming

Step 1: Initial Files

  1. Create a folder in the project of the form _src, where is your UNI I created an example, which is dff9_src.

  2. Create a file in the directory _HW0.

  3. Copy the Jupyter notebook file from dff9_src/dff9_HW0.ipynb into the directory you created and replace dff9 with your UNI.

  4. Do the same for dff9_HW0.py

Step 2: Jupter Notebook

  • Start Anaconda.

  • Open Jupyter Notebook in Anaconda.

  • Navigate to the directory where you cloned the repository, and then go into the folder you created.

  • Open the notebook (the file ending in .ipynb).

  • The remaining steps in HW0: Non-Programming are in the notebook that you opened.

HW 0: Programming

  • Complete the steps for HW0: Non-Programming.

  • The programming track is not "harder" than non-programming. The initial set up is a little more work, however.

  • Download and install PyCharm. Download and install the professional edition.

  • Follow the instructions to set the license key using the JetBrains account you used to get the DataGrip licenses.

  • Start PyCharm, navigate to and open the project that you cloned from GitHub.

  • Follow the instructions for creating a new virtual Conda environment for the project.

  • Select the root folder in the project, right click and add a new Python Package named _web_src. My example is dff9_web_src.

  • Copy the files from dff9_web_src into the package you created.

  • Follow the instructions for adding a package to your virtual environment. You should add the package flask.

  • Right click on your file application.py that you copied and select run. You will see a console window open and this will show a URL. Copy on the URL.

  • Open a browser. Paste the URL and append '/health'. My URL looks like http://172.20.1.14:5000/health. Yours may be a little different.

  • Hit enter. You should see a health message. Take a screenshot of the browser window and add the file to the directory. My example is ""

Owner
Donald F. Ferguson
Senior Technical Fellow, Chief SW Architect, Ansys, Inc. Adjunct Professor, Dept. of Computer Science, Columbia University. CTO and Co-Founder, Seeka.TV
Donald F. Ferguson
Snakemake workflow for converting FASTQ files to self-contained CRAM files with maximum lossless compression.

Snakemake workflow: name A Snakemake workflow for description Usage The usage of this workflow is described in the Snakemake Workflow Catalog. If

Algorithms for reproducible bioinformatics (Koesterlab) 1 Dec 16, 2021
Driver Analysis with Factors and Forests: An Automated Data Science Tool using Python

Driver Analysis with Factors and Forests: An Automated Data Science Tool using Python 📊

Thomas 2 May 26, 2022
Multiple Pairwise Comparisons (Post Hoc) Tests in Python

scikit-posthocs is a Python package that provides post hoc tests for pairwise multiple comparisons that are usually performed in statistical data anal

Maksim Terpilowski 264 Dec 30, 2022
Using Data Science with Machine Learning techniques (ETL pipeline and ML pipeline) to classify received messages after disasters.

Using Data Science with Machine Learning techniques (ETL pipeline and ML pipeline) to classify received messages after disasters.

1 Feb 11, 2022
PySpark bindings for H3, a hierarchical hexagonal geospatial indexing system

h3-pyspark: Uber's H3 Hexagonal Hierarchical Geospatial Indexing System in PySpark PySpark bindings for the H3 core library. For available functions,

Kevin Schaich 12 Dec 24, 2022
Data and code accompanying the paper Politics and Virality in the Time of Twitter

Politics and Virality in the Time of Twitter Data and code accompanying the paper Politics and Virality in the Time of Twitter. In specific: the code

Cardiff NLP 3 Jul 02, 2022
A probabilistic programming library for Bayesian deep learning, generative models, based on Tensorflow

ZhuSuan is a Python probabilistic programming library for Bayesian deep learning, which conjoins the complimentary advantages of Bayesian methods and

Tsinghua Machine Learning Group 2.2k Dec 28, 2022
Building house price data pipelines with Apache Beam and Spark on GCP

This project contains the process from building a web crawler to extract the raw data of house price to create ETL pipelines using Google Could Platform services.

1 Nov 22, 2021
Helper tools to construct probability distributions built from expert elicited data for use in monte carlo simulations.

Elicited Helper tools to construct probability distributions built from expert elicited data for use in monte carlo simulations. Credit to Brett Hoove

Ryan McGeehan 3 Nov 04, 2022
Stream-Kafka-ELK-Stack - Weather data streaming using Apache Kafka and Elastic Stack.

Streaming Data Pipeline - Kafka + ELK Stack Streaming weather data using Apache Kafka and Elastic Stack. Data source: https://openweathermap.org/api O

Felipe Demenech Vasconcelos 2 Jan 20, 2022
My solution to the book A Collection of Data Science Take-Home Challenges

DS-Take-Home Solution to the book "A Collection of Data Science Take-Home Challenges". Note: Please don't contact me for the dataset. This repository

Jifu Zhao 1.5k Jan 03, 2023
Incubator for useful bioinformatics code, primarily in Python and R

Collection of useful code related to biological analysis. Much of this is discussed with examples at Blue collar bioinformatics. All code, images and

Brad Chapman 560 Jan 03, 2023
For making Tagtog annotation into csv dataset

tagtog_relation_extraction for making Tagtog annotation into csv dataset How to Use On Tagtog 1. Go to Project Downloads 2. Download all documents,

hyeong 4 Dec 28, 2021
MidTerm Project for the Data Analysis FT Bootcamp, Adam Tycner and Florent ZAHOUI

MidTerm Project for the Data Analysis FT Bootcamp, Adam Tycner and Florent ZAHOUI Hallo

Florent Zahoui 1 Feb 07, 2022
Kennedy Institute of Rheumatology University of Oxford Project November 2019

TradingBot6M Kennedy Institute of Rheumatology University of Oxford Project November 2019 Run Change api.txt to binance api key: https://www.binance.c

Kannan SAR 2 Nov 16, 2021
Pandas-based utility to calculate weighted means, medians, distributions, standard deviations, and more.

weightedcalcs weightedcalcs is a pandas-based Python library for calculating weighted means, medians, standard deviations, and more. Features Plays we

Jeremy Singer-Vine 98 Dec 31, 2022
A stock analysis app with streamlit

StockAnalysisApp A stock analysis app with streamlit. You select the ticker of the stock and the app makes a series of analysis by using the price cha

Antonio Catalano 50 Nov 27, 2022
pipeline for migrating lichess data into postgresql

How Long Does It Take Ordinary People To "Get Good" At Chess? TL;DR: According to 5.5 years of data from 2.3 million players and 450 million games, mo

Joseph Wong 182 Nov 11, 2022
Tuplex is a parallel big data processing framework that runs data science pipelines written in Python at the speed of compiled code

Tuplex is a parallel big data processing framework that runs data science pipelines written in Python at the speed of compiled code. Tuplex has similar Python APIs to Apache Spark or Dask, but rather

Tuplex 791 Jan 04, 2023
NumPy aware dynamic Python compiler using LLVM

Numba A Just-In-Time Compiler for Numerical Functions in Python Numba is an open source, NumPy-aware optimizing compiler for Python sponsored by Anaco

Numba 8.2k Jan 07, 2023