This repository contains answers of the Shopify Summer 2022 Data Science Intern Challenge.

Last update: Jan 11, 2022

Overview

Data-Science-Intern-Challenge

This repository contains answers of the Shopify Summer 2022 Data Science Intern Challenge.

Summer 2022 Data Science Intern Challenge

Please complete the following questions, and provide your thought process/work. You can attach your work in a text file, link, etc. on the application page. Please ensure answers are easily visible for reviewers!

Question 1: Given some sample data, write a program to answer the following: click here to access the required data set

On Shopify, we have exactly 100 sneaker shops, and each of these shops sells only one model of shoe. We want to do some analysis of the average order value (AOV). When we look at orders data over a 30 day window, we naively calculate an AOV of $3145.13. Given that we know these shops are selling sneakers, a relatively affordable item, something seems wrong with our analysis.

Think about what could be going wrong with our calculation. Think about a better way to evaluate this data.

Answer: The wrong average was calculated using this method: total of all order values/ number of order_values. This is wrong because the formula didn't consider the fact that an order can have multiple items. I have tried to explain the problem with code. Click Here to view it.

What metric would you report for this dataset?

Answer: The correct approach would be to divide the total of all order_values by the sum of total_items. By following this method, we would consider the fact that an order can have multiple items.

What is its value?

Answer: $357.92

Question 2: For this question you’ll need to use SQL. Follow this link to access the data set required for the challenge. Please use queries to answer the following questions. Paste your queries along with your final numerical answers below.

How many orders were shipped by Speedy Express in total?

Answer: 54

What is the last name of the employee with the most orders?

Answer: Peacock

What product was ordered the most by customers in Germany?

Answer: Boston Crab Meat. This product was ordered 160 times in total.

Click here to check the sql queries.

This repository contains answers of the Shopify Summer 2022 Data Science Intern Challenge.

Related tags

Overview

Data-Science-Intern-Challenge

Summer 2022 Data Science Intern Challenge

Owner

Online Multi-Granularity Distillation for GAN Compression (ICCV2021)

Flaxformer: transformer architectures in JAX/Flax

Awesome Artificial Intelligence, Machine Learning and Deep Learning as we learn it

DetCo: Unsupervised Contrastive Learning for Object Detection

Official implementation of "Towards Good Practices for Efficiently Annotating Large-Scale Image Classification Datasets" (CVPR2021)

Python package for covariance matrices manipulation and Biosignal classification with application in Brain Computer interface

Official Pytorch Implementation of Length-Adaptive Transformer (ACL 2021)

Exploration & Research into cross-domain MEV. Initial focus on ETH/POLYGON.

A modified version of DeepMind's Alphafold2 to divide CPU part (MSA and template searching) and GPU part (prediction model)

Deep learning models for change detection of remote sensing images

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

VID-Fusion: Robust Visual-Inertial-Dynamics Odometry for Accurate External Force Estimation

This demo showcase the use of onnxruntime-rs with a GPU on CUDA 11 to run Bert in a data pipeline with Rust.

Split Variational AutoEncoder

Open source repository for the code accompanying the paper 'Non-Rigid Neural Radiance Fields Reconstruction and Novel View Synthesis of a Deforming Scene from Monocular Video'.

(CVPR 2021) Back-tracing Representative Points for Voting-based 3D Object Detection in Point Clouds

CarND-LaneLines-P1 - Lane Finding Project for Self-Driving Car ND

This repository contains all data used for writing a research paper Multiple Object Trackers in OpenCV: A Benchmark, presented in ISIE 2021 conference in Kyoto, Japan.

This repository provides the official implementation of 'Learning to ignore: rethinking attention in CNNs' accepted in BMVC 2021.

Python library containing BART query generation and BERT-based Siamese models for neural retrieval.