Materials to reproduce our findings in our stories, "Amazon Puts Its Own 'Brands' First Above Better-Rated Products" and "When Amazon Takes the Buy Box, it Doesn’t Give it up"

Overview

Amazon Brands and Exclusives

This repository contains code to reproduce the findings featured in our story "Amazon Puts Its Own 'Brands' First Above Better-Rated Products" and "When Amazon Takes the Buy Box, it Doesn’t Give it up" from our series Amazon's Advantage.

Our methodology is described in "How We Analyzed Amazon’s Treatment of Its Brands in Search Results".

Data that we collected and analyzed is in the data folder.
To use the full input dataset (which is not hosted here), please refer to Download data.

Jupyter notebooks used for data preprocessing and analysis are available in the notebooks folder.
Descriptions for each notebook are outlined in the Notebooks section below.

Installation

Python

Make sure you have Python 3.6+ installed. We used Miniconda to create a Python 3.8 virtual environment.

Then install the Python packages:
pip install -r requirements.txt

Notebooks

These notebooks are intended to be run sequentially, but they are not dependent on one another. If you want a quick overview of the methodology, you only need to concern yourself with the notebooks with an asterisk(*).

0-data-preprocessing.ipynb

This notebook parses Amazon search results and Amazon product pages, and produces the intermediary datasets (data/output/datasets/) used in ranking analysis and random forest classifiers.

1-data-analysis-search-results.ipynb *

Bulk of the ranking analysis and stats in the data analysis.

2-random-forest-analysis.ipynb *

Feature engineering training set, finding optimal hyperparameters, and performing the ablation study on a random forest model. The most predictive feature is verified using three separate methods.

3-survey-results.ipynb

Visualizing the survey results from our national panel of 1,000 adults.

4-limiations-product-page-changes.ipynb

Analysis of how often the Buy Box's default shipper and seller change between Amazon and a third party.

utils.py

Contains convenient functions used in the notebooks.

parsers.py

Contains parsers for search results and product pages.

Data

This directory is where inputs, intermediaries, and outputs are saved.

data
├── output
│   ├── figures
│   ├── tables
│   └── datasets
│       ├── amazon_private_label.csv.xz
│       ├── products.csv.xz
│       ├── searches.csv.xz
│       ├── training_set.csv.gz
│       ├── pairwise_training_set.csv.gz
│       └── trademarks
└── input
    ├── combined_queries_with_source.csv
    ├── best_sellers
    ├── generic_search_terms
    ├── search-private-label
    ├── search-selenium
    ├── search-selenium-our-brands-filter_
    ├── selenium-products
    ├── seller_central
    └── spotcheck

data/output/ contains tables, figures, and datasets used in our methodology.

data/output/datasets/amazon_private_label.csv.xz is our dataset of Amazon brands, exclusives, and proprietary electronics (N=137,428 products). We use each product's unique ID (called an ASIN) to identify Amazon's own products in our methodology.

data/output/datasets/trademarks contains a dataset of trademarked brands registered by Amazon. The data was collected from USPTO.gov and Amazon. We included an additional README with the exact steps we took to build this dataset in the directory.

data/output/datasets/searches.csv.xz parsed search result pages from top and generic searches (N=187,534 product positions). You can filter this by search_term for each of these subsets from data/input/combined_queries_with_source.csv.

data/output/datasets/products.csv.xz parsed product pages from the searches above (N=157,405 product pages).

data/output/training_set.csv.gz metadata used to train and evaluate the random forest. Additionally, feature engineering is conducted in notebooks/2-random-forest-analysis.ipynb, which produces pairwise_training_set.csv.gz.

Every file in data/input except combined_queries_with_source.csv is stored in AWS s3. Those are not hosted in this repository.

Download Data

You can find the raw inputs in data/input in s3://markup-public-data/amazon-brands/.

If you trust us, you can download the HTML and JSON files in data/input using this script: sh data/download_input_data.sh

Note this is not necessary to run notebooks and see full results.

data/input/search-selenium/ (12 GB uncompressed)

First page of search results collected in January 2021. Download the HTML files search-selenium.tar.xz (238 MB compressed) here.

data/input/selenium-products/ (220 GB uncompressed)

Product pages collected in February 2021. Download the HTML files selenium-products.tar.xz (9 GB compressed) here.

data/input/search-selenium-our-brands-filter_/ (35 GB uncompressed)

Search results filtered by "our brands". Contains every page of search results. Download search-selenium-our-brands-filter_.tar.xz (403 MB compressed) here.

data/input/search-private-label/ (25 GB uncompressed)

API responses for search results filtered down to products Amazon identifies as "our brands". Contains paginated API results. Download the JSON files search-private-label.tar.xz (402 MB uncompressed) here.

data/input/seller_central/ (105 MB)

Seller central data for Q4 2020. Download the CSV file All_Q4_2020.csv.xz (105 MB compressioned) here.

data/input/best_sellers/ (4 GB)

Amazon's best sellers under the category "Amazon Devices & Accessories". Download the HTML files best_sellers.tar.xz (60MB compressed) here.

data/input/spotcheck/ (4 GB)

A sub-sample of product pages for spot-checking Buy Box changes. Download the HTML files spotcheck.tar.xz (159 MB compressed) here.

Ts-matterbridge - Integrate TeamSpeak Chat with MatterBridge

TeamSpeak-MatterBridge Bot You can use this bot to integrate TeamSpeak Chat with

4 Sep 25, 2022
Library to manage your own custom RPC on your desktop

Info I don't recommend novices setting this up yourself. It requires Redis, a server to host the API on, and a bit of understanding of Windows & Pytho

Isaac K 1 Apr 16, 2022
Select random winners for a Twitter giveaway

twitter_picker Select random winners for a Twitter giveaway Once the Twitter giveaway (or airdrop) is closed, assign a number to each participant. The

Michael Rawner 1 Dec 11, 2021
multi-purpose discord bot

virus multi-purpose discord bot ⚠️ WARNING This project is incomplete and may not work as expected. Download & Run Install Python =3.10 Clone the sou

miten 2 Jan 17, 2022
Python client for Toyota North America service API

toyota-na Python client for Toyota North America service API Install pip install toyota-na[qt] [qt] is required for generating authorization code. Us

Gavin Ni 18 Sep 06, 2022
AWS SQS event redrive Lambda

This repository contains the Lambda function to redrive sqs events from source to destination queue while controlling maxRetry per event.

1 Oct 19, 2021
Definitive Guide to Creating a SQL Database on Cloud with AWS and Python

Definitive Guide to Creating a SQL Database on Cloud with AWS and Python An easy-to-follow comprehensive guide on integrating Amazon RDS, MySQL Workbe

Kenneth Leung 6 Aug 17, 2022
Send OpenWeatherMap alerts (One Call API) to telegram users.

OpenWeatherMap Telegram Alert Send OpenWeatherMap alerts (One Call API) to telegram users. Installation Requirements: $ apt install python3-yaml pytho

Michael Hacker 1 Jun 04, 2022
An advanced QR Code telegram bot with more features.

QR Code Bot A telegram qr code encode and decode bot Advanced Features 1. Database ( MongoDB ) Support 2. Broadcast Support 3. Status Command 4. Setti

Fayas Noushad 16 Nov 12, 2022
This Telegram bot is created to help monitor individual mood. Lean and mean

Mood bot This bot is created to help monitor your mood. Lean and mean. Deployment Install Docker and Docker Compose Populate .env file cp .env.dist .e

Piotr Markielau 1 Dec 05, 2021
A python discord client interaction emulator for the DC29 badge code channel

dc29-discord-signalbot A python discord client interaction emulator for the DC29 badge code channel Prep Open Developer mode Open the developer mode f

8 Aug 23, 2021
ServiceX DID Finder Girder

ServiceX_DID_Finder_Girder Access datasets for ServiceX from yt Hub Finding datasets This DID finder is designed to take a collection id (https://gird

1 Dec 07, 2021
SongFinder Bot helps you to find song name by recognising via voice note or instagram reels shared link.

SongFinder V1.1 SongFinder to detect songs name by just sending voice note or instagram reels links to your telegram bot. FFMPEG must be installed on

Abhishek Pathak 4 Dec 30, 2022
This automation protect against subdomain takeover on AWS env which also send alerts on slack.

AWS_Subdomain_Takeover_Detector Purpose The purpose of this automation is to detect misconfigured Route53 entries which are vulnerable to subdomain ta

Puneet Kumar Maurya 8 May 18, 2022
Written in Python, freezed into stand-alone executable with PyInstaller. This app will make sure you stay in New World without getting kicked for inactivity.

New World - AFK Written in Python, freezed into stand-alone executable with PyInstaller. This app will make sure you stay in New World without getting

Rodney 5 Oct 31, 2021
A bot that can play songs in Telegram group voice chats like AK 47

🎧 47Music Player 🎧 A bot that can play songs in Telegram group voice chats like AK 47 ✨ Easy To Deploy Pyrogram Session Config Vars API_ID : Assista

Janindu Malshan 23 Dec 07, 2022
The wrapper you need for the osu!api v2

oppy (op.py) oppy is the wrapper for use on the osu! v2 API. Version 1.0.0 Installation To install please use pip to install oppy pip install op.py To

Wayde 2 May 01, 2022
Recommendation systems are among most widely preffered marketing strategies.

Recommendation systems are among most widely preffered marketing strategies. Their popularity comes from close prediction scores obtained from relationships of users and items. In this project, two r

Sübeyte 8 Oct 06, 2021
TORNADO CASH Pancakeswap Sniper BOT 2022-V1 (MAC WINDOWS ANDROID LINUX)

TORNADO CASH Pancakeswap Sniper BOT 2022-V1 (MAC WINDOWS ANDROID LINUX)

Crypto Trader 1 Jan 06, 2022
A Telegram bot to extracting text from images. All languages supported.

OCR Bot A Telegram bot to extracting text from images. All languages supported. Deploy to Heroku Local Deploying Clone the repo git clone https://gith

6 Oct 21, 2022