Scraping and analysis of leetcode-compensations page.

Overview

Leetcode compensations report

Scraping and analysis of leetcode-compensations page.

Salary Distribution Salary

Report

INDIA : 5th Jan 2019 - 5th Aug 2021 / fixed salary

INDIA : 5th Jan 2019 - 5th Aug 2021 / fixed salary, dark mode

INDIA : 5th Jan 2019 - 5th Aug 2021 / total salary

INDIA : 5th Jan 2019 - 5th Aug 2021 / total salary, dark mode

Directory structure

  • data
    • imgs - images for reports
    • logs - scraping logs
    • mappings - standardized company, location and title mappings as well as unmapped entities
    • meta - meta information for the posts like post_id, date, title, href.
    • out - data from info.all_info.get_clean_records_for_india()
    • posts - text from the post
    • reports - salary analysis by companies, titles and experience
  • info - functions to posts data(along with the standardized entities) in a tabular format
  • leetcode - scraper
  • utils - constants and helper methods

Setup

  1. Clone the repo.
  2. Put the chromedriver in the utils directory.
  3. Setup virual enviroment python -m venv leetcode.
  4. Install necessary packages pip install -r requirements.txt.
  5. To create the reports npm install vega-lite vega-cli canvas(needed to save altair plots).

Scraping

$ export PTYHONPATH=<project_directory>
$ python leetcode/posts_meta.py --till_date 2021/08/03

# sample output
2021-08-03 19:36:07.474 | INFO     | __main__:<module>:48 - page no: 1 | # posts: 15
$ python leetcode/posts.py

# sample output
2021-08-03 19:36:25.997 | INFO     | __main__:<module>:45 - post_id: 1380805 done!
2021-08-03 19:36:28.995 | INFO     | __main__:<module>:45 - post_id: 1380646 done!
2021-08-03 19:36:31.631 | INFO     | __main__:<module>:45 - post_id: 1380542 done!
2021-08-03 19:36:34.727 | INFO     | __main__:<module>:45 - post_id: 1380068 done!
2021-08-03 19:36:37.280 | INFO     | __main__:<module>:45 - post_id: 1379990 done!
2021-08-03 19:36:40.509 | INFO     | __main__:<module>:45 - post_id: 1379903 done!
2021-08-03 19:36:41.096 | WARNING  | __main__:<module>:34 - sleeping extra for post_id: 1379487
2021-08-03 19:36:44.530 | INFO     | __main__:<module>:45 - post_id: 1379487 done!
2021-08-03 19:36:47.115 | INFO     | __main__:<module>:45 - post_id: 1379208 done!
2021-08-03 19:36:49.660 | INFO     | __main__:<module>:45 - post_id: 1378689 done!
2021-08-03 19:36:50.470 | WARNING  | __main__:<module>:34 - sleeping extra for post_id: 1378620
2021-08-03 19:36:53.866 | INFO     | __main__:<module>:45 - post_id: 1378620 done!
2021-08-03 19:36:57.203 | INFO     | __main__:<module>:45 - post_id: 1378334 done!
2021-08-03 19:37:00.570 | INFO     | __main__:<module>:45 - post_id: 1378288 done!
2021-08-03 19:37:03.226 | INFO     | __main__:<module>:45 - post_id: 1378181 done!
2021-08-03 19:37:05.895 | INFO     | __main__:<module>:45 - post_id: 1378113 done!

Report DataFrame

$ ipython

In [1]: from info.all_info import get_clean_records_for_india                                                               
In [2]: df = get_clean_records_for_india()                                                                                  
2021-08-04 15:47:11.615 | INFO     | info.all_info:get_raw_records:95 - n records: 4134
2021-08-04 15:47:11.616 | WARNING  | info.all_info:get_raw_records:97 - missing post_ids: ['1347044', '1193859', '1208031', '1352074', '1308645', '1206533', '1309603', '1308672', '1271172', '214751', '1317751', '1342147', '1308728', '1138584']
2021-08-04 15:47:11.696 | WARNING  | info.all_info:_save_unmapped_labels:54 - 35 unmapped company saved
2021-08-04 15:47:11.705 | WARNING  | info.all_info:_save_unmapped_labels:54 - 353 unmapped title saved
2021-08-04 15:47:11.708 | WARNING  | info.all_info:get_clean_records_for_india:122 - 1779 rows dropped(location!=india)
2021-08-04 15:47:11.709 | WARNING  | info.all_info:get_clean_records_for_india:128 - 385 rows dropped(incomplete info)
2021-08-04 15:47:11.710 | WARNING  | info.all_info:get_clean_records_for_india:134 - 7 rows dropped(internships)
In [3]: df.shape                                                                                                            
Out[3]: (1963, 14)

Report

$ python reports/plots.py # generate fixed comp. plots
$ python reports/report.py # fixed comp.
$ python reports/report_dark.py # fixed comp., dark mode

$ python reports/plots_tc.py # generate total comp. plots
$ python reports/report_tc.py # total comp.
$ python reports/report_dark.py # total comp., dark mode

Samples

title : Flipkart | Software Development Engineer-1 | Bangalore
url : https://leetcode.com/discuss/compensation/834212/Flipkart-or-Software-Development-Engineer-1-or-Bangalore
company : flipkart
title : sde 1
yoe : 0.0 years
salary : ₹ 1800000.0
location : bangalore
post Education: B.Tech from NIT (2021 passout) Years of Experience: 0 Prior Experience: Fresher Date of the Offer: Aug 2020 Company: Flipkart Title/Level: Software Development Engineer-1 Location: Bangalore Salary: INR 18,00,000 Performance Incentive: INR 1,80,000 (10% of base pay) ESOPs: 48 units => INR 5,07,734 (vested over 4 years. 25% each year) Relocation Reimbursement: INR 40,000 Telephone Reimbursement: INR 12,000 Home Broadband Reimbursement: INR 12,000 Gratuity: INR 38,961 Insurance: INR 27,000 Other Benefits: INR 40,000 (15 days accomodation + travel) (this is different from the relocation reimbursement) Total comp (Salary + Bonus + Stock): Total CTC: INR 26,57,695; First year: INR 22,76,895 Other details: Standard Offer for On-Campus Hire Allowed Branches: B.Tech CSE/IT (6.0 CGPA & above) Process consisted of Coding test & 3 rounds of interviews. I don't remember questions exactly. But they vary from topics such as Graph(Topological Sort, Bi-Partite Graph), Trie based questions, DP based questions both recursive and dp approach, trees, Backtracking.

title : Cloudera | SSE | Bangalore | 2019
url : https://leetcode.com/discuss/compensation/388432/Cloudera-or-SSE-or-Bangalore-or-2019
company : cloudera
title : sde 2
yoe : 2.5 years
salary : ₹ 2800000.0
location : bangalore
post Education: MTech from Tier 1 College Years of Experience: 2.5 Prior Experience: SDE at Flipkart Date of the Offer: Sept 10, 2019 Company: Cloudera Title/Level: Senior Software Engineer (SSE) Location: Bangalore, India Salary: Rs 28,00,000 Bonus: Rs 2,80,000 (10 % of base) PF & Gratuity: Rs 1,88,272 Stock bonus: 5000 units over 4 years ($9 per unit) Other Benefits: Rs 4,00,000 (Health, Term Life and Personal Accident Insurance, Annual Medical Health Checkup, Transportation, Education Reimbursement) Total comp (Salary + Bonus + Stock): Rs 4070572

title : Amadeus Labs | MTS | Bengaluru
url : https://leetcode.com/discuss/compensation/1109046/Amadeus-Labs-or-MTS-or-Bengaluru
company : amadeus labs
title : mts 1
yoe : 7.0 years
salary : ₹ 1700000.0
location : bangalore
post Education: B.Tech. in ECE Years of Experience: 7 Prior Experience: Worked at few MNCs Date of the Offer: Jan 2021 Company: Amadeus Labs Title/Level: Member of Technical Staff Location: Bengaluru, India Salary: ₹ 1,700,000 Signing Bonus: ₹ 50,000 Stock bonus: None Bonus: 137,000 Total comp (Salary + Bonus + Stock): ~₹1,887,000 Benefits: Employee and family Insurance

Owner
utsav
Lead MLE @ freshworks
utsav
Data and code accompanying the paper Politics and Virality in the Time of Twitter

Politics and Virality in the Time of Twitter Data and code accompanying the paper Politics and Virality in the Time of Twitter. In specific: the code

Cardiff NLP 3 Jul 02, 2022
A Python adaption of Augur to prioritize cell types in perturbation analysis.

A Python adaption of Augur to prioritize cell types in perturbation analysis.

Theis Lab 2 Mar 29, 2022
BioMASS - A Python Framework for Modeling and Analysis of Signaling Systems

Mathematical modeling is a powerful method for the analysis of complex biological systems. Although there are many researches devoted on produ

BioMASS 22 Dec 27, 2022
A probabilistic programming language in TensorFlow. Deep generative models, variational inference.

Edward is a Python library for probabilistic modeling, inference, and criticism. It is a testbed for fast experimentation and research with probabilis

Blei Lab 4.7k Jan 09, 2023
Using Python to derive insights on particular Pokemon, Types, Generations, and Stats

Pokémon Analysis Andreas Nikolaidis February 2022 Introduction Exploratory Analysis Correlations & Descriptive Statistics Principal Component Analysis

Andreas 1 Feb 18, 2022
Handle, manipulate, and convert data with units in Python

unyt A package for handling numpy arrays with units. Often writing code that deals with data that has units can be confusing. A function might return

The yt project 304 Jan 02, 2023
Using Python to scrape some basic player information from www.premierleague.com and then use Pandas to analyse said data.

PremiershipPlayerAnalysis Using Python to scrape some basic player information from www.premierleague.com and then use Pandas to analyse said data. No

5 Sep 06, 2021
Two phase pipeline + StreamlitTwo phase pipeline + Streamlit

Two phase pipeline + Streamlit This is an example project that demonstrates how to create a pipeline that consists of two phases of execution. In betw

Rick Lamers 1 Nov 17, 2021
Exploratory Data Analysis of the 2019 Indian General Elections using a dataset from Kaggle.

2019-indian-election-eda Exploratory Data Analysis of the 2019 Indian General Elections using a dataset from Kaggle. This project is a part of the Cou

Souradeep Banerjee 5 Oct 10, 2022
Pyspark project that able to do joins on the spark data frames.

SPARK JOINS This project is to perform inner, all outer joins and semi joins. create_df.py: load_data.py : helps to put data into Spark data frames. d

Joshua 1 Dec 14, 2021
pyETT: Python library for Eleven VR Table Tennis data

pyETT: Python library for Eleven VR Table Tennis data Documentation Documentation for pyETT is located at https://pyett.readthedocs.io/. Installation

Tharsis Souza 5 Nov 19, 2022
Collections of pydantic models

pydantic-collections The pydantic-collections package provides BaseCollectionModel class that allows you to manipulate collections of pydantic models

Roman Snegirev 20 Dec 26, 2022
A Python 3 library making time series data mining tasks, utilizing matrix profile algorithms

MatrixProfile MatrixProfile is a Python 3 library, brought to you by the Matrix Profile Foundation, for mining time series data. The Matrix Profile is

Matrix Profile Foundation 302 Dec 29, 2022
Example Of Splunk Search Query With Python And Splunk Python SDK

SSQAuto (Splunk Search Query Automation) Example Of Splunk Search Query With Python And Splunk Python SDK installation: ➜ ~ git clone https://github.c

AmirHoseinTangsiriNET 1 Nov 14, 2021
Code for the DH project "Dhimmis & Muslims – Analysing Multireligious Spaces in the Medieval Muslim World"

Damast This repository contains code developed for the digital humanities project "Dhimmis & Muslims – Analysing Multireligious Spaces in the Medieval

University of Stuttgart Visualization Research Center 2 Jul 01, 2022
Processo de ETL (extração, transformação, carregamento) realizado pela equipe no projeto final do curso da Soul Code Academy.

Processo de ETL (extração, transformação, carregamento) realizado pela equipe no projeto final do curso da Soul Code Academy.

Débora Mendes de Azevedo 1 Feb 03, 2022
Yet Another Workflow Parser for SecurityHub

YAWPS Yet Another Workflow Parser for SecurityHub "Screaming pepper" by Rum Bucolic Ape is licensed with CC BY-ND 2.0. To view a copy of this license,

myoung34 8 Dec 22, 2022
ASOUL直播间弹幕抓取&&数据分析

ASOUL直播间弹幕抓取&&数据分析(更新中) 这些文件用于爬取ASOUL直播间的弹幕(其他直播间也可以)和其他信息,以及简单的数据分析生成。

159 Dec 10, 2022
Hidden Markov Models in Python, with scikit-learn like API

hmmlearn hmmlearn is a set of algorithms for unsupervised learning and inference of Hidden Markov Models. For supervised learning learning of HMMs and

2.7k Jan 03, 2023
MeSH2Matrix - A set of Python codes for the generation of biomedical ontologies from the MeSH keywords of the PubMed scholarly publications

A set of Python codes for the generation of biomedical ontologies from the MeSH keywords of the PubMed scholarly publications

SisonkeBiotik 6 Nov 30, 2022