This is python to scrape overview and reviews of companies from Glassdoor.

Last update: Jun 23, 2022

Related tags

Overview

Data Scraping for Glassdoor

This is python to scrape overview and reviews of companies from Glassdoor. Please use it carefully and follow the Terms of Service that explicitly prohibits web scraping.

Built With

Python
ChromeDriver

(back to top)

Getting Started

Download the SeleniumGlassdor.py file. Change the path of the chromedriver on your machine. Use your own file that contain the lists of the companies glassdoor url. The company url csv file is also attached here. The way to generate the file is also based on selenium, searching the 'glassdoor' + company name in google search engine, and extract the url from the first results. Per requests, I can also upload the file accordingly.

Prerequisites

Install the selenium before using it.

selenium
```
pip install selenium
```

For the other sections

If you want to scape data from the other sections, such as jobs, salaries. You can use the following methods to first extract the url and then use the similar method to downlode the sections.

reviewsUrl = browser.find_element_by_xpath("//a[@data-label='Reviews']").get_attribute('href')
jobsUrl = browser.find_element_by_xpath("//a[@data-label='Jobs']").get_attribute('href')
salariesUrl = browser.find_element_by_xpath("//a[@data-label='Salaries']").get_attribute('href')
interviewsUrl = browser.find_element_by_xpath("//a[@data-label='Interviews']").get_attribute('href')
benefitsUrl = browser.find_element_by_xpath("//a[@data-label='Benefits']").get_attribute('href')
photosUrl = browser.find_element_by_xpath("//a[@data-label='Photos']").get_attribute('href')

Contributing

Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!

Fork the Project
Create your Feature Branch (git checkout -b feature/AmazingFeature)
Commit your Changes (git commit -m 'Add some AmazingFeature')
Push to the Branch (git push origin feature/AmazingFeature)
Open a Pull Request

(back to top)

License

Distributed under the MIT License. See LICENSE.txt for more information.

(back to top)

Contact

Houping - [email protected]

(back to top)

This is python to scrape overview and reviews of companies from Glassdoor.

Related tags

Overview

Data Scraping for Glassdoor

Built With

Getting Started

Prerequisites

For the other sections

Contributing

License

Contact

Owner

Houping

An helper library to scrape data from Instagram effortlessly, using the Influencer Hunters APIs.

Fundamentus scrapy

Extract embedded metadata from HTML markup

A list of Python Bots used to extract data from several websites

Complete pipeline for crawling online newspaper article.

A multithreaded tool for searching and downloading images from popular search engines. It is straightforward to set up and run!

爬虫案例合集。包括但不限于《淘宝、京东、天猫、豆瓣、抖音、快手、微博、微信、阿里、头条、pdd、优酷、爱奇艺、携程、12306、58、搜狐、百度指数、维普万方、Zlibraty、Oalib、小说、招标网、采购网、小红书》

京东茅台抢购最新优化版本，京东秒杀，添加误差时间调整，优化了茅台抢购进程队列

Simple python tool for the purpose of swapping latinic letters with cirilic ones and vice versa in txt, docx and pdf files in Serbian language

Python based Web Scraper which can discover javascript files and parse them for juicy information (API keys, IP's, Hidden Paths etc)

A simple python script to fetch the latest covid info

对于有验证码的站点爆破，用于安全合法测试

A high-level distributed crawling framework.

This script is intended to crawl license information of repositories through the GitHub API.

DaProfiler allows you to get emails, social medias, adresses, works and more on your target using web scraping and google dorking techniques

Web-Scraping using Selenium Master

Simple tool to scrape and download cross country ski timings and results from live.skidor.com

Crawler do site Fundamentus.com com o uso do framework scrapy, tanto da aba detalhada como a de resumo.

Poolbooru gelscraper - a simple python script for scraping images off gelbooru pools.

Screenhook is a script that captures an image of a web page and send it to a discord webhook.