Make Selenium work on Github Actions

Overview

Make Selenium work on Github Actions

Scraping with BeautifulSoup on GitHub Actions is easy-peasy. But what about Selenium?? After you jump through some hoops it's just as simple.

How to use this

I think you can just fork this repository and get to work on scraper.py.

Otherwise, just steal scraper.py and .github/workflows/scrape.yml and you'll be good to go.

Saving CSV files

If you want every scrape to update a CSV or something like this, you'll need to edit your scrape.yml to commit to the repo after the scraper is done running.

Take a look at the bottom of autoscraper-history to see how to do that. It should be one more step, something like this:

      - name: Commit and push if content changed
        run: |-
          git config user.name "Automated"
          git config user.email "[email protected]"
          git add -A
          timestamp=$(date -u)
          git commit -m "Latest data: ${timestamp}" || exit 0
          git push

I'm pretty sure I lifted that code 100% from Simmon Willison.

Scheduling

Right now the yaml is set to only scrape when you click the "Run workflow" button in the Actions tab, but you can add something like

  schedule:
    - cron: '0 * * * *'

To make it run the first minute of every hour.

How this all works

To make Selenium + GitHub Actions work, this repo does a few magic things. In a normal world, you start Chrome like this:

from selenium import webdriver

driver = webdriver.Chrome()

But we.... do things a little differently. Let me walk you through the changes!

webdriver-manager to manage the webdriver

You can add a special action to set up Chromedriver but I feel it's honestly easier to use webdriver-manager. It magically picks out the right version of chromedriver for you.

from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager

driver = webdriver.Chrome(ChromeDriverManager().install())

It works on your local machine, too!

Chromium, not Chrome

When you're running GitHub Actions, it's probably on a nice little Ubuntu Linux machine. In those situations, you install software using apt. Since you can't install Chrome with apt, you'll install Chromium instead, the open-source version of Chrome. Works the same, just opens a little differently.

As a result, our chromedriver install code changed a little more:

from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
from webdriver_manager.utils import ChromeType

driver_path = ChromeDriverManager(chrome_type=ChromeType.CHROMIUM).install()
driver = webdriver.Chrome(driver_path)

This also means we need to add a line that does apt-get install -y chromium-browser to our scrape.yml)

Headless Chromium

Since we don't have a monitor plugged into GitHub Actions, we can't actually see what's going on in the browser. In the olden days you had to construct some odd technical fake screen, but these days you just run in headless mode!

from selenium import webdriver 
from selenium.webdriver.chrome.options import Options

chrome_options = Options()
chrome_options.add_argument("--headless")
driver = webdriver.Chrome(options=chrome_options)

Other options and more management of things

To combine the webdriver-manager driver path and the headless Chrome option, there are a lot of hoops to jump through. During the research process a lot of extra chrome options poked up, so I thought hey, let's just add all those, too.

from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
from webdriver_manager.utils import ChromeType
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service

chrome_service = Service(ChromeDriverManager(chrome_type=ChromeType.CHROMIUM).install())

chrome_options = Options()
options = [
    "--headless",
    "--disable-gpu",
    "--window-size=1920,1200",
    "--ignore-certificate-errors",
    "--disable-extensions",
    "--no-sandbox",
    "--disable-dev-shm-usage"
]
for option in options:
    chrome_options.add_argument(option)

driver = webdriver.Chrome(service=chrome_service, options=chrome_options)

The biggest change here beyond all those options is the Service thing. Apparently just giving it the path to chromdriver isn't good enough? Who knows, I just do what works.

Owner
Jonathan Soma
baby data journo wrangler @ledeprogram + @littlecolumns, cat wrangler @cat-republic
Jonathan Soma
Headless chrome/chromium automation library (unofficial port of puppeteer)

Pyppeteer Pyppeteer has moved to pyppeteer/pyppeteer Unofficial Python port of puppeteer JavaScript (headless) chrome/chromium browser automation libr

miyakogi 3.5k Dec 30, 2022
Python Rest Testing

pyresttest Table of Contents What Is It? Status Installation Sample Test Examples Installation How Do I Use It? Running A Simple Test Using JSON Valid

Sam Van Oort 1.1k Dec 28, 2022
Tattoo - System for automating the Gentoo arch testing process

Naming origin Well, naming things is very hard. Thankfully we have an excellent

Arthur Zamarin 4 Nov 07, 2022
Automating the process of sorting files in my downloads folder by file type.

downloads-folder-automation Automating the process of sorting files in a user's downloads folder on Windows by file type. This script iterates through

Eric Mahasi 27 Jan 07, 2023
MultiPy lets you conveniently keep track of your python scripts for personal use or showcase by loading and grouping them into categories. It allows you to either run each script individually or together with just one click.

MultiPy About MultiPy is a graphical user interface built using Dear PyGui Python GUI Framework that lets you conveniently keep track of your python s

56 Oct 29, 2022
To automate the generation and validation tests of COSE/CBOR Codes and it's base45/2D Code representations

To automate the generation and validation tests of COSE/CBOR Codes and it's base45/2D Code representations, a lot of data has to be collected to ensure the variance of the tests. This respository was

160 Jul 25, 2022
A web scraping using Selenium Webdriver

Savee - Images Downloader Project using Selenium Webdriver to download images from someone's profile on https:www.savee.it website. Usage The project

Caio Eduardo Lobo 1 Dec 17, 2021
Load and performance benchmark tool

Yandex Tank Yandextank has been moved to Python 3. Latest stable release for Python 2 here. Yandex.Tank is an extensible open source load testing tool

Yandex 2.2k Jan 03, 2023
API Rest testing FastAPI + SQLAchmey + Docker

Transactions API Rest Implement and design a simple REST API Description We need to a simple API that allow us to register users' transactions and hav

TxeMac 2 Jun 30, 2022
Generate random test credit card numbers for testing, validation and/or verification purposes.

Generate random test credit card numbers for testing, validation and/or verification purposes.

Dark Hunter 141 5 Nov 14, 2022
Pytest-typechecker - Pytest plugin to test how type checkers respond to code

pytest-typechecker this is a plugin for pytest that allows you to create tests t

vivax 2 Aug 20, 2022
UUM Merit Form Filler is a web automation which helps automate entering a matric number to the UUM system in order for participants to obtain a merit

About UUM Merit Form Filler UUM Merit Form Filler is a web automation which helps automate entering a matric number to the UUM system in order for par

Ilham Rachmat 3 May 31, 2022
Test for generating stylized circuit traces from images

I test of an image processing idea to take an image and make neat circuit board art automatically. Inspired by this twitter post by @JackRhysider

Miller Hooks 3 Dec 12, 2022
A complete test automation tool

Golem - Test Automation Golem is a test framework and a complete tool for browser automation. Tests can be written with code in Python, codeless using

486 Dec 30, 2022
The definitive testing tool for Python. Born under the banner of Behavior Driven Development (BDD).

mamba: the definitive test runner for Python mamba is the definitive test runner for Python. Born under the banner of behavior-driven development. Ins

Néstor Salceda 502 Dec 30, 2022
a wrapper around pytest for executing tests to look for test flakiness and runtime regression

bubblewrap a wrapper around pytest for assessing flakiness and runtime regressions a cs implementations practice project How to Run: First, install de

Anna Nagy 1 Aug 05, 2021
A rewrite of Python's builtin doctest module (with pytest plugin integration) but without all the weirdness

The xdoctest package is a re-write of Python's builtin doctest module. It replaces the old regex-based parser with a new abstract-syntax-tree based pa

Jon Crall 174 Dec 16, 2022
Nokia SR OS automation

Nokia SR OS automation Nokia is one of the biggest vendors of the telecommunication equipment, which is very popular in the Service Provider segment.

Karneliuk.com 7 Jul 23, 2022
Selenium Manager

SeleniumManager I'm fed up with always having to struggle unnecessarily when I have to use Selenium on a new machine, so I made this little python mod

Victor Vague 1 Dec 24, 2021
A Proof of concept of a modern python CLI with click, pydantic, rich and anyio

httpcli This project is a proof of concept of a modern python networking cli which can be simple and easy to maintain using some of the best packages

Kevin Tewouda 17 Nov 15, 2022