Make Selenium work on Github Actions

Overview

Make Selenium work on Github Actions

Scraping with BeautifulSoup on GitHub Actions is easy-peasy. But what about Selenium?? After you jump through some hoops it's just as simple.

How to use this

I think you can just fork this repository and get to work on scraper.py.

Otherwise, just steal scraper.py and .github/workflows/scrape.yml and you'll be good to go.

Saving CSV files

If you want every scrape to update a CSV or something like this, you'll need to edit your scrape.yml to commit to the repo after the scraper is done running.

Take a look at the bottom of autoscraper-history to see how to do that. It should be one more step, something like this:

      - name: Commit and push if content changed
        run: |-
          git config user.name "Automated"
          git config user.email "[email protected]"
          git add -A
          timestamp=$(date -u)
          git commit -m "Latest data: ${timestamp}" || exit 0
          git push

I'm pretty sure I lifted that code 100% from Simmon Willison.

Scheduling

Right now the yaml is set to only scrape when you click the "Run workflow" button in the Actions tab, but you can add something like

  schedule:
    - cron: '0 * * * *'

To make it run the first minute of every hour.

How this all works

To make Selenium + GitHub Actions work, this repo does a few magic things. In a normal world, you start Chrome like this:

from selenium import webdriver

driver = webdriver.Chrome()

But we.... do things a little differently. Let me walk you through the changes!

webdriver-manager to manage the webdriver

You can add a special action to set up Chromedriver but I feel it's honestly easier to use webdriver-manager. It magically picks out the right version of chromedriver for you.

from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager

driver = webdriver.Chrome(ChromeDriverManager().install())

It works on your local machine, too!

Chromium, not Chrome

When you're running GitHub Actions, it's probably on a nice little Ubuntu Linux machine. In those situations, you install software using apt. Since you can't install Chrome with apt, you'll install Chromium instead, the open-source version of Chrome. Works the same, just opens a little differently.

As a result, our chromedriver install code changed a little more:

from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
from webdriver_manager.utils import ChromeType

driver_path = ChromeDriverManager(chrome_type=ChromeType.CHROMIUM).install()
driver = webdriver.Chrome(driver_path)

This also means we need to add a line that does apt-get install -y chromium-browser to our scrape.yml)

Headless Chromium

Since we don't have a monitor plugged into GitHub Actions, we can't actually see what's going on in the browser. In the olden days you had to construct some odd technical fake screen, but these days you just run in headless mode!

from selenium import webdriver 
from selenium.webdriver.chrome.options import Options

chrome_options = Options()
chrome_options.add_argument("--headless")
driver = webdriver.Chrome(options=chrome_options)

Other options and more management of things

To combine the webdriver-manager driver path and the headless Chrome option, there are a lot of hoops to jump through. During the research process a lot of extra chrome options poked up, so I thought hey, let's just add all those, too.

from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
from webdriver_manager.utils import ChromeType
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service

chrome_service = Service(ChromeDriverManager(chrome_type=ChromeType.CHROMIUM).install())

chrome_options = Options()
options = [
    "--headless",
    "--disable-gpu",
    "--window-size=1920,1200",
    "--ignore-certificate-errors",
    "--disable-extensions",
    "--no-sandbox",
    "--disable-dev-shm-usage"
]
for option in options:
    chrome_options.add_argument(option)

driver = webdriver.Chrome(service=chrome_service, options=chrome_options)

The biggest change here beyond all those options is the Service thing. Apparently just giving it the path to chromdriver isn't good enough? Who knows, I just do what works.

Owner
Jonathan Soma
baby data journo wrangler @ledeprogram + @littlecolumns, cat wrangler @cat-republic
Jonathan Soma
Make Selenium work on Github Actions

Make Selenium work on Github Actions Scraping with BeautifulSoup on GitHub Actions is easy-peasy. But what about Selenium?? After you jump through som

Jonathan Soma 33 Dec 27, 2022
Doing dirty (but extremely useful) things with equals.

Doing dirty (but extremely useful) things with equals. Documentation: dirty-equals.helpmanual.io Source Code: github.com/samuelcolvin/dirty-equals dir

Samuel Colvin 602 Jan 05, 2023
A python bot using the Selenium library to auto-buy specified sneakers on the nike.com website.

Sneaker-Bot-UK A python bot using the Selenium library to auto-buy specified sneakers on the nike.com website. This bot is still in development and is

Daniel Hinds 4 Dec 14, 2022
Scraping Bot for the Covid19 vaccination website of the Canton of Zurich, Switzerland.

Hi šŸ‘‹ , I'm David A passionate developer from France. 🌱 I’m currently learning Kotlin, ReactJS and Kubernetes šŸ‘Øā€šŸ’» All of my projects are available

1 Nov 14, 2021
pytest plugin for testing mypy types, stubs, and plugins

pytest plugin for testing mypy types, stubs, and plugins Installation This package is available on PyPI pip install pytest-mypy-plugins and conda-forg

TypedDjango 74 Dec 31, 2022
A single module to link Python ecosystem to the Web

A single module to link Python ecosystem to the Web. Have a quick look at the Gallery first to get convinced ! FAQ For any questions, please use Stack

66 Dec 21, 2022
Sixpack is a language-agnostic a/b-testing framework

Sixpack Sixpack is a framework to enable A/B testing across multiple programming languages. It does this by exposing a simple API for client libraries

1.7k Dec 24, 2022
Django test runner using nose

django-nose django-nose provides all the goodness of nose in your Django tests, like: Testing just your apps by default, not all the standard ones tha

Jazzband 880 Dec 15, 2022
An interactive TLS-capable intercepting HTTP proxy for penetration testers and software developers.

mitmproxy mitmproxy is an interactive, SSL/TLS-capable intercepting proxy with a console interface for HTTP/1, HTTP/2, and WebSockets. mitmdump is the

mitmproxy 29.7k Jan 02, 2023
Connexion-faker - Auto-generate mocks from your Connexion API using OpenAPI

Connexion Faker Get Started Install With poetry: poetry add connexion-faker # a

Erle Carrara 6 Dec 19, 2022
Implement unittest, removing all global variable and returning values

Implement unittest, removing all global variable and returning values

Placide 1 Nov 01, 2021
pytest splinter and selenium integration for anyone interested in browser interaction in tests

Splinter plugin for the pytest runner Install pytest-splinter pip install pytest-splinter Features The plugin provides a set of fixtures to use splin

pytest-dev 238 Nov 14, 2022
A configurable set of panels that display various debug information about the current request/response.

Django Debug Toolbar The Django Debug Toolbar is a configurable set of panels that display various debug information about the current request/respons

Jazzband 7.3k Jan 02, 2023
Coverage plugin for pytest.

Overview docs tests package This plugin produces coverage reports. Compared to just using coverage run this plugin does some extras: Subprocess suppor

pytest-dev 1.4k Dec 29, 2022
The best, free, all in one, multichecking, pentesting utility

The best, free, all in one, multichecking, pentesting utility

Mickey 58 Jan 03, 2023
Fully functioning price detector built with selenium and python

Fully functioning price detector built with selenium and python

mark sikaundi 4 Mar 30, 2022
FFPuppet is a Python module that automates browser process related tasks to aid in fuzzing

FFPuppet FFPuppet is a Python module that automates browser process related tasks to aid in fuzzing. Happy bug hunting! Are you fuzzing the browser? G

Mozilla Fuzzing Security 24 Oct 25, 2022
pytest plugin for distributed testing and loop-on-failures testing modes.

xdist: pytest distributed testing plugin The pytest-xdist plugin extends pytest with some unique test execution modes: test run parallelization: if yo

pytest-dev 1.1k Dec 30, 2022
A Simple Unit Test Matcher Library for Python 3

pychoir - Python Test Matchers for humans Super duper low cognitive overhead matching for Python developers reading or writing tests. Implemented in p

Antti Kajander 15 Sep 14, 2022
Python tools for penetration testing

pyTools_PT python tools for penetration testing Please don't use these tool for illegal purposes. These tools is meant for penetration testing for leg

Gourab 1 Dec 01, 2021