Skip to content

guardian/google-ad-database-processing-scripts

Repository files navigation

Making Google's political ad transparency library suck less

This is a series of scripts that takes Google's political ad transparency data and makes the ad content searchable as, ironically, the world's most powerful search company does not make their ad data searchable.

It can also takes the ad targeting information and map it to electorates, but this only works for postcodes at the moment so isn't in the main group of scripts yet.

It is aimed at Australian content, but most of the scripts could be applied to all ad content if you'd like to use it elsewhere.

Get the data

The current output is a work-in-progress, but you can find the latest file here as gzipped csv or json

What it does:

  • Gets the text content from text ads
  • Gets the YouTube title for YouTube ads
  • Gets the YouTube transcript for YouTube ads if it is available
  • Gets the image URL for image ads
  • Runs images through OCR and adds text to database

Still to do:

  • Figure out if there's a good way to get text from animated html ads
  • Run the non-YouTube video ads through speech-to-text and put the text in the database
  • Add electorates for Australian ads (this is mostly done but I need to run it over the whole thing)

About

Makes google's political ad database actually useful

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages