First Party data integration solution built for marketing teams to enable audience and conversion onboarding into Google Marketing products (Google Ads, Campaign Manager, Google Analytics).

Overview

Megalista

Sample integration code for onboarding offline/CRM data from BigQuery as custom audiences or offline conversions in Google Ads, Google Analytics 360, Google Display & Video 360 and Google Campaign Manager.

Disclaimer: This is not an officially supported Google product.

Supported integrations

  • Google Ads

    • Contact Info Customer Match (email, phone, address) [details]
    • Id Based Customer Match (device Id, user id)
    • Offline Conversions through gclid [details]
    • Store Sales Direct (SSD) conversions [details]
  • Google Analytics (Universal analytics)

  • Campaign Manager

    • Offline Conversions API (user id, device id, match id, gclid, dclid) [details]
  • Google Analytics 4

  • Appsflyer

    • S2S Offline events API (conversion upload), to be used for audience creation and in-app events with Google Ads and DV360 [details]

How does it work

Megalista was design to separate the configuration of conversion/audience upload rules from the engine, giving more freedom for non-technical teams (i.e. Media and Business Inteligence) to setup multiple upload rules on their own.

The solution consists in #1 a Google Spreadsheet (template) in which all rules are defined by mapping a data source (BigQuery Table) to a destination (data upload endpoint) and #2, an apache beam workflow running on Google Dataflow, scheduled to upload the data in batch mode.

Prerequisites

Google Cloud Services

  • Google Cloud Platform account
    • Billing enabled
    • BigQuery enabled
    • Dataflow enabled
    • Cloud storage enabled
    • Cloud scheduler enabled
  • At least one of:
    • Google Ads API Access
    • Campaign Manager API Access
    • Google Analytics API Access
  • Python3
  • Google Cloud SDK

Access Requirements

Those are the minimum roles necessary to deploy Megalista:

  • OAuth Config Editor
  • BigQuery User
  • BigQuery Job User
  • BigQuery Data Viewer
  • Cloud Scheduler Admin
  • Storage Admin
  • Dataflow Admin
  • Service Account Admin
  • Logs Viewer
  • Service Consumer

APIs

Required APIs will depend on upload endpoints in use. We recomend you to enable all of them:

  • Google Sheets (required for any use case) [link]
  • Google Analytics [link]
  • Google Analytics Reporting [link]
  • Google Ads [link]
  • Campaign Manager [link]

Installation

Create a copy of the configuration Spreadsheet

WIP

Creating required access tokens

To access campaigns and user lists on Google's platforms, this dataflow will need OAuth tokens for a account that can authenticate in those systems.

In order to create it, follow these steps:

  • Access GCP console
  • Go to the API & Services section on the top-left menu.
  • On the OAuth Consent Screen and configure an Application name
  • Then, go to the Credentials and create an OAuth client Id with Application type set as Desktop App
  • This will generate a Client Id and a Client secret
  • Run the generate_megalist_token.sh script in this folder providing these two values and follow the instructions
    • Sample: ./generate_megalist_token.sh client_id client_secret
  • This will generate the Access Token and the Refresh token

Creating a bucket on Cloud Storage

This bucket will hold the deployed code for this solution. To create it, navigate to the Storage link on the top-left menu on GCP and click on Create bucket. You can use Regional location and Standard data type for this bucket.

Running Megalista

We recommend first running it locally and make sure that everything works. Make some sample tables on BigQuery for one of the uploaders and make sure that the data is getting correctly to the destination. After that is done, upload the Dataflow template to GCP and try running it manually via the UI to make sure it works. Lastly, configure the Cloud Scheduler to run Megalista in the frequency desired and you'll have a fully functional data integration pipeline.

Running locally

python3 megalist_dataflow/main.py \
  --runner DirectRunner \
  --developer_token ${GOOGLE_ADS_DEVELOPER_TOKEN} \
  --setup_sheet_id ${CONFIGURATION_SHEET_ID} \
  --refresh_token ${REFRESH_TOKEN} \
  --access_token ${ACCESS_TOKEN} \
  --client_id ${CLIENT_ID} \
  --client_secret ${CLIENT_SECRET} \
  --project ${GCP_PROJECT_ID} \
  --region us-central1 \
  --temp_location gs://{$GCS_BUCKET}/tmp

Deploying Pipeline

To deploy, use the following command: ./deploy_cloud.sh project_id bucket_name region_name

Manually executing pipeline using Dataflow UI

To execute the pipeline, use the following steps:

  • Go to Dataflow on GCP console
  • Click on Create job from template
  • On the template selection dropdown, select Custom template
  • Find the megalist file on the bucket you've created, on the templates folder
  • Fill in the parameters required and execute

Scheduling pipeline

To schedule daily/hourly runs, go to Cloud Scheduler:

Creating a Service Account

It's recommended to create a new Service Account to be used with the Cloud Scheduler

  • Go to IAM & Admin > Service Accounts
  • Create a new Service Account with the following roles:
    • Cloud Dataflow Service Agent
    • Dataflow Admin
    • Storage Objects Viewer

Usage

Every upload method expects as source a BigQuery data with specific fields, in addition to specific configuration metadata. For details on how to setup your upload routines, refer to the Megalista Wiki or the Megalista user guide.

Comments
  • Add Firestore source

    Add Firestore source

    Hello. I've implemented a Firestore source, which is meant to work as an alternative for Sheets for parametrization purposes.

    • Why Firestore? Some of our clients are unable to access the Spreadsheets domain for security purposes, and Firestore proved to be a great option. It provides reliable and dynamic storage, and is quite simple to use. Also, the expected usage level for Megalist should fall into the free tier.

    Additionally, Firestore has great integration with App Engine. In a future PR, I’d like to add a highly customizable App Engine form integrated with Firestore, which provides an easy to use alternative to Sheets, especially for non-technical users unable to access it.

    • Requirements: For now, Firestore usage requires a GCP project with native Firestore mode.

    • Usage: The default fields for any upload type are: active (yes/no), bq_dataset, bq_table, source and type. Valid upload types and their required fields can be seen in the firestore_execution_source file.

    As with Sheets, account IDs are included separately. In this case, in a Firestore document called account_config, within the same collection. In other words, the hierarchy is: Firestore collection -> document entries for each schedule + account_config document.

    In order to check Firestore, Megalist requires the setup_firestore_collection command line parameter. If setup_sheet_id is provided, Sheets will be used instead.

    • Improvement opportunities:
    1. For now, the Firestore source expects BigQuery parameters, as it is the only ingest option currently available. This should be made flexible in the future, to allow options such as GCS.

    2. The list of parameter metadata was included in firestore_execution_source, and could be modularized in the future.

    • Testing I have only been able to test uploads to Google Ads and Google Analytics so far, as we generally lack access/test data to other platforms. Help with further testing would be greatly appreciated.
    opened by nivaldoh 10
  • Add partial failure support for Google Ads

    Add partial failure support for Google Ads

    Hi, we've added support for partial failures in Google Ads conversion uploads. By default, if a single row contains errors, the entire batch is blocked. This change aims to allow any valid rows to be uploaded, regardless of errors in other rows in the same batch. For more information: https://developers.google.com/adwords/api/docs/guides/partial-failure

    Topics for future consideration:

    1. We could make this optional, if needed.
    2. Megalist currently shows only the amount of rows that reached the uploader. We intend to contribute again soon with changes that display the number of rows that were, in fact, accepted by the API, as well as logs that register invalid rows individually and the reason for their rejections.
    opened by nivaldoh 10
  • Not being to upload Custom Variables as a part of CM offline conversion data

    Not being to upload Custom Variables as a part of CM offline conversion data

    Hi There,

    We are running Megalista implementation for a client for quite a long time. It's been working wonderfully so far. However, in one of the scenarios, where we use the CM offline conversion upload functionality, it uploads all the data except Custom Variables. Below are the steps that we have performed and confirmed at our end.

    1. All these custom variables are available and enabled on the floodlight level.
    2. We also used regular API calls to push these same variables into the CM platform and that worked fine. So the problem is not with the data or setup on the CM part.

    We value your time and effort. However, it would be of huge help if it would be possible to look at this from Megalista's end. Any help or guidance would be highly appreciated, as we are not sure how to proceed beyond this point.

    Additionally, if any input is needed from our end, we would be happy to contribute.

    Thanks & Regards, Sandhya

    opened by Sandhya-1988 7
  • Fixed error when saving the uploaded data to a BigQuery table

    Fixed error when saving the uploaded data to a BigQuery table

    The process fails to create a BigQuery table with the uploaded data:

    table_name = self._bq_ops_dataset.get() + '.' + execution.source.source_metadata[1] + "_uploaded"
    TypeError: unsupported operand type(s) for +: 'NoneType' and 'str' 
    

    The approach as to change it to execution.source.source_metadata[0] instead of self._bq_ops_dataset.get(), which is returning None instead of the Dataset name.

    opened by gabrielmpaula 7
  • Add dclid implementation to the CM connector and fixed a bug in the customVariables column

    Add dclid implementation to the CM connector and fixed a bug in the customVariables column

    Add dclid implementation to the CM connector and fixed a bug where the customVariables (type/value) info was not retrieved from the table since it was not matching the regex for the filtered columns. Migrate CM API version to 4 since it will be deprecated in Feb 2023. Updated google-api-python-client to the latest version to support the CM API version 4. All tests passed. Tested CM connector and Customer Match connector.

    opened by anaesqueda 3
  • Trouble uploading audience to GoogleAds

    Trouble uploading audience to GoogleAds

    I'm trying to send audience lists to Google Ads, but having the following error in all forms of Customer Match:

    ERROR:megalista.GoogleAdsCustomerMatchAbstractUploader:'int' object has no attribute 'name' Traceback (most recent call last): File "/home/bianca_santos/google-marketing-data-sync/megalista-v2/megalista/megalista_dataflow/uploaders/utils.py", line 72, in inner return func(*args, **kwargs) File "/home/bianca_santos/google-marketing-data-sync/megalista-v2/megalista/megalista_dataflow/uploaders/google_ads/customer_match/abstract_uploader.py", line 209, in process execution.destination.destination_metadata)) File "/home/bianca_santos/google-marketing-data-sync/megalista-v2/megalista/megalista_dataflow/uploaders/google_ads/customer_match/abstract_uploader.py", line 60, in _create_list_if_it_does_not_exist customer_id, list_name, list_definition) File "/home/bianca_santos/google-marketing-data-sync/megalista-v2/megalista/megalista_dataflow/uploaders/google_ads/customer_match/abstract_uploader.py", line 70, in _do_create_list_if_it_does_not_exist resource_name = self._get_user_list_resource_name(customer_id, list_name) File "/home/bianca_santos/google-marketing-data-sync/megalista-v2/megalista/megalista_dataflow/uploaders/google_ads/customer_match/abstract_uploader.py", line 109, in _get_user_list_resource_name query_aux = f"AND user_list.access_reason={ads_client.enums.AccessReasonEnum.OWNED.name}" AttributeError: 'int' object has no attribute 'name' ERROR:megalista.GoogleAdsCustomerMatchAbstractUploader:Error uploading data. Traceback (most recent call last): File "/home/bianca_santos/google-marketing-data-sync/megalista-v2/megalista/megalista_dataflow/uploaders/utils.py", line 72, in inner return func(*args, **kwargs) File "/home/bianca_santos/google-marketing-data-sync/megalista-v2/megalista/megalista_dataflow/uploaders/google_ads/customer_match/abstract_uploader.py", line 209, in process execution.destination.destination_metadata)) File "/home/bianca_santos/google-marketing-data-sync/megalista-v2/megalista/megalista_dataflow/uploaders/google_ads/customer_match/abstract_uploader.py", line 60, in _create_list_if_it_does_not_exist customer_id, list_name, list_definition) File "/home/bianca_santos/google-marketing-data-sync/megalista-v2/megalista/megalista_dataflow/uploaders/google_ads/customer_match/abstract_uploader.py", line 70, in _do_create_list_if_it_does_not_exist resource_name = self._get_user_list_resource_name(customer_id, list_name) File "/home/bianca_santos/google-marketing-data-sync/megalista-v2/megalista/megalista_dataflow/uploaders/google_ads/customer_match/abstract_uploader.py", line 109, in _get_user_list_resource_name query_aux = f"AND user_list.access_reason={ads_client.enums.AccessReasonEnum.OWNED.name}" AttributeError: 'int' object has no attribute 'name' INFO:megalista.GoogleAdsOfflineUploader:Uploading 1000 rows... INFO:megalista.GoogleAdsOfflineConversionsUploader:Uploading 1000 offline conversions on customers/5852184472/conversionActions/792598949 to Google Ads. ERROR:megalista.GoogleAdsOfflineConversionsUploader:Error on uploading offline conversions: Multiple errors in ‘details’. First error: The click or call is owned by a customer account that the uploading customer does not manage., at conversions[0].gclid. INFO:megalista:Completed successfully!

    The account is a MCC Google Ads Account.

    opened by BiancaK3 2
  • [Docs] Possible outdated documentation

    [Docs] Possible outdated documentation

    We have identified 1 possible instance of outdated documentation:

    About

    This is part of a research project that aims to automatically detect outdated documentation in GitHub repositories. We are evaluating the validity of our approach by identifying instances of outdated documentation in real-world projects.

    We hope that this research will be a step towards keeping documentation up-to-date. If this has been helpful, consider updating the documentation to keep it in sync with the source code. If this has not been helpful, consider updating this issue with an explanation, so that we can improve our approach. Thanks!

    opened by wesleytanws 2
  • Customer Match Upload with login_customer_id

    Customer Match Upload with login_customer_id

    Change to take the AccountConfig Customer Id to be used as the login_customer_id for the gAds API Requests. It takes each Audience's Metadata 5 (Account) if exists for the customer_id value in requests. If this Metadata does not exist, it takes also the AccountConfig Customer Id for the customer_id

    This allows to upload audiences to non-MCC accounts where the manager account needs to be passed in the login_customer_id in the new gAds API

    opened by alvarolamas10 2
  • Recode change and reformatted

    Recode change and reformatted

    recode change and reformatted code contain:

    • [x] reformatted code to easily readable code maintenance
    • [x] change %s to fstring for more readable and less error
    • [x] passed local test flake8
    opened by slowy07 2
  • Documentation mismatch in Google Ads Customer Match Device ID schema

    Documentation mismatch in Google Ads Customer Match Device ID schema

    According to the documentation, the expected schema for Google Ads Customer Match Device ID table is | Column name | Type | Description | Requirement | | :---: | :---: | :---: | :---: | | mobile_device_id | STRING | Mobile device Id identifier (android AdId or IOS IDFA) | required |

    But in the source code, the field is mobileId.

    def get_row_keys(self) -> List[str]:
        return ['mobileId']
    
    opened by xfer 2
  • Question: project name is megalist or megalista?

    Question: project name is megalist or megalista?

    I have a doubt, @astivi and @caiotomazelli

    The name of the repository and documentation is Megalista But the folder structure uses the name megalist and some parameters as well.

    We understand that the name of the solution is Megalista, and all coding must use the megalist nomenclature. Is correct?

    opened by joaquimsn 2
  • Allow auth via manager access

    Allow auth via manager access

    I noticed that the README states:

    Calls to the Google Ads API will fail if the user that generated the OAuth2 credentials (Access Token and Refresh Token) doesn't have direct access to the Google Ads account to which the calls are being directed. It's not enough for the user to have access to a MCC above this account and being able to access the account through the interface, it's required that the user has permissions on the account itself.

    However, the Google Ads API client library for Java supports auth via manager access by specifying login-customer-id as described in https://developers.google.com/google-ads/api/docs/client-libs/java/config-file and https://developers.google.com/google-ads/api/docs/concepts/call-structure#cid.

    opened by jradcliff 0
  • BigQuery column names schema should be

    BigQuery column names schema should be "dimension" not "cd" for Data Import Destination.

    Hi Guys, I'm using megalista to upload some audiences from bigquery to google analytics data import.

    Right now you're checking the column names in the data source in bigquery with the schema 'cd\d+', but it doesn't work when we upload the data into google analytics, since the data import only accept 'dimension\d+' schema.

    So, my recommendation is to change in the script megalista_dataflow/data_sources/data_source.py, the line:

    'GA_DATA_IMPORT': {
        'columns': [
            {'name': 'cd\\d+', 'required': True, 'data_type': 'string'},
            {'name': 'cd\\d+', 'required': True, 'data_type': 'string'},
            {'name': 'cd\\d+', 'required': False, 'data_type': 'string'},
        ],
        'groups': []
    },
    

    for:

    'GA_DATA_IMPORT': {
        'columns': [
            {'name': 'dimension\\d+', 'required': True, 'data_type': 'string'},
            {'name': 'dimension\\d+', 'required': True, 'data_type': 'string'},
            {'name': 'dimension\\d+', 'required': False, 'data_type': 'string'},
        ],
        'groups': []
    },
    

    Best,

    Gibran

    opened by gibrano 0
  • [GA4 MP] timestamp_micros should be sent inside the events array

    [GA4 MP] timestamp_micros should be sent inside the events array

    CHANGELOG:

    • Alter timestamp_micros location within the measurement protocol payload for GA4 to inside the events array

    OBSERVATIONS

    Looking at the Measurement Protocol Reference, it seems like timestamp_micros should be sent at the top level of the payload. However, empirically, it does not work. I performed experiments both by using Megalista's workflow and by performing individual post requests to the collection endpoint

    Intuitively, it makes sense that it works this way as a same request could send multiple events for a given user with distinct timestamps

    opened by fsalhani 2
Releases(v4.4)
  • v4.4(Apr 12, 2022)

    Add error notification by email capabilities.

    Every error occurred inside uploaders can now be aggregated and sent by email.
    For setup instructions, refer to the "Errors notifications by email" section of the readme file.

    Source code(tar.gz)
    Source code(zip)
  • v4.3(Feb 23, 2022)

    Update the Google Ads API from v8 to v10.

    This update introduced resend controls for Google Ads Offline Conversions given. This change was driven by a new error being thrown starting by the API v9. More information on possible changes necessaries do sources database tables on the Update Instructions page.

    Source code(tar.gz)
    Source code(zip)
  • v4.2(Sep 15, 2021)

    Changes execution of Google Ads API Customer Match uploader to create a single job and append operations to that single job. Also splits user identifiers by operation.

    Source code(tar.gz)
    Source code(zip)
  • v4.1(Jun 22, 2021)

  • v4.0(May 26, 2021)

Owner
Google
Google ❤️ Open Source
Google
Buy early bsc gems with custom gas fee, slippage, amount. Auto approve token after buy. Sell buyed token with custom gas fee, slippage, amount. And more.

Pancakeswap Sniper bot Full version of Pancakeswap sniping bot used to snipe during fair coin launches. With advanced options and a graphical user int

Jesus Crypto 204 Apr 27, 2022
=>|<= the MsgRoom bot for Windows 96

=|= bot A MsgRoom bot in Python 3 for Windows96.net. The bot joins as =|=, if parameter name_lasts is not true and default_name is =|=. The full

Larry Holst 2 Jun 07, 2022
BaiduPCS API & App 百度网盘客户端

BaiduPCS-Py A BaiduPCS API and An App BaiduPCS-Py 是百度网盘 pcs 的非官方 api 和一个命令行运用程序。

Peter Ding 450 Jan 05, 2023
Python wrapper for the Sportradar APIs ⚽️🏈

Sportradar APIs This is a Python wrapper for the sports APIs provided by Sportradar. You'll need to sign up for an API key to use the service. Sportra

John W. Miller 39 Jan 01, 2023
A small python script which runs a speedtest using speedtest.net and inserts it into a Google Docs Spreadsheet.

speedtest-google-sheets This is a small python script which runs a speedtest using speedtest.net and inserts it into a Google Docs Spreadsheet. Setup

marie 2 Feb 10, 2022
Trading bot - A Trading bot With Python

Trading_bot Trading bot intended for 1) Tracking current prices of tokens 2) Set

Tymur Kotkov 29 Dec 01, 2022
This asynchronous telegram bot sells books.

Selling_Books_Bot Description Say, you have a bunch of items you need no more and you want to sell it all out. That's where you're going to have to us

Roman 1 Oct 24, 2021
AWS Glue PySpark - Apache Hudi Quick Start Guide

AWS Glue PySpark - Apache Hudi Quick Start Guide Disclaimer: This is a quick start guide for the Apache Hudi Python Spark connector, running on AWS Gl

Gabriel Amazonas Mesquita 8 Nov 14, 2022
Gorrabot is a bot made to automate checks and processes in the development process.

Gorrabot is a Gitlab bot made to automate checks and processes in the Faraday development. Features Check that the CHANGELOG is modified By default, m

Faraday 7 Dec 14, 2022
TG-Url-Uploader-Bot - Telegram RoBot to Upload Links

MW-URL-Uploader Bot Telegram RoBot to Upload Links. Features: 👉 Only Auth Users

Aadhi 3 Jun 27, 2022
Advance Anonymous Sender bot with Caption Editor

AnonyMous Sender 👨‍💻 Advanced Anonymous Sender with Caption Editor Join @DaisySupport_Official 🎵 for help Features Get forwarded messages without f

Inuka Asith 13 Oct 09, 2022
A Telelgram Bot to Extract Text from an Image

Text-Scanner-OCR A Telelgram Bot to Extract Text from an Image Configs Vars API_KEY: Your API_KEY from OCR Space GROUP: Your Group Username without '@

ALBY 8 Feb 20, 2022
Save data from Instagram takeout to a SQLite database

instagram-to-sqlite Save data from a Instagram takeout to a SQLite database. Mise En Place git clone https://github.com/gavindsouza/instagram-to-sqlit

gavin 8 Dec 13, 2022
This bot can mention members upto 10,000 in groups and can mention members upto 200 in channels !

Mention All Bot This bot can mention members upto 10,000 in groups and can mention members upto 200 in channels ! 🏷 Infomation Language: Python. Tele

Anjana Madu 52 Dec 29, 2022
Follow pixiv account from twitter profile

Follow pixiv account from twitter profile

Genshi 7 Apr 11, 2022
A stock information collector and parser for Taiwan and US market. Automatically send LINE message if the pre-defined rules are triggered.

agastock 開發動機 就在海運飆漲的2021年7月,差點跪在地上喜迎財富自由的當下,EPS超高好消息不斷的長榮竟然套在202元一去不回,有圖有真相(哭) 忽然體會到追高殺低不是辦法,魯蛇我得靠邏輯分析也能出頭天,經過三個月無數個不出門的周末,產出簡單的爬蟲和分析工具。 上過金融研訓院的量化交易

Gavin Lee 12 Nov 16, 2022
An Open Source ALL-In-One Telegram RoBot, that can do lot of things.

URL Uploader Bot An Open Source ALL-In-One Telegram RoBot, that can do lot of things. My Features Installation The Easy Way You can also tap the Deplo

NT BOTS 1 Oct 23, 2021
RP2 is a privacy-focused, free, open-source US cryptocurrency tax calculator

Privacy-focused, free, open-source cryptocurrency US tax calculator, up to date for 2021: it handles multiple coins/exchanges and computes long/short-term capital gains, cost bases, in/out lot relati

eprbell 123 Jan 04, 2023
Projeto do segundo módulo da Resilia

@ Projeto Resilia : Módulo 2 Vamos jogar Forca ! O jogo da forca é um jogo em que o jogador tem que acertar qual é a palavra proposta, tendo como dica

Mateus Sartorio 2 Feb 24, 2022
A Discord/Xenforo bot!

telathbot A Discord/Xenforo bot! Pre-requisites pyenv (via installer) poetry Docker (with Go version of docker compose enabled) Local development Crea

Telath 4 Mar 09, 2022