This repository is home to the Optimus data transformation plugins for various data processing needs.

Overview

Transformers

test workflow build workflow

Optimus's transformation plugins are implementations of Task and Hook interfaces that allows execution of arbitrary jobs in optimus.

To install plugins via homebrew

brew tap odpf/taps
brew install optimus-plugins-odpf

To install plugins via shell

curl -sL ${PLUGIN_RELEASE_URL} | tar xvz
chmod +x optimus-*
mv optimus-* /usr/bin/
Comments
  • Fix: fix ignoreupstream helper for big query view

    Fix: fix ignoreupstream helper for big query view

    Hello, Currently, for any query, we try to find the dependancy and ignoredependancy with FindDependenciesWithRegex and then we again pull the Refereced table with big query dry run.

    If query contains view which is marked with /* @ignoreupstream */ helper, then ignoredependancy will contain the view name but not the table referenced by view.

    The change here is to revise ignoredependancy list with table referenced by view.

    I kept the loop execution in sequential manner, please let me know if should add concurrency here

    enhancement 
    opened by SumitAgrawal03071989 2
  • @ignoreupstream ineffective on big query view

    @ignoreupstream ineffective on big query view

    We have a query referencing to table as well as view. select * from proj.dataset.table t1 left join proj.dataset.view v1 on t1.date = v1.date and t1.id = v1.id

    • now if we apply @ignoreupstream helper on table proj.dataset.table then it correctly ignores to create upstream dependancy for this table.
    • But if we apply @ignoreupstream helper on view proj.dataset.view ( note the view query refers to 2 more tables ) then it does not ignore view or table referenced by view.
    opened by SumitAgrawal03071989 2
  • feat : migrate plugins for the inti-container changes in optimus

    feat : migrate plugins for the inti-container changes in optimus

    As per Optimus PR, the executor boot process is standardised and maintained at optimus. Plugin devs need no longer have to wrap the executor image. closes odpf/optimus#405

    opened by smarch-int 1
  • monthly job didn't run for the last day of month

    monthly job didn't run for the last day of month

    Hi team,

    I have a bq2bq job with window configuration

      window:
        size: 720h
        offset: -48h
        truncate_to: M
    

    I expect to have transformation for date 01 to last day of the month, e.g on April, I expect got transformation from date 01 - 30. but currently only got transformation from date 01 - 29

    [2022-06-13 15:08:12,323] {pod_launcher.py:149} INFO - [2022-06-13 15:08:12] INFO:bumblebee.transformation: create transformation for partition: 2022-04-26 00:00:00+00:00
    [2022-06-13 15:08:12,323] {pod_launcher.py:149} INFO - [2022-06-13 15:08:12] INFO:bumblebee.transformation: create transformation for partition: 2022-04-27 00:00:00+00:00
    [2022-06-13 15:08:12,323] {pod_launcher.py:149} INFO - [2022-06-13 15:08:12] INFO:bumblebee.transformation: create transformation for partition: 2022-04-28 00:00:00+00:00
    [2022-06-13 15:08:12,323] {pod_launcher.py:149} INFO - [2022-06-13 15:08:12] INFO:bumblebee.transformation: create transformation for partition: 2022-04-29 00:00:00+00:00
    [2022-06-13 15:08:12,324] {pod_launcher.py:149} INFO - [2022-06-13 15:08:12] INFO:bumblebee.transformation: start transformation job
    [2022-06-13 15:08:12,324] {pod_launcher.py:149} INFO - [2022-06-13 15:08:12] INFO:bumblebee.transformation: sql transformation query:
    

    after checking, I suspect the logic may related to this line, where the last day generated by windows class not included as the transformation partition.

    https://github.com/odpf/transformers/blob/ea1de4f0de3d17d9be7ccefb1e2f3beab1a685f1/task/bq2bq/executor/bumblebee/transformation.py#L393

    please kindly check it, and release the fix. thank you

    opened by novanxyz 1
  • feat: add support for secret env vars

    feat: add support for secret env vars

    With this we are adding support for using secrets in macros, we do not want to print the env vars in the logs, so exporting them as a separate file from optimus.

    Plugins can export this extra file to get env vars.

    opened by sbchaos 1
  • feat : remove wrapper image and use bq2bq executor image in plugin

    feat : remove wrapper image and use bq2bq executor image in plugin

    As per https://github.com/odpf/optimus/pull/425, the executor boot process is standardised and maintained at optimus. Plugin devs need no longer have to wrap the executor image. closes https://github.com/odpf/optimus/issues/405

    opened by smarch-int 0
  • Generate Dependencies is using the dry run apis which is bound to fail with macros

    Generate Dependencies is using the dry run apis which is bound to fail with macros

    The most intuitive way is to parse the query and hit the metadata apis instead of going through the dry run which should be definitly costly then the metadata fetch apis.

    enhancement performance 
    opened by sravankorumilli 0
  • BQ2BQ Replace load dispostion doesn't handle aggregations

    BQ2BQ Replace load dispostion doesn't handle aggregations

    Options

    1. Add an extra option in Replace load dispostition to take input from users to replace a specific or range of partitions using literals / all , dstart, dend. Default is all : all represents splitting of query to multiple partitions from dstart to dend.
    2. Use a new Load Disposition, to replace to a single destination partition which is window start
    bug 
    opened by sravankorumilli 0
Releases(v0.2.1)
Owner
Open Data Platform
Next-gen collaborative, domain-driven and distributed data platform
Open Data Platform
Compute distance between sequences. 30+ algorithms, pure python implementation, common interface, optional external libs usage.

TextDistance TextDistance -- python library for comparing distance between two or more sequences by many algorithms. Features: 30+ algorithms Pure pyt

Life4 3k Jan 06, 2023
Malaya-Speech is a Speech-Toolkit library for bahasa Malaysia, powered by Deep Learning Tensorflow.

Malaya-Speech is a Speech-Toolkit library for bahasa Malaysia, powered by Deep Learning Tensorflow. Documentation Proper documentation is available at

HUSEIN ZOLKEPLI 151 Jan 05, 2023
This converter will create the exact measure for your cappuccino recipe from the grandiose Rafaella Ballerini!

About CappuccinoJs This converter will create the exact measure for your cappuccino recipe from the grandiose Rafaella Ballerini! Este conversor criar

Arthur Ottoni Ribeiro 48 Nov 15, 2022
This is a really simple text-to-speech app made with python and tkinter.

Tkinter Text-to-Speech App by Souvik Roy This is a really simple tkinter app which converts the text you have entered into a speech. It is created wit

Souvik Roy 1 Dec 21, 2021
A relatively simple python program to generate one of those reddit text to speech videos dominating youtube.

Reddit text to speech generator A basic reddit tts video generator Current functionality Generate videos for subs based on comments,(askreddit) so rea

Aadvik 17 Dec 19, 2022
A fast hierarchical dimensionality reduction algorithm.

h-NNE: Hierarchical Nearest Neighbor Embedding A fast hierarchical dimensionality reduction algorithm. h-NNE is a general purpose dimensionality reduc

Marios Koulakis 35 Dec 12, 2022
Write Python in Urdu - اردو میں کوڈ لکھیں

UrduPython Write simple Python in Urdu. How to Use Write Urdu code in سامپل۔پے The mappings are as following: "۔": ".", "،":

Saad A. Bazaz 26 Nov 27, 2022
ProtFeat is protein feature extraction tool that utilizes POSSUM and iFeature.

Description: ProtFeat is designed to extract the protein features by employing POSSUM and iFeature python-based tools. ProtFeat includes a total of 39

GOKHAN OZSARI 5 Dec 16, 2022
Sequence Modeling with Structured State Spaces

Structured State Spaces for Sequence Modeling This repository provides implementations and experiments for the following papers. S4 Efficiently Modeli

HazyResearch 902 Jan 06, 2023
This project deals with a simplified version of a more general problem of Aspect Based Sentiment Analysis.

Aspect_Based_Sentiment_Extraction Created on: 5th Jan, 2022. This project deals with an important field of Natural Lnaguage Processing - Aspect Based

Naman Rastogi 4 Jan 01, 2023
Implementaion of our ACL 2022 paper Bridging the Data Gap between Training and Inference for Unsupervised Neural Machine Translation

Bridging the Data Gap between Training and Inference for Unsupervised Neural Machine Translation This is the implementaion of our paper: Bridging the

hezw.tkcw 20 Dec 12, 2022
Python implementation of TextRank for phrase extraction and summarization of text documents

PyTextRank PyTextRank is a Python implementation of TextRank as a spaCy pipeline extension, used to: extract the top-ranked phrases from text document

derwen.ai 1.9k Jan 06, 2023
A collection of Classical Chinese natural language processing models, including Classical Chinese related models and resources on the Internet.

GuwenModels: 古文自然语言处理模型合集, 收录互联网上的古文相关模型及资源. A collection of Classical Chinese natural language processing models, including Classical Chinese related models and resources on the Internet.

Ethan 66 Dec 26, 2022
KoBERT - Korean BERT pre-trained cased (KoBERT)

KoBERT KoBERT Korean BERT pre-trained cased (KoBERT) Why'?' Training Environment Requirements How to install How to use Using with PyTorch Using with

SK T-Brain 1k Jan 02, 2023
Common Voice Dataset explorer

Common Voice Dataset Explorer Common Voice Dataset is by Mozilla Made during huggingface finetuning week Usage pip install -r requirements.txt streaml

Ceyda Cinarel 22 Nov 16, 2022
This is a Prototype of an Ai ChatBot "Tea and Coffee Supplier" using python.

Ai-ChatBot-Python A chatbot is an intelligent system which can hold a conversation with a human using natural language in real time. Due to the rise o

1 Oct 30, 2021
An ActivityWatch watcher to pose questions to the user and record her answers.

aw-watcher-ask An ActivityWatch watcher to pose questions to the user and record her answers. This watcher uses Zenity to present dialog boxes to the

Bernardo Chrispim Baron 33 Dec 03, 2022
Unsupervised Language Model Pre-training for French

FlauBERT and FLUE FlauBERT is a French BERT trained on a very large and heterogeneous French corpus. Models of different sizes are trained using the n

GETALP 212 Dec 10, 2022
Text Analysis & Topic Extraction on Android App user reviews

AndroidApp_TextAnalysis Hi, there! This is code archive for Text Analysis and Topic Extraction from user_reviews of Android App. Dataset Source : http

Fitrie Ratnasari 1 Feb 14, 2022
NVDA, the free and open source Screen Reader for Microsoft Windows

NVDA NVDA (NonVisual Desktop Access) is a free, open source screen reader for Microsoft Windows. It is developed by NV Access in collaboration with a

NV Access 1.6k Jan 07, 2023