ML Lineage Helper

This library is a wrapper around the SageMaker SDK to support ease of lineage tracking across the ML lifecycle. Lineage artifacts include data, code, feature groups, features in a feature group, feature group queries, training jobs, and models.

Install

pip install git+https://github.com/aws-samples/ml-lineage-helper

Usage

Import ml_lineage_helper.

from ml_lineage_helper import *
from ml_lineage_helper.query_lineage import QueryLineage

Creating and Displaying ML Lineage

Lineage tracking can tie together a SageMaker Processing job, the raw data being processed, the processing code, the query you used against the Feature Store to fetch your training and test sets, the training and test data in S3, and the training code into a lineage represented as a DAG.

ml_lineage = MLLineageHelper()
lineage = ml_lineage.create_ml_lineage(estimator_or_training_job_name, model_name=model_name,
                                       query=query, sagemaker_processing_job_description=preprocessing_job_description,
                                       feature_group_names=['customers', 'claims'])
lineage

If you cloned your code from a version control hosting platform like GitHub or GitLab, ml_lineage_tracking can associate the URLs of the code with the artifacts that will be created. See below:

# Get repo links to processing and training code
processing_code_repo_url = get_repo_link(os.getcwd(), 'processing.py')
training_code_repo_url = get_repo_link(os.getcwd(), 'pytorch-model/train_deploy.py', processing_code=False)
repo_links = [processing_code_repo_url, training_code_repo_url]

# Create lineage
ml_lineage = MLLineageHelper()
lineage = ml_lineage.create_ml_lineage(estimator, model_name=model_name,
                                       query=query, sagemaker_processing_job_description=preprocessing_job_description,
                                       feature_group_names=['customers', 'claims'],
                                       repo_links=repo_links)
lineage

Name/Source	Association	Name/Destination	Artifact Source ARN	Artifact Destination ARN	Source URI	Base64 Feature Store Query String	Git URL
pytorch-hosted-model-2021-08-26-15-55-22-071-aws-training-job	Produced	Model	arn:aws:sagemaker:us-west-2:000000000000:experiment-trial-component/pytorch-hosted-model-2021-08-26-15-55-22-071-aws-training-job	arn:aws:sagemaker:us-west-2:000000000000:artifact/013fa1be4ec1d192dac21abaf94ddded	None	None	None
TrainingCode	ContributedTo	pytorch-hosted-model-2021-08-26-15-55-22-071-aws-training-job	arn:aws:sagemaker:us-west-2:000000000000:artifact/902d23ff64ef6d85dc27d841a967cd7d	arn:aws:sagemaker:us-west-2:000000000000:experiment-trial-component/pytorch-hosted-model-2021-08-26-15-55-22-071-aws-training-job	s3://sagemaker-us-west-2-000000000000/pytorch-hosted-model-2021-08-26-15-55-22-071/source/sourcedir.tar.gz	None	https://gitlab.com/bwlind/ml-lineage-tracking/blob/main/ml-lineage-tracking/pytorch-model/train_deploy.py
TestingData	ContributedTo	pytorch-hosted-model-2021-08-26-15-55-22-071-aws-training-job	arn:aws:sagemaker:us-west-2:000000000000:artifact/1ae9dfab7a3817cbf14708d932d9142d	arn:aws:sagemaker:us-west-2:000000000000:experiment-trial-component/pytorch-hosted-model-2021-08-26-15-55-22-071-aws-training-job	s3://sagemaker-us-west-2-000000000000/ml-lineage-tracking-v1/test.npy	None	None
TrainingData	ContributedTo	pytorch-hosted-model-2021-08-26-15-55-22-071-aws-training-job	arn:aws:sagemaker:us-west-2:000000000000:artifact/a0fd47c730f883b8e5228577fc5d5ef4	arn:aws:sagemaker:us-west-2:000000000000:experiment-trial-component/pytorch-hosted-model-2021-08-26-15-55-22-071-aws-training-job	s3://sagemaker-us-west-2-000000000000/ml-lineage-tracking-v1/train.npy	CnNlbGVjdCAqCmZyb20gImJvc3Rvbi1ob3VzaW5nLXY1LTE2Mjk3MzEyNjkiCg==	None
fg-boston-housing-v5	ContributedTo	TestingData	arn:aws:sagemaker:us-west-2:000000000000:artifact/1969cb21bf48405e0f2bb2d33f48b7b2	arn:aws:sagemaker:us-west-2:000000000000:artifact/1ae9dfab7a3817cbf14708d932d9142d	arn:aws:sagemaker:us-west-2:000000000000:feature-group/boston-housing-v5	None	None
fg-boston-housing	ContributedTo	TestingData	arn:aws:sagemaker:us-west-2:000000000000:artifact/d1b82165341cd78b93995d492b5adf7f	arn:aws:sagemaker:us-west-2:000000000000:artifact/1ae9dfab7a3817cbf14708d932d9142d	arn:aws:sagemaker:us-west-2:000000000000:feature-group/boston-housing	None	None
ProcessingJob	ContributedTo	fg-boston-housing-v5	arn:aws:sagemaker:us-west-2:000000000000:artifact/0a665c42c57f3b561e18a51a327d0a2f	arn:aws:sagemaker:us-west-2:000000000000:artifact/1969cb21bf48405e0f2bb2d33f48b7b2	arn:aws:sagemaker:us-west-2:000000000000:processing-job/pytorch-workflow-preprocessing-26-15-41-18	None	None
ProcessingInputData	ContributedTo	ProcessingJob	arn:aws:sagemaker:us-west-2:000000000000:artifact/2204290e557c4c9feaaa4ef7e4d88f0c	arn:aws:sagemaker:us-west-2:000000000000:artifact/0a665c42c57f3b561e18a51a327d0a2f	s3://sagemaker-us-west-2-000000000000/ml-lineage-tracking-v1/data/raw	None	None
ProcessingCode	ContributedTo	ProcessingJob	arn:aws:sagemaker:us-west-2:000000000000:artifact/69de4723ab0643c6ca8257bc6fbcfb4f	arn:aws:sagemaker:us-west-2:000000000000:artifact/0a665c42c57f3b561e18a51a327d0a2f	s3://sagemaker-us-west-2-000000000000/pytorch-workflow-preprocessing-26-15-41-18/input/code/preprocessing.py	None	https://gitlab.com/bwlind/ml-lineage-tracking/blob/main/ml-lineage-tracking/processing.py
ProcessingJob	ContributedTo	fg-boston-housing	arn:aws:sagemaker:us-west-2:000000000000:artifact/0a665c42c57f3b561e18a51a327d0a2f	arn:aws:sagemaker:us-west-2:000000000000:artifact/d1b82165341cd78b93995d492b5adf7f	arn:aws:sagemaker:us-west-2:000000000000:processing-job/pytorch-workflow-preprocessing-26-15-41-18	None	None
fg-boston-housing-v5	ContributedTo	TrainingData	arn:aws:sagemaker:us-west-2:000000000000:artifact/1969cb21bf48405e0f2bb2d33f48b7b2	arn:aws:sagemaker:us-west-2:000000000000:artifact/a0fd47c730f883b8e5228577fc5d5ef4	arn:aws:sagemaker:us-west-2:000000000000:feature-group/boston-housing-v5	None	None
fg-boston-housing	ContributedTo	TrainingData	arn:aws:sagemaker:us-west-2:000000000000:artifact/d1b82165341cd78b93995d492b5adf7f	arn:aws:sagemaker:us-west-2:000000000000:artifact/a0fd47c730f883b8e5228577fc5d5ef4	arn:aws:sagemaker:us-west-2:000000000000:feature-group/boston-housing	None	None

You can optionally see the lineage represented as a graph instead of a Pandas DataFrame:

ml_lineage.graph()

If you're jumping in a notebook fresh and already have a model whose ML Lineage has been tracked, you can get this MLLineage object by using the following line of code:

ml_lineage = MLLineageHelper(sagemaker_model_name_or_model_s3_uri='my-sagemaker-model-name')
ml_lineage.df

Querying ML Lineage

If you have a data source, you can find associated Feature Groups by providing the data source's S3 URI or Artifact ARN:

query_lineage = QueryLineage()
query_lineage.get_feature_groups_from_data_source(artifact_arn_or_s3_uri)

You can also start with a Feature Group, and find associated data sources:

query_lineage = QueryLineage()
query_lineage.get_data_sources_from_feature_group(artifact_or_fg_arn, max_depth=3)

Given a Feature Group, you can also find associated models:

query_lineage = QueryLineage()
query_lineage.get_models_from_feature_group(artifact_or_fg_arn)

Given a SageMaker model name or artifact ARN, you can find associated Feature Groups.

query_lineage = QueryLineage()
query_lineage.get_feature_groups_from_model(artifact_arn_or_model_name)

Security

See CONTRIBUTING for more information.

License

This project is licensed under the Apache-2.0 License.

A wrapper around SageMaker ML Lineage Tracking extending ML Lineage to end-to-end ML lifecycles, including additional capabilities around Feature Store groups, queries, and other relevant artifacts.

Related tags

Overview

ML Lineage Helper

Install

Usage

Creating and Displaying ML Lineage

Querying ML Lineage

Security

License

Owner

AWS Samples

Code release for ICCV 2021 paper "Anticipative Video Transformer"

Temporal Dynamic Convolutional Neural Network for Text-Independent Speaker Verification and Phonemetic Analysis

Pytorch implementation of DeepMind's differentiable neural computer paper.

Meli Data Challenge 2021 - First Place Solution

Another pytorch implementation of FCN (Fully Convolutional Networks)

3D Pose Estimation for Vehicles

A Dying Light 2 (DL2) PAKFile Utility for Modders and Mod Makers.

A new GCN model for Point Cloud Analyse

Deploy tensorflow graphs for fast evaluation and export to tensorflow-less environments running numpy.

A package to predict protein inter-residue geometries from sequence data

PyTorch implementation of "VRT: A Video Restoration Transformer"

Traductor de lengua de señas al español basado en Python con Opencv y MedaiPipe

Source code for our EMNLP'21 paper 《Raise a Child in Large Language Model: Towards Effective and Generalizable Fine-tuning》

Count GitHub Stars ⭐

Projects of Andfun Yangon

Source code of our TTH paper: Targeted Trojan-Horse Attacks on Language-based Image Retrieval.

You Only Look Once for Panopitic Driving Perception

Voxel Transformer for 3D object detection

This repo will contain code to reproduce and build upon understanding transfer learning

Unicorn can be used for performance analyses of highly configurable systems with causal reasoning