ML Lineage Helper

This library is a wrapper around the SageMaker SDK to support ease of lineage tracking across the ML lifecycle. Lineage artifacts include data, code, feature groups, features in a feature group, feature group queries, training jobs, and models.

Install

pip install git+https://github.com/aws-samples/ml-lineage-helper

Usage

Import ml_lineage_helper.

from ml_lineage_helper import *
from ml_lineage_helper.query_lineage import QueryLineage

Creating and Displaying ML Lineage

Lineage tracking can tie together a SageMaker Processing job, the raw data being processed, the processing code, the query you used against the Feature Store to fetch your training and test sets, the training and test data in S3, and the training code into a lineage represented as a DAG.

ml_lineage = MLLineageHelper()
lineage = ml_lineage.create_ml_lineage(estimator_or_training_job_name, model_name=model_name,
                                       query=query, sagemaker_processing_job_description=preprocessing_job_description,
                                       feature_group_names=['customers', 'claims'])
lineage

If you cloned your code from a version control hosting platform like GitHub or GitLab, ml_lineage_tracking can associate the URLs of the code with the artifacts that will be created. See below:

# Get repo links to processing and training code
processing_code_repo_url = get_repo_link(os.getcwd(), 'processing.py')
training_code_repo_url = get_repo_link(os.getcwd(), 'pytorch-model/train_deploy.py', processing_code=False)
repo_links = [processing_code_repo_url, training_code_repo_url]

# Create lineage
ml_lineage = MLLineageHelper()
lineage = ml_lineage.create_ml_lineage(estimator, model_name=model_name,
                                       query=query, sagemaker_processing_job_description=preprocessing_job_description,
                                       feature_group_names=['customers', 'claims'],
                                       repo_links=repo_links)
lineage

Name/Source	Association	Name/Destination	Artifact Source ARN	Artifact Destination ARN	Source URI	Base64 Feature Store Query String	Git URL
pytorch-hosted-model-2021-08-26-15-55-22-071-aws-training-job	Produced	Model	arn:aws:sagemaker:us-west-2:000000000000:experiment-trial-component/pytorch-hosted-model-2021-08-26-15-55-22-071-aws-training-job	arn:aws:sagemaker:us-west-2:000000000000:artifact/013fa1be4ec1d192dac21abaf94ddded	None	None	None
TrainingCode	ContributedTo	pytorch-hosted-model-2021-08-26-15-55-22-071-aws-training-job	arn:aws:sagemaker:us-west-2:000000000000:artifact/902d23ff64ef6d85dc27d841a967cd7d	arn:aws:sagemaker:us-west-2:000000000000:experiment-trial-component/pytorch-hosted-model-2021-08-26-15-55-22-071-aws-training-job	s3://sagemaker-us-west-2-000000000000/pytorch-hosted-model-2021-08-26-15-55-22-071/source/sourcedir.tar.gz	None	https://gitlab.com/bwlind/ml-lineage-tracking/blob/main/ml-lineage-tracking/pytorch-model/train_deploy.py
TestingData	ContributedTo	pytorch-hosted-model-2021-08-26-15-55-22-071-aws-training-job	arn:aws:sagemaker:us-west-2:000000000000:artifact/1ae9dfab7a3817cbf14708d932d9142d	arn:aws:sagemaker:us-west-2:000000000000:experiment-trial-component/pytorch-hosted-model-2021-08-26-15-55-22-071-aws-training-job	s3://sagemaker-us-west-2-000000000000/ml-lineage-tracking-v1/test.npy	None	None
TrainingData	ContributedTo	pytorch-hosted-model-2021-08-26-15-55-22-071-aws-training-job	arn:aws:sagemaker:us-west-2:000000000000:artifact/a0fd47c730f883b8e5228577fc5d5ef4	arn:aws:sagemaker:us-west-2:000000000000:experiment-trial-component/pytorch-hosted-model-2021-08-26-15-55-22-071-aws-training-job	s3://sagemaker-us-west-2-000000000000/ml-lineage-tracking-v1/train.npy	CnNlbGVjdCAqCmZyb20gImJvc3Rvbi1ob3VzaW5nLXY1LTE2Mjk3MzEyNjkiCg==	None
fg-boston-housing-v5	ContributedTo	TestingData	arn:aws:sagemaker:us-west-2:000000000000:artifact/1969cb21bf48405e0f2bb2d33f48b7b2	arn:aws:sagemaker:us-west-2:000000000000:artifact/1ae9dfab7a3817cbf14708d932d9142d	arn:aws:sagemaker:us-west-2:000000000000:feature-group/boston-housing-v5	None	None
fg-boston-housing	ContributedTo	TestingData	arn:aws:sagemaker:us-west-2:000000000000:artifact/d1b82165341cd78b93995d492b5adf7f	arn:aws:sagemaker:us-west-2:000000000000:artifact/1ae9dfab7a3817cbf14708d932d9142d	arn:aws:sagemaker:us-west-2:000000000000:feature-group/boston-housing	None	None
ProcessingJob	ContributedTo	fg-boston-housing-v5	arn:aws:sagemaker:us-west-2:000000000000:artifact/0a665c42c57f3b561e18a51a327d0a2f	arn:aws:sagemaker:us-west-2:000000000000:artifact/1969cb21bf48405e0f2bb2d33f48b7b2	arn:aws:sagemaker:us-west-2:000000000000:processing-job/pytorch-workflow-preprocessing-26-15-41-18	None	None
ProcessingInputData	ContributedTo	ProcessingJob	arn:aws:sagemaker:us-west-2:000000000000:artifact/2204290e557c4c9feaaa4ef7e4d88f0c	arn:aws:sagemaker:us-west-2:000000000000:artifact/0a665c42c57f3b561e18a51a327d0a2f	s3://sagemaker-us-west-2-000000000000/ml-lineage-tracking-v1/data/raw	None	None
ProcessingCode	ContributedTo	ProcessingJob	arn:aws:sagemaker:us-west-2:000000000000:artifact/69de4723ab0643c6ca8257bc6fbcfb4f	arn:aws:sagemaker:us-west-2:000000000000:artifact/0a665c42c57f3b561e18a51a327d0a2f	s3://sagemaker-us-west-2-000000000000/pytorch-workflow-preprocessing-26-15-41-18/input/code/preprocessing.py	None	https://gitlab.com/bwlind/ml-lineage-tracking/blob/main/ml-lineage-tracking/processing.py
ProcessingJob	ContributedTo	fg-boston-housing	arn:aws:sagemaker:us-west-2:000000000000:artifact/0a665c42c57f3b561e18a51a327d0a2f	arn:aws:sagemaker:us-west-2:000000000000:artifact/d1b82165341cd78b93995d492b5adf7f	arn:aws:sagemaker:us-west-2:000000000000:processing-job/pytorch-workflow-preprocessing-26-15-41-18	None	None
fg-boston-housing-v5	ContributedTo	TrainingData	arn:aws:sagemaker:us-west-2:000000000000:artifact/1969cb21bf48405e0f2bb2d33f48b7b2	arn:aws:sagemaker:us-west-2:000000000000:artifact/a0fd47c730f883b8e5228577fc5d5ef4	arn:aws:sagemaker:us-west-2:000000000000:feature-group/boston-housing-v5	None	None
fg-boston-housing	ContributedTo	TrainingData	arn:aws:sagemaker:us-west-2:000000000000:artifact/d1b82165341cd78b93995d492b5adf7f	arn:aws:sagemaker:us-west-2:000000000000:artifact/a0fd47c730f883b8e5228577fc5d5ef4	arn:aws:sagemaker:us-west-2:000000000000:feature-group/boston-housing	None	None

You can optionally see the lineage represented as a graph instead of a Pandas DataFrame:

ml_lineage.graph()

If you're jumping in a notebook fresh and already have a model whose ML Lineage has been tracked, you can get this MLLineage object by using the following line of code:

ml_lineage = MLLineageHelper(sagemaker_model_name_or_model_s3_uri='my-sagemaker-model-name')
ml_lineage.df

Querying ML Lineage

If you have a data source, you can find associated Feature Groups by providing the data source's S3 URI or Artifact ARN:

query_lineage = QueryLineage()
query_lineage.get_feature_groups_from_data_source(artifact_arn_or_s3_uri)

You can also start with a Feature Group, and find associated data sources:

query_lineage = QueryLineage()
query_lineage.get_data_sources_from_feature_group(artifact_or_fg_arn, max_depth=3)

Given a Feature Group, you can also find associated models:

query_lineage = QueryLineage()
query_lineage.get_models_from_feature_group(artifact_or_fg_arn)

Given a SageMaker model name or artifact ARN, you can find associated Feature Groups.

query_lineage = QueryLineage()
query_lineage.get_feature_groups_from_model(artifact_arn_or_model_name)

Security

See CONTRIBUTING for more information.

License

This project is licensed under the Apache-2.0 License.

A wrapper around SageMaker ML Lineage Tracking extending ML Lineage to end-to-end ML lifecycles, including additional capabilities around Feature Store groups, queries, and other relevant artifacts.

Related tags

Overview

ML Lineage Helper

Install

Usage

Creating and Displaying ML Lineage

Querying ML Lineage

Security

License

Owner

AWS Samples

The InterScript dataset contains interactive user feedback on scripts generated by a T5-XXL model.

Source code for Fathony, Sahu, Willmott, & Kolter, "Multiplicative Filter Networks", ICLR 2021.

🍀 Pytorch implementation of various Attention Mechanisms, MLP, Re-parameter, Convolution, which is helpful to further understand papers.⭐⭐⭐

On the Analysis of French Phonetic Idiosyncrasies for Accent Recognition

SeisComP/SeisBench interface to enable deep-learning (re)picking in SeisComP

A framework for multi-step probabilistic time-series/demand forecasting models

The code of NeurIPS 2021 paper "Scalable Rule-Based Representation Learning for Interpretable Classification".

Official PyTorch implementation of StyleGAN3

Unofficial implementation of "TTNet: Real-time temporal and spatial video analysis of table tennis" (CVPR 2020)

Official Implement of CVPR 2021 paper “Cross-Modal Collaborative Representation Learning and a Large-Scale RGBT Benchmark for Crowd Counting”

PyTorch code of "SLAPS: Self-Supervision Improves Structure Learning for Graph Neural Networks"

A voice recognition assistant similar to amazon alexa, siri and google assistant.

A Python Package for Convex Regression and Frontier Estimation

RLMeta is a light-weight flexible framework for Distributed Reinforcement Learning Research.

Official public repository of paper "Intention Adaptive Graph Neural Network for Category-Aware Session-Based Recommendation"

Inverse Rendering for Complex Indoor Scenes: Shape, Spatially-Varying Lighting and SVBRDF From a Single Image

TorchOk - The toolkit for fast Deep Learning experiments in Computer Vision

Implementing yolov4 target detection and tracking based on nao robot

Advanced Signal Processing Notebooks and Tutorials

Memory Defense: More Robust Classificationvia a Memory-Masking Autoencoder