Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting with data.

Last update: Jan 03, 2023

Overview

Amundsen is a data discovery and metadata engine for improving the productivity of data analysts, data scientists and engineers when interacting with data. It does that today by indexing data resources (tables, dashboards, streams, etc.) and powering a page-rank style search based on usage patterns (e.g. highly queried tables show up earlier than less queried tables). Think of it as Google search for data. The project is named after Norwegian explorer Roald Amundsen, the first person to discover the South Pole.

Amundsen is hosted by the LF AI & Data Foundation. It includes three microservices, one data ingestion library and one common library.

amundsenfrontendlibrary: Frontend service which is a Flask application with a React frontend.
amundsensearchlibrary: Search service, which leverages Elasticsearch for search capabilities, is used to power frontend metadata searching.
amundsenmetadatalibrary: Metadata service, which leverages Neo4j or Apache Atlas as the persistent layer, to provide various metadata.
amundsendatabuilder: Data ingestion library for building metadata graph and search index. Users could either load the data with a python script with the library or with an Airflow DAG importing the library.
amundsencommon: Amundsen Common library holds common codes among microservices in Amundsen.
amundsengremlin: Amundsen Gremlin library holds code used for converting model objects into vertices and edges in gremlin. It's used for loading data into an AWS Neptune backend.
amundsenrds: Amundsenrds contains ORM models to support relational database as metadata backend store in Amundsen. The schema in ORM models follows the logic of databuilder models. Amundsenrds will be used in databuilder and metadatalibrary for metadata storage and retrieval with relational databases.

Homepage

https://www.amundsen.io/

Documentation

https://www.amundsen.io/amundsen/

Requirements

Python = 3.6 or 3.7
Node = v10 or v12 (v14 may have compatibility issues)
npm >= 6

User Interface

Please note that the mock images only served as demonstration purpose.

Landing Page: The landing page for Amundsen including 1. search bars; 2. popular used tables;
Search Preview: See inline search results as you type
Table Detail Page: Visualization of a Hive / Redshift table
Column detail: Visualization of columns of a Hive / Redshift table which includes an optional stats display
Data Preview Page: Visualization of table data preview which could integrate with Apache Superset or other Data Visualization Tools.

Get Involved in the Community

Want help or want to help? Use the button in our header to join our slack channel. Contributions are also more than welcome! As explained in CONTRIBUTING.md there are many ways to contribute, it does not all have to be code with new features and bug fixes, also documentation, like FAQ entries, bug reports, blog posts sharing experiences etc. all help move Amundsen forward. If you find a security vulnerability, please follow this guide.

Getting Started

Please visit the Amundsen installation documentation for a quick start to bootstrap a default version of Amundsen with dummy data.

Architecture Overview

Please visit Architecture for Amundsen architecture overview.

Supported Entities

Tables (from Databases)
People (from HR systems)
Dashboards

Supported Integrations

Table Connectors

Amazon Athena
Amazon Glue and anything built over it (like Databricks Delta - which is a work in progress).
Amazon Redshift
Apache Cassandra
Apache Druid
Apache Hive
CSV
dbt
Delta Lake
Google BigQuery
IBM DB2
Microsoft SQL Server
MySQL
Oracle (through dbapi or sql_alchemy)
PostgreSQL
Trino (formerly Presto SQL)
Vertica
Snowflake

Amundsen can also connect to any database that provides dbapi or sql_alchemy interface (which most DBs provide).

Dashboard Connectors

ETL Orchestration

Apache Airflow

BI Viz Tool

Apache Superset

Installation

Please visit Installation guideline on how to install Amundsen.

Roadmap

Please visit Roadmap if you are interested in Amundsen upcoming roadmap items.

Blog Posts and Interviews

Amundsen - Lyft's data discovery & metadata engine (April 2019)
Software Engineering Daily podcast on Amundsen (April 2019)
How Lyft Drives Data Discovery (July 2019)
Data Engineering podcast on Solving Data Discovery At Lyft (Aug 2019)
Open Sourcing Amundsen: A Data Discovery And Metadata Platform (Oct 2019)
Adding Data Quality into Amundsen with Programmatic Descriptions by Sam Shuster from Edmunds.com (May 2020)
Facilitating Data discovery with Apache Atlas and Amundsen by Mariusz Górski from ING (June 2020)
Using Amundsen to Support User Privacy via Metadata Collection at Square by Alyssa Ransbury from Square (July 14, 2020)
Amundsen Joins LF AI as New Incubation Project (Aug 11, 2020)
Amundsen: one year later (Oct 6, 2020)

Talks

Disrupting Data Discovery {slides, recording} (Strata SF, March 2019)
Amundsen: A Data Discovery Platform from Lyft {slides} (Data Council SF, April 2019)
Disrupting Data Discovery {slides} (Strata London, May 2019)
ING Data Analytics Platform (Amundsen is mentioned) {slides, recording } (Kubecon Barcelona, May 2019)
Disrupting Data Discovery {slides, recording} (Making Big Data Easy SF, May 2019)
Disrupting Data Discovery {slides, recording} (Neo4j Graph Tour Santa Monica, September 2019)
Disrupting Data Discovery {slides} (IDEAS SoCal AI & Data Science Conference, Oct 2019)
Data Discovery with Amundsen by Gerard Toonstra from Coolblue {slides} and {talk} (BigData Vilnius 2019)
Towards Enterprise Grade Data Discovery and Data Lineage with Apache Atlas and Amundsen by Verdan Mahmood and Marek Wiewiorka from ING {slides, talk} (Big Data Technology Warsaw Summit 2020)
Airflow @ Lyft (which covers how we integrate Airflow and Amundsen) by Tao Feng {slides and website} (Airflow Summit 2020)
Data DAGs with lineage for fun and for profit by Bolke de Bruin {website} (Airflow Summit 2020)

Community meetings

Community meetings are held on the first Thursday of every month at 9 AM Pacific, Noon Eastern, 6 PM Central European Time. Link to join

Upcoming meetings & notes

You can the exact date for the next meeting and the agenda a few weeks before the meeting in this doc.

Notes from all past meetings are available here.

Who uses Amundsen?

Here is the list of organizations that are using Amundsen today. If your organization uses Amundsen, please file a PR and update this list.

Currently officially using Amundsen:

License

Apache 2.0 License.

Comments

Programmatic and Manual pathways for table and column descriptions

Overview

As a data engineer, there are quite a few properties that we can extract programmatically that are currently not supported by amundsen as first class properties in the UI. For some of these properties that are likely widely useful, I can understand creating issues to ingest them so that they appear in the panel on the right. However, some of these properties are very company specific and wouldn't be needed by other companies. Therefore, I think that it would be useful to allow users to ingest structured data without needing to make changes to amundsen infrastructure.

What do we do now? And why it won't work in long run?

Currently, we get around this by programmatically updating the table description with prepared markdown, however in the long run we also want users to be able to edit table and column descriptions through amundsen which will put us in a bind. We no longer would able to update programmatically without a lot of added complexity concerning reconciliation and merging of changes.

Proposal

My proposal is that there are two types of descriptions for tables and columns

One would be programmatic and cannot be modified manually. The other would be the current description. By default, the programmatic description would not appear on the page unless it is populated.

Note about column level

This is true on the column level as well, where we may want to include company specific attributes, but I can understand why column level programmatic descriptions maybe too much to ask for in terms of cluttering the UI.
type:feature status:completed

opened by samshuster 35
feat: Upgrade feast to 0.17
Summary of Changes

Upgrade feast extractor with new Feast architecture

Tests

Update old tests and remove deprecated ones

CheckList

Make sure you have checked all steps below to ensure a timely review.

[x] PR title addresses the issue accurately and concisely. Example: "Updates the version of Flask to v1.0.2"

[x] PR includes a summary of changes.

[x] PR adds unit tests, updates existing unit tests, OR documents why no test additions or modifications are needed.

[x] In case of new functionality, my PR adds documentation that describes how to use it.

All the public functions and the classes in the PR contain docstrings that explain what it does

keep fresh area:common area:databuilder area:all
opened by amommendes 29

OIDC authentication - session not persists issue

I'm trying to implement Google OIDC authentication on an existing amundsen. All the micro services (search, metadata, frontend) are deployed with helm on AWS EKS and, without authentication, it works. But when I enable OIDC, as you'll see below, session information does not persist and frontend starts showing a "Something went wrong..." message.

Here's the related logs from frontend. (sensitive info replaced with arbitrary number of *'s + I put the 'test_data' in session for debugging purposes)

2021-08-13T17:23:20+09:00 /usr/local/lib/python3.7/site-packages/flask/json/__init__.py:179: DeprecationWarning: Importing 'itsdangerous.json' is deprecated and will be removed in ItsDangerous 2.1. Use Python's 'json' module instead.
2021-08-13T17:23:20+09:00   rv = _json.dumps(obj, **kwargs)
2021-08-13T17:23:20+09:00 /usr/local/lib/python3.7/site-packages/flask/json/__init__.py:205: DeprecationWarning: Importing 'itsdangerous.json' is deprecated and will be removed in ItsDangerous 2.1. Use Python's 'json' module instead.
2021-08-13T17:23:20+09:00   return _json.loads(s, **kwargs)
2021-08-13T17:23:20+09:00 2021-08-13T08:23:20+0000.168 [DEBUG] __init__._before_request:23 (11:MainThread) - Whitelisted Endpoint: status,healthcheck,health,logout
2021-08-13T17:23:20+09:00 2021-08-13T08:23:20+0000.169 [ERROR] models._fetch_token:88 (11:MainThread) - Calling _fetch_token(name=google)...
2021-08-13T17:23:20+09:00 2021-08-13T08:23:20+0000.169 [ERROR] models._fetch_token:92 (11:MainThread) - <SecureCookieSession {'test_data': 'test data', 'user': {'__id': '*****@*************.com', 'at_hash': '**************', 'aud': '***********************', 'azp': '*****************', 'display_name': '******************', 'email': '*****************', 'email_verified': True, 'exp': 1628846069, 'family_name': '******', 'given_name': '***********', 'hd': '***************, 'iat': 1628842469, 'iss': 'https://accounts.google.com', 'locale': 'en', 'name': '************', 'nonce': '*************', 'picture': '****************', 'profile_url': '', 'sub': '**********', 'user_id': '*****************'}}>
2021-08-13T17:23:32+09:00 /usr/local/lib/python3.7/site-packages/flask/json/__init__.py:179: DeprecationWarning: Importing 'itsdangerous.json' is deprecated and will be removed in ItsDangerous 2.1. Use Python's 'json' module instead.
2021-08-13T17:23:32+09:00   rv = _json.dumps(obj, **kwargs)
2021-08-13T17:23:32+09:00 /usr/local/lib/python3.7/site-packages/flask/json/__init__.py:179: DeprecationWarning: Importing 'itsdangerous.json' is deprecated and will be removed in ItsDangerous 2.1. Use Python's 'json' module instead.
2021-08-13T17:23:32+09:00   rv = _json.dumps(obj, **kwargs)
2021-08-13T17:23:32+09:00 /usr/local/lib/python3.7/site-packages/flask/json/__init__.py:179: DeprecationWarning: Importing 'itsdangerous.json' is deprecated and will be removed in ItsDangerous 2.1. Use Python's 'json' module instead.
2021-08-13T17:23:32+09:00   rv = _json.dumps(obj, **kwargs)
2021-08-13T17:23:32+09:00 2021-08-13T08:23:32+0000.720 [DEBUG] __init__._before_request:23 (10:MainThread) - Whitelisted Endpoint: status,healthcheck,health,logout
2021-08-13T17:23:32+09:00 2021-08-13T08:23:32+0000.720 [ERROR] models._fetch_token:88 (10:MainThread) - Calling _fetch_token(name=google)...
2021-08-13T17:23:32+09:00 2021-08-13T08:23:32+0000.720 [ERROR] models._fetch_token:92 (10:MainThread) - <SecureCookieSession {'test_data': 'test data'}>
2021-08-13T17:23:32+09:00 2021-08-13T08:23:32+0000.721 [ERROR] models._fetch_token:100 (10:MainThread) - 'user'
2021-08-13T17:23:32+09:00 2021-08-13T08:23:32+0000.721 [ERROR] __init__._before_request:59 (10:MainThread) - User not logged in, redirecting to auth
2021-08-13T17:23:32+09:00 Traceback (most recent call last):
2021-08-13T17:23:32+09:00   File "/usr/local/lib/python3.7/site-packages/flaskoidc/models.py", line 94, in _fetch_token
2021-08-13T17:23:32+09:00     user_id=session["user"]["__id"],
2021-08-13T17:23:32+09:00   File "/usr/local/lib/python3.7/site-packages/flask/sessions.py", line 83, in __getitem__
2021-08-13T17:23:32+09:00     return super(SecureCookieSession, self).__getitem__(key)
2021-08-13T17:23:32+09:00 KeyError: 'user'

My suspicions are:

Google OIDC has different specs, for example, in my setup, OAuth2Token (flaskoidc) saves access token, refresh token and such, instead of name or user_id. So I tweaked the flaskoidc lib to authenticate the current session user using some google apis, but still the error persists.
Metadata also produces suspicious logs. Except the healthcheck signals, it always returns 302. Is it meant to work like this? I'm not sure.

Expected Behavior

meant to work out of the box?

Current Behavior

frontend shows 'something went wrong' messages after login when hooked with google oidc

Possible Solution

Steps to Reproduce

deploy amundsen using the helm chart in gith repo using frontend-oidc: 3.11.1, metadata-oidc: 3.5.0, search:2.5.1 with flask OIDC env variables for Google oidc

opened by woodchuck1206 28

Python setup.py egg_info did not run successfully.
I'm trying to install Amundsen on docker running on Windows 10. I'm getting an error while run a docker-compose using atlas.

Expected Behavior

Install success

Current Behavior

Getting erro when execute: docker-compose -f docker-amundsen-atlas.yml up

Steps to Reproduce

Clone the repo

Execute: docker-compose -f docker-amundsen-atlas.yml up (get an error in this step)

Error: error: subprocess-exited-with-error

× python setup.py egg_info did not run successfully. │ exit code: 1 ╰─> [10 lines of output] /bin/sh: 1: npm: not found ERROR:root:npm must be available /bin/sh: 1: npm: not found /app/setup.py:30: DeprecationWarning: The 'warn' function is deprecated, use 'warning' instead logging.warn('Installation of npm dependencies failed') WARNING:root:Installation of npm dependencies failed /app/setup.py:31: DeprecationWarning: The 'warn' function is deprecated, use 'warning' instead logging.warn(str(e)) WARNING:root:Command '['npm install']' returned non-zero exit status 127. error in amundsen-frontend setup command: 'extras_require' must be a dictionary whose values are strings or lists of strings containing valid project/version requirement specifiers. [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip. error: metadata-generation-failed

× Encountered error while generating package metadata. ╰─> See above for output.

note: This is an issue with the package mentioned above, not pip. hint: See above for details. 1 error occurred: * Status: The command '/bin/sh -c pip3 install -e .' returned a non-zero code: 1, Code: 1

Screenshots (if appropriate)

Context

Your Environment

Amunsen version used: 6.7.4

Data warehouse stores: none yet

Deployment (k8s or native): native

Link to your fork or repository:
opened by FelipeArruda 25
More support for get_user_details
Expected Behavior or Use Case

In the docs, it suggests to give USER_DETAIL_METHOD = get_user_details within the metadata_service config. After defining the function given in the docs:

def get_user_details(user_id): user_info = { 'email': 'email', 'user_id': user_id, 'first_name': 'Firstname', 'last_name': 'Lastname', 'full_name': 'Firstname Lastname', } return user_info

The only info attached to the user is the email and the users are created with names Firstname Lastname. We're able to properly authenticate the user based on their email with Okta, but we are unable to create the user with their proper names.

Possible Implementation

In order for more info to be available for the function, the function header needs to change and accept other arguments like a json of the header. Something like:

def get_user_details(data): user_info = { 'email': data['email'], 'user_id': data['user_id'], 'first_name': data['first_name'], 'last_name': data['last_name'], 'full_name': data['first_name']+" "+data['last_name'], } return user_info

where data is a json object with the request header info.

If this is implemented, then all of the function calls will need to be changed so that the correct info is passed along. We would need to fork the metadata submodule, build our own image and test our changes. After finishing testing, we'd push back upstream.

Context

This is needed to properly create users with their correct names.

I was following #852 but opened a new issue to provide more context and its current state.
opened by alldoami 25

Feature Proposal: Search service ElasticSearch AWS (and potentially other) authentication support

Currently ES Proxy in Search service allows for simple user-password pair, non SSL setup. Whereas, in our usage we would like to use AWS Elasticsearch Service which requires specific ES client initialisation, which currently requires code modification and injection of another Elasticsearch client.

Expected Behavior or Use Case

User can setup in config using env variables:

PROXY_CLIENT=ELASTICSEARCH_AWS
CREDENTIALS_PROXY_USER=aws_access_key
CREDENTIALS_PROXY_PASSWORD=aws_secret_key

Service or Ingestion ETL

amundsen-search service.

Possible Implementation

The change would be mostly to the initialisation of ES Proxy, eg.

# Current ES Proxy would require ES client, without any other changes to the class business logic
class ElasticsearchProxy(BaseProxy):
    """
    ElasticSearch connection handler
    """

    def __init__(self, *,
                 client: Elasticsearch = None,
                 page_size: int = 10
                 ) -> None:
        """
        Constructs Elasticsearch client for interactions with the cluster.
        Allows caller to pass a fully constructed Elasticsearch client, {elasticsearch_client}.

        :param elasticsearch_client: Elasticsearch client to use, if provided
        :param  page_size: Number of search results to return per request
        """
        self.elasticsearch = client

        self.page_size = page_size
...

# Old implementation would only be for creating simple ES client but most of the logic is still in ElasticsearchProxy; 
# can be setup as before by env variable PROXY_CLIENT=ELASTICSEARCH
class SimpleElasticsearchProxy(ElasticsearchProxy):
    """
    ElasticSearch connection handler
    """

    def __init__(self, *,
                 host: str = None,
                 user: str = '',
                 password: str = '',
                 page_size: int = 10
                 ) -> None:
        """
        Constructs simple Elasticsearch client from the parameters provided.

        :param host: Elasticsearch host we should connect to
        :param user: user name to use for authentication
        :param password: user password to use for authentication
        :param  page_size: Number of search results to return per request
        """
        http_auth = (user, password) if user else None
        client = Elasticsearch(host, http_auth=http_auth)

        super().__init__(client=client, page_size=page_size)

# AWS ES Proxy connector can be setup via env variable PROXY_CLIENT=ELASTICSEARCH_AWS
class AwsElasticsearchProxy(ElasticsearchProxy):
    """
    ElasticSearch connection handler
    """

    def __init__(self, *,
                 host: str = None,
                 user: str = '',
                 password: str = '',
                 page_size: int = 10
                 ) -> None:
        """
        Constructs simple Elasticsearch client from the parameters provided.

        :param host: Elasticsearch host we should connect to
        :param user: AWS access key
        :param password: AWS secret key
        :param  page_size: Number of search results to return per request
        """
        region = os.environ.get('AWS_REGION')
        awsauth = AWS4Auth(user, password, region, 'es')

        client = Elasticsearch(
            hosts=[{'host': host, 'port': 443}],
            http_auth=awsauth,
            use_ssl=True,
            verify_certs=True,
            connection_class=RequestsHttpConnection
        )

        super().__init__(client=client, page_size=page_size)

Context

This would allow to have different implementation of ES and other proxy client, which can be selected via configuration and would not require witting new code and manual docker image building.

status:completed area:search

opened by jsnowacki 24

open source amundsen neo4j backup scripts
AC

there will be scripts provided that allow amundsen neo4j data to be backed up (on a schedule) to cloud provider blob storage. aws s3 makes the most sense, and if others need other providers (e.g. azure), then they can provide an extension to this functionality

once these scripts are established, we should extended them to the k8s setup as well

keep fresh
opened by javamonkey79 24
Would like a guide for How-To deploy Amundsen in production
Please add points on what you expect from such a guide in a comment below. I will then try to consolidate input and draft up an outline in this comment.

The guide can end up as ~~/docs/deployment.md~~ is /docs/owners_manual.md better?

Initial outline:

[ ] Basic install of services (in different environments)

[x] Docker-compose “vanilla”, but with Gunicorn (WIP #109) ~~data in volumes etc.~~

[ ] AWS ECS. original PR: https://github.com/lyft/amundsenfrontendlibrary/pull/216 (or EC2): https://github.com/lyft/amundsenfrontendlibrary/issues/186

[x] Kubernetes helm chart install ~~(convert from Compose using https://kompose.io?)~~ (upcoming PR see https://github.com/lyft/amundsen/issues/53#issuecomment-538575978 below)

[ ] Setting up ingest (with or without Airflow, see https://github.com/lyft/amundsen/issues/53#issuecomment-617370073)

Figure out which parts of this belongs with Architecture.md and which in Databuilder repo?

[ ] Compared to Quickstart ingest (https://github.com/lyft/amundsen/issues/75)

[ ] Then mention source by source; Extractor(s), Model, Metadata - Table Metadata: - Users - Table Usage: (How it works and why in https://github.com/lyft/amundsen/issues/381#issuecomment-613387814) - ...

[ ] Configuration - custom build of frontend (to not have to maintain a fork we need to get https://github.com/lyft/amundsen/issues/408 transmogrified into proper documentation/tooling)

[x] Small tweaks to turn on/off features, adding logo etc. (mostly Done) https://github.com/lyft/amundsenfrontendlibrary/commit/c256115f7d64da121de4ea36ea9c55592c11f9d5 in PR https://github.com/lyft/amundsenfrontendlibrary/pull/255

[x] Config of email notification/feedback Done in PR https://github.com/lyft/amundsenfrontendlibrary/pull/291

[x] Data preview (integration to SuperSet) - https://github.com/lyft/amundsen/issues/27#issuecomment-517477074 has some draft contextual lead in and reasoning and a link to example setup. But ultimately what ticks off the box for this is Taos Guide in https://github.com/lyft/amundsen/blob/master/docs/tutorials/data-preview-with-superset.md (or on the https://lyft.github.io/amundsen/ site, search for SuperSet!)

[ ] Security

[ ] Auth - passwords etc.

[ ] secure communication

[ ] production grade docker as per Production-ready Docker images (via https://www.youtube.com/watch?v=cDzFm68aMao)

[x] Backup - initial WiP in https://github.com/lyft/amundsen/issues/53#issuecomment-516159598 below ... current result in https://github.com/lyft/amundsen/issues/381#issuecomment-614534794 - and restore (on K8s) implemented in https://github.com/lyft/amundsen/pull/394

[ ] Monitoring (statsd etc.?)

[ ] Handling upgrades

[ ] ....

type:documentation status:needs_votes area:all
opened by jornh 23

Neo4jCsvPublisher Speed Optimization (Parallelism)

Hi Team, I’m wondering if there’s a plan to apply multiprocessing on the publishers. We have a large amount of metadata in our production, which ended up running 3 million queries on neo4j . It takes about 90 minutes to finish.

To investigate the bottleneck, I looked into the code and logged the time elapsed for each step in a single iteration in the _publish_node function. This is the result

Neo4j query: 0.1ms
Create statement: 1ms
Others: super fast, doesn’t matter

Surprisingly, the bottleneck is not the db query, it’s the statement creation. The process is basically

loop each row in csv
parse the row into a dictionary
loop through each key value pair in the dictionary to get the props
fill the statement Jinjia template with the props
execute the query with the statement

I’m thinking that instead of read a row => create a node in graph db one by one, maybe we could use multiprocessing to speed up the process. I believe there will be no dependency issue as long as we publish all the nodes before publishing relations, which is already handled in the current codebase. I’m planning on implementing multiprocessing for this, is there any potential problem? Like dependency, graph db load, etc..

Expected Behavior or Use Case

Speed up the performance of the publisher. Currently, a 90 min sync is not acceptable for our use case 😢

Service or Ingestion ETL

Ingestion ETL, publisher

Possible Implementation

Thanks to @dkunitsk 's idea, I think there are three possible implementations

Multiprocessing on call side
Multiprocessing on Neo4j publisher
Neo4j UNWIND (Batch processing)

class HiveParallelIndexer:
    # Shim for adding all node labels to the NEO4J_DEADLOCK_NODE_LABELS config
    # which enables retries for those node labels. This is important for parallel writing
    # since we see intermittent Neo4j deadlock errors relatively often.
    class ContainsAllList(list):
        def __contains__(self, item):
            return True

    def __init__(self, publish_tag: str, parallelism: int):
        self.publish_tag = publish_tag
        self.parallelism = parallelism

    def __call__(self, worker_index: int):
        # Sharding:
        #   - take the md5 hash of the schema.table_name
        #   - convert the first 3 characters of the hash to decimal (3 chosen arbitrarily)
        #   - mod by total number of processes
        where_clause_suffix = """
            WHERE MOD(CONV(LEFT(MD5(CONCAT(d.NAME, '.', t.TBL_NAME)), 3), 16, 10), {total_parallelism}) = {worker_index}
            AND t.TBL_TYPE IN ('EXTERNAL_TABLE', 'MANAGED_TABLE', 'VIRTUAL_VIEW')
            AND (t.VIEW_EXPANDED_TEXT != '/* Presto View */' OR t.VIEW_EXPANDED_TEXT is NULL)
        """.format(total_parallelism=self.parallelism,
            worker_index=worker_index)

        # configs relevant for multiprocessing
        job_config = ConfigFactory.from_dict({
            'extractor.hive_table_metadata.{}'.format(HiveTableMetadataExtractor.WHERE_CLAUSE_SUFFIX_KEY):
                where_clause_suffix,
            # keeping this relatively low, in our experience, reduces neo4j deadlocks
            'publisher.neo4j.{}'.format(neo4j_csv_publisher.NEO4J_TRANSACTION_SIZE):
                100,
            'publisher.neo4j.{}'.format(neo4j_csv_publisher.NEO4J_DEADLOCK_NODE_LABELS):
                HiveParallelIndexer.ContainsAllList(),
        })
        job = DefaultJob(conf=job_config,
                         task=DefaultTask(
                             extractor=HiveTableMetadataExtractor(),
                             loader=FsNeo4jCSVLoader()),
                         publisher=Neo4jCsvPublisher())
        job.launch()


parallelism = 16
indexer = HiveParallelIndexer(
    publish_tag='2021-12-03'
    parallelism=parallelism)

with multiprocessing.Pool(processes=parallelism) as pool:
    def callback(_):
        # fast fail in case of exception in any process
        print('terminating due to exception')
        pool.terminate()
    res = pool.map_async(indexer, [i for i in range(parallelism)], error_callback=callback)
    res.get()

Screenshots of Slack Discussion

type:feature status:needs_votes area:databuilder

opened by chonyy 22

Feature Proposal: Editable Custom Table Attributes
Users of Amundsen occasionally want to display table level attributes on the table detail page that are specific to their business or data source. Unlike programmatic descriptions, they also want to be able to edit this information directly in the UI as these attributes are typically human-generated.

Some examples include:

retention policy

data usage policy

Expected Behavior or Use Case

Users can display and edit additional custom attributes in the table detail page.

Service or Ingestion ETL

frontend and metadata services

Possible Implementation

Define additional custom table attributes via configuration in the frontend. The custom table attributes are then displayed in the table detail page using the EditableText component. Custom attributes are persisted to the graph using the metadata service and a new PATCH table API endpoint.

Example Screenshots (if appropriate):

Context

This would allow users to add human-generated business-specific metadata to Amundsen and maintain it directly in the UI.
opened by jkulzick 22
Recieve error "Something went wrong..." when upload data from PostgreSQL database
I have uploaded information about my postgresql tables. When I try to see Information about tables I see error "Something went wrong... "

Expected Behavior

I want to see metadata information about tables.

Current Behavior

I can see that upload was succesfull because information about tables availible in neo4j store.

I have used docker-amundsen.yml script for deloyment. In amundsenfrontend container logs recieve this error

2021-03-14T13:11:33+0000.463 [ERROR] v0._get_table_metadata:149 (11:MainThread) - Encountered exception: {'columns': {0: {'col_type': ['Field may not be null.']}, 1: {'col_type': ['Field may not be null.']}}} Traceback (most recent call last): File "/app/amundsen_application/api/metadata/v0.py", line 143, in _get_table_metadata results_dict['tableData'] = marshall_table_full(table_data_raw) File "/app/amundsen_application/api/utils/metadata_utils.py", line 109, in marshall_table_full table: Table = schema.load(table_dict).data File "/usr/local/lib/python3.7/site-packages/marshmallow/schema.py", line 588, in load result, errors = self._do_load(data, many, partial=partial, postprocess=True) File "/usr/local/lib/python3.7/site-packages/marshmallow/schema.py", line 711, in _do_load raise exc marshmallow.exceptions.ValidationError: {'columns': {0: {'col_type': ['Field may not be null.']}, 1: {'col_type': ['Field may not be null.']}}} 2021-03-14T13:11:33+0000.463 [DEBUG] action_log_callback.on_post_execution:70 (11:MainThread) - Calling callbacks: [<function logging_action_log at 0x7f26506bf8c0>] 2021-03-14T13:11:33+0000.464 [DEBUG] action_log_callback.logging_action_log:85 (11:MainThread) - logging_action_log: ActionLogParams(command='_get_table_metadata', start_epoch_ms=1615727493322, end_epoch_ms=1615727493463, user='[email protected]', host_name='ee9f995f6b9d', pos_args_json='[]', keyword_args_json='{"table_key": "postgres://ioekgftt.public/demo", "index": "0", "source": "search_results"}', output='{"tableData": {}, "msg": "Encountered exception: {\'columns\': {0: {\'col_type\': [\'Field may not be null.\']}, 1: {\'col_type\': [\'Field may not be null.\']}}}", "status_code": 500}', error=None)

Steps to Reproduce

Deploy amundsen on docker

Use sample_postgres_loader.py to upload data

Your Environment

Amunsen version used: amundsen-frontend:3.1.0, amundsen-search:2.4.1, amundsen-metadata:3.3.0

Data warehouse stores: postgresql

Deployment (k8s or native): native(docker)
opened by Arkronus 21
feat: support table/column lineage for mysql backend
Summary of Changes

The change is related to issue #2072. It is about supporting table/column lineage for mysql backend end to end.

databuilder: updated table_lineage model for table/column lineage record iterator and the corresponding sample_data_loader_mysql.py

metadata_service: updated mysql_proxy.py to add lineage related endpoint.

Tests

Added unit tests in both databuilder and metadata_service for lineage related change.

Documentation

N/A

CheckList

Make sure you have checked all steps below to ensure a timely review.

[X] PR title addresses the issue accurately and concisely. Example: "Updates the version of Flask to v1.0.2"

In case you are adding a dependency, check if the license complies with the ASF 3rd Party License Policy.

[X] PR includes a summary of changes.

[X] PR adds unit tests, updates existing unit tests, OR documents why no test additions or modifications are needed.

[X] In case of new functionality, my PR adds documentation that describes how to use it.

All the public functions and the classes in the PR contain docstrings that explain what it does

area:databuilder area:metadata category:models category:proxy
opened by xuan616 0
feat: Make default depth configurable for table lineage graph view
Summary of Changes

For certain deployments the graph view can get unreadable when it displays nodes at too great a depth. The app hardcodes the table lineage graph depth to 5 currently. This PR makes that value configurable using the config-types pattern that the rest of the front end uses.

There is some discussion about the feature in a previous PR that I botched: https://github.com/amundsen-io/amundsen/pull/2069

Tests

Added a unit test for the function that gets the default graph depth

Documentation

CheckList

Make sure you have checked all steps below to ensure a timely review.

[x] PR title addresses the issue accurately and concisely. Example: "Updates the version of Flask to v1.0.2"

In case you are adding a dependency, check if the license complies with the ASF 3rd Party License Policy.

[x] PR includes a summary of changes.

[x] PR adds unit tests, updates existing unit tests, OR documents why no test additions or modifications are needed.

[ ] In case of new functionality, my PR adds documentation that describes how to use it.

All the public functions and the classes in the PR contain docstrings that explain what it does

status:completed area:frontend category:ui
opened by jsnb-devoted 10
Changed Ports for Gunicorn Command
fix : gunicorn start command contains same port in all three places. So changed into three different ports which we configured in ports section.

Summary of Changes

Changed into three different ports

Tests

Documentation

CheckList

Make sure you have checked all steps below to ensure a timely review.

[x] PR title addresses the issue accurately and concisely. Example: "Updates the version of Flask to v1.0.2"

In case you are adding a dependency, check if the license complies with the ASF 3rd Party License Policy.

[x] PR includes a summary of changes.

[x] PR adds unit tests, updates existing unit tests, OR documents why no test additions or modifications are needed.

[x] In case of new functionality, my PR adds documentation that describes how to use it.

All the public functions and the classes in the PR contain docstrings that explain what it does

area:dev-tools
opened by cppandi 2
Updated MSSQL databuilder connection string
Summary of Changes

updating the mssql databuilder connection string, since existing way of writng sql connection string with sqlalchemy is not working with the latest version of sql server, After spending few hours i found this way to build the connection string for mssql. I have mainly tested it with Azure MSSQL with SQL Server authentication and able to index the schema into neo4j.

Changes made in this file https://github.com/amundsen-io/amundsen/blob/main/databuilder/example/scripts/sample_mssql_metadata.py

Summary of Changes

Tests

Documentation

CheckList

Make sure you have checked all steps below to ensure a timely review.

[ ] PR title addresses the issue accurately and concisely. Example: "Updates the version of Flask to v1.0.2"

In case you are adding a dependency, check if the license complies with the ASF 3rd Party License Policy.

[ ] PR includes a summary of changes.

[ ] PR adds unit tests, updates existing unit tests, OR documents why no test additions or modifications are needed.

[ ] In case of new functionality, my PR adds documentation that describes how to use it.

All the public functions and the classes in the PR contain docstrings that explain what it does

status:in_progress area:databuilder
opened by akumarseth 2

fix: issue 2066 - updated sample MySQL data loader file.

fixes: #2066

Summary of Changes

Changes made to: https://github.com/amundsen-io/amundsen/blob/main/databuilder/example/scripts/sample_mysql_loader.py

import pymysql
pymysql.install_as_MySQLdb()

def connection_string():
    user = 'root'
    password='root'
    host = 'localhost'
    port = '3307'
    db = 'test_db'
    return "mysql+pymysql://%s:%s@%s:%s/%s" % (user,password, host, port, db)

job_config = ConfigFactory.from_dict({
        f'extractor.mysql_metadata.{MysqlMetadataExtractor.WHERE_CLAUSE_SUFFIX_KEY}': where_clause_suffix,
        f'extractor.mysql_metadata.{MysqlMetadataExtractor.USE_CATALOG_AS_CLUSTER_NAME}': True,
        f'extractor.mysql_metadata.extractor.sqlalchemy.{SQLAlchemyExtractor.CONN_STRING}': connection_string(),
        f'loader.filesystem_csv_neo4j.{FsNeo4jCSVLoader.NODE_DIR_PATH}': node_files_folder,
        f'loader.filesystem_csv_neo4j.{FsNeo4jCSVLoader.RELATION_DIR_PATH}': relationship_files_folder,
        f'publisher.neo4j.{neo4j_csv_publisher.NODE_FILES_DIR}': node_files_folder,
        f'publisher.neo4j.{neo4j_csv_publisher.RELATION_FILES_DIR}': relationship_files_folder,
        f'publisher.neo4j.{neo4j_csv_publisher.NEO4J_END_POINT_KEY}': neo4j_endpoint,
        f'publisher.neo4j.{neo4j_csv_publisher.NEO4J_USER}': neo4j_user,
        f'publisher.neo4j.{neo4j_csv_publisher.NEO4J_PASSWORD}': neo4j_password,
        f'publisher.neo4j.{neo4j_csv_publisher.NEO4J_ENCRYPTED}': False,
        f'publisher.neo4j.{neo4j_csv_publisher.JOB_PUBLISH_TAG}': 'unique_tag',  # should use unique tag here like {ds}
    })

Changes made to : https://github.com/amundsen-io/amundsen/blob/main/databuilder/setup.py

requirements_path = os.path.join(os.path.dirname(os.path.realpath(__file__)),
                                 '../requirements.txt')

CheckList

Make sure you have checked all steps below to ensure a timely review.

[x] PR title addresses the issue accurately and concisely. Example: "Updates the version of Flask to v1.0.2"
- In case you are adding a dependency, check if the license complies with the ASF 3rd Party License Policy.
[x] PR includes a summary of changes.
[ ] PR adds unit tests, updates existing unit tests, OR documents why no test additions or modifications are needed.
[ ] In case of new functionality, my PR adds documentation that describes how to use it.
- All the public functions and the classes in the PR contain docstrings that explain what it does

area:databuilder

opened by MalavikaN1 2

Amundsen is unable to import MYSQL data

Expected Behavior

Changed the connection string in https://github.com/amundsen-io/amundsen/blob/main/databuilder/example/scripts/sample_mysql_loader.py to load locally hosted MySQL data into Amundsen . Changes made in above file:

import pymysql
pymysql.install_as_MySQLdb()

def connection_string():
    user = 'root'
    password='root'
    host = 'localhost'
    port = '3307'
    db = 'test_db'
    return "mysql+pymysql://%s:%s@%s:%s/%s" % (user,password, host, port, db)

Current Behavior

While running the python file, I get the following error: ERROR:neo4j:Failed to write data to connection IPv4Address(('127.0.0.1', 7687)) (IPv4Address(('127.0.0.1', 7687))).

I tried loading the sample data by running the https://github.com/amundsen-io/amundsen/blob/main/databuilder/example/scripts/sample_data_loader.py file and it worked.

Possible Solution

fix: Adding the below code to job_config in sample_mysql_loader.py fixed the issue. f'publisher.neo4j.{neo4j_csv_publisher.NEO4J_ENCRYPTED}': False So now the code looks like this:

job_config = ConfigFactory.from_dict({
        f'extractor.mysql_metadata.{MysqlMetadataExtractor.WHERE_CLAUSE_SUFFIX_KEY}': where_clause_suffix,
        f'extractor.mysql_metadata.{MysqlMetadataExtractor.USE_CATALOG_AS_CLUSTER_NAME}': True,
        f'extractor.mysql_metadata.extractor.sqlalchemy.{SQLAlchemyExtractor.CONN_STRING}': connection_string(),
        f'loader.filesystem_csv_neo4j.{FsNeo4jCSVLoader.NODE_DIR_PATH}': node_files_folder,
        f'loader.filesystem_csv_neo4j.{FsNeo4jCSVLoader.RELATION_DIR_PATH}': relationship_files_folder,
        f'publisher.neo4j.{neo4j_csv_publisher.NODE_FILES_DIR}': node_files_folder,
        f'publisher.neo4j.{neo4j_csv_publisher.RELATION_FILES_DIR}': relationship_files_folder,
        f'publisher.neo4j.{neo4j_csv_publisher.NEO4J_END_POINT_KEY}': neo4j_endpoint,
        f'publisher.neo4j.{neo4j_csv_publisher.NEO4J_USER}': neo4j_user,
        f'publisher.neo4j.{neo4j_csv_publisher.NEO4J_PASSWORD}': neo4j_password,
        f'publisher.neo4j.{neo4j_csv_publisher.NEO4J_ENCRYPTED}': False,
        f'publisher.neo4j.{neo4j_csv_publisher.JOB_PUBLISH_TAG}': 'unique_tag',  # should use unique tag here like {ds}
    })

Your Environment

Amundsen databuilder version used: 7.4.3
Deployment (k8s or native): native
Link to your fork or repository: (https://github.com/MalavikaN1/amundsen)

type:bug type:question status:needs_triage area:databuilder

opened by MalavikaN1 2

Releases(databuilder-7.4.3)

databuilder-7.4.3(Dec 16, 2022)
What's Changed

feat: add configurable message to lineage tabs by @allisonsuarez in https://github.com/amundsen-io/amundsen/pull/2038

feat: tweaks styling of Alerts by @Golodhros in https://github.com/amundsen-io/amundsen/pull/2043

fix: add parenthesis to upstream tab title by @allisonsuarez in https://github.com/amundsen-io/amundsen/pull/2046

feat: extends resource notices to support extra information by @Golodhros in https://github.com/amundsen-io/amundsen/pull/2045

refactor - move homepage components in preparation implementation of configurable widgets. by @B-T-D in https://github.com/amundsen-io/amundsen/pull/2041

Fix: Corrected position of arguments in _par by @loojovi in https://github.com/amundsen-io/amundsen/pull/2037

Fixes UI crashing on "search page" if we multiple filters with the same category are added (issue #2053) by @mikaalanwar in https://github.com/amundsen-io/amundsen/pull/2057

Chore: Bump databuilder version to 7.4.3 by @sahithi03 in https://github.com/amundsen-io/amundsen/pull/2056

New Contributors

@loojovi made their first contribution in https://github.com/amundsen-io/amundsen/pull/2037

Full Changelog: https://github.com/amundsen-io/amundsen/compare/common-0.30.0...databuilder-7.4.3
Source code(tar.gz)
Source code(zip)
common-0.30.0(Nov 29, 2022)
What's Changed

fix: better styling for disabled items by @allisonsuarez in https://github.com/amundsen-io/amundsen/pull/2014

fix: adds loading spinners to table lineage tabs by @Golodhros in https://github.com/amundsen-io/amundsen/pull/2016

fix: fixes storybook installation by @Golodhros in https://github.com/amundsen-io/amundsen/pull/2017

fix: fixes Collapse text button overlapping lineage tabs by @Golodhros in https://github.com/amundsen-io/amundsen/pull/2019

fix: fixes cached lineage list content by @Golodhros in https://github.com/amundsen-io/amundsen/pull/2020

fix: fixes scrolling issue after tab changes by @Golodhros in https://github.com/amundsen-io/amundsen/pull/2021

chore: updates logging to cover tour and feedback widget by @Golodhros in https://github.com/amundsen-io/amundsen/pull/2024

chore: support amundsen-rds == 0.0.7 for mysql_proxy in metadata_service by @xuan616 in https://github.com/amundsen-io/amundsen/pull/2022

fix: Use isColumnLineagePageEnabled by @ozandogrultan in https://github.com/amundsen-io/amundsen/pull/2012

chore:feedback cleanup by @Golodhros in https://github.com/amundsen-io/amundsen/pull/2027

Fix: Change default value for 'description' in BigQuery_metadata_extractor results by @sahithi03 in https://github.com/amundsen-io/amundsen/pull/2034

feat: Add lineage item counts to lineage response by @allisonsuarez in https://github.com/amundsen-io/amundsen/pull/2039

Full Changelog: https://github.com/amundsen-io/amundsen/compare/common-0.29.0...common-0.30.0
Source code(tar.gz)
Source code(zip)
common-0.29.0(Oct 18, 2022)
What's Changed

feat: added optional in_amundsen bool to lineage items by @allisonsuarez in https://github.com/amundsen-io/amundsen/pull/2010

Full Changelog: https://github.com/amundsen-io/amundsen/compare/common-0.28.0...common-0.29.0
Source code(tar.gz)
Source code(zip)
common-0.28.0(Oct 12, 2022)
What's Changed

fix: support capitalized table names by @allisonsuarez in https://github.com/amundsen-io/amundsen/pull/2004

feat: use different internal link than the table details page on lineage by @allisonsuarez in https://github.com/amundsen-io/amundsen/pull/2006

Full Changelog: https://github.com/amundsen-io/amundsen/compare/databuilder-7.4.2...common-0.28.0
Source code(tar.gz)
Source code(zip)
databuilder-7.4.2(Oct 3, 2022)
What's Changed

chore--update amundsen-rds version in databuilder requirements.txt by @B-T-D in https://github.com/amundsen-io/amundsen/pull/2001

Full Changelog: https://github.com/amundsen-io/amundsen/compare/databuilder-7.4.1...databuilder-7.4.2
Source code(tar.gz)
Source code(zip)
databuilder-7.4.1(Sep 29, 2022)
What's Changed

Fix toggle filter and styling; update tests by @B-T-D in https://github.com/amundsen-io/amundsen/pull/1995

docs: Fix typo on BigQuery's instructions by @LieAlbertTriAdrian in https://github.com/amundsen-io/amundsen/pull/1997

chore: remove sqlalchemy dependency upper bound by @allisonsuarez in https://github.com/amundsen-io/amundsen/pull/2000

New Contributors

@B-T-D made their first contribution in https://github.com/amundsen-io/amundsen/pull/1995

@LieAlbertTriAdrian made their first contribution in https://github.com/amundsen-io/amundsen/pull/1997

Full Changelog: https://github.com/amundsen-io/amundsen/compare/databuilder-7.4.0...databuilder-7.4.1
Source code(tar.gz)
Source code(zip)
databuilder-7.4.0(Sep 19, 2022)
What's Changed

fix: more logging for badges by @allisonsuarez in https://github.com/amundsen-io/amundsen/pull/1991

feat: Add configurable prop types to neo4j csv publisher by @kristenarmes in https://github.com/amundsen-io/amundsen/pull/1993

Full Changelog: https://github.com/amundsen-io/amundsen/compare/databuilder-7.3.0...databuilder-7.4.0
Source code(tar.gz)
Source code(zip)
databuilder-7.3.0(Sep 13, 2022)
What's Changed

fix: no reason to raise a 404 when a user has no bookmarks or reads by @allisonsuarez in https://github.com/amundsen-io/amundsen/pull/1964

fix: Dashboard User relationships raised 404s also by @allisonsuarez in https://github.com/amundsen-io/amundsen/pull/1965

Feat/kafka schema registry integration by @farbodahm in https://github.com/amundsen-io/amundsen/pull/1959

Get the first item from the healthcheck response by @allisonsuarez in https://github.com/amundsen-io/amundsen/pull/1968

feat: Extend Lineage list view configuration by @ozandogrultan in https://github.com/amundsen-io/amundsen/pull/1961

chore: Update 'Who uses Amundsen?' by @ozandogrultan in https://github.com/amundsen-io/amundsen/pull/1973

fix: Fix hideNonClickableBadges configuration by @ozandogrultan in https://github.com/amundsen-io/amundsen/pull/1974

fix: Correct sharded table prefix extraction in Bigquery Usage Extractor by @sahithi03 in https://github.com/amundsen-io/amundsen/pull/1980

feat-use-retryable-query-executor by @Owen-LCH in https://github.com/amundsen-io/amundsen/pull/1941

fix: add f to search filter logging by @allisonsuarez in https://github.com/amundsen-io/amundsen/pull/1984

feat: Allow null values set for empty props in neo4j unwind publisher and multiple rels between nodes by @kristenarmes in https://github.com/amundsen-io/amundsen/pull/1983

chore: Bump databuilder version to 7.3.0 by @kristenarmes in https://github.com/amundsen-io/amundsen/pull/1985

New Contributors

@farbodahm made their first contribution in https://github.com/amundsen-io/amundsen/pull/1959

Full Changelog: https://github.com/amundsen-io/amundsen/compare/databuilder-7.2.1...databuilder-7.3.0
Source code(tar.gz)
Source code(zip)
databuilder-7.2.1(Aug 17, 2022)
What's Changed

feat: Neo4j driver 4.4.5 on metadata by @allisonsuarez in https://github.com/amundsen-io/amundsen/pull/1952

fix: For new publisher fix error handling for already created constraints/indices by @kristenarmes in https://github.com/amundsen-io/amundsen/pull/1963

Full Changelog: https://github.com/amundsen-io/amundsen/compare/databuilder-7.2.0...databuilder-7.2.1
Source code(tar.gz)
Source code(zip)
databuilder-7.2.0(Aug 16, 2022)
What's Changed

fix: Add postgres compatibility in HiveTableLastUpdatedExtractor by @chonyy in https://github.com/amundsen-io/amundsen/pull/1879

fix: Add postgres compatibility in PrestoViewMetadataExtractor by @chonyy in https://github.com/amundsen-io/amundsen/pull/1878

fix: Fix handling of big ints in preview data by @kristenarmes in https://github.com/amundsen-io/amundsen/pull/1950

fix: Fix column description overflow-y value by @ozandogrultan in https://github.com/amundsen-io/amundsen/pull/1937

feat: Add config to hide non-clickable badges by @ozandogrultan in https://github.com/amundsen-io/amundsen/pull/1943

fix: Don't retrieve column lineage when it is not enabled by @kristenarmes in https://github.com/amundsen-io/amundsen/pull/1956

fix: Fix links in Announcements by @ozandogrultan in https://github.com/amundsen-io/amundsen/pull/1934

feat: Add optional configuration to disable Lineage list view links by @ozandogrultan in https://github.com/amundsen-io/amundsen/pull/1958

perf: New neo4j csv publisher to improve performance using batched params by @kristenarmes in https://github.com/amundsen-io/amundsen/pull/1957

Full Changelog: https://github.com/amundsen-io/amundsen/compare/databuilder-7.1.2...databuilder-7.2.0
Source code(tar.gz)
Source code(zip)
databuilder-7.1.2(Aug 1, 2022)
What's Changed

fix: session db name by @allisonsuarez in https://github.com/amundsen-io/amundsen/pull/1948

Full Changelog: https://github.com/amundsen-io/amundsen/compare/databuilder-7.1.1...databuilder-7.1.2
Source code(tar.gz)
Source code(zip)
databuilder-7.1.1(Jul 28, 2022)
What's Changed

fix: driver object pickle error by @allisonsuarez in https://github.com/amundsen-io/amundsen/pull/1944

Full Changelog: https://github.com/amundsen-io/amundsen/compare/databuilder-7.1.0...databuilder-7.1.1
Source code(tar.gz)
Source code(zip)
databuilder-7.1.0(Jul 27, 2022)
What's Changed

feat: Exclude stats icon if configured stat types are the only ones present by @kristenarmes in https://github.com/amundsen-io/amundsen/pull/1939

fix: Show the column in the center of the table when navigating to a column link by @kristenarmes in https://github.com/amundsen-io/amundsen/pull/1940

feat: Neo4j 4.x support by @allisonsuarez in https://github.com/amundsen-io/amundsen/pull/1942

Full Changelog: https://github.com/amundsen-io/amundsen/compare/databuilder-7.0.0...databuilder-7.1.0
Source code(tar.gz)
Source code(zip)
databuilder-7.0.0(Jul 21, 2022)
THIS RELEASE IS BACKWARDS INCOMPATIBLE FOR ANYONE USING NEO4J DB < 3.5

What's Changed

chore: migrate databuilder to neo4j-driver 4.4.5 by @allisonsuarez in https://github.com/amundsen-io/amundsen/pull/1938

Full Changelog: https://github.com/amundsen-io/amundsen/compare/metadata-3.11.0...databuilder-7.0.0
Source code(tar.gz)
Source code(zip)
metadata-3.11.0(Jul 12, 2022)
What's Changed

feat: add get_dashbaord support for neptune by @Owen-LCH in https://github.com/amundsen-io/amundsen/pull/1927

fix: Fix nested UI for eventbridge metadata by @kahrabian in https://github.com/amundsen-io/amundsen/pull/1912

Full Changelog: https://github.com/amundsen-io/amundsen/compare/databuilder-6.12.0...metadata-3.11.0
Source code(tar.gz)
Source code(zip)
databuilder-6.12.0(Jul 7, 2022)
What's Changed

fix: Change TabsComponent styling to only be sticky in certain cases by @kristenarmes in https://github.com/amundsen-io/amundsen/pull/1925

feat: Adding new Trino type parser and other type metadata updates by @kristenarmes in https://github.com/amundsen-io/amundsen/pull/1917

Full Changelog: https://github.com/amundsen-io/amundsen/compare/databuilder-6.11.1...databuilder-6.12.0
Source code(tar.gz)
Source code(zip)
databuilder-6.11.1(Jul 6, 2022)
What's Changed

feat: Sticky TabsComponent and Table headers by @kristenarmes in https://github.com/amundsen-io/amundsen/pull/1924

feat: add get_lineage support for neptune backend by @Owen-LCH in https://github.com/amundsen-io/amundsen/pull/1915

fix: Update bounds for databuilder google-auth versions by @ozandogrultan in https://github.com/amundsen-io/amundsen/pull/1918

New Contributors

@Owen-LCH made their first contribution in https://github.com/amundsen-io/amundsen/pull/1915

Full Changelog: https://github.com/amundsen-io/amundsen/compare/databuilder-6.11.0...databuilder-6.11.1
Source code(tar.gz)
Source code(zip)
databuilder-6.11.0(Jul 5, 2022)
What's Changed

refactor: Remove all old frontend parsing for nested columns by @kristenarmes in https://github.com/amundsen-io/amundsen/pull/1919

docs: Improves the frontend documentation on announcements by @xfiderek in https://github.com/amundsen-io/amundsen/pull/1921

feat: Extract search results per page into a config variable by @ozandogrultan in https://github.com/amundsen-io/amundsen/pull/1922

feat: added ngram subfield with no stemming on ES mappings by @allisonsuarez in https://github.com/amundsen-io/amundsen/pull/1895

chore: bumped databuilder to 6.11.0 by @allisonsuarez in https://github.com/amundsen-io/amundsen/pull/1923

New Contributors

@xfiderek made their first contribution in https://github.com/amundsen-io/amundsen/pull/1921

Full Changelog: https://github.com/amundsen-io/amundsen/compare/databuilder-6.10.0...databuilder-6.11.0
Source code(tar.gz)
Source code(zip)
databuilder-6.10.0(Jun 30, 2022)

Source code(tar.gz)
Source code(zip)
databuilder-6.9.0(Jun 23, 2022)
What's Changed

feat: added addition fields config to publisher by @allisonsuarez in https://github.com/amundsen-io/amundsen/pull/1898

Full Changelog: https://github.com/amundsen-io/amundsen/compare/frontend-4.2.0...databuilder-6.9.0
Source code(tar.gz)
Source code(zip)
frontend-4.2.0(Jun 22, 2022)
What's Changed

chore: updated company list in readme by @xuan616 in https://github.com/amundsen-io/amundsen/pull/1863

docs: Update docs for Windows workaround to solve databuilder extras_require error. by @alanmcruickshank in https://github.com/amundsen-io/amundsen/pull/1861

refactor: Refactor various column details and add TypeMetadata to TableColumn model by @kristenarmes in https://github.com/amundsen-io/amundsen/pull/1864

feat: Adding nested columns to be displayed in the column dropdowns as rows by @kristenarmes in https://github.com/amundsen-io/amundsen/pull/1865

feat: Allow dangerous html based on config variable by @MrwanBaghdad in https://github.com/amundsen-io/amundsen/pull/1459

chore: updated search readme by @allisonsuarez in https://github.com/amundsen-io/amundsen/pull/1859

feat: Nested columns special type rows and expand by default by @kristenarmes in https://github.com/amundsen-io/amundsen/pull/1872

feat: Use type metadata description get/update apis by @kristenarmes in https://github.com/amundsen-io/amundsen/pull/1876

feat: Search Service Highlighting by @allisonsuarez in https://github.com/amundsen-io/amundsen/pull/1856

feat: search highlighting UI by @allisonsuarez in https://github.com/amundsen-io/amundsen/pull/1850

fix: typos in search proxy impl by @mgorsk1 in https://github.com/amundsen-io/amundsen/pull/1884

fix: Highlighting styling improvement by @allisonsuarez in https://github.com/amundsen-io/amundsen/pull/1887

test: Fix databuilder PR unit test import error by @chonyy in https://github.com/amundsen-io/amundsen/pull/1891

feat: Adding expand all/collapse all functionality for nested columns by @kristenarmes in https://github.com/amundsen-io/amundsen/pull/1888

fix: relax pandas version requirements for databuilder by @henridwyer in https://github.com/amundsen-io/amundsen/pull/1858

feat: Add clickable rows to table detail page and new expand/collapse arrow icons by @kristenarmes in https://github.com/amundsen-io/amundsen/pull/1897

fix: Various fixes to nested columns based on feedback by @kristenarmes in https://github.com/amundsen-io/amundsen/pull/1901

fix: Updating tour feature to store wildcard path instead of the url path by @kristenarmes in https://github.com/amundsen-io/amundsen/pull/1904

fix: Fixing markdown for truncated column descriptions in the table by @kristenarmes in https://github.com/amundsen-io/amundsen/pull/1905

fix: Handle qualified tableau datasources more gracefully by @alanmcruickshank in https://github.com/amundsen-io/amundsen/pull/1869

feat: Amazon EventBridge Extractor by @kahrabian in https://github.com/amundsen-io/amundsen/pull/1881

fix: Fixing column description markdown to handle more multiline cases by @kristenarmes in https://github.com/amundsen-io/amundsen/pull/1906

feat: Enable new nested columns by @kristenarmes in https://github.com/amundsen-io/amundsen/pull/1907

New Contributors

@alanmcruickshank made their first contribution in https://github.com/amundsen-io/amundsen/pull/1861

@chonyy made their first contribution in https://github.com/amundsen-io/amundsen/pull/1891

@henridwyer made their first contribution in https://github.com/amundsen-io/amundsen/pull/1858

@kahrabian made their first contribution in https://github.com/amundsen-io/amundsen/pull/1881

Full Changelog: https://github.com/amundsen-io/amundsen/compare/frontend-4.1.2...frontend-4.2.0
Source code(tar.gz)
Source code(zip)
search-4.0.2(May 16, 2022)
What's Changed

fix: toggle filter should clear when off by @allisonsuarez in https://github.com/amundsen-io/amundsen/pull/1848

refactor: Refactor various column details and add TypeMetadata to TableColumn model by @kristenarmes in https://github.com/amundsen-io/amundsen/pull/1847

fix: fixes tour not resetting on different pages by @Golodhros in https://github.com/amundsen-io/amundsen/pull/1849

fix: better behavior for search filters by @allisonsuarez in https://github.com/amundsen-io/amundsen/pull/1852

fix: fixes state sharing between tours of different pages by @Golodhros in https://github.com/amundsen-io/amundsen/pull/1854

fix: avoid extra load url search call when default filters are applied by @allisonsuarez in https://github.com/amundsen-io/amundsen/pull/1853

feat: enable UI config overrides post build by @rajasekharreddyparvatha in https://github.com/amundsen-io/amundsen/pull/1830

feat: update search service to use new search mappings by @allisonsuarez in https://github.com/amundsen-io/amundsen/pull/1832

fix: fix misaligned source icon by @youngyjd in https://github.com/amundsen-io/amundsen/pull/1855

chore: bump release versions for search and frontend by @allisonsuarez in https://github.com/amundsen-io/amundsen/pull/1862

New Contributors

@rajasekharreddyparvatha made their first contribution in https://github.com/amundsen-io/amundsen/pull/1830

Full Changelog: https://github.com/amundsen-io/amundsen/compare/databuilder-6.8.0...search-4.0.2
Source code(tar.gz)
Source code(zip)
frontend-4.1.2(May 16, 2022)
What's Changed

fix: toggle filter should clear when off by @allisonsuarez in https://github.com/amundsen-io/amundsen/pull/1848

refactor: Refactor various column details and add TypeMetadata to TableColumn model by @kristenarmes in https://github.com/amundsen-io/amundsen/pull/1847

fix: fixes tour not resetting on different pages by @Golodhros in https://github.com/amundsen-io/amundsen/pull/1849

fix: better behavior for search filters by @allisonsuarez in https://github.com/amundsen-io/amundsen/pull/1852

fix: fixes state sharing between tours of different pages by @Golodhros in https://github.com/amundsen-io/amundsen/pull/1854

fix: avoid extra load url search call when default filters are applied by @allisonsuarez in https://github.com/amundsen-io/amundsen/pull/1853

feat: enable UI config overrides post build by @rajasekharreddyparvatha in https://github.com/amundsen-io/amundsen/pull/1830

feat: update search service to use new search mappings by @allisonsuarez in https://github.com/amundsen-io/amundsen/pull/1832

fix: fix misaligned source icon by @youngyjd in https://github.com/amundsen-io/amundsen/pull/1855

chore: bump release versions for search and frontend by @allisonsuarez in https://github.com/amundsen-io/amundsen/pull/1862

New Contributors

@rajasekharreddyparvatha made their first contribution in https://github.com/amundsen-io/amundsen/pull/1830

Full Changelog: https://github.com/amundsen-io/amundsen/compare/databuilder-6.8.0...frontend-4.1.2
Source code(tar.gz)
Source code(zip)
databuilder-6.8.0(May 2, 2022)

Source code(tar.gz)
Source code(zip)
common-0.27.1(Apr 27, 2022)

Source code(tar.gz)
Source code(zip)
common-0.27.0(Apr 26, 2022)

Source code(tar.gz)
Source code(zip)
databuilder-6.7.5(Apr 25, 2022)

Source code(tar.gz)
Source code(zip)
common-0.26.2(Mar 29, 2022)

Fix naming of Type_Metadata resource type
Source code(tar.gz)
Source code(zip)
common-0.26.1(Mar 29, 2022)

Adds TypeMetadata as a possible ResourceType to allow implementing metadata service endpoints like put_resource_description which take a ResourceType.
Source code(tar.gz)
Source code(zip)
databuilder-6.7.4(Mar 18, 2022)

Minor cleanups in TypeMetadata utils.
Source code(tar.gz)
Source code(zip)

Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting with data.

Related tags

Overview

Homepage

Documentation

Requirements

User Interface

Get Involved in the Community

Getting Started

Architecture Overview

Supported Entities

Supported Integrations

Table Connectors

Dashboard Connectors

ETL Orchestration

BI Viz Tool

Installation

Roadmap

Blog Posts and Interviews

Talks

Related Articles

Community meetings

Upcoming meetings & notes

Who uses Amundsen?

License

Comments

Overview

What do we do now? And why it won't work in long run?

Proposal

Note about column level

Summary of Changes

Tests

CheckList

Expected Behavior

Current Behavior

Possible Solution

Steps to Reproduce

Expected Behavior

Current Behavior

Steps to Reproduce

Screenshots (if appropriate)

Context

Your Environment

Expected Behavior or Use Case

Possible Implementation

Context

Expected Behavior or Use Case

Service or Ingestion ETL

Possible Implementation

Context

AC

Expected Behavior or Use Case

Service or Ingestion ETL

Possible Implementation

Screenshots of Slack Discussion

Expected Behavior or Use Case

Service or Ingestion ETL

Possible Implementation

Example Screenshots (if appropriate):

Context

Expected Behavior

Current Behavior

Steps to Reproduce

Your Environment

Summary of Changes

Tests

Documentation

CheckList

Summary of Changes

Tests

Documentation

CheckList

Summary of Changes

Tests

Documentation

CheckList

Summary of Changes

Tests

Documentation

CheckList