A Critical Assessment of State-of-the-Art in Entity Alignment

This repository contains the source code for the paper

A Critical Assessment of State-of-the-Art in Entity Alignment
Max Berrendorf, Ludwig Wacker, and Evgeniy Faerman
https://arxiv.org/abs/2010.16314

Installation

Setup and activate virtual environment:

python3.8 -m venv ./venv
source ./venv/bin/activate

Install requirements (in this virtual environment):

pip install -U pip
pip install -U -r requirements.txt

In order to run the DGMC scripts, you additionally need to setup its requirements as described in the corresponding GitHub repository's README. We do not include them into requirements.txt, since their installation is a bit more involved, including non-Python dependencies.

Preparation

MLFlow

In order to track results to a MLFlow server, start it first by running

mlflow server

Note: When storing the result for many configurations, we recommend to setup a database backend following the instructions. For the following examples, we assume that the server is running at

TRACKING_URI=http://localhost:5000

OpenEA RDGCN embeddings

Please download the RDGCN embeddings extracted with the OpenEA codebase from here and place them in ~/.kgm/openea_rdgcn_embeddings. They have a file name matching the pattern *_*_15K_V2.pt and require in total around 160MiB storage.

BERT initialization

To generate data for the BERT-based initialization, run

(venv) PYTHONPATH=./src python3 executables/prepare_bert.py

We also provide preprocessed files at this url. If you prefer to use those, please download and place them in ~/.kgm/bert_prepared. They have a file name matching *_bert-base-multilingual-cased_* and require in total around 6.1GiB storage.

Experiments

For all experiments the results are logged to the running MLFlow instance.

Note: The hyperparameter searches takes a significant amount of time (~multiple days), and requires access to GPU(s). You can abort the script at any time, and inspect the current results via the web interface of MLFlow.

Zero-Shot

For the zero-shot evaluation run

(venv) PYTHONPATH=./src python3 executables/zero_shot.py --tracking_uri=${TRACKING_URI}

GCN-Align

To run the hyperparameter search run

(venv) PYTHONPATH=./src python3 executables/tune_gcn_align.py --tracking_uri=${TRACKING_URI}

RDGCN

To run the hyperparameter search run

(venv) PYTHONPATH=./src python3 executables/tune_rdgcn.py --tracking_uri=${TRACKING_URI}

DGMC

To run the hyperparameter search run

(venv) PYTHONPATH=./src python3 executables/tune_dgmc.py  --tracking_uri=${TRACKING_URI}

Evaluation

To summarize the dataset statistics run

(venv) PYTHONPATH=./src python3 executables/summarize.py --target datasets --force

To summarize all experiments run

(venv) PYTHONPATH=./src python3 executables/summarize.py --target results --tracking_uri=${TRACKING_URI} --force

To generate the ablation study table run

(venv) PYTHONPATH=./src python3 executables/summarize.py --target ablation --tracking_uri=${TRACKING_URI} --force

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
executables		executables
src		src
tests		tests
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
test_requirements.txt		test_requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

executables

executables

src

src

tests

tests

LICENSE

LICENSE

README.md

README.md

requirements.txt

requirements.txt

test_requirements.txt

test_requirements.txt

Repository files navigation

A Critical Assessment of State-of-the-Art in Entity Alignment

Installation

Preparation

MLFlow

OpenEA RDGCN embeddings

BERT initialization

Experiments

Zero-Shot

GCN-Align

RDGCN

DGMC

Evaluation

About

Releases 3

Packages

Languages

License

mberr/ea-sota-comparison

Folders and files

Latest commit

History

Repository files navigation

A Critical Assessment of State-of-the-Art in Entity Alignment

Installation

Preparation

MLFlow

OpenEA RDGCN embeddings

BERT initialization

Experiments

Zero-Shot

GCN-Align

RDGCN

DGMC

Evaluation

About

Topics

Resources

License

Stars

Watchers

Forks

Languages