Grounding Representation Similarity with Statistical Testing

Overview

Grounding Representation Similarity with Statistical Testing

This repo contains code to replicate the results in our paper, which evaluates representation similarity measures with a series of benchmark tasks. The experiments in the paper require first computing neural network embeddings of a dataset and computing accuracy scores of that neural network, which we provide pre-computed. This repo contains the code that implements our benchmark evaluation, given these embeddings and performance scores.

File descriptions

This repo: sim_metric

This repo is organized as follows:

  • experiments/ contains code to run the experiments in part 4 of the paper:
    • layer_exp is the first experiment in part 4, with different random seeds and layer depths
    • pca_deletion is the second experiment in part 4, with different numbers of principal components deleted
    • feather is the first experiment in part 4.1, with different finetuning seeds
    • pretrain_finetune is the second experiment in part 4.2, with different pretraining and finetuning seeds
  • dists/ contains functions to compute dissimilarities between representations.

Pre-computed resources: sim_metric_resources

The pre-computed embeddings and scores available at https://zenodo.org/record/5117844 can be downloaded and unzipped into a folder titled sim_metric_resources, which is organized as follows:

  • embeddings contains the embeddings between which we are computing dissimilarities
  • dists contains, for every experiment, the dissimilarities between the corresponding embeddings, for every metric:
    • dists.csv contains the precomputed dissimilarities
    • dists_self_computed.csv contains the dissimilarities computed by running compute_dists.py (see below)
  • scores contains, for every experiment, the accuracy scores of the embeddings
  • full_dfs contains, for every experiment, a csv file aggregating the dissimilarities and accuracy differences between the embeddings

Instructions

  • clone this repository
  • go to https://zenodo.org/record/5117844 and download sim_metric_resources.tar
  • untar it with tar -xvf sim_metric_resources sim_metric_resources.tar
  • in sim_metric/paths.py, modify the path to sim_metric_resources

Replicating the results

For every experiment (eg feather, pretrain_finetune, layer_exp, or pca_deletion):

  • the relevant dissimilarities and accuracies differences have already been precomputed and aggregated in a dataframe full_df
  • make sure that dists_path and full_df_path in compute_full_df.py, script.py and notebook.ipynb are set to dists.csv and full_df.csv, and not dists_self_computed.csv and full_df_self_computed.csv.
  • to get the results, you can:
    • run the notebook notebook.ipynb, or
    • run script.py in the experiment's folder, and find the results in results.txt, in the same folder To run the scripts for all four experiments, run experiments/script.py.

Recomputing dissimilarities

For every experiment, you can:

  • recompute the dissimilarities between embeddings by running compute_dists.py in this experiment's folder
  • use these and the accuracy scores to recompute the aggregate dataframe by running compute_full_df.py in this experiment's folder
  • change dists_path and full_df_path in compute_full_df.py, script.py and notebook.ipynb from dists.csv and full_df.csv to dists_self_computed.csv and full_df_self_computed.csv
  • run the experiments with script.py or notebook.ipynb as above.

Adding a new metric

This repo also allows you to test a new representational similarity metric and see how it compares according to our benchmark. To add a new metric:

  • add the corresponding function at the end of dists/scoring.py
  • add a condition in dists/score_pair.py, around line 160
  • for every experiment in experiments, add the name of the metric to the metrics list in compute_dists.py
The Codebase for Causal Distillation for Language Models.

Causal Distillation for Language Models Zhengxuan Wu*,Atticus Geiger*, Josh Rozner, Elisa Kreiss, Hanson Lu, Thomas Icard, Christopher Potts, Noah D.

Zen 20 Dec 31, 2022
MicroNet: Improving Image Recognition with Extremely Low FLOPs (ICCV 2021)

MicroNet: Improving Image Recognition with Extremely Low FLOPs (ICCV 2021) A pytorch implementation of MicroNet. If you use this code in your research

Yunsheng Li 293 Dec 28, 2022
img2pose: Face Alignment and Detection via 6DoF, Face Pose Estimation

img2pose: Face Alignment and Detection via 6DoF, Face Pose Estimation Figure 1: We estimate the 6DoF rigid transformation of a 3D face (rendered in si

Vítor Albiero 519 Dec 29, 2022
Code to reproduce the results in the paper "Tensor Component Analysis for Interpreting the Latent Space of GANs".

Tensor Component Analysis for Interpreting the Latent Space of GANs [ paper | project page ] Code to reproduce the results in the paper "Tensor Compon

James Oldfield 4 Jun 17, 2022
[NeurIPS 2021] Large Scale Learning on Non-Homophilous Graphs: New Benchmarks and Strong Simple Methods

Large Scale Learning on Non-Homophilous Graphs: New Benchmarks and Strong Simple Methods Large Scale Learning on Non-Homophilous Graphs: New Benchmark

60 Jan 03, 2023
Real-Time and Accurate Full-Body Multi-Person Pose Estimation&Tracking System

News! Aug 2020: v0.4.0 version of AlphaPose is released! Stronger tracking! Include whole body(face,hand,foot) keypoints! Colab now available. Dec 201

Machine Vision and Intelligence Group @ SJTU 6.7k Dec 28, 2022
When Does Pretraining Help? Assessing Self-Supervised Learning for Law and the CaseHOLD Dataset of 53,000+ Legal Holdings

When Does Pretraining Help? Assessing Self-Supervised Learning for Law and the CaseHOLD Dataset of 53,000+ Legal Holdings This is the repository for t

RegLab 39 Jan 07, 2023
YuNetのPythonでのONNX、TensorFlow-Lite推論サンプル

YuNet-ONNX-TFLite-Sample YuNetのPythonでのONNX、TensorFlow-Lite推論サンプルです。 TensorFlow-LiteモデルはPINTO0309/PINTO_model_zoo/144_YuNetのものを使用しています。 Requirement Op

KazuhitoTakahashi 8 Nov 17, 2021
Web-interface + rest API for classification and regression (https://jeff1evesque.github.io/machine-learning.docs)

Machine Learning This project provides a web-interface, as well as a programmatic-api for various machine learning algorithms. Supported algorithms: S

Jeff Levesque 252 Dec 11, 2022
Official code for UnICORNN (ICML 2021)

UnICORNN (Undamped Independent Controlled Oscillatory RNN) [ICML 2021] This repository contains the implementation to reproduce the numerical experime

Konstantin Rusch 21 Dec 22, 2022
Python script to download the celebA-HQ dataset from google drive

download-celebA-HQ Python script to download and create the celebA-HQ dataset. WARNING from the author. I believe this script is broken since a few mo

133 Dec 21, 2022
Reinforcement learning library(framework) designed for PyTorch, implements DQN, DDPG, A2C, PPO, SAC, MADDPG, A3C, APEX, IMPALA ...

Automatic, Readable, Reusable, Extendable Machin is a reinforcement library designed for pytorch. Build status Platform Status Linux Windows Supported

Iffi 348 Dec 24, 2022
Kaggle-titanic - A tutorial for Kaggle's Titanic: Machine Learning from Disaster competition. Demonstrates basic data munging, analysis, and visualization techniques. Shows examples of supervised machine learning techniques.

Kaggle-titanic This is a tutorial in an IPython Notebook for the Kaggle competition, Titanic Machine Learning From Disaster. The goal of this reposito

Andrew Conti 800 Dec 15, 2022
Exporter for Storage Area Network (SAN)

SAN Exporter Prometheus exporter for Storage Area Network (SAN). We all know that each SAN Storage vendor has their own glossary of terms, health/perf

vCloud 32 Dec 16, 2022
Cours d'Algorithmique Appliquée avec Python pour BTS SIO SISR

Course: Introduction to Applied Algorithms with Python (in French) This is the source code of the website for the Applied Algorithms with Python cours

Loic Yvonnet 0 Jan 27, 2022
Automatic Number Plate Recognition using Contours and Convolution Neural Networks (CNN)

Cite our paper if you find this project useful https://www.ijariit.com/manuscripts/v7i4/V7I4-1139.pdf Abstract Image processing technology is used in

Adithya M 2 Jun 28, 2022
InsightFace: 2D and 3D Face Analysis Project on MXNet and PyTorch

InsightFace: 2D and 3D Face Analysis Project on MXNet and PyTorch

Deep Insight 13.2k Jan 06, 2023
Cerberus Transformer: Joint Semantic, Affordance and Attribute Parsing

Cerberus Transformer: Joint Semantic, Affordance and Attribute Parsing Paper Introduction Multi-task indoor scene understanding is widely considered a

62 Dec 05, 2022
Usable Implementation of "Bootstrap Your Own Latent" self-supervised learning, from Deepmind, in Pytorch

Bootstrap Your Own Latent (BYOL), in Pytorch Practical implementation of an astoundingly simple method for self-supervised learning that achieves a ne

Phil Wang 1.4k Dec 29, 2022
Customer-Transaction-Analysis - This analysis is based on a synthesised transaction dataset containing 3 months worth of transactions for 100 hypothetical customers.

Customer-Transaction-Analysis - This analysis is based on a synthesised transaction dataset containing 3 months worth of transactions for 100 hypothetical customers. It contains purchases, recurring

Ayodeji Yekeen 1 Jan 01, 2022