Pytorch implementation of Integrating Tree Path in Transformer for Code Representation

Last update: Dec 23, 2022

Related tags

Deep Learning TPTrans

Overview

This is an official Pytorch implementation of the approaches proposed in:

Han Peng, Ge Li, Wenhan Wang, Yunfei Zhao, Zhi Jin “Integrating Tree Path in Transformer for Code Representation”

which appeared at NeurIPS 2021[Paper Link][Poster][Slides].

In this paper, we investigate the interaction between the absolute and relative path encoding, and propose novel code representation model TPTrans and its variants, which introduce path encoding inductive bias into the attention module of Transformer and power Transformer to know the structure of source codes.

Please cite our paper if you use the model, experimental results, or our code in your own work.

1.1 Raw data

To run experiments with TPTrans and its variants, please first create datasets from raw code snippets of CodeSearchNet dataset. Download and unzip the raw jsonl data of CSN into the raw_data dir like that

├── raw_data        
│   ├── python         
│   │   ├── train    
│   │   │   ├── XXXX.jsonl...
│   │   ├── test    
│   │   ├── valid   
│   ├── ruby          
│   ├── go        
│   ├── javascript

1.2 Tree-Sitter

The Tree-Sitter is a open-source parser for multi-language programming languages. Please install it and then download the grammer files into vendor dir for four different programming languages like that

├── vendor        
│   ├── tree-sitter-python  (from https://github.com/tree-sitter/tree-sitter-python)         
│   ├── tree-sitter-javascript  (from https://github.com/tree-sitter/tree-sitter-javascript)     
│   ├── tree-sitter-go  (from https://github.com/tree-sitter/tree-sitter-go)
│   ├── tree-sitter-ruby  (from https://github.com/tree-sitter/tree-sitter-ruby)

After that, run the multi_language_parse.py in parser dir to parse the raw code snippets into the data dir.

1.3 Training

After preprocessing, run the _main.py_ to train the model.

To run the TPTrans, please specify the relation_path=True and absolute_path=False.

To run the TPTrans-\alpha, please specify the relation_path=True and absolute_path=True.

For other command triggers, please refer the comment inline for details.

Contact If you have any questions, please contact me via email: [email protected] or open issue on Github.

Pytorch implementation of Integrating Tree Path in Transformer for Code Representation

Related tags

Overview

1.1 Raw data

1.2 Tree-Sitter

1.3 Training

Owner

Han Peng

PyTorch wrapper for Taichi data-oriented class

RetinaNet-PyTorch - A RetinaNet Pytorch Implementation on remote sensing images and has the similar mAP result with RetinaNet in MMdetection

A minimalist implementation of score-based diffusion model

A library for graph deep learning research

Efficient Sharpness-aware Minimization for Improved Training of Neural Networks

This is a demo app to be used in the video streaming applications

Elegy is a framework-agnostic Trainer interface for the Jax ecosystem.

FaceVerse: a Fine-grained and Detail-controllable 3D Face Morphable Model from a Hybrid Dataset (CVPR2022)

Using Random Effects to Account for High-Cardinality Categorical Features and Repeated Measures in Deep Neural Networks

Deep Learning & 3D Convolutional Neural Networks for Speaker Verification

GuideDog is an AI/ML-based mobile app designed to assist the lives of the visually impaired, 100% voice-controlled

Training Confidence-Calibrated Classifier for Detecting Out-of-Distribution Samples / ICLR 2018

Semantically Contrastive Learning for Low-light Image Enhancement

Watch faces morph into each other with StyleGAN 2, StyleGAN, and DCGAN!

Code for the Interspeech 2021 paper "AST: Audio Spectrogram Transformer".

An automated facial recognition based attendance system (desktop application)

Group Fisher Pruning for Practical Network Compression(ICML2021)

Official Implementation of "Learning Disentangled Behavior Embeddings"

SAS output to EXCEL converter for Cornell/MIT Language and acquisition lab

PyTorch implementation of Off-policy Learning in Two-stage Recommender Systems