OSCAR

This repository contains the source code of our ICML 2021 paper How could Neural Networks understand Programs?.

Environment

Run following commands to build a docker image for the environment:

cd docker
sudo docker build -t oscar:latest .

And you can launch a container with nvidia-docker command.

sudo nvidia-docker run -it --mount type=bind,source="$(pwd)",target=/oscar oscar:latest

To compile the binaries for processing the data:

cd /oscar/bin
make

Then the OSCAR LLVM analyzer pass (located in analyzer), IR Lexer (located in irlexer), and FastBPE (located in fastBPE) will be compiled.

Processing the data

First, please visit https://1drv.ms/u/s!AjYwgux2zLgMiAhYpoCU3jLu20Z6?e=XR52y9 to download the data for pretraining and downstream tasks. Extract the downloaded tarballs to the data-raw directory.

To process the data for pretraining and the downstream tasks, enter the coressponding directories and execute ./process.sh. Raw data needs to be placed in the directory data-raw. Processed data will be placed in the directory data-bin.

Train the model

Use following commands to pretrain the model:

cd /oscar/model
./scripts/pretrain.sh

For downstream tasks the procedure is similar.

Code for ICML 2021 paper: How could Neural Networks understand Programs?

Related tags

Overview

OSCAR

Environment

Processing the data

Train the model

Owner

Dinglan Peng

Free Book about Deep-Learning approaches for Chess (like AlphaZero, Leela Chess Zero and Stockfish NNUE)

Official Code for "Non-deep Networks"

Near-Duplicate Video Retrieval with Deep Metric Learning

DCGAN LSGAN WGAN-GP DRAGAN PyTorch

A Pytorch implementation of "LegoNet: Efficient Convolutional Neural Networks with Lego Filters" (ICML 2019).

Satellite labelling tool for manual labelling of storm top features such as overshooting tops, above-anvil plumes, cold U/Vs, rings etc.

Speech recognition tool to convert audio to text transcripts, for Linux and Raspberry Pi.

A few stylization coreML models that I've trained with CreateML

Google-drive-to-sqlite - Create a SQLite database containing metadata from Google Drive

(EI 2022) Controllable Confidence-Based Image Denoising

Nonuniform-to-Uniform Quantization: Towards Accurate Quantization via Generalized Straight-Through Estimation. In CVPR 2022.

Implementation of "DeepOrder: Deep Learning for Test Case Prioritization in Continuous Integration Testing".

Making Structure-from-Motion (COLMAP) more robust to symmetries and duplicated structures

Decorators for maximizing memory utilization with PyTorch & CUDA

Repository for the "Gotta Go Fast When Generating Data with Score-Based Models" paper

Real-time face detection and emotion/gender classification using fer2013/imdb datasets with a keras CNN model and openCV.

This is official implementaion of paper "Token Shift Transformer for Video Classification".

Using pytorch to implement unet network for liver image segmentation.

Weakly-Supervised Semantic Segmentation Network with Deep Seeded Region Growing (CVPR 2018).

EmoTag helps you train emotion detection model for Chinese audios