Unofficial implementation (replicates paper results!) of MINER: Multiscale Implicit Neural Representations in pytorch-lightning

Overview

MINER_pl

Unofficial implementation of MINER: Multiscale Implicit Neural Representations in pytorch-lightning.

image

📖 Ref readings

⚠️ Main differences w.r.t. the original paper before continue:

  • In the pseudo code on page 8, where the author states Weight sharing for images, it means finer level networks are initialized with coarser level network weights. However, I did not find the correct way to implement this. Therefore, I initialize the network weights from scratch for all levels.
  • The paper says it uses sinusoidal activation (does he mean SIREN? I don't know), but I use gaussian activation (in hidden layers) with trainable parameters (per block) like my experiments in the other repo. In finer levels where the model predicts laplacian pyramids, I use sinusoidal activation x |-> sin(ax) with trainable parameters a (per block) as output layer (btw, this performs significantly better than simple tanh). Moreover, I precompute the maximum amplitude for laplacian residuals, and use it to scale the output, and I find it to be better than without scaling.
  • I experimented with a common trick in coordinate mlp: positional encoding and find that using it can increase training time/accuracy with the same number of parameters (by reducing 1 layer). This can be turned on/off by specifying the argument --use_pe. The optimal number of frequencies depends on the patch size, the larger patch sizes, the more number of frequencies you need and vice versa.
  • Some difference in the hyperparameters: the default learning rate is 3e-2 instead of 5e-4. Optimizer is RAdam instead of Adam. Block pruning happens when the loss is lower than 1e-4 (i.e. when PSNR>=40) for image and 5e-3 for occupancy rather than 2e-7.

💻 Installation

  • Run pip install -r requirements.txt.
  • Download the images from Acknowledgement or prepare your own images into a folder called images.
  • Download the meshes from Acknowledgement or prepare your own meshes into a folder called meshes.

🔑 Training

image

Pluto example:

python train.py \
    --task image --path images/pluto.png \
    --input_size 4096 4096 --patch_size 32 32 --batch_size 256 --n_scales 4 \
    --use_pe --n_layers 3 \
    --num_epochs 50 50 50 200 \
    --exp_name pluto4k_4scale

Tokyo station example:

python train.py \
    --task image --path images/tokyo-station.jpg \
    --input_size 6000 4000 --patch_size 25 25 --batch_size 192 --n_scales 5 \
    --use_pe --n_layers 3 \
    --num_epochs 50 50 50 50 150 \
    --exp_name tokyo6k_5scale
Image (size) Train time (s) GPU mem (MiB) #Params (M) PSNR
Pluto (4096x4096) 53 3171 9.16 42.14
Pluto (8192x8192) 106 6099 28.05 45.09
Tokyo station (6000x4000) 68 6819 35.4 42.48
Shibuya (7168x2560) 101 8967 17.73 37.78
Shibuya (14336x5120) 372 8847 75.42 39.32
Shibuya (28672x10240) 890 10255 277.37 41.93
Shibuya (28672x10240)* 1244 6277 98.7 37.59

*paper settings (6 scales, each network has 4 layer with 9 hidden units)

The original image will be resized to img_wh for reconstruction. You need to make sure img_wh divided by 2^(n_scales-1) (the resolution at the coarsest level) is still a multiple of patch_wh.


mesh

First, convert the mesh to N^3 occupancy grid by

python preprocess_mesh.py --N 512 --M 1 --T 1 --path <path/to/mesh> 

This will create N^3 occupancy to be regressed by the neural network. For detailed options, please see preprocess_mesh.py. Typically, increase M or T if you find the resulting occupancy bad.

Next, start training (bunny example):

python train.py \
    --task mesh --path occupancy/bunny_512.npy \
    --input_size 512 --patch_size 16 --batch_size 512 --n_scales 4 \
    --use_pe --n_freq 5 --n_layers 2 --n_hidden 8 \
    --loss_thr 5e-3 --b_chunks 512 \
    --num_epochs 50 50 50 150 \
    --exp_name bunny512_4scale

For full options, please see here. Some important options:

  • If your GPU memory is not enough, try reducing batch_size.
  • By default it will not log intermediate images to tensorboard to save time. To visualize image reconstruction and active blocks, add --log_image argument.

You are recommended to monitor the training progress by

tensorboard --logdir logs

where you can see training curves and images.

🟥 🟩 🟦 Block decomposition

To reconstruct the image using trained model and to visualize block decomposition per scale like Fig. 4 in the paper, see image_test.ipynb or mesh_test.ipynb

Examples:

💡 Implementation tricks

  • Setting num_workers=0 in dataloader increased the speed a lot.
  • As suggested in training details on page 4, I implement parallel block inference by defining parameters of shape (n_blocks, n_in, n_out) and use @ operator (same as torch.bmm) for faster inference.
  • To perform block pruning efficiently, I create two copies of the same network, and continually train and prune one of them while copying the trained parameters to the target network (somehow like in reinforcement learning, e.g. DDPG). This allows the network as well as the optimizer to shrink, therefore achieve higher memory and speed performance.
  • In validation, I perform inference in chunks like NeRF, and pass each chunk to cpu to reduce GPU memory usage.

💝 Acknowledgement

Further readings

During a stream, my audience suggested me to test on this image with random pixels:

random

The default 32x32 patch size doesn't work well, since the texture varies too quickly inside a patch. Decreasing to 16x16 and increasing network hidden units make the network converge right away to 43.91 dB under a minute. Surprisingly, with the other image reconstruction SOTA instant-ngp, the network is stuck at 17 dB no matter how long I train.

ngp-random

Is this a possible weakness of instant-ngp? What effect could it bring to real application? You are welcome to test other methods to reconstruct this image!

Owner
AI葵
AI R&D in computer vision. Doing VTuber about DL algorithms. Check my channel! If you find my works helpful, please consider sponsoring! 我有在做VTuber,歡迎訂閱我的頻道!
AI葵
Algorithms for outlier, adversarial and drift detection

Alibi Detect is an open source Python library focused on outlier, adversarial and drift detection. The package aims to cover both online and offline d

Seldon 1.6k Dec 31, 2022
Monitor your ML jobs on mobile devices📱, especially for Google Colab / Kaggle

TF Watcher TF Watcher is a simple to use Python package and web app which allows you to monitor 👀 your Machine Learning training or testing process o

Rishit Dagli 54 Nov 01, 2022
ByteTrack(Multi-Object Tracking by Associating Every Detection Box)のPythonでのONNX推論サンプル

ByteTrack-ONNX-Sample ByteTrack(Multi-Object Tracking by Associating Every Detection Box)のPythonでのONNX推論サンプルです。 ONNXに変換したモデルも同梱しています。 変換自体を試したい方はByteT

KazuhitoTakahashi 16 Oct 26, 2022
A library for uncertainty quantification based on PyTorch

Torchuq [logo here] TorchUQ is an extensive library for uncertainty quantification (UQ) based on pytorch. TorchUQ currently supports 10 representation

TorchUQ 96 Dec 12, 2022
Complete system for facial identity system. Include one-shot model, database operation, features visualization, monitoring

Complete system for facial identity system. Include one-shot model, database operation, features visualization, monitoring

2 Dec 28, 2021
Small repo describing how to use Hugging Face's Wav2Vec2 with PyCTCDecode

🤗 Transformers Wav2Vec2 + PyCTCDecode Introduction This repo shows how 🤗 Transformers can be used in combination with kensho-technologies's PyCTCDec

Patrick von Platen 102 Oct 22, 2022
This is the official code for the paper "Learning with Nested Scene Modeling and Cooperative Architecture Search for Low-Light Vision"

RUAS This is the official code for the paper "Learning with Nested Scene Modeling and Cooperative Architecture Search for Low-Light Vision" A prelimin

Vision & Optimization Group (VOG) 2 May 05, 2022
Official implementation of the paper "Light Field Networks: Neural Scene Representations with Single-Evaluation Rendering"

Light Field Networks Project Page | Paper | Data | Pretrained Models Vincent Sitzmann*, Semon Rezchikov*, William Freeman, Joshua Tenenbaum, Frédo Dur

Vincent Sitzmann 130 Dec 29, 2022
Genetic Programming in Python, with a scikit-learn inspired API

Welcome to gplearn! gplearn implements Genetic Programming in Python, with a scikit-learn inspired and compatible API. While Genetic Programming (GP)

Trevor Stephens 1.3k Jan 03, 2023
Pytorch implemenation of Stochastic Multi-Label Image-to-image Translation (SMIT)

SMIT: Stochastic Multi-Label Image-to-image Translation This repository provides a PyTorch implementation of SMIT. SMIT can stochastically translate a

Biomedical Computer Vision Group @ Uniandes 37 Mar 01, 2022
PFFDTD is an open-source FDTD simulator for 3D room acoustics

PFFDTD is an open-source FDTD simulator for 3D room acoustics

Brian Hamilton 34 Nov 24, 2022
Reliable probability face embeddings

ProbFace, arxiv This is a demo code of training and testing [ProbFace] using Tensorflow. ProbFace is a reliable Probabilistic Face Embeddging (PFE) me

Kaen Chan 34 Dec 31, 2022
Implement of "Training deep neural networks via direct loss minimization" in PyTorch for 0-1 loss

This is the implementation of "Training deep neural networks via direct loss minimization" published at ICML 2016 in PyTorch. The implementation targe

Cuong Nguyen 1 Jan 18, 2022
Official pytorch implementation of Active Learning for deep object detection via probabilistic modeling (ICCV 2021)

Active Learning for Deep Object Detection via Probabilistic Modeling This repository is the official PyTorch implementation of Active Learning for Dee

NVIDIA Research Projects 130 Jan 06, 2023
An attempt at the implementation of Glom, Geoffrey Hinton's new idea that integrates neural fields, predictive coding, top-down-bottom-up, and attention (consensus between columns)

GLOM - Pytorch (wip) An attempt at the implementation of Glom, Geoffrey Hinton's new idea that integrates neural fields, predictive coding,

Phil Wang 173 Dec 14, 2022
Code for the paper Language as a Cognitive Tool to Imagine Goals in Curiosity Driven Exploration

IMAGINE: Language as a Cognitive Tool to Imagine Goals in Curiosity Driven Exploration This repo contains the code base of the paper Language as a Cog

Flowers Team 26 Dec 22, 2022
Code for our ACL 2021 paper "One2Set: Generating Diverse Keyphrases as a Set"

One2Set This repository contains the code for our ACL 2021 paper “One2Set: Generating Diverse Keyphrases as a Set”. Our implementation is built on the

Jiacheng Ye 63 Jan 05, 2023
Shape Matching of Real 3D Object Data to Synthetic 3D CADs (3DV project @ ETHZ)

Real2CAD-3DV Shape Matching of Real 3D Object Data to Synthetic 3D CADs (3DV project @ ETHZ) Group Member: Yue Pan, Yuanwen Yue, Bingxin Ke, Yujie He

24 Jun 22, 2022
Leveraging Social Influence based on Users Activity Centers for Point-of-Interest Recommendation

SUCP Leveraging Social Influence based on Users Activity Centers for Point-of-Interest Recommendation () Direct Friends (i.e., users who follow each o

Kosar 8 Nov 26, 2022
OCR Post Correction for Endangered Language Texts

📌 Coming soon: an update to the software including features from our paper on semi-supervised OCR post-correction, to be published in the Transaction

Shruti Rijhwani 96 Dec 31, 2022