Convert Apple NeuralHash model for CSAM Detection to ONNX.

Overview

AppleNeuralHash2ONNX

Convert Apple NeuralHash model for CSAM Detection to ONNX.

Intro

Apple NeuralHash is a perceptual hashing method for images based on neural networks. It can tolerate image resize and compression. The steps of hashing is as the following:

  1. Convert image to RGB.
  2. Resize image to 360x360.
  3. Normalize RGB values to [-1, 1] range.
  4. Perform inference on the NeuralHash model.
  5. Calculate dot product of a 96x128 matrix with the resulting vector of 128 floats.
  6. Apply binary step to the resulting 96 float vector.
  7. Convert the vector of 1.0 and 0.0 to bits, resulting in 96-bit binary data.

In this project, we convert Apple's NeuralHash model to ONNX format. A demo script for testing the model is also included.

Prerequisite

OS

Both macOS and Linux will work. In the following sections Debian is used for Linux example.

LZFSE decoder

  • macOS: Install by running brew install lzfse.
  • Linux: Build and install from lzfse source.

Python

Python 3.6 and above should work. Install the following dependencies:

pip install onnx coremltools

Conversion Guide

Step 1: Get NeuralHash model

You will need 4 files from a recent macOS or iOS build:

  • neuralhash_128x96_seed1.dat
  • NeuralHashv3b-current.espresso.net
  • NeuralHashv3b-current.espresso.shape
  • NeuralHashv3b-current.espresso.weights

Option 1: From macOS or jailbroken iOS device (Recommended)

If you have a recent version of macOS (11.4+) or jailbroken iOS (14.7+) installed, simply grab these files from /System/Library/Frameworks/Vision.framework/Resources/ (on macOS) or /System/Library/Frameworks/Vision.framework/ (on iOS).

Option 2: From iOS IPSW (click to reveal)
  1. Download any .ipsw of a recent iOS build (14.7+) from ipsw.me.
  2. Unpack the file:
cd /path/to/ipsw/file
mkdir unpacked_ipsw
cd unpacked_ipsw
unzip ../*.ipsw
  1. Locate system image:
ls -lh

What you need is the largest .dmg file, for example 018-63036-003.dmg.

  1. Mount system image. On macOS simply open the file in Finder. On Linux run the following commands:
# Build and install apfs-fuse
sudo apt install fuse libfuse3-dev bzip2 libbz2-dev cmake g++ git libattr1-dev zlib1g-dev
git clone https://github.com/sgan81/apfs-fuse.git
cd apfs-fuse
git submodule init
git submodule update
mkdir build
cd build
cmake ..
make
sudo make install
sudo ln -s /bin/fusermount /bin/fusermount3
# Mount image
mkdir rootfs
apfs-fuse 018-63036-003.dmg rootfs

Required files are under /System/Library/Frameworks/Vision.framework/ in mounted path.

Put them under the same directory:

mkdir NeuralHash
cd NeuralHash
cp /System/Library/Frameworks/Vision.framework/Resources/NeuralHashv3b-current.espresso.* .
cp /System/Library/Frameworks/Vision.framework/Resources/neuralhash_128x96_seed1.dat .

Step 2: Decode model structure and shapes

Normally compiled Core ML models store structure in model.espresso.net and shapes in model.espresso.shape, both in JSON. It's the same for NeuralHash model but compressed with LZFSE.

dd if=NeuralHashv3b-current.espresso.net bs=4 skip=7 | lzfse -decode -o model.espresso.net
dd if=NeuralHashv3b-current.espresso.shape bs=4 skip=7 | lzfse -decode -o model.espresso.shape
cp NeuralHashv3b-current.espresso.weights model.espresso.weights

Step 3: Convert model to ONNX

cd ..
git clone https://github.com/AsuharietYgvar/TNN.git
cd TNN
python3 tools/onnx2tnn/onnx-coreml/coreml2onnx.py ../NeuralHash

The resulting model is NeuralHash/model.onnx.

Usage

Inspect model

Netron is a perfect tool for this purpose.

Calculate neural hash with onnxruntime

  1. Install required libraries:
pip install onnxruntime pillow
  1. Run nnhash.py on an image:
python3 nnhash.py /path/to/model.onnx /path/to/neuralhash_128x96_seed1.dat image.jpg

Example output:

ab14febaa837b6c1484c35e6

Note: Neural hash generated here might be a few bits off from one generated on an iOS device. This is expected since different iOS devices generate slightly different hashes anyway. The reason is that neural networks are based on floating-point calculations. The accuracy is highly dependent on the hardware. For smaller networks it won't make any difference. But NeuralHash has 200+ layers, resulting in significant cumulative errors.

Device Hash
iPad Pro 10.5-inch 2b186faa6b36ffcc4c4635e1
M1 Mac 2b5c6faa6bb7bdcc4c4731a1
iOS Simulator 2b5c6faa6bb6bdcc4c4731a1
ONNX Runtime 2b5c6faa6bb6bdcc4c4735a1

Credits

  • nhcalc for uncovering NeuralHash private API.
  • TNN for compiled Core ML to ONNX script.
Owner
Asuhariet Ygvar
Asuhariet Ygvar
Joint parameterization and fitting of stroke clusters

StrokeStrip: Joint Parameterization and Fitting of Stroke Clusters Dave Pagurek van Mossel1, Chenxi Liu1, Nicholas Vining1,2, Mikhail Bessmeltsev3, Al

Dave Pagurek 44 Dec 01, 2022
retweet 4 satoshi ⚡️

rt4sat retweet 4 satoshi This bot is the codebase for https://twitter.com/rt4sat please feel free to create an issue if you saw any bugs basically thi

6 Sep 30, 2022
Deep Neural Networks Improve Radiologists' Performance in Breast Cancer Screening

Deep Neural Networks Improve Radiologists' Performance in Breast Cancer Screening Introduction This is an implementation of the model used for breast

757 Dec 30, 2022
Automated Evidence Collection for Fake News Detection

Automated Evidence Collection for Fake News Detection This is the code repo for the Automated Evidence Collection for Fake News Detection paper accept

Mrinal Rawat 2 Apr 12, 2022
Keras udrl - Keras implementation of Upside Down Reinforcement Learning

keras_udrl Keras implementation of Upside Down Reinforcement Learning This is me

Eder Santana 7 Jan 24, 2022
Intrusion Detection System using ensemble learning (machine learning)

IDS-ML implementation of an intrusion detection system using ensemble machine learning methods Data set This project is carried out using the UNSW-15

4 Nov 25, 2022
CondLaneNet: a Top-to-down Lane Detection Framework Based on Conditional Convolution

CondLaneNet: a Top-to-down Lane Detection Framework Based on Conditional Convolution This is the official implementation code of the paper "CondLaneNe

Alibaba Cloud 311 Dec 30, 2022
Paddle-Skeleton-Based-Action-Recognition - DecoupleGCN-DropGraph, ASGCN, AGCN, STGCN

Paddle-Skeleton-Action-Recognition DecoupleGCN-DropGraph, ASGCN, AGCN, STGCN. Yo

Chenxu Peng 3 Nov 02, 2022
Implementation of CaiT models in TensorFlow and ImageNet-1k checkpoints. Includes code for inference and fine-tuning.

CaiT-TF (Going deeper with Image Transformers) This repository provides TensorFlow / Keras implementations of different CaiT [1] variants from Touvron

Sayak Paul 9 Jun 26, 2022
A Protein-RNA Interface Predictor Based on Semantics of Sequences

PRIP PRIP:A Protein-RNA Interface Predictor Based on Semantics of Sequences installation gensim==3.8.3 matplotlib==3.1.3 xgboost==1.3.3 prettytable==2

李优 0 Mar 25, 2022
Bilinear attention networks for visual question answering

Bilinear Attention Networks This repository is the implementation of Bilinear Attention Networks for the visual question answering and Flickr30k Entit

Jin-Hwa Kim 506 Nov 29, 2022
CAPRI: Context-Aware Interpretable Point-of-Interest Recommendation Framework

CAPRI: Context-Aware Interpretable Point-of-Interest Recommendation Framework This repository contains a framework for Recommender Systems (RecSys), a

RecSys Lab 8 Jul 03, 2022
Pytorch implementation of SenFormer: Efficient Self-Ensemble Framework for Semantic Segmentation

SenFormer: Efficient Self-Ensemble Framework for Semantic Segmentation Efficient Self-Ensemble Framework for Semantic Segmentation by Walid Bousselham

61 Dec 26, 2022
Cervix ROI Segmentation Using U-NET

Cervix ROI Segmentation Using U-NET Overview This code illustrate how to segment the ROI in cervical images using U-NET. The ROI here meant to include

Scotty Kwok 35 Sep 14, 2022
Code repository for "Reducing Underflow in Mixed Precision Training by Gradient Scaling" presented at IJCAI '20

Reducing Underflow in Mixed Precision Training by Gradient Scaling This project implements the gradient scaling method to improve the performance of m

Ruizhe Zhao 5 Apr 14, 2022
Pytorch implementations of popular off-policy multi-agent reinforcement learning algorithms, including QMix, VDN, MADDPG, and MATD3.

Off-Policy Multi-Agent Reinforcement Learning (MARL) Algorithms This repository contains implementations of various off-policy multi-agent reinforceme

183 Dec 28, 2022
Adversarial Learning for Modeling Human Motion

Adversarial Learning for Modeling Human Motion This repository contains the open source code which reproduces the results for the paper: Adversarial l

wangqi 6 Jun 15, 2021
Distributed Asynchronous Hyperparameter Optimization in Python

Hyperopt: Distributed Hyperparameter Optimization Hyperopt is a Python library for serial and parallel optimization over awkward search spaces, which

6.5k Jan 01, 2023
Class activation maps for your PyTorch models (CAM, Grad-CAM, Grad-CAM++, Smooth Grad-CAM++, Score-CAM, SS-CAM, IS-CAM, XGrad-CAM, Layer-CAM)

TorchCAM: class activation explorer Simple way to leverage the class-specific activation of convolutional layers in PyTorch. Quick Tour Setting your C

F-G Fernandez 1.2k Dec 29, 2022
PyTorch code of paper "LiVLR: A Lightweight Visual-Linguistic Reasoning Framework for Video Question Answering"

LiVLR-VideoQA We propose a Lightweight Visual-Linguistic Reasoning framework (LiVLR) for VideoQA. The overview of LiVLR: Evaluation on MSRVTT-QA Datas

JJ Jiang 7 Dec 30, 2022