Convert Apple NeuralHash model for CSAM Detection to ONNX.

Overview

AppleNeuralHash2ONNX

Convert Apple NeuralHash model for CSAM Detection to ONNX.

Intro

Apple NeuralHash is a perceptual hashing method for images based on neural networks. It can tolerate image resize and compression. The steps of hashing is as the following:

  1. Convert image to RGB.
  2. Resize image to 360x360.
  3. Normalize RGB values to [-1, 1] range.
  4. Perform inference on the NeuralHash model.
  5. Calculate dot product of a 96x128 matrix with the resulting vector of 128 floats.
  6. Apply binary step to the resulting 96 float vector.
  7. Convert the vector of 1.0 and 0.0 to bits, resulting in 96-bit binary data.

In this project, we convert Apple's NeuralHash model to ONNX format. A demo script for testing the model is also included.

Prerequisite

OS

Both macOS and Linux will work. In the following sections Debian is used for Linux example.

LZFSE decoder

  • macOS: Install by running brew install lzfse.
  • Linux: Build and install from lzfse source.

Python

Python 3.6 and above should work. Install the following dependencies:

pip install onnx coremltools

Conversion Guide

Step 1: Get NeuralHash model

You will need 4 files from a recent macOS or iOS build:

  • neuralhash_128x96_seed1.dat
  • NeuralHashv3b-current.espresso.net
  • NeuralHashv3b-current.espresso.shape
  • NeuralHashv3b-current.espresso.weights

Option 1: From macOS or jailbroken iOS device (Recommended)

If you have a recent version of macOS (11.4+) or jailbroken iOS (14.7+) installed, simply grab these files from /System/Library/Frameworks/Vision.framework/Resources/ (on macOS) or /System/Library/Frameworks/Vision.framework/ (on iOS).

Option 2: From iOS IPSW (click to reveal)
  1. Download any .ipsw of a recent iOS build (14.7+) from ipsw.me.
  2. Unpack the file:
cd /path/to/ipsw/file
mkdir unpacked_ipsw
cd unpacked_ipsw
unzip ../*.ipsw
  1. Locate system image:
ls -lh

What you need is the largest .dmg file, for example 018-63036-003.dmg.

  1. Mount system image. On macOS simply open the file in Finder. On Linux run the following commands:
# Build and install apfs-fuse
sudo apt install fuse libfuse3-dev bzip2 libbz2-dev cmake g++ git libattr1-dev zlib1g-dev
git clone https://github.com/sgan81/apfs-fuse.git
cd apfs-fuse
git submodule init
git submodule update
mkdir build
cd build
cmake ..
make
sudo make install
sudo ln -s /bin/fusermount /bin/fusermount3
# Mount image
mkdir rootfs
apfs-fuse 018-63036-003.dmg rootfs

Required files are under /System/Library/Frameworks/Vision.framework/ in mounted path.

Put them under the same directory:

mkdir NeuralHash
cd NeuralHash
cp /System/Library/Frameworks/Vision.framework/Resources/NeuralHashv3b-current.espresso.* .
cp /System/Library/Frameworks/Vision.framework/Resources/neuralhash_128x96_seed1.dat .

Step 2: Decode model structure and shapes

Normally compiled Core ML models store structure in model.espresso.net and shapes in model.espresso.shape, both in JSON. It's the same for NeuralHash model but compressed with LZFSE.

dd if=NeuralHashv3b-current.espresso.net bs=4 skip=7 | lzfse -decode -o model.espresso.net
dd if=NeuralHashv3b-current.espresso.shape bs=4 skip=7 | lzfse -decode -o model.espresso.shape
cp NeuralHashv3b-current.espresso.weights model.espresso.weights

Step 3: Convert model to ONNX

cd ..
git clone https://github.com/AsuharietYgvar/TNN.git
cd TNN
python3 tools/onnx2tnn/onnx-coreml/coreml2onnx.py ../NeuralHash

The resulting model is NeuralHash/model.onnx.

Usage

Inspect model

Netron is a perfect tool for this purpose.

Calculate neural hash with onnxruntime

  1. Install required libraries:
pip install onnxruntime pillow
  1. Run nnhash.py on an image:
python3 nnhash.py /path/to/model.onnx /path/to/neuralhash_128x96_seed1.dat image.jpg

Example output:

ab14febaa837b6c1484c35e6

Note: Neural hash generated here might be a few bits off from one generated on an iOS device. This is expected since different iOS devices generate slightly different hashes anyway. The reason is that neural networks are based on floating-point calculations. The accuracy is highly dependent on the hardware. For smaller networks it won't make any difference. But NeuralHash has 200+ layers, resulting in significant cumulative errors.

Device Hash
iPad Pro 10.5-inch 2b186faa6b36ffcc4c4635e1
M1 Mac 2b5c6faa6bb7bdcc4c4731a1
iOS Simulator 2b5c6faa6bb6bdcc4c4731a1
ONNX Runtime 2b5c6faa6bb6bdcc4c4735a1

Credits

  • nhcalc for uncovering NeuralHash private API.
  • TNN for compiled Core ML to ONNX script.
Issues
  • VirusTotal flags neuralhash_128x96_seed1.dat with Trojan:Script/Wacatac.B!ml from Microsoft

    VirusTotal flags neuralhash_128x96_seed1.dat with Trojan:Script/Wacatac.B!ml from Microsoft

    File distributed by Apple 312344458ca5468eced6f50163c09d88dbc9f3470891f1b078852b01c9a0fce9 neuralhash_128x96_seed1.dat 48.13 KB known-distributor Microsoft: Trojan:Script/Wacatac.B!ml

    opened by laserjobs 7
  • IsADirectoryError: NeuralHash/

    IsADirectoryError: NeuralHash/

    ➤ ls NeuralHash/ -1
    model.espresso.net
    model.espresso.shape
    model.espresso.weights
    neuralhash_128x96_seed1.dat
    NeuralHashv3b-current.espresso.net
    NeuralHashv3b-current.espresso.shape
    NeuralHashv3b-current.espresso.weights
    (AppleNeuralHash2ONNX) ~/code/vcs/git/com/github/@/AsuharietYgvar/TNN|master⚡?
    ➤ python3 tools/onnx2tnn/onnx-coreml/onnx2coreml.py NeuralHash/
    dir: NeuralHash
    Traceback (most recent call last):
      File "/home/user/code/vcs/git/com/github/@/AsuharietYgvar/TNN/tools/onnx2tnn/onnx-coreml/onnx2coreml.py", line 60, in <module>
        main()
      File "/home/user/code/vcs/git/com/github/@/AsuharietYgvar/TNN/tools/onnx2tnn/onnx-coreml/onnx2coreml.py", line 32, in main
        onnx_model = onnx.load(onnx_net_path)
      File "/home/user/.local/share/venvs/home/user/code/vcs/git/com/github/@/AsuharietYgvar/AppleNeuralHash2ONNX/lib/python3.9/site-packages/onnx/__init__.py", line 120, in load_model
        s = _load_bytes(f)
      File "/home/user/.local/share/venvs/home/user/code/vcs/git/com/github/@/AsuharietYgvar/AppleNeuralHash2ONNX/lib/python3.9/site-packages/onnx/__init__.py", line 34, in _load_bytes
        with open(cast(Text, f), 'rb') as readable:
    IsADirectoryError: [Errno 21] Is a directory: 'NeuralHash/'
    

    What should I do?

    opened by ioistired 3
  • Error when run

    Error when run "python3 tools/onnx2tnn/onnx-coreml/coreml2onnx.py ../NeuralHash_model"

    I got the follow error when I run "python3 tools/onnx2tnn/onnx-coreml/coreml2onnx.py ../NeuralHash_model"

    -> python3 tools/onnx2tnn/onnx-coreml/coreml2onnx.py ../NeuralHash_model Traceback (most recent call last): File "tools/onnx2tnn/onnx-coreml/coreml2onnx.py", line 538, in main() File "tools/onnx2tnn/onnx-coreml/coreml2onnx.py", line 35, in main net_layers = net_dict['layers'] KeyError: 'layers'

    Could you please give some pointers how I can resolve it?

    Thanks!

    opened by testagain36 1
  • Model convert coreml2onnx error

    Model convert coreml2onnx error

    Use the latest coreml2onnx.py convert other mlmodelc :

    Error: Unsupported layer type: deconvolution line 507 Prompt: deconvolution type is not supported. Hope to enhance it.

    Valueerror: cannot reshape array of size 10752 into shape (672,4) line 415 Need to change 4 to 16, but I wonder if this change is correct?

    Finally, line 527, line 531-533, check_ model、inferred_ Model error, looking forward to solving it, you can contact us if necessary [email protected] Send model you test, thank you very much!

    opened by psiydown 0
  • Normalization seems to be unnecessary

    Normalization seems to be unnecessary

    Hi there! Thanks so much for this! While playing around with the model, I realized that the initial normalization step seems to be unnecessary, since the model already has InstanceNormalization layers.

    I tried running a few tests and it seems to give the same hash regardless of whether we normalize the inputs.

    opened by greentfrapp 1
  • NeuralHash classifier

    NeuralHash classifier

    I tried to build a classifier for NeuralHash: It gets NeuralHash as input and outputs class and probability.

    I hashed all images of ILSVRC2012 dataset and trained the simple NN model.

    Performance on the ImageNet validation dataset: (1,000 possible choices)

    • Top-1 Accuracy: 5.25% (If random, 0.1% expected)
    • Top-5 Accuracy: 14.09% (If random, 0.5% expected)

    So... It seems that NeuralHash can't anonymize images well.

    You can try this in Colab.

    opened by kjsman 3
  • Added Docker image

    Added Docker image

    This allows a Docker image to be generated from an IPSW file and provides a script to generate an ONNX model along with instructions on how to copy a file to the running Docker container and generate the NeuralHash value.

    opened by jeremytieman 6
  • Large scale testing

    Large scale testing

    I think collaborating would be fun.

    My issues: I'm not much of a python person, my mac doesn't have their neuralhash files, and my iPhone isn't jail broken.

    My offerings: I run FotoForensics and I have nearly 5 million pictures.

    I read in the issues that the NCMEC hash is also able to be extracted from a jail-broken iPhone. That's the thing we should be testing against.

    If someone can package everying up into a minimal docker image, point me to a simple command-line (find /mnt/allpictures -type f | run_this_command), then I can do a large-scale test using a real-world collection of pictures. This won't be an immediate result (depending on the code's speed, it might take weeks or months to run), but I can look for real-world false-positive matches.

    opened by hackerfactor 5
  • Working Collision?

    Working Collision?

    Can you verify that these two images collide? beagle360 collision

    Here's what I see from following your directions:

    $ python3 nnhash.py NeuralHash/model.onnx neuralhash_128x96_seed1.dat beagle360.png
    59a34eabe31910abfb06f308
    $ python3 nnhash.py NeuralHash/model.onnx neuralhash_128x96_seed1.dat collision.png
    59a34eabe31910abfb06f308
    
    opened by dxoigmn 120
Owner
Asuhariet Ygvar
Asuhariet Ygvar
ONNX-GLPDepth - Python scripts for performing monocular depth estimation using the GLPDepth model in ONNX

ONNX-GLPDepth - Python scripts for performing monocular depth estimation using the GLPDepth model in ONNX

Ibai Gorordo 17 Jun 12, 2022
ONNX-PackNet-SfM: Python scripts for performing monocular depth estimation using the PackNet-SfM model in ONNX

Python scripts for performing monocular depth estimation using the PackNet-SfM model in ONNX

Ibai Gorordo 10 Apr 7, 2022
Convert Pytorch model to onnx or tflite, and the converted model can be visualized by Netron

Convert Pytorch model to onnx or tflite, and the converted model can be visualized by Netron

Roxbili 3 Jun 9, 2022
Source code for our paper "Learning to Break Deep Perceptual Hashing: The Use Case NeuralHash"

Learning to Break Deep Perceptual Hashing: The Use Case NeuralHash Abstract: Apple recently revealed its deep perceptual hashing system NeuralHash to

ml-research@TUDarmstadt 6 Jun 22, 2022
ONNX Runtime Web demo is an interactive demo portal showing real use cases running ONNX Runtime Web in VueJS.

ONNX Runtime Web demo is an interactive demo portal showing real use cases running ONNX Runtime Web in VueJS. It currently supports four examples for you to quickly experience the power of ONNX Runtime Web.

Microsoft 42 Jun 5, 2022
An executor that loads ONNX models and embeds documents using the ONNX runtime.

ONNXEncoder An executor that loads ONNX models and embeds documents using the ONNX runtime. Usage via Docker image (recommended) from jina import Flow

Jina AI 2 Mar 15, 2022
A very simple tool for situations where optimization with onnx-simplifier would exceed the Protocol Buffers upper file size limit of 2GB, or simply to separate onnx files to any size you want.

sne4onnx A very simple tool for situations where optimization with onnx-simplifier would exceed the Protocol Buffers upper file size limit of 2GB, or

Katsuya Hyodo 6 May 15, 2022
Simple ONNX operation generator. Simple Operation Generator for ONNX.

sog4onnx Simple ONNX operation generator. Simple Operation Generator for ONNX. https://github.com/PINTO0309/simple-onnx-processing-tools Key concept V

Katsuya Hyodo 6 May 15, 2022
A very simple tool to rewrite parameters such as attributes and constants for OPs in ONNX models. Simple Attribute and Constant Modifier for ONNX.

sam4onnx A very simple tool to rewrite parameters such as attributes and constants for OPs in ONNX models. Simple Attribute and Constant Modifier for

Katsuya Hyodo 6 May 15, 2022
Simple tool to combine(merge) onnx models. Simple Network Combine Tool for ONNX.

snc4onnx Simple tool to combine(merge) onnx models. Simple Network Combine Tool for ONNX. https://github.com/PINTO0309/simple-onnx-processing-tools 1.

Katsuya Hyodo 7 May 15, 2022
Very simple NCHW and NHWC conversion tool for ONNX. Change to the specified input order for each and every input OP. Also, change the channel order of RGB and BGR. Simple Channel Converter for ONNX.

scc4onnx Very simple NCHW and NHWC conversion tool for ONNX. Change to the specified input order for each and every input OP. Also, change the channel

Katsuya Hyodo 10 Jun 20, 2022
MMdnn is a set of tools to help users inter-operate among different deep learning frameworks. E.g. model conversion and visualization. Convert models between Caffe, Keras, MXNet, Tensorflow, CNTK, PyTorch Onnx and CoreML.

MMdnn MMdnn is a comprehensive and cross-framework tool to convert, visualize and diagnose deep learning (DL) models. The "MM" stands for model manage

Microsoft 5.6k Jun 16, 2022
Example scripts for the detection of lanes using the ultra fast lane detection model in ONNX.

Example scripts for the detection of lanes using the ultra fast lane detection model in ONNX.

Ibai Gorordo 33 Apr 29, 2022
End-to-end face detection, cropping, norm estimation, and landmark detection in a single onnx model

onnx-facial-lmk-detector End-to-end face detection, cropping, norm estimation, and landmark detection in a single onnx model, model.onnx. Demo You can

atksh 33 Jun 26, 2022
tf2onnx - Convert TensorFlow, Keras and Tflite models to ONNX.

tf2onnx converts TensorFlow (tf-1.x or tf-2.x), tf.keras and tflite models to ONNX via command line or python api.

Open Neural Network Exchange 1.5k Jun 21, 2022
Convert onnx models to pytorch.

onnx2torch onnx2torch is an ONNX to PyTorch converter. Our converter: Is easy to use – Convert the ONNX model with the function call convert; Is easy

ENOT 161 Jun 16, 2022
Python scripts for performing lane detection using the LSTR model in ONNX

ONNX LSTR Lane Detection Python scripts for performing lane detection using the Lane Shape Prediction with Transformers (LSTR) model in ONNX. Requirem

Ibai Gorordo 28 May 30, 2022
Python scripts for performing road segemtnation and car detection using the HybridNets multitask model in ONNX.

ONNX-HybridNets-Multitask-Road-Detection Python scripts for performing road segemtnation and car detection using the HybridNets multitask model in ONN

Ibai Gorordo 29 Jun 17, 2022