Privacy-Preserving Portrait Matting [ACM MM-21]

Overview

Privacy-Preserving Portrait Matting [ACM MM-21]

This is the official repository of the paper Privacy-Preserving Portrait Matting.

Jizhizi Li, Sihan Ma, Jing Zhang, and Dacheng Tao

Introduction | PPT and P3M-10k | P3M-Net | Benchmark | Results | Train and Test | Inference code | Statement


📮 News

[2021-11-21]: Publish the dataset P3M-10k (the largest privacy-preserving portrait matting dataset, contains 10421 high-resolution real-world face-blurred portrait images and the manually labeled alpha mattes.), the train code and the test code. The dataset P3M-10k can be accessed from the following link, please make sure that you have read and agreed to the agreement. The train code and test code can be viewed from this code-base page.

[2021-12-06]: Publish the face mask of the training set and P3M-500-P validation set of P3M-10k dataset.

Dataset

Dataset Link
(Google Drive)

Dataset Link
(Baidu Wangpan 百度网盘)

Dataset Release Agreement
P3M-10k Link Link (pw: fgmc) Agreement (MIT License)
P3M-10k facemask (optional) Link Link (pw: f772) Agreement (MIT License)

[2021-11-20]: Publish the inference code and the pretrained model (Google Drive | Baidu Wangpan (pw: 2308)) that can be used to test on your own privacy-preserving or normal portrait images. Some test results on P3M-10k can be viewed from this demo page.

Introduction

Recently, there has been an increasing concern about the privacy issue raised by using personally identifiable information in machine learning. However, previous portrait matting methods were all based on identifiable portrait images.

To fill the gap, we present P3M-10k in this paper, which is the first large-scale anonymized benchmark for Privacy-Preserving Portrait Matting. P3M-10k consists of 10,000 high-resolution face-blurred portrait images along with high-quality alpha mattes. We systematically evaluate both trimap-free and trimap-based matting methods on P3M-10k and find that existing matting methods show different generalization capabilities when following the Privacy-Preserving Training (PPT) setting, 𝑖.𝑒., training on face-blurred images and testing on arbitrary images.

To devise a better trimap-free portrait matting model, we propose P3M-Net, which leverages the power of a unified framework for both semantic perception and detail matting, and specifically emphasizes the interaction between them and the encoder to facilitate the matting process. Extensive experiments on P3M-10k demonstrate that P3M-Net outperforms the state-of-the-art methods in terms of both objective metrics and subjective visual quality. Besides, it shows good generalization capacity under the PPT setting, confirming the value of P3M-10k for facilitating future research and enabling potential real-world applications.

PPT Setting and P3M-10k Dataset

PPT Setting: Due to the privacy concern, we propose the Privacy-Preserving Training (PPT) setting in portrait matting, 𝑖.𝑒., training on privacy-preserved images (𝑒.𝑔., processed by face obfuscation) and testing on arbitraty images with or without privacy content. As an initial step towards privacy-preserving portrait matting problem, we only define the identifiable faces in frontal and some profile portrait images as the private content in this work.

P3M-10k Dataset: To further explore the effect of PPT setting, we establish the first large-scale privacy-preserving portrait matting benchmark named P3M-10k. It contains 10,000 annonymized high-resolution portrait images by face obfuscation along with high-quality ground truth alpha mattes. Specifically, we carefully collect, filter, and annotate about 10,000 high-resolution images from the Internet with free use license. There are 9,421 images in the training set and 500 images in the test set, denoted as P3M-500-P. In addition, we also collect and annotate another 500 public celebrity images from the Internet without face obfuscation, to evaluate the performance of matting models under the PPT setting on normal portrait images, denoted as P3M-500-NP. We show some examples as below, where (a) is from the training set, (b) is from P3M-500-P, and (c) is from P3M-500-NP.

P3M-10k and the facemask are now published!! You can get access to it from the following links, please make sure that you have read and agreed to the agreement. Note that the facemask is not used in our work. So it's optional to download it.

Dataset

Dataset Link
(Google Drive)

Dataset Link
(Baidu Wangpan 百度网盘)

Dataset Release Agreement
P3M-10k Link Link (pw: fgmc) Agreement (MIT License)
P3M-10k facemask (optional) Link Link (pw: f772) Agreement (MIT License)

P3M-Net

Our proposed P3M-Net consists of four parts

  • A Multi-task Framework: To enable benefits from explicitly modeling both semantic segmentation and detail matting tasks and jointly optimizing for trimap-free matting, we follow [1] and [2], adopt a multi-task framework based on a modified version of ResNet-34, the model pretrained on ImageNet will be listed as follows;

  • TFI: Tripartite-Feature Integration: TFI module is used in each matting decoder block to model the interaction between encoder, segmentation decoder, and the matting decoder. TFI has three inputs, the feature map of the previous matting decoder block, the feature map from the same level semantic decoder block, and the feature map from the symmetrical encoder block. TFI passes them through a projection layer, concats the outputs and feeds into a convolutional block to generate the output feature;

  • sBFI: Shallow Bipartite-Feature Integration: sBFI module is used to model the interaction between the encoder and matting decoder. sBFI adopts the feature map from the first encoder block as a guidance to refine the output feature map from previous matting decoder block since shallow layers in the encoder contain many details and local structural information;

  • dBFI: Deep Bipartite-Feature Integration: dBFI module is used to model the interaction between the encoder and segmentation decoder. dBFI adopts the feature map from the last encoder block as a guidance for the semantic decoder since it contains abundant global semantics. Specifically, dBFI fuses the feature map from the last encoder with the ones from semantic decoder to improve the feature representation ability for the high-level semantic segmentation task.

Here we provide the model we pretrained on P3M-10k and the backbone we pretrained on ImageNet.

Model Pretrained Backbone on ImageNet Pretrained P3M-NET on P3M-10k
Google Drive Link Link

Baidu Wangpan
(百度网盘)

Link
(pw: 2v1t)

Link
(pw: 2308)

Benchmark

A systematic evaluation of the existing trimap-based and trimap-free matting methods on P3M-10k is conducted to investigate the impact of the privacy-preserving training (PPT) setting on different matting models and gain some useful insights. Part of the results are shown as below. Please refer to the paper for full tables.

In the following tables, "B" denotes the blurred images, and "N" denotes the normal images. "B:N" denotes training on blurred images while testing on normal images, vice versa.

Table 1. Results of trimap-based deep learning methods on P3M-500-P.
Setting B:B B:N N:B N:N
Method SAD MSE SAD MSE SAD MSE SAD MSE
DIM 4.8906 0.0115 4.8940 0.0116 4.8050 0.0116 4.7941 0.0116
AlphaGAN 5.2669 0.0112 5.2367 0.0112 5.7060 0.0120 5.6696 0.0119
GCA 4.3593 0.0088 4.3469 0.0089 4.4068 0.0089 4.4002 0.0089
IndexNet 5.1959 0.0156 5.2188 0.0158 5.8267 0.0202 5.8509 0.0204
FBA 4.1330 0.0088 4.1267 0.0088 4.1666 0.0086 4.1544 0.0086
Table 2. Results of trimap-free methods on P3M-500-P.
Setting B:B B:N N:B N:N
Method SAD MSE SAD MSE SAD MSE SAD MSE
SHM 21.56 0.0100 24.33 0.0116 23.91 0.0115 17.13 0.0075
LF 42.95 0.0191 30.84 0.0129 41.01 0.0174 31.22 0.0123
HATT 25.99 0.0054 26.5 0.0055 35.02 0.0103 22.93 0.0040
GFM 13.20 0.0050 13.08 0.0050 13.54 0.0048 10.73 0.0033
BASIC 15.13 0.0058 15.52 0.0060 24.38 0.0109 14.52 0.0054
P3M-Net (Ours) 8.73 0.0026 9.22 0.0028 11.22 0.0040 9.06 0.0028

Results

We test our network on our proposed P3M-500-P and P3M-500-NP and compare with previous SOTA methods, we list the results as below. More results on P3M-10k test set can be found here.

Inference Code - How to Test on Your Images

Here we provide the procedure of testing on sample images by our pretrained model:

  1. Setup environment following this instruction page;

  2. Insert the path REPOSITORY_ROOT_PATH in the file core/config.py;

  3. Download the pretrained P3M-Net model from here (Google Drive|Baidu Wangpan (pw: 2308)) and unzip to the folder models/pretrained/;

  4. Save your sample images in folder samples/original/.;

  5. Setup parameters in the file scripts/test_samples.sh and run by:

    chmod +x scripts/test_samples.sh

    scripts/test_samples.sh;

  6. The results of alpha matte and transparent color image will be saved in folder samples/result_alpha/. and samples/result_color/..

We show some sample images, the predicted alpha mattes, and their transparent results as below. We use the pretrained model from section Network with Hybrid (1 & 1/2) test strategy.

Statement

If you are interested in our work, please consider citing the following:

@inproceedings{10.1145/3474085.3475512,
author = {Li, Jizhizi and Ma, Sihan and Zhang, Jing and Tao, Dacheng},
title = {Privacy-Preserving Portrait Matting},
year = {2021},
isbn = {9781450386517},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3474085.3475512},
doi = {10.1145/3474085.3475512},
booktitle = {Proceedings of the 29th ACM International Conference on Multimedia},
pages = {3501–3509},
numpages = {9},
keywords = {trimap, benchmark, portrait matting, deep learning, semantic segmentation, privacy-preserving},
location = {Virtual Event, China},
series = {MM '21}
}

This project is under MIT licence.

For further questions, please contact Jizhizi Li at [email protected] or Sihan Ma at [email protected].

Relevant Projects

[1] Bridging Composite and Real: Towards End-to-end Deep Image Matting, IJCV, 2021 | Paper | Github
     Jizhizi Li, Jing Zhang, Stephen J. Maybank, Dacheng Tao

[2] Deep Automatic Natural Image Matting, IJCAI, 2021 | Paper | Github
     Jizhizi Li, Jing Zhang, and Dacheng Tao

Owner
Jizhizi_Li
Ph.D. student at the University of Sydney - Artificial Intelligence
Jizhizi_Li
ilpyt: imitation learning library with modular, baseline implementations in Pytorch

ilpyt The imitation learning toolbox (ilpyt) contains modular implementations of common deep imitation learning algorithms in PyTorch, with unified in

The MITRE Corporation 11 Nov 17, 2022
"Projelerle Yapay Zeka Ve Bilgisayarlı Görü" Kitabımın projeleri

"Projelerle Yapay Zeka Ve Bilgisayarlı Görü" Kitabımın projeleri Bu Github Reposundaki tüm projeler; kaleme almış olduğum "Projelerle Yapay Zekâ ve Bi

Ümit Aksoylu 4 Aug 03, 2022
Paddle-Skeleton-Based-Action-Recognition - DecoupleGCN-DropGraph, ASGCN, AGCN, STGCN

Paddle-Skeleton-Action-Recognition DecoupleGCN-DropGraph, ASGCN, AGCN, STGCN. Yo

Chenxu Peng 3 Nov 02, 2022
GPU Accelerated Non-rigid ICP for surface registration

GPU Accelerated Non-rigid ICP for surface registration Introduction Preivous Non-rigid ICP algorithm is usually implemented on CPU, and needs to solve

Haozhe Wu 144 Jan 04, 2023
The official implementation of EIGNN: Efficient Infinite-Depth Graph Neural Networks (NeurIPS 2021)

EIGNN: Efficient Infinite-Depth Graph Neural Networks The official implementation of EIGNN: Efficient Infinite-Depth Graph Neural Networks (NeurIPS 20

Juncheng Liu 14 Nov 22, 2022
PyTorch implementation of the NIPS-17 paper "Poincaré Embeddings for Learning Hierarchical Representations"

Poincaré Embeddings for Learning Hierarchical Representations PyTorch implementation of Poincaré Embeddings for Learning Hierarchical Representations

Facebook Research 1.6k Dec 25, 2022
A Real-Time-Strategy game for Deep Learning research

Description DeepRTS is a high-performance Real-TIme strategy game for Reinforcement Learning research. It is written in C++ for performance, but provi

Centre for Artificial Intelligence Research (CAIR) 156 Dec 19, 2022
Official PyTorch implementation of our AAAI22 paper: TransMEF: A Transformer-Based Multi-Exposure Image Fusion Framework via Self-Supervised Multi-Task Learning. Code will be available soon.

Official-PyTorch-Implementation-of-TransMEF Official PyTorch implementation of our AAAI22 paper: TransMEF: A Transformer-Based Multi-Exposure Image Fu

117 Dec 27, 2022
converts nominal survey data into a numerical value based on a dictionary lookup.

SWAP RATE Converts nominal survey data into a numerical values based on a dictionary lookup. It allows the user to switch nominal scale data from text

Jake Rhodes 1 Jan 18, 2022
Code for Mining the Benefits of Two-stage and One-stage HOI Detection

Status: Archive (code is provided as-is, no updates expected) PPO-EWMA [Paper] This is code for training agents using PPO-EWMA and PPG-EWMA, introduce

OpenAI 33 Dec 15, 2022
Allows including an action inside another action (by preprocessing the Yaml file). This is how composite actions should have worked.

actions-includes Allows including an action inside another action (by preprocessing the Yaml file). Instead of using uses or run in your action step,

Tim Ansell 70 Nov 04, 2022
BOVText: A Large-Scale, Multidimensional Multilingual Dataset for Video Text Spotting

BOVText: A Large-Scale, Bilingual Open World Dataset for Video Text Spotting Updated on December 10, 2021 (Release all dataset(2021 videos)) Updated o

weijiawu 47 Dec 26, 2022
This repository focus on Image Captioning & Video Captioning & Seq-to-Seq Learning & NLP

Awesome-Visual-Captioning Table of Contents ACL-2021 CVPR-2021 AAAI-2021 ACMMM-2020 NeurIPS-2020 ECCV-2020 CVPR-2020 ACL-2020 AAAI-2020 ACL-2019 NeurI

Ziqi Zhang 362 Jan 03, 2023
Hyperbolic Hierarchical Clustering.

Hyperbolic Hierarchical Clustering (HypHC) This code is the official PyTorch implementation of the NeurIPS 2020 paper: From Trees to Continuous Embedd

HazyResearch 154 Dec 15, 2022
Implementation of popular bandit algorithms in batch environments.

batch-bandits Implementation of popular bandit algorithms in batch environments. Source code to our paper "The Impact of Batch Learning in Stochastic

Danil Provodin 2 Sep 11, 2022
"Inductive Entity Representations from Text via Link Prediction" @ The Web Conference 2021

Inductive entity representations from text via link prediction This repository contains the code used for the experiments in the paper "Inductive enti

Daniel Daza 45 Jan 09, 2023
Train emoji embeddings based on emoji descriptions.

emoji2vec This is my attempt to train, visualize and evaluate emoji embeddings as presented by Ben Eisner, Tim Rocktäschel, Isabelle Augenstein, Matko

Miruna Pislar 17 Sep 03, 2022
for taichi voxel-challange event

Taichi Voxel Challenge Figure: result of python3 example6.py. Please replace the image above (demo.jpg) with yours, so that other people can immediate

Liming Xu 20 Nov 26, 2022
CC-GENERATOR - A python script for generating CC

CC-GENERATOR A python script for generating CC NOTE: This tool is for Educationa

Lêkzï 6 Oct 14, 2022
Improved Fitness Optimization Landscapes for Sequence Design

ReLSO Improved Fitness Optimization Landscapes for Sequence Design Description Citation How to run Training models Original data source Description In

Krishnaswamy Lab 44 Dec 20, 2022