Complete portable pipeline for masking of Aadhaar Number adhering to Govt. Privacy Guidelines.

Overview

Aadhaar Number Masking Pipeline

Implementation of a complete pipeline that masks the Aadhaar Number in given images to adhere to Govt. of India's Privacy Guidelines for storage of Aadhaar Card images in digital repository. The following project was carried out as an internship for Muthoot Finance. We make use of the open source packages CRAFT text detector | Paper | Pretrained Model | Github Repo provided by Clova AI Research for OSD and combine a heurestic model with pytesseract OCR for masking.

Rohit Ranjan, Ram Sundaram.

Sample Results

teaser

Versions

The search for the best masking pipeline led us to experiment with several different approaches. We have documented our experiments in other branches.

Branch(->model) Speed/Performance Pipeline
main Best performing CRAFT + pytesseract + dimensional heuristics
CNN_OCR->cnn_model Fastest masking CRAFT + LeNet trained by us
CNN_OCR->cnn_model_2 Fastest masking CRAFT + LeNet trained by us
UNET_OCDR Theoretically Fastest but trained model unavailable** UNet

**We proposed and implemented a pipeline which uses a single UNet model for achieving a desirable mask. A single model would have made the inference very fast and real time use capable on mobile devices. Training meant creating a dataset since the company could not legally provide us the needed data. After several trials, we halted work on this model because with barely 150 unique datapoints available, a data hungry UNet Model is simply unsatiable for now.

Datasets

Lenet was trained on our self-created labelled dataset | labels.

Getting started

Install dependencies

Requirements

  • torch
  • opencv-python
  • tesseract-ocr
  • check requirements.txt
pip install -r requirements.txt

Test instructions

  • Clone this repository
git clone https://github.com/thefurorjuror/Aadhaar_Masker.git
  • Run on an image folder
python [folder path to the cloned repo]/masker.py --test_folder=[folder path to test images] --output_folder=[folder path to output images] --cuda=[True/False]
#Example- When one is inside the cloned repo
python masker.py --test_folder=./images/ --output_folder=./output/ --cuda=True

cuda is set to False by default.

Citation

@inproceedings{baek2019character,
  title={Character Region Awareness for Text Detection},
  author={Baek, Youngmin and Lee, Bado and Han, Dongyoon and Yun, Sangdoo and Lee, Hwalsuk},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  pages={9365--9374},
  year={2019}
}

License

Copyright (c) 2021-present Rohit Ranjan & Ram Sundaram.

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.
A Telegram Music Bot written in Python using Pyrogram and Py-Tgcalls

A Telegram Music Bot written in Python using Pyrogram and Py-Tgcalls. This is Also The Source Code of The UserBot Which is Playing Music in @S1-BOTS Support Group ❤️

SAF ONE 224 Dec 28, 2022
Projeto Informações Conta do Instagram - Instagram Account Information Project

VESTA-tools A collection of simple tools that proved to be needed for handling large periodic calculations with the VASP software package. distTotCalc

Thiago Souza 1 Dec 02, 2021
Video Stream is a telegram bot project that's allow you to play video on telegram group video chat

Video Stream is a telegram bot project that's allow you to play video on telegram group video chat 🚀 Get SESSION_NAME from below: Pyrogram ## ✨ Featu

1 Nov 10, 2021
=>|<= the MsgRoom bot for Windows 96

=|= bot A MsgRoom bot in Python 3 for Windows96.net. The bot joins as =|=, if parameter name_lasts is not true and default_name is =|=. The full

Larry Holst 2 Jun 07, 2022
Reverse engineering the dengue virus (under development construction)

Reverse engineering the dengue virus (under development 🚧 ) What is dengue? Dengue is a viral infection transmitted to humans through the bite of inf

kjain 4 Feb 09, 2022
We propose the adversarial blur attack (ABA) against visual object tracking.

ABA We propose the adversarial blur attack (ABA) against visual object tracking. The ICCV link: https://arxiv.org/abs/2107.12085 and, https://openacce

Qing Guo 13 Dec 01, 2022
Provide discord buttons feature for discord.py

dpy_buttons wrapper library for discord.py, providing discord buttons feature. Future of the library Will be merged into discord interaction api libra

Minjun Kim (Lapis0875) 17 Feb 02, 2022
Pack up to 3MB of data into a tweetable PNG polyglot file.

tweetable-polyglot-png Pack up to 3MB of data into a tweetable PNG polyglot file. See it in action here: https://twitter.com/David3141593/status/13719

David Buchanan 2.4k Dec 29, 2022
This is a simple unofficial async Api-wrapper for tio.run

Async-Tio This is a simple unofficial async Api-wrapper for tio.run

Tom-the-Bomb 7 Oct 28, 2022
A simple and stupid Miinto API wrapper

miinto-api-wrapper Miinto API Wrapper is a simple python wrapper for Miinto API. Miinto is a fashion luxury marketplace. For more information see the

Giuseppe Checchia 3 Jan 09, 2022
Minimal API for the COVID Booking System of the Offices at the UniPD Math Dep

Simple and easy to use python BOT for the COVID registration booking system of the math department @ unipd (torre archimede). This API creates an interface with the official website, with more useful

Guglielmo Camporese 4 Dec 24, 2021
Policy and data administration, distribution, and real-time updates on top of Open Policy Agent

⚡ OPAL ⚡ Open Policy Administration Layer OPAL is an administration layer for Open Policy Agent (OPA), detecting changes to both policy and policy dat

8 Dec 07, 2022
quote is a python wrapper for the Goodreads Quote API, powered by gazpacho.

About quote is a python wrapper for the Goodreads Quote API, powered by gazpacho.

Max Humber 11 Nov 10, 2022
Tracks twitter spaces and sends it to a discord webhook.

Tracks twitter spaces and sends it to a discord webhook. Uses the twitter api to find twitter spaces and then the m3u8 url for the space is found using selenium and will have it posted using a discor

Sam Phung 20 Dec 17, 2022
🐍 The official Python client library for Google's discovery based APIs.

Google API Client This is the Python client library for Google's discovery based APIs. To get started, please see the docs folder. These client librar

Google APIs 6.2k Dec 31, 2022
Plataforma para atendimento a outras empresas que necessitam de atendimento técnico.

Plataforma para atendimento a outras empresas que necessitam de atendimento técnico. É possível que os usuarios de empresas parceiras registrem solici

Kelvin Alisson Cantarino 2 Jun 29, 2022
Free and Open Source Group Voice chat music player for telegram ❤️ with button support youtube playback support

Free and Open Source Group Voice chat music player for telegram ❤️ with button support youtube playback support

Sehath Perera 1 Jan 08, 2022
An automated tool that fetches information about your crypto stake and generates historical data in time.

Introduction Yield explorer is a WIP! I needed a tool that would show me historical data and performance of my staked crypto but was unable to find a

Sedat Can Yalçın 42 Nov 26, 2022
CRUD database for python discord bot developers that stores data on discord text channels

Discord Database A CRUD (Create Read Update Delete) database for python Discord bot developers. All data is stored in key-value pairs directly on disc

Ankush Singh 7 Oct 22, 2022
A Bot to Track Kernel Upstreams from kernel.org and Post it on Telegram Channel

Channel Kernel Tracker is the channel where the bot will be sending the updates in. Introduction This is a Telegram Bot to Track Kernel Upstreams kern

Kartikeya Hegde 3 Oct 05, 2021