Complete portable pipeline for masking of Aadhaar Number adhering to Govt. Privacy Guidelines.

Overview

Aadhaar Number Masking Pipeline

Implementation of a complete pipeline that masks the Aadhaar Number in given images to adhere to Govt. of India's Privacy Guidelines for storage of Aadhaar Card images in digital repository. The following project was carried out as an internship for Muthoot Finance. We make use of the open source packages CRAFT text detector | Paper | Pretrained Model | Github Repo provided by Clova AI Research for OSD and combine a heurestic model with pytesseract OCR for masking.

Rohit Ranjan, Ram Sundaram.

Sample Results

teaser

Versions

The search for the best masking pipeline led us to experiment with several different approaches. We have documented our experiments in other branches.

Branch(->model) Speed/Performance Pipeline
main Best performing CRAFT + pytesseract + dimensional heuristics
CNN_OCR->cnn_model Fastest masking CRAFT + LeNet trained by us
CNN_OCR->cnn_model_2 Fastest masking CRAFT + LeNet trained by us
UNET_OCDR Theoretically Fastest but trained model unavailable** UNet

**We proposed and implemented a pipeline which uses a single UNet model for achieving a desirable mask. A single model would have made the inference very fast and real time use capable on mobile devices. Training meant creating a dataset since the company could not legally provide us the needed data. After several trials, we halted work on this model because with barely 150 unique datapoints available, a data hungry UNet Model is simply unsatiable for now.

Datasets

Lenet was trained on our self-created labelled dataset | labels.

Getting started

Install dependencies

Requirements

  • torch
  • opencv-python
  • tesseract-ocr
  • check requirements.txt
pip install -r requirements.txt

Test instructions

  • Clone this repository
git clone https://github.com/thefurorjuror/Aadhaar_Masker.git
  • Run on an image folder
python [folder path to the cloned repo]/masker.py --test_folder=[folder path to test images] --output_folder=[folder path to output images] --cuda=[True/False]
#Example- When one is inside the cloned repo
python masker.py --test_folder=./images/ --output_folder=./output/ --cuda=True

cuda is set to False by default.

Citation

@inproceedings{baek2019character,
  title={Character Region Awareness for Text Detection},
  author={Baek, Youngmin and Lee, Bado and Han, Dongyoon and Yun, Sangdoo and Lee, Hwalsuk},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  pages={9365--9374},
  year={2019}
}

License

Copyright (c) 2021-present Rohit Ranjan & Ram Sundaram.

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.
Python client for the iNaturalist APIs

pyinaturalist Introduction iNaturalist is a community science platform that helps people get involved in the natural world by observing and identifyin

Nicolas Noé 79 Dec 22, 2022
Lamblayer: a minimal deployment tool for AWS Lambda layers

lamblayer lamblayer is a minimal deployment tool for AWS Lambda layers. lamblayer does, Create a Layers of built pip-installable python packages. Crea

Yusuke Takahashi 2 Aug 19, 2022
Generate visualizations of GitHub user and repository statistics using GitHubActions

GitHub Stats Visualization Generate visualizations of GitHub user and repository

Jun Shi 3 Dec 15, 2022
Utility for downloading fanfiction in bulk from the Archive of Our Own

What is this? This is a program intended to help you download fanfiction from the Archive of Our Own in bulk. This program is primarily intended to wo

73 Dec 30, 2022
Anti-league-discordbot - Harrasses imbeciles for playing league of legends

anti-league-discordbot harrasses imbeciles for playing league of legends Running

Chris Clem 2 Feb 12, 2022
A melhor maneira de atender seus clientes no Telegram!

Clientes.Chat Sobre o serviço Configuração Banco de Dados Variáveis de Ambiente Docker Python Heroku Contribuição Sobre o serviço A maneira mais organ

Gabriel R F 10 Oct 12, 2022
A Telegram Bot Plays With Words!!!

TheWordzBot ➠ I Can Turn Text Into Audio ➠ I Can Get Results From Dictionary ➠ I Can Make Google Search For You ➠ I Can Suggest Strong Passwords For Y

RAVEEN KUMAR 8 Feb 28, 2022
Nonebot2 简易群管

简易群管 ✨ NoneBot2 简易群管 ✨ _ 踢 改 禁 欢迎issue pr 权限说明:permission=SUPERUSER 安装 💿 pip install nonebot-plugin-admin 导入 📲 在bot.py 导入,语句: nonebot.load_plugin("n

幼稚园园长 74 Dec 22, 2022
A repo-watcher to watch for commits on a repo an trigger GitHub action by sending a `repository_dispatch` event to destinantion repo

repo-watcher-dispatch-sender This app is used to send a repository_dispatch event to the destination repo set in config.py or Environmental Variables

Divide Projects™ 2 Feb 06, 2022
:snake: Python SDK to query Scaleway APIs.

Scaleway SDK Python SDK to query Scaleway's APIs. Stable release: Development: Installation The package is available on pip. To install it in a virtua

Scaleway 114 Dec 11, 2022
A tool for extracting plain text from Wikipedia dumps

WikiExtractor WikiExtractor.py is a Python script that extracts and cleans text from a Wikipedia database dump. The tool is written in Python and requ

Giuseppe Attardi 3.2k Dec 31, 2022
A minimalist file manager for those who want to use Linux mobile devices.

Portfolio A minimalist file manager for those who want to use Linux mobile devices. Usage Tap to activate and press to select, to browse, open, copy,

Martin Abente Lahaye 71 Nov 18, 2022
Cities bot - A simple example of using aiogram and the wikipedia package

Cities game A simple example of using aiogram and the wikipedia package. The bot

Artem Meller 2 Jan 29, 2022
A Telegram bot for remotely managing Binance Trade Bot

Binance Trade Bot Manager Telegram A Telegram bot for remotely managing Binance Trade Bot. If you have feature requests please open an issue on this r

Lorenzo Callegari 乐子睿 350 Jan 01, 2023
ANKIT-OS/TG-SESSION-HACK-BOT: A Special Repository.Telegram Bot Which Can Hack The Victim By Using That Victim Session

🔰 ᵀᴱᴸᴱᴳᴿᴬᴹ ᴴᴬᶜᴷ ᴮᴼᵀ 🔰 The owner would not be responsible for any kind of bans due to the bot. • ⚡ INSTALLING ⚡ • • 🛠️ Lᴀɴɢᴜᴀɢᴇs Aɴᴅ Tᴏᴏʟs 🔰 • If

ANKIT KUMAR 2 Dec 24, 2021
Companion "receiver" to matrix-appservice-webhooks for [matrix].

Matrix Webhook Receiver Companion "receiver" to matrix-appservice-webhooks for [matrix]. The purpose of this app is to listen for generic webhook mess

Kim Brose 13 Sep 29, 2022
L3DAS22 challenge supporting API

L3DAS22 challenge supporting API This repository supports the L3DAS22 IEEE ICASSP Grand Challenge and it is aimed at downloading the dataset, pre-proc

L3DAS 38 Dec 25, 2022
Eva Maria Bot With Python

Eva Maria Bot Features Auto Filter Manual Filter IMDB Admin Commands Broadcast Index IMDB search Inline Search Random pics ids and User info Stats, Us

Aadhi 3 Jan 06, 2022
Pydapper - A pure python port of the NuGet library dapper

pydapper A pure python library inspired by the NuGet library dapper. pydapper is

Zach Schumacher 38 Jan 02, 2023
SickNerd aims to slowly enumerate Google Dorks via the googlesearch API then requests found pages for metadata

CLI tool for making Google Dorking a passive recon experience. With the ability to fetch and filter dorks from GHDB.

Jake Wnuk 21 Jan 02, 2023