GazeScroller - Using Facial Movements to perform Hands-free Gesture on the system

Overview

GazeScroller

Using Facial Movements to perform Hands-free Gesture on the system

Abstract

As our world is getting digitized on an fast rate, every person is having a device that is making life better. Also, there is a considerate amount of the society that do not have interactions as others to these devices. One such example are the quadriplegic people (people suffering from paralysis) which constitute to 5.4 million people people in the world*. Our aim here is to make them interact with the digital world. In this project, facial movements of the person's face is fed to the system on real-time and a certain list of operations can be performed on the system using these facial actions.Additionally, we will extend this system to mini-games on the internet like the Dino Game. Finally, I have evaluated the system by five people and found that they have positively to the system. These results imply that we can generalise this system to the entire world.

Approach

The project captures live stream of the video via webcam of the system. It then maps the face to 68 landmark points via the library Dlib. The movements of the points corresponding to the eye and nose are monitored continously. The functionalities covered in the project include : • Detect blink of one eye to enable/disable scrolling. • Detect the scroll movement based on the movement of the point on the nose. Using Blink to toggle scroll and head direction to scroll

Background Study

Blinking is an involuntary action of a human being.Blinks can be spontaneous, reflex and voluntary, and eye blink rate depends on various factors including environmental factors, type of activity.

In order to segregate natural blink of the eye with the intentional blink of one eye of the user for functionality 1 as discussed above, I have studied the eye width ratios of by conducting experiments study over 5 users with each subject testing for 10 times. This data analysis is used to understand to difference in the eye width ratio between both the eyes to when a user blinks one of the eye. Secondly, the intentional blink of the eye is put on a threshold for 3 frames to detect blink. These procedures helped detect the intentional one eye blink from the natural blink of the eyes. The information from the Fig 1 gives us the details of the eye ratio and the delta (difference between the eye ratios). We take the mean and use them as a reference in our code as threshold.

Technical Tools :

• Dlib - a library used to detect face per frame via webcam • Python - language to write the code • landmarksPoints.dat file - this file is used to superimpose landmarks onto the face detected. • pynput - library to invoke keyboard and mouse keys.

System Setup :

By using the tools of mentioned above, we get the face of the user per frame superimposed by landmark points. Calculations for each frame include :

rightEyeWidthRatio = height of the right eye/ width of the right eye leftEyeWidthRatio = height of the left eye/ width of the left eye delta = abs(leftEyeWidthRatio - rightEyeWidthRatio) Whenever a user blinks one eye, following cases are checked • Check 1 : if delta > threshold of delta taken from fig.1 • Check 2 : if leftEyeWidthRatio < threshold value of blink and frame count is 3. • If Check 1 and Check 2 true , trigger Blink and enable scrolling. UX Aspects : Trigger notifications in the system when scrolling is toggled.

Discussion & Future Scope:

In the present work I have not made much effort into perfectly the model and in CV. I have worked towards the thresholds and correlating to the use case I mentioned in the abstract. If substantial work is detecting the exact eye wink using ML models, the system would be much better. The false blinks being recorded is because we lack a model here. In the future scope , we can use this feature to build interactive games to the quadriplegic people to improve their psychological status too.

Conclusion :

All the subjects who have tested responded positively to the system and felt good about it. Therefore, we can say that our system is performing good to scroll pages using the nose and to capture the blink of the eye as a toggle gesture.

Hence, such a model will be beneficial to quadriplegic people and help them to interact with the digital world.Since the false blinks are low, the system is good to be used. It can be further perfected with ML models to give better accuracy to be used by the quadriplegic people.

AniGAN: Style-Guided Generative Adversarial Networks for Unsupervised Anime Face Generation

AniGAN: Style-Guided Generative Adversarial Networks for Unsupervised Anime Face Generation AniGAN: Style-Guided Generative Adversarial Networks for U

Bing Li 81 Dec 14, 2022
Python utility to generate filesystem content for Obsidian.

Security Vault Generator Quickly parse, format, and output common frameworks/content for Obsidian.md. There is a strong focus on MITRE ATT&CK because

Justin Angel 73 Dec 02, 2022
Simple improvement of VQVAE that allow to generate x2 sized images compared to baseline

vqvae_dwt_distiller.pytorch Simple improvement of VQVAE that allow to generate x2 sized images compared to baseline. It allows to generate 512x512 ima

Sergei Belousov 25 Jul 19, 2022
Source code for Zalo AI 2021 submission

zalo_ltr_2021 Source code for Zalo AI 2021 submission Solution: Pipeline We use the pipepline in the picture below: Our pipeline is combination of BM2

128 Dec 27, 2022
Language model Prompt And Query Archive

LPAQA: Language model Prompt And Query Archive This repository contains data and code for the paper How Can We Know What Language Models Know? Install

127 Dec 20, 2022
Hough Transform and Hough Line Transform Using OpenCV

Hough transform is a feature extraction method for detecting simple shapes such as circles, lines, etc in an image. Hough Transform and Hough Line Transform is implemented in OpenCV with two methods;

Happy N. Monday 3 Feb 15, 2022
Implementation of experiments in the paper Clockwork Variational Autoencoders (project website) using JAX and Flax

Clockwork VAEs in JAX/Flax Implementation of experiments in the paper Clockwork Variational Autoencoders (project website) using JAX and Flax, ported

Julius Kunze 26 Oct 05, 2022
Time Series Forecasting with Temporal Fusion Transformer in Pytorch

Forecasting with the Temporal Fusion Transformer Multi-horizon forecasting often contains a complex mix of inputs – including static (i.e. time-invari

Nicolás Fornasari 6 Jan 24, 2022
Recreate CenternetV2 based on MMDET.

Introduction This project is trying to Recreate CenternetV2 based on MMDET, which is proposed in paper Probabilistic two-stage detection. This project

25 Dec 09, 2022
The original weights of some Caffe models, ported to PyTorch.

pytorch-caffe-models This repo contains the original weights of some Caffe models, ported to PyTorch. Currently there are: GoogLeNet (Going Deeper wit

Katherine Crowson 9 Nov 04, 2022
A Repository of Community-Driven Natural Instructions

A Repository of Community-Driven Natural Instructions TLDR; this repository maintains a community effort to create a large collection of tasks and the

AI2 244 Jan 04, 2023
CN24 is a complete semantic segmentation framework using fully convolutional networks

Build status: master (production branch): develop (development branch): Welcome to the CN24 GitHub repository! CN24 is a complete semantic segmentatio

Computer Vision Group Jena 123 Jul 14, 2022
A PyTorch Implementation of Gated Graph Sequence Neural Networks (GGNN)

A PyTorch Implementation of GGNN This is a PyTorch implementation of the Gated Graph Sequence Neural Networks (GGNN) as described in the paper Gated G

Ching-Yao Chuang 427 Dec 13, 2022
A 1.3B text-to-image generation model trained on 14 million image-text pairs

minDALL-E on Conceptual Captions minDALL-E, named after minGPT, is a 1.3B text-to-image generation model trained on 14 million image-text pairs for no

Kakao Brain 604 Dec 14, 2022
Implements Stacked-RNN in numpy and torch with manual forward and backward functions

Recurrent Neural Networks Implements simple recurrent network and a stacked recurrent network in numpy and torch respectively. Both flavours implement

Vishal R 1 Nov 16, 2021
A LiDAR point cloud cluster for panoptic segmentation

Divide-and-Merge-LiDAR-Panoptic-Cluster A demo video of our method with semantic prior: More information will be coming soon! As a PhD student, I don'

YimingZhao 65 Dec 22, 2022
Enabling dynamic analysis of Legacy Embedded Systems in full emulated environment

PENecro This project is based on "Enabling dynamic analysis of Legacy Embedded Systems in full emulated environment", published on hardwear.io USA 202

Ta-Lun Yen 10 May 17, 2022
Source code for models described in the paper "AudioCLIP: Extending CLIP to Image, Text and Audio" (https://arxiv.org/abs/2106.13043)

AudioCLIP Extending CLIP to Image, Text and Audio This repository contains implementation of the models described in the paper arXiv:2106.13043. This

458 Jan 02, 2023
Pytorch implementation of DeePSiM

Pytorch implementation of DeePSiM

1 Nov 05, 2021
All materials of Cassandra Event, Udyam'22

Cassandra 2022 Workspace Workshop Materials Workshop-1 Workshop-2 Workshop-3 Workshop-4 Assignments Assignment-1 Assignment-2 Assignment-3 Resources P

36 Dec 31, 2022