Similarity checking of sign languages

This repository checks for similarity between

BSL & ASL
BSL & Auslan
BSL & ISL

with a segmentation model from "Temporal segmentation of sign language videos" that is pre-trained in British sign language (BSL).

Although we have represented these three compararishn (as we found existing comaparative values for these three only) in 0ur paper "A Machine Learning-based Segmentation Approach for Measuring Similarity Between Sign Languages", we worked with total of 5 datasets for experimentation and kept them in our repository.

For Auslan dataset, we have a total of $5$ folders namely AUSLAN, AUSLAN2, AUSLAN3, AUSLAN4, AUSLAN5.
For ASL dataset, we have a total of 2 folders namely, how2sign and how2sign2.
For ISL we have only one folder named as ISL.
For Autsl we have only one folder named AUTSL. All these folders have a common structure. They have core 3 subfolders named as output, processed, raw along with some .py files made to preprocess the dataset.
Here:-
1. processed_dataset name: Inside the processed subfolder, there are three more subsubfolders namely, srt, video, vtt. , input videos, output segmented signs accordingly. Srt folder contains the ground truth files in srt for each sentence. vtt folder contains the predicted files of each of those sentences. Video folder holds the video files for each of those sentences.
2. raw_dataset name: It has srt, videos folders holding the signs with temporal boundaries, input videos accordingly.
3. process_dataset name.py: This is the model we implemented to process the raw dataset. Processed folder is the one where we put our final pre-processed dataset.
Other than dataset folder, we have two more important folders namely, processed_input_output, processed_matrices. There are also .py files named as processed_input_output.py, processed_matrices.py and a final output generator .py file namely processed_output. As our project grew, we partitioned different steps of our code work to keep track of step by step results.
We have used some pickles in similar fashioned names to store result and information along with storing them in csv for later use purpose.

Setups

All set up instructions for "Temporal segmentation of sign language videos" can be found here. The other pre-processing models are python files.

Models

The models for pre-processing the datasets are attached in the datasets folder. And the segmentation model that we modified for our experiment is in here.

Results

A summary of or results are shown in summary.xlsx and an elaborated version of the main result is shown in matrices_final

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
datasets		datasets
processed_input_output		processed_input_output
processed_matrices		processed_matrices
processed_matrices_backup		processed_matrices_backup
.gitignore		.gitignore
README.md		README.md
how2sign.png		how2sign.png
ira.png		ira.png
matrices_final.csv		matrices_final.csv
matrices_final_backup.csv		matrices_final_backup.csv
process_input_output.py		process_input_output.py
process_matric_info.py		process_matric_info.py
process_result.py		process_result.py
process_woodward.py		process_woodward.py
prro.png		prro.png
stopwords.py		stopwords.py
summary.xlsx		summary.xlsx
sw_ww_AUSLAN		sw_ww_AUSLAN
sw_ww_AUSLAN2		sw_ww_AUSLAN2
sw_ww_AUSLAN3		sw_ww_AUSLAN3
sw_ww_AUSLAN4		sw_ww_AUSLAN4
sw_ww_AUSLAN5		sw_ww_AUSLAN5
sw_ww_ISL		sw_ww_ISL
swadeshList.csv		swadeshList.csv
swadeshList_ori.csv		swadeshList_ori.csv
sww_ww_AUSLAN.csv		sww_ww_AUSLAN.csv
sww_ww_AUSLAN2.csv		sww_ww_AUSLAN2.csv
sww_ww_AUSLAN3.csv		sww_ww_AUSLAN3.csv
sww_ww_AUSLAN4.csv		sww_ww_AUSLAN4.csv
sww_ww_AUSLAN5.csv		sww_ww_AUSLAN5.csv
sww_ww_ISL.csv		sww_ww_ISL.csv
woodward.csv		woodward.csv
~$summary.xlsx		~$summary.xlsx

tonnidas/sign_similarity

Folders and files

Latest commit

History

Repository files navigation

Similarity checking of sign languages

Contents

Setups

Models

Results

About

Resources

Stars

Watchers

Forks

Languages