The audio-video synchronization of MKV Container Format is exploited to achieve data hiding

Overview

1.0 Data Hiding in MKV Container Format

1.1 Brief Description

The audio-video synchronization of MKV Container Format is exploited to achieve data hiding, where the hidden data can be utilized for various management purposes, including hyper-linking, annotation, and authentication

1.2 Video Demonstration @ YouTube

Data Hiding (Hidden Watermark) in MKV Container Format

1.3 Requirements

  • Linux (not tested anywhere else)
  • Python
  • .MKV reader (like VLC player)
  • All the files are required:
    • .MKV video (./VideoForTesting/2mb.mkv)
    • ./convert_xml2mkv.py
    • ./parse_and_convert_mkv2xml.py
    • ./find_data.py
    • ./hide_data.py
    • ./find
    • ./hide
  • Ensure that you have all the permission to access these files. Run the following command: chmod +x convert_xml2mkv.py && chmod +x find_data.py && chmod +x hide_data.py && chmod +x parse_and_convert_mkv2xml.py
  • If the command above doesn't work and Linux prevents your access you may use the following command on any of the affected files: chmod +x filename.extension

1.4 How To Run Data Embedding Process

Note: for screenshots refer to the end of the ./Maxim_Zaika_Data_Hiding_in_MKV_Container.pdf file

  1. Ensure 1.3 Requirements are fulfilled
  2. Run ./hide from your terminal within the folder where files are located.
  3. Enter the name of the .MKV container: 2mb.mkv.
  4. Enter the data that needs to be hidden: 'example'. Write it down!
  5. Enter the SECRET KEY that will be used to decrypt your data in the data detecting process: 'encryption key'. Write it down!
  6. Enter the timecode where data will be saved to: 10.523 or type 'help' to display all the available timecodes. Write it down!
  7. File modified_mkv.mkv should now be created that stores your hidden data.

Note: do not lose text of the hidden data, SECRET KEY, and the timecode. Otherwise, you won't be able to verify it later.

1.5 How To Run Data Detecting Process

  1. Ensure 1.3 Requirements are fulfilled
  2. Run ./find from your terminal within the folder where files are located.
  3. Enter the file name: modified_mkv.mkv.
  4. Enter the text of your hidden data: 'example'.
  5. Enter the SECRET KEY used: 'encryption key'.
  6. Enter the timecode used: 10.523.
  7. If the data is matching then it will show a success.

2.0 Data Embedding Process

2.1 Software Architecture of Data Embedding

DataEmbeddingDesign

2.2 Data Embedding Design

DataEmbeddingDesign

2.3 Data Embedding Pseudocode

Note: this is incomplete representation.

Function main {
  Set a_word -> “word that needs to be written in”
  Set encryption_key -> “key used for the encryption”
  If (length of encryption_key) < (length of a_word) {
	  Set encryption_key -> same length as a_word
  }
  Set a_word -> convert to ascii
  Set encryption_key -> convert to ascii
  Set ascii_a_word -> convert to hexadecimal
  Set ascii_encryption_key -> convert to hexadecimal
  If (length of ascii_encryption_key) < (length of ascii_a_word) { 
	  Set ascii_encryption_key = -> same length as ascii_a_word
  }
  Encrypt a_word(ascii_a_word, ascii_encryption_key, a_word) // encrypt ascii word
                                                             // using original word 
  Convert encrypted word to hexadecimal // because MKV parser accepts hexadecimals
                                        // inside the cluster’s timecode
  Timecodes = [] // read the XML file and identify the timecodes
  Set input_timecode -> “input timecode here”
  Call function embed data (filename, input_timecode, encrypted_word_in_hexadecimal_format)
}

Function embed data {
	Loop through the file {
		Identify the location of the timecode {
			Identify the location of the data inside the cluster’s timecode {
				Write-in the data
			}
		} else not found timecode {
			Try again
		}
	}
}

3.0 Data Detecting Process

3.1 Software Architecture of Data Detecting

DataEmbeddingDesign

3.2 Data Detecting Design

DataEmbeddingDesign

3.3 Data Embedding Pseudocode

Note: this is incomplete representation.

Function detect data {
	Set hexadecimal_word -> ‘the encrypted word’ \\ basically the identical process like in data 
						                                    \\ hiding process
	Loop through the file {
		Loop each line of the file {
			Identify the location of the timecode {
				Identify the data inside the cluster’s timecode {
					Read through the line ignoring first 6 characters // format
				}
				If there is at least 1 miss-match {
					Return error
				} else fully matched {
					Return success
				}
			}
		}
	}
}

4.0 Results

Description Explanation
Limited Number of Cluster's Timecodes Modifying more than two cluster’s timecodes cause slight video distortion; however, modifying even more timecodes causes both video and audio distortions.
Embedding Capacity Passed test of up to 2,500 characters. Assumption is that 2,500 characters should be more than enough for the user.
File Size Increment Original file: 2.1 MB (2,097,641 bytes) -> Modified File (2,500 characters): 2.1 MB (2,122,058 bytes). Increased by 23,417 bytes (1.00%).

5.0 Additional Information

For more information (like testing and background information), refer to the .PDF file attached to this repository: ./Maxim_Zaika_Data_Hiding_in_MKV_Container.pdf

6.0 Credits

It would not be possible to complete this project without MKV > XML > MKV parser created by Vitaly "_Vi" Shukela: https://github.com/vi/mkvparse.

Parser is rewritten for my own needs (for better understanding) and included in this repository to ensure that there is no mismatch with Vitaly's version. If you are interested in the parser, please, refer to his repository provided above. I do not take any credit for its creation.

Owner
Maxim Zaika
Maxim Zaika
A port of muP to JAX/Haiku

MUP for Haiku This is a (very preliminary) port of Yang and Hu et al.'s μP repo to Haiku and JAX. It's not feature complete, and I'm very open to sugg

18 Dec 30, 2022
This repository contains code for the paper "Disentangling Label Distribution for Long-tailed Visual Recognition", published at CVPR' 2021

Disentangling Label Distribution for Long-tailed Visual Recognition (CVPR 2021) Arxiv link Blog post This codebase is built on Causal Norm. Install co

Hyperconnect 85 Oct 18, 2022
[ICML 2021] Towards Understanding and Mitigating Social Biases in Language Models

Towards Understanding and Mitigating Social Biases in Language Models This repo contains code and data for evaluating and mitigating bias from generat

Paul Liang 42 Jan 03, 2023
Semantic Segmentation in Pytorch

PyTorch Semantic Segmentation Introduction This repository is a PyTorch implementation for semantic segmentation / scene parsing. The code is easy to

Hengshuang Zhao 1.2k Jan 01, 2023
MACE is a deep learning inference framework optimized for mobile heterogeneous computing platforms.

Documentation | FAQ | Release Notes | Roadmap | MACE Model Zoo | Demo | Join Us | 中文 Mobile AI Compute Engine (or MACE for short) is a deep learning i

Xiaomi 4.7k Dec 29, 2022
Imaginaire - NVIDIA's Deep Imagination Team's PyTorch Library

Imaginaire Docs | License | Installation | Model Zoo Imaginaire is a pytorch library that contains optimized implementation of several image and video

NVIDIA Research Projects 3.6k Dec 29, 2022
ElegantRL is featured with lightweight, efficient and stable, for researchers and practitioners.

Lightweight, efficient and stable implementations of deep reinforcement learning algorithms using PyTorch. 🔥

AI4Finance 2.5k Jan 08, 2023
This repository is to support contributions for tools for the Project CodeNet dataset hosted in DAX

The goal of Project CodeNet is to provide the AI-for-Code research community with a large scale, diverse, and high quality curated dataset to drive innovation in AI techniques.

International Business Machines 1.2k Jan 04, 2023
(NeurIPS 2021) Pytorch implementation of paper "Re-ranking for image retrieval and transductive few-shot classification"

SSR (NeurIPS 2021) Pytorch implementation of paper "Re-ranking for image retrieval and transductivefew-shot classification" [Paper] [Project webpage]

xshen 29 Dec 06, 2022
Ratatoskr: Worcester Tech's conference scheduling system

Ratatoskr: Worcester Tech's conference scheduling system In Norse mythology, Ratatoskr is a squirrel who runs up and down the world tree Yggdrasil to

4 Dec 22, 2022
BESS: Balanced Evolutionary Semi-Stacking for Disease Detection via Partially Labeled Imbalanced Tongue Data

Balanced-Evolutionary-Semi-Stacking Code for the paper ''BESS: Balanced Evolutionary Semi-Stacking for Disease Detection via Partially Labeled Imbalan

0 Jan 16, 2022
A Home Assistant custom component for Lobe. Lobe is an AI tool that can classify images.

Lobe This is a Home Assistant custom component for Lobe. Lobe is an AI tool that can classify images. This component lets you easily use an exported m

Kendell R 4 Feb 28, 2022
This is a repository for a Semantic Segmentation inference API using the Gluoncv CV toolkit

BMW Semantic Segmentation GPU/CPU Inference API This is a repository for a Semantic Segmentation inference API using the Gluoncv CV toolkit. The train

BMW TechOffice MUNICH 56 Nov 24, 2022
Classic Papers for Beginners and Impact Scope for Authors.

There have been billions of academic papers around the world. However, maybe only 0.0...01% among them are valuable or are worth reading. Since our limited life has never been forever, TopPaper provi

Qiulin Zhang 228 Dec 18, 2022
DeepStruc is a Conditional Variational Autoencoder which can predict the mono-metallic nanoparticle from a Pair Distribution Function.

ChemRxiv | [Paper] XXX DeepStruc Welcome to DeepStruc, a Deep Generative Model (DGM) that learns the relation between PDF and atomic structure and the

Emil Thyge Skaaning Kjær 13 Aug 01, 2022
Airborne magnetic data of the Osborne Mine and Lightning Creek sill complex, Australia

Osborne Mine, Australia - Airborne total-field magnetic anomaly This is a section of a survey acquired in 1990 by the Queensland Government, Australia

Fatiando a Terra Datasets 1 Jan 21, 2022
Automatically erase objects in the video, such as logo, text, etc.

Video-Auto-Wipe Read English Introduction:Here   本人不定期的基于生成技术制作一些好玩有趣的算法模型,这次带来的作品是“视频擦除”方向的应用模型,它实现的功能是自动感知到视频中我们不想看见的部分(譬如广告、水印、字幕、图标等等)然后进行擦除。由于图标擦

seeprettyface.com 141 Dec 26, 2022
particle tracking model, works with the ROMS output file(qck.nc, his.nc)

particle-tracking-model-for-ROMS particle tracking model, works with the ROMS output file(qck.nc, his.nc) description this is a 2-dimensional particle

xusheng 1 Jan 11, 2022
Software Platform for solving and manipulating multiparametric programs in Python

PPOPT Python Parametric OPtimization Toolbox (PPOPT) is a software platform for solving and manipulating multiparametric programs in Python. This pack

10 Sep 13, 2022
PyElastica is the Python implementation of Elastica, an open-source software for the simulation of assemblies of slender, one-dimensional structures using Cosserat Rod theory.

PyElastica PyElastica is the python implementation of Elastica: an open-source project for simulating assemblies of slender, one-dimensional structure

Gazzola Lab 105 Jan 09, 2023