Skip to content

harshithvh/Apple-voice-recognition

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

38 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Apple-voice-recognition

Machine Learning


Visual Studio Code

How does Siri work?


Siri is based on large-scale Machine Learning systems that employ many aspects of data science.

Upon receiving your request, Siri records the frequencies and sound waves from your voice and translates them into a code. Siri then breaks down the code to identify particular patterns, phrases, and keywords. This data gets input into an algorithm that sifts through thousands of combinations of sentences to determine what the inputted phrase means. This algorithm is complex enough that it is capable of working around idioms, homophones and other literary expressions to determine the context of a sentence.

Once Siri determines its request, it begins to assess what tasks needs to be carried out, determining whether or not the information needed can be accessed from within the phone’s data banks or from online servers. Siri is then able to craft complete and cohesive sentences relevant to the type of question or command requested.

Technology behind Voice Identification


Voice identification technology captures and measures the physical qualities of a person’s voice when speaking as well as the unique biological parameters that combine to produce that voice.

Visual Studio Code

These parameters Include:

#1 Pitch


Pitch is an important perceptual dimension by which listeners discriminate and categorize voice quality. It affects the perceived brightness of the sound, and brightness may be one of several perceptual features of a sound used by listeners to distinguish one voice quality from another.

#2 Intensity


The increased vocal intensity results from a greater resistance by the vocal folds to increased airflow. The vocal folds are blown wider apart, releasing a larger puff of air that sets up a sound pressure wave of greater amplitude.

#3 Dynamics


Within-person variability in our vocal signals is substantial: we volitionally modulate our voices to express our thoughts and intentions or adjust our vocal outputs to suit a particular audience, speaking environment, or situation.

Prerequisites


On the Terminal run - pip install speaker-verification-toolkit
On the Terminal run - pip install numba==0.48
In case an ERROR occurs while installing numba==0.48 then :
On the Terminal run - pip install librosa --ignore-installed llvmlite

Extra


> Numba is an upgraded version of Numpy.
> Librosa is a python package for music and audio analysis.
> svt.rms_silence_filter() used for filtering environment noise.
> Mel-Frequency Cepstral Coefficients (MFCC) feature extraction method is a leading approach for speech feature extraction that include pitch, intensity and dynamics.
> Known_1, Known_2, Unknown are sample audio voices.
> Covert audio from .mp4 to .wav beacuse librosa supports .wav
> .wav are decompressed files which consume more memory( better quality).

About

Machine Learning

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages