Apple-voice-recognition

Machine Learning

How does Siri work?

Siri is based on large-scale Machine Learning systems that employ many aspects of data science.

Upon receiving your request, Siri records the frequencies and sound waves from your voice and translates them into a code. Siri then breaks down the code to identify particular patterns, phrases, and keywords. This data gets input into an algorithm that sifts through thousands of combinations of sentences to determine what the inputted phrase means. This algorithm is complex enough that it is capable of working around idioms, homophones and other literary expressions to determine the context of a sentence.

Once Siri determines its request, it begins to assess what tasks needs to be carried out, determining whether or not the information needed can be accessed from within the phone’s data banks or from online servers. Siri is then able to craft complete and cohesive sentences relevant to the type of question or command requested.

Technology behind Voice Identification

Voice identification technology captures and measures the physical qualities of a person’s voice when speaking as well as the unique biological parameters that combine to produce that voice.

These parameters Include:

#1 Pitch

Pitch is an important perceptual dimension by which listeners discriminate and categorize voice quality. It affects the perceived brightness of the sound, and brightness may be one of several perceptual features of a sound used by listeners to distinguish one voice quality from another.

#2 Intensity

The increased vocal intensity results from a greater resistance by the vocal folds to increased airflow. The vocal folds are blown wider apart, releasing a larger puff of air that sets up a sound pressure wave of greater amplitude.

#3 Dynamics

Within-person variability in our vocal signals is substantial: we volitionally modulate our voices to express our thoughts and intentions or adjust our vocal outputs to suit a particular audience, speaking environment, or situation.

Prerequisites

On the Terminal run - pip install speaker-verification-toolkit
On the Terminal run - pip install numba==0.48
In case an ERROR occurs while installing numba==0.48 then :
On the Terminal run - pip install librosa --ignore-installed llvmlite

Extra

> Numba is an upgraded version of Numpy.
> Librosa is a python package for music and audio analysis.
> svt.rms_silence_filter() used for filtering environment noise.
> Mel-Frequency Cepstral Coefficients (MFCC) feature extraction method is a leading approach for speech feature extraction that include pitch, intensity and dynamics.
> Known_1, Known_2, Unknown are sample audio voices.
> Covert audio from .mp4 to .wav beacuse librosa supports .wav
> .wav are decompressed files which consume more memory( better quality).

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
Known_2.wav		Known_2.wav
README.md		README.md
Unknown.wav		Unknown.wav
known_1.wav		known_1.wav
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Known_2.wav

Known_2.wav

README.md

README.md

Unknown.wav

Unknown.wav

known_1.wav

known_1.wav

main.py

main.py

Repository files navigation

Apple-voice-recognition

How does Siri work?

Technology behind Voice Identification

#1 Pitch

#2 Intensity

#3 Dynamics

Prerequisites

Extra

About

Releases

Packages

Languages

harshithvh/Apple-voice-recognition

Folders and files

Latest commit

History

Repository files navigation

Apple-voice-recognition

How does Siri work?

Technology behind Voice Identification

#1 Pitch

#2 Intensity

#3 Dynamics

Prerequisites

Extra

About

Resources

Stars

Watchers

Forks

Languages