The first and earliest approach uses long-term av-
erages of acoustic features, such as spectrum rep-
resentations or pitch. Second approach models the
speaker-dependent acoustic features within the
individual phonetic sounds that comprise the ut-
terance. Third and the latest approach is the use of
discriminative neural networks (NN). Speech and
speaker recognition in general are the subset any
pattern recognition. Thus, three stages are applied
in any speaker recognition task (1) training (2) test-
ing and (3) implementation.
The logic behind the speaker recognition is to clas-
sify the differences in speaker’s articulatory organs,
shape of vocal tract, size of the nasal cavity, speak-
er intonation and speaker prosody to identify the
speaker correctly. Furthermore, a language model
can be used to improve the performance. In actu-
al, significant errors introduced in the training and
testing data due to the inclusion of environmental
noise, , convolution or white noise, and speaker’s
39