Final Report for CS5611
(Fuzzy Sets: Theory and Applications):
Speaker Recognition Using Neuro-Fuzzy Techniques
³¯´¼°¶
(mr854347, ¸êºÓ¤@)
³¯§¶±l
(mr844367, ¸êºÓ¤@)
Table of Contents
Abstract
This report describes our attempt to apply Neural Network and Fuzzy
System for speaker recognition. The preliminary results show that backpropogation
network is a better approach than ANFIS when applied to speaker recognition,
which is obviously a problem of classification, not regression.
Problem Definition
This project tries to recognize a user input speach via a pretrained
network. There are several types of speaker recognition:
- Text-dependent or text-independent
- Open-set or close-set
This project uses text-independ speach and a close-set speakers for
speaker recognition. That is, we will identify your voice from a limited
people, regardless of what you speak. There are many methods in the neuro-fuzzy
catalog that can be used to achive this.
For neural network:
- MultiLayer Perceptron
- BackPropagation Network
- Radial Basis Networks
- Learning Vector Quantization
For fuzzy logic:
- Fuzzy C-means clustering
- K-nearest neighbor rule
- Mahalonobis distance
And their mixup, ANFIS (Adaptive Neuro-Fuzzy Inference System). We only
chose some representive methods to show the possibility of speaker recognition
using neuro-fuzzy.
Data Set Description
- Origin: This dataset was original from our senior who is preparing
for his paper as speaker recognition. The dataset is obtained by recording
several people's voice and transfer them into a MATLAB readable format.
Basically the dataset can be generated at any time. All the frames are
overlapped 256 samples (32 ms).
- Description: The original sound file is transformed into several
large matrices, which contain
- sample data: that are trained to construct a speaker recognition network.
- test data: that will be applied to the network mentioned above.
both matrices are two dimensions which are:
- dimension 1: samples of voice.
- dimension 2: samples' features that are extracted from sound file to
sound features, and the last column is the sample's belonging.
Therefore, the network should accept n features inputs, and output the
sample's belonging.
- Past Usage: This dataset is courtesy of our senior.
Approach
Our approach to this problem can be explained in two aspects:
- Input Selection: Because a sample can be extracted into many features,
some important, some unimportant, and others noise, the input should be
filtered so that the features are the most important and reduce the number
of input, in case the network will be too big to train and learn.
- Network Training: For backpropagation network, it is found that Levenberg-Marquardt
training method is much faster than original. For ANFIS, the features should
be reduced to around three. Neural networks do not have such restriction.
Simulation Results
- Data Preprocessing and Input selection:
- Radial basis network:
- Method: Solution
- Level of network: 2
- First layer: radbasis
- Second layer: purelin
- LVQ network:
- Class number: Number of dataset classes
- ANFIS
- Input features: 4
- Class number: 3
- Other infomation output by program
- Number of nodes: 55
- Number of linear parameters: 80
- Number of nonlinear parameters: 24
- Total number of parameters: 104
- Number of training data pairs: 1063
- Number of checking data pairs: 578
- Number of fuzzy rules: 16

First two feature of data.
- Training Results:


Training Result

Training RMSE


Training Result
Concluding Remarks and Future
Work
- Concluding Remarks:
Neural Networks seem better methods for speaker recognition than ANFIS.
There are two major catalogs for Neuro-Fuzzy applications: regression and
classification. Neural networks can do both, and ANFIS is suitable for
regression. When an ANFIS is fed too many features, the effeciency drops
quickly, whereas neural networks do gradually. As for the recognition percentage,
Levenberg-Marquardt method seems to have the best result, due to it's fast
training and it accepts many features.
- Future Work:
As we introduced, there are several types of speaker recognition. We
hope that someday we will be able to do an openset speaker recognition,
even if there is noise. Just to recognize. This should be useful.
Computer Programs
Our programs do not stand alone. They are function calls by my senior's
main program. Therefore there is no program for test. Sorry pals...
Division of Labor
- ³¯´¼°¶:
Neural networks tester, homepage modification.
- ³¯§¶±l:
Fuzzy and ANFIS tester, oral presentation.
References
- [Jang97] J.-S. R. Jang, C.-T. Sun, and E. Mizutani, "Neuro-Fuzzy
and Soft Computing: A Computational Approach to Learning and Machine Intelligence,"
Prentice Hall, 1997.
- Neural Network TOOLBOX, The MATHWORKS Inc.
- Fuzzy Logic TOOLBOX, The MATHWORKS Inc.