CS5611 (Fuzzy Sets: Theory and Applications)

Final Report for CS5611 (Fuzzy Sets: Theory and Applications):

Speaker Recognition Using Neuro-Fuzzy Techniques

³¯´¼°¶ (mr854347, ¸êºÓ¤@)
³¯§¶±l (mr844367, ¸êºÓ¤@)

Abstract	Problem Definition	Data Set Description	Our Approach
Simulation Results	Conclusions	Computer Programs	Division of labor
References

Abstract

This report describes our attempt to apply Neural Network and Fuzzy System for speaker recognition. The preliminary results show that backpropogation network is a better approach than ANFIS when applied to speaker recognition, which is obviously a problem of classification, not regression.

Problem Definition

This project tries to recognize a user input speach via a pretrained network. There are several types of speaker recognition:

Text-dependent or text-independent
Open-set or close-set

This project uses text-independ speach and a close-set speakers for speaker recognition. That is, we will identify your voice from a limited people, regardless of what you speak. There are many methods in the neuro-fuzzy catalog that can be used to achive this.

For neural network:

MultiLayer Perceptron
BackPropagation Network
Radial Basis Networks
Learning Vector Quantization

For fuzzy logic:

Fuzzy C-means clustering
K-nearest neighbor rule
Mahalonobis distance

And their mixup, ANFIS (Adaptive Neuro-Fuzzy Inference System). We only chose some representive methods to show the possibility of speaker recognition using neuro-fuzzy.

Data Set Description

Origin: This dataset was original from our senior who is preparing for his paper as speaker recognition. The dataset is obtained by recording several people's voice and transfer them into a MATLAB readable format. Basically the dataset can be generated at any time. All the frames are overlapped 256 samples (32 ms).

Description: The original sound file is transformed into several large matrices, which contain

sample data: that are trained to construct a speaker recognition network.
test data: that will be applied to the network mentioned above.

both matrices are two dimensions which are:

dimension 1: samples of voice.
dimension 2: samples' features that are extracted from sound file to sound features, and the last column is the sample's belonging.

Therefore, the network should accept n features inputs, and output the sample's belonging.

Past Usage: This dataset is courtesy of our senior.

Approach

Our approach to this problem can be explained in two aspects:

Input Selection: Because a sample can be extracted into many features, some important, some unimportant, and others noise, the input should be filtered so that the features are the most important and reduce the number of input, in case the network will be too big to train and learn.

Network Training: For backpropagation network, it is found that Levenberg-Marquardt training method is much faster than original. For ANFIS, the features should be reduced to around three. Neural networks do not have such restriction.

Simulation Results

Data Preprocessing and Input selection:

Feature extraction: Suppose a frame is defined as several continuous samples,

cepstrum(frame)=real(IFFT(log|FFT(frame)|))

FFT=Fast Fourier Transform

IFFT=Inverse FFT

Input normalization: none.
Output normalization: All outputs are linearly normalized to an integer, indicating the predicted tester.
Data reduction: The samples are reduced using the following techniques:

Editing: To eliminate boundary (regarded as noise) data.
Condensing: To eliminate redundant (deeply embedded) data.
Vector quantization: To find representative data.
Principal component projection: To reduce the dimensions of the feature sets.
Discriminant projection: To find the best set of vectors which best separates the patterns

Others: none

Data utilization:

Training data: A synonym of sample data.
Test data: Another dataset.

Network structure:

Neural networks:

n (features)-input 8-rule backpropagation network
n (features)-input radial basis network.
n (features)-input LVQ network.

ANFIS network:

Structure is decided by program. See below for detail.

Training method:

Backpropagation network:

Method: Levenberg-Marquardt
Level of network: 2
First layer: tansig
Second layer: purelin
Interlayer complexity: 8
Error goal: Sample number dependent

Radial basis network:

Method: Solution
Level of network: 2
First layer: radbasis
Second layer: purelin

LVQ network:

Class number: Number of dataset classes

ANFIS

Input features: 4
Class number: 3
Other infomation output by program

Number of nodes: 55
Number of linear parameters: 80
Number of nonlinear parameters: 24
Total number of parameters: 104
Number of training data pairs: 1063
Number of checking data pairs: 578
Number of fuzzy rules: 16

First two feature of data.

Training Results:

Backpropagation Network:

Training Result

ANFIS

Training RMSE

Training Result

Concluding Remarks and Future Work

Concluding Remarks:

Neural Networks seem better methods for speaker recognition than ANFIS. There are two major catalogs for Neuro-Fuzzy applications: regression and classification. Neural networks can do both, and ANFIS is suitable for regression. When an ANFIS is fed too many features, the effeciency drops quickly, whereas neural networks do gradually. As for the recognition percentage, Levenberg-Marquardt method seems to have the best result, due to it's fast training and it accepts many features.

Future Work:

As we introduced, there are several types of speaker recognition. We hope that someday we will be able to do an openset speaker recognition, even if there is noise. Just to recognize. This should be useful.

Computer Programs

Our programs do not stand alone. They are function calls by my senior's main program. Therefore there is no program for test. Sorry pals...

Division of Labor

³¯´¼°¶: Neural networks tester, homepage modification.
³¯§¶±l: Fuzzy and ANFIS tester, oral presentation.

References

[Jang97] J.-S. R. Jang, C.-T. Sun, and E. Mizutani, "Neuro-Fuzzy and Soft Computing: A Computational Approach to Learning and Machine Intelligence," Prentice Hall, 1997.
Neural Network TOOLBOX, The MATHWORKS Inc.
Fuzzy Logic TOOLBOX, The MATHWORKS Inc.