A Novel Representation of Sequence Data based on Structural Information for Effective Music Retrieval

Similarity search in a sequence database consists of two major issues, i.e. the representation of a sequence and the similarity measure between two sequences. In this paper, we propose a novel representation of sequences based on the structural information of the sequences. A sequence is represented by a set of rules, which are derived from its subsequences. There are two types of subsequences of interest. One is called frequent pattern, which is a subsequence appearing often enough in the sequence. The other is called correlative pattern, which is a subsequence composed of highly correlated elements. The rules derived from the frequent patterns are called frequent rules, while the ones derived from the correlative patterns are called correlative rules. By considering music objects as sequences, we represent each of them as a set of rules and design a similarity function for effective music retrieval. The experimental results show that our approach based on the frequent rules achieves 23% improvement on the average precision to the approaches based on the Markov-model, while our approach based on the correlative rules achieves 56% improvement.