On 2013-01-07 at 15:00:00 (Brussels Time) |
Abstract
Pattern Mining is a very popular topic in the data mining community. Literally hundreds of algorithms for efficiently enumerating many different types of frequent patterns have been proposed. These exhaustive algorithms, however, all suffer from the pattern explosion problem. Depending on the minimal support threshold, even for moderately sized databases, millions of patterns may be generated. Although this problem is by now well recognized in the pattern mining community, it has not yet been solved satisfactorily. Recently, promising methods based upon the minimal description length principle, information theory, and statistical models have been introduced. We will give an overview of these techniques and show an algorithm based upon compression algorithms for mining non-redundant frequent subsequences in a stream.
Keywords
pattern mining, non-redundant, minimal description length, data stream