On 2010-06-15 at 10:30:00 (Brussels Time) |
Abstract
Recently the subject of data analysis are more and more frequently multidimensional datasets with huge sample lengths. Extraction of knowledge from such datasets is a very complicated task. Difficulties include mainly limitations of computer systems' performance when considering huge samples and methodological obstacles of high dimensional data analysis. In my talk I will introduce a technique of simultaneous dimensionality and sample length reduction based on parallelized variant of Fast Simulated Annealing (FSA). Proposed method relies on distance preserving linear transformation of given dataset to the feature space with reduced dimensionality. It will be applied to the tasks of exploratory data mining using statistical kernel density estimation i.e. cluster analysis, outlier detection and classification. I will give detailed description of the algorithm, taking a closer look at important customized elements of FSA such as neighbor generation, cooling schedule, parallelization scheme and termination criterion. Some preliminary experimental results will be shown along with the scope of further research in the subject.
Keywords
Data analysis, Dimensionality reduction