Abstrakt

MST Clustering and Relevancy Analysis for Key Element Identification Process

Satheesh V, Poongothai T

Data items are grouped with reference to the similarity under the clustering process. Similarity measures are used to analyze the relationship between the transactions. Vector based similarity models are suitable for low dimensional data values. High dimensional data values are clustered using subspace clustering methods. Feature selection involves identifying a subset of the most useful features that produces compatible results. A feature selection algorithm is constructed with the consideration of efficiency and effectiveness factors. The efficiency concerns the time required to find a subset of features. The effectiveness is related to the quality of the subset of features. High dimensional data clustering and feature selection process is carried out using the Fast clustering algorithm. FAST algorithm is divided into two steps. In the first step, features are divided into clusters by using graph-theoretic clustering methods. In the second step, the most representative feature is selected from each cluster to form a subset of features. Features in different clusters are relatively independent. The clustering-based strategy of FAST has a high probability of producing a subset of useful and independent features. Minimum-Spanning Tree (MST) clustering method is adopted to ensure the efficiency of FAST. Feature subset selection algorithm is used to identify the features from the clusters. Transaction similarity analysis is carried out with different type of correlation measures in the feature selection process. Dynamic feature intervals can be used to distinguish features. Redundant feature filtering mechanism is used to filter the similar features. Custom threshold is used to improve the cluster accuracy.

Haftungsausschluss: Dieser Abstract wurde mit Hilfe von Künstlicher Intelligenz übersetzt und wurde noch nicht überprüft oder verifiziert