Abstracto

Classification of XML Document by Extracting Structural and Textual Features

Gnana Vardhini. H,Anju Abraham

In this paper the XML document classification is done by both structural and content based features. By this classification informative feature vectors are represented. In structural extraction, the tree-mining algorithm is used. For textual extraction, the algorithm is developed by using fuzzy c-means clustering algorithm. Once the classification is done the supervised classification algorithm is used which combines both structural and textual feature vectors. From which we get the classifier model. In this classification we can obtain 85% to 87% classification accuracy, which is more than the previously achieved classification accuracy.