Abstrakt

Discovering Relations among Documents Using Novel Text Retrieval Technique

Manjiri Gajanan Ghadi, Carmen Lysandra Pereira , Manimozhi R.

Text categorization is one of the key techniques in text mining to categorize the documents in a supervised manner.In this paper we have done study on automaticcategorization of news items.The categorization algorithm transforms each document into a vector of weights corresponding to an automatically extracted set of keywords. This process is performed on a large set of news items, forming the multi-dimensional space populated by news items of known categories. An unknown news item is also transformed into a vector of keyword weights and then categorized using the k-means method in this space. Finally the documents are compared based on weighted keywords to find which documents are most similar.