Updates
Latest Tweet
What's New?
Check out for latest innovation, a computer based training video collection
Like this Page
Survey of Text Mining I: Clustering, Classification, and Retrieval (No. 1) Review by W Boudville
subjective extraction of clusters
The book is relatively brief, given the technical nature of its chapters, each written by different authors. Many clustering methods are described. Most can be seen to have some degree of subjectivity, in defining what ends up in a given cluster. Or whether a cluster even exists or not.
The analysis of Web documents forms a major portion of the book. This data set is vast, continually changing and expanding. Plus, it is noisy. Unlike many clean data sets that might be extracted from a corpus of books, for example. Attention should be paid to methods of automatically extracting information from the Web.
The book does not go much into the higher level problems of defining ontologies. Which are very hard tasks. The closest it seems to get is along the lines of finding similar words in documents. Which is still very useful.