blog




  • Essay / Text Classification Systems - 1054

    Currently, there are many classification systems. Generally speaking, these systems fall into two main categories. These are binary and multiclass systems. Binary classification systems only concern the classification of documents into two main categories or groups. Classification systems of this type are used to distinguish only two classes of objects. As Maranis and Bebenko (2009) explain, these systems provide a Yes/No answer to the question: does this document belong to class X? In this, such systems can be useful for classifying emails when they are classified as spam or not, or commercial transactions when they are determined to be fraudulent or not. In such applications, it is more likely and easier to use binary classification systems because we only have two classes or groups. Multiclass systems, in turn, divide documents into two or more classes. As their name suggests, these classifiers assign each document or data point to one of several classes where each has a distinct domain. Newspaper accounts, for example, can be categorized into different categories such as news, sports, culture, business and money, politics, science, etc. This thesis is only interested in the grouping of texts. That is, he makes no a priori assumptions about the interrelations between Hardy's prose works. Computational methods for grouping text fall into two main categories. These are mathematical, linguistic and statistical methods (Srivastava and Sahami, 2009; Justo and Torres, 2005). Linguistic methods rely on natural language processing techniques. Methods of this type typically involve morphological and syntactic processes to extract meaning and identify relationships within documents. Mathematical and statistical classification...... middle of article ......sks, including SenseClusters (Purandare and Pedersen, 2004). This and others are programs that allow users to group similar contexts such as emails and web pages together (Pedersen, 2008). The operating principle of such programs is that data documents can be grouped based on their mutual contextual similarities (Purandare and Pedersen, 2004). Programs of this type have indeed proven to be an effective bundling method when applied to web pages and their merits are more tangible with multimedia material. However, such an approach has certain limitations. One of them, perhaps the most important, is that it is not interested in analyzing the content of documents. Another drawback is that in almost all applications of contextual classification, "identical repetitions of controlled experiments result in different conclusions" (Martin et al.., 2005: 470).