Image

Nikunj Oza

Member since: Sep 30, 2010, NASA

Pseudo-Label Generation for Multi-Label Text Classification

Shared by Nikunj Oza, updated on Mar 28, 2013

Summary

Author(s) :
Mohammad Salim Ahmed, Latifur Khan, Nikunj C. Oza
Abstract

With the advent and expansion of social networking, the amount of generated text
data has seen a sharp increase. In order to handle such a huge volume of text data, new and improved text mining techniques are a necessity. One of the characteristics of text data that makes text mining difficult, is multi-labelity. In order to build a robust and effective text classification method which is an integral part of text mining research, we must consider this property more closely. This kind of property is not unique to text data as it can be found in non-text (e.g., numeric) data as well. However, in text data, it is most prevalent. This property also puts the text classification problem in the domain of multi-label classification (MLC), where each instance
is associated with a subset of class-labels instead of a single class, as in conventional classification. In this paper, we explore how the generation of pseudo labels (i.e., combinations of existing class labels) can help us in performing better text classification and under what kind of circumstances. During the classification, the high and sparse dimensionality of text data has also been considered. Although, here we are proposing and evaluating a text classification technique, our main focus is on the handling of the multi-labelity of text data while utilizing the correlation among multiple labels existing in the data set. Our text classification technique is called pseudo-LSC (pseudo-Label Based Subspace Clustering). It is a subspace clustering algorithm that considers the high and sparse dimensionality as well as the correlation among different class labels during the classification process to provide better performance than existing approaches. Results on three real world multi-label data sets provide us insight into how the multi-labelity is handled in our classification process and shows the effectiveness of our approach.

show more info
Publication Name
Conference on Intelligent Data Understanding
Publication Location
N/A
Year Published
2011

Files

ahkh11.pdf
673.1 KB 51 downloads

Other projects using this item:

Text Mining Algorithms & Applications

Discussions

Add New Comment

Nikunj's Projects (11)

Need help?

Visit our help center