http://www.cnr.it/ontology/cnr/individuo/prodotto/ID206834
Using micro-documents for feature selection: the case of ordinal text classification (Contributo in atti di convegno)
- Type
- Label
- Using micro-documents for feature selection: the case of ordinal text classification (Contributo in atti di convegno) (literal)
- Anno
- 2011-01-01T00:00:00+01:00 (literal)
- Alternative label
Baccianella S., Esuli, A., Sebastiani, F. (2011)
Using micro-documents for feature selection: the case of ordinal text classification
in 2nd Italian Information Retrieval Workshop, IIR 2011, Milano, IT, 27-28 gennaio 2011
(literal)
- Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#autori
- Baccianella S., Esuli, A., Sebastiani, F. (literal)
- Pagina inizio
- Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#altreInformazioni
- Area di valutazione 01 - Scienze matematiche e informatiche . - Numero documento/Codice originale: /cnr.isti/2011-TR-001 (literal)
- Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#url
- http://ceur-ws.org/Vol-704/7.pdf (literal)
- Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#volumeInCollana
- Note
- Scopu (literal)
- PuMa (literal)
- Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#affiliazioni
- CNR-ISTI, Pisa, italy; CNR-ISTI, Pisa, italy; CNR-ISTI, Pisa, italy (literal)
- Titolo
- Using micro-documents for feature selection: the case of ordinal text classification (literal)
- Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#curatoriVolume
- Massimo Melucci, Stefano Mizzaro, Gabriella Pasi (literal)
- Abstract
- Most popular feature selection (FS) methods for text classification (TC) such as information gain (a.k.a. mutual information), chi-square, and odds ratio, are based on binary information concerning the presence/absence of the feature in each training document. As such, these methods do not exploit a rich source of information, namely, the information concerning how frequently the feature occurs in each training document (term frequency). In order to overcome this drawback we break down each training document of length k into k training \"micro-documents\", each consisting of a single word occurrence and endowed with the same class information of the original training document. This move has the double effect of (a) allowing all the original FS methods to be still straightforwardly applicable, and (b) making them sensitive to term frequency. We study the impact of this strategy in the case of ordinal TC, using four recently introduced FS functions, two SVM-based learning methods, and two large datasets of product reviews. The experiments show that the use of this strategy substantially improves the accuracy of ordinal TC. (literal)
- Prodotto di
- Autore CNR
- Insieme di parole chiave
Incoming links:
- Prodotto
- Autore CNR di
- Insieme di parole chiave di