|
|
| Acesso ao texto completo restrito à biblioteca da Embrapa Agricultura Digital. Para informações adicionais entre em contato com cnptia.biblioteca@embrapa.br. |
Registro Completo |
Biblioteca(s): |
Embrapa Agricultura Digital. |
Data corrente: |
27/06/2018 |
Data da última atualização: |
07/01/2020 |
Tipo da produção científica: |
Artigo em Periódico Indexado |
Autoria: |
SANTOS, F. F. dos; DOMINGUES, M. A.; SUNDERMANN, C. V.; CARVALHO, V. O. de; MOURA, M. F.; REZENDE, S. O. |
Afiliação: |
FABIANO FERNANDES DOS SANTOS, ICMC/USP; MARCOS AURÉLIO DOMINGUES, UEM; CAMILA VACCARI SUNDERMANN, ICMC/USP; VERONICA OLIVEIRA DE CARVALHO, Unesp Rio Claro; MARIA FERNANDA MOURA, CNPTIA; SOLANGE OLIVEIRA REZENDE, ICMC/USP. |
Título: |
Latent association rule cluster based model to extract topics for classification and recommendation applications. |
Ano de publicação: |
2018 |
Fonte/Imprenta: |
Expert Systems with Applications, New York, v. 112, n. 1, p. 34-60, Dec. 2018. |
DOI: |
https://doi.org/10.1016/j.eswa.2018.06.021 |
Idioma: |
Inglês |
Conteúdo: |
The quality of any text mining technique is highly dependent on the features that are used to represent the document collection. A classical form of document representation is the vector space model (VSM), according to which the documents are represented as vectors of weights that correspond to the features of the documents. The bag-of-words model is the most popular VSM approach due to its simplicity and general applicability, but this model does not include term dependency and has a high dimensionality. In the literature, several models for document representation have been proposed in order to capture the dependency of terms. Among them, the topic model representation is one of the most interesting approaches - since it describes the collection of documents in a way that reveals their internal struc- ture and the interrelationships therein, and also provides a dimensionality reduction. However, even for topic models, the efficient extraction of information concerning the relations among terms for document representation is still a major research challenge. In order to address this issue, we proposed the latent association rule cluster based model (LARCM). The LARCM is a non-probabilistic topic model that makes use of association rule clustering to build a document representation with low dimensionality in such a way that each feature (i.e., topic) is comprised of information concerning relations among the terms. We evaluated the interpretability of the topics obtained by using our proposed model against the ones provided by the traditional latent dirichlet allocation (LDA) model and the LDA model using a document representation that includes correlated terms (i.e., bag-of-related-words). The experimental results indi- cated that the LARCM provides topics with better interpretability than the LDA models. Additionally, we used the topics obtained by the LARCM in two different applications: text classification and page recommendation. With respect to text classification, the topics were used to improve document collection representation. Concerning page recommendation, topics were used as contextual information in context- aware recommender systems. Results have shown that the topics provided by the LARCM can be used to improve both applications. MenosThe quality of any text mining technique is highly dependent on the features that are used to represent the document collection. A classical form of document representation is the vector space model (VSM), according to which the documents are represented as vectors of weights that correspond to the features of the documents. The bag-of-words model is the most popular VSM approach due to its simplicity and general applicability, but this model does not include term dependency and has a high dimensionality. In the literature, several models for document representation have been proposed in order to capture the dependency of terms. Among them, the topic model representation is one of the most interesting approaches - since it describes the collection of documents in a way that reveals their internal struc- ture and the interrelationships therein, and also provides a dimensionality reduction. However, even for topic models, the efficient extraction of information concerning the relations among terms for document representation is still a major research challenge. In order to address this issue, we proposed the latent association rule cluster based model (LARCM). The LARCM is a non-probabilistic topic model that makes use of association rule clustering to build a document representation with low dimensionality in such a way that each feature (i.e., topic) is comprised of information concerning relations among the terms. We evaluated the interpretability of the topics obtained by ... Mostrar Tudo |
Palavras-Chave: |
Association rules; Clustering; Clusterização; Context-aware recommender systems; Document representation; Mineração de textos; Regras de associação; Text classification; Text mining; Topic model. |
Thesaurus Nal: |
Cluster analysis. |
Categoria do assunto: |
X Pesquisa, Tecnologia e Engenharia |
Marc: |
LEADER 03339naa a2200325 a 4500 001 2092838 005 2020-01-07 008 2018 bl uuuu u00u1 u #d 024 7 $ahttps://doi.org/10.1016/j.eswa.2018.06.021$2DOI 100 1 $aSANTOS, F. F. dos 245 $aLatent association rule cluster based model to extract topics for classification and recommendation applications.$h[electronic resource] 260 $c2018 520 $aThe quality of any text mining technique is highly dependent on the features that are used to represent the document collection. A classical form of document representation is the vector space model (VSM), according to which the documents are represented as vectors of weights that correspond to the features of the documents. The bag-of-words model is the most popular VSM approach due to its simplicity and general applicability, but this model does not include term dependency and has a high dimensionality. In the literature, several models for document representation have been proposed in order to capture the dependency of terms. Among them, the topic model representation is one of the most interesting approaches - since it describes the collection of documents in a way that reveals their internal struc- ture and the interrelationships therein, and also provides a dimensionality reduction. However, even for topic models, the efficient extraction of information concerning the relations among terms for document representation is still a major research challenge. In order to address this issue, we proposed the latent association rule cluster based model (LARCM). The LARCM is a non-probabilistic topic model that makes use of association rule clustering to build a document representation with low dimensionality in such a way that each feature (i.e., topic) is comprised of information concerning relations among the terms. We evaluated the interpretability of the topics obtained by using our proposed model against the ones provided by the traditional latent dirichlet allocation (LDA) model and the LDA model using a document representation that includes correlated terms (i.e., bag-of-related-words). The experimental results indi- cated that the LARCM provides topics with better interpretability than the LDA models. Additionally, we used the topics obtained by the LARCM in two different applications: text classification and page recommendation. With respect to text classification, the topics were used to improve document collection representation. Concerning page recommendation, topics were used as contextual information in context- aware recommender systems. Results have shown that the topics provided by the LARCM can be used to improve both applications. 650 $aCluster analysis 653 $aAssociation rules 653 $aClustering 653 $aClusterização 653 $aContext-aware recommender systems 653 $aDocument representation 653 $aMineração de textos 653 $aRegras de associação 653 $aText classification 653 $aText mining 653 $aTopic model 700 1 $aDOMINGUES, M. A. 700 1 $aSUNDERMANN, C. V. 700 1 $aCARVALHO, V. O. de 700 1 $aMOURA, M. F. 700 1 $aREZENDE, S. O. 773 $tExpert Systems with Applications, New York$gv. 112, n. 1, p. 34-60, Dec. 2018.
Download
Esconder MarcMostrar Marc Completo |
Registro original: |
Embrapa Agricultura Digital (CNPTIA) |
|
Biblioteca |
ID |
Origem |
Tipo/Formato |
Classificação |
Cutter |
Registro |
Volume |
Status |
URL |
Voltar
|
|
Registros recuperados : 1 | |
1. | | SANTOS, F. F. dos; DOMINGUES, M. A.; SUNDERMANN, C. V.; CARVALHO, V. O. de; MOURA, M. F.; REZENDE, S. O. Latent association rule cluster based model to extract topics for classification and recommendation applications. Expert Systems with Applications, New York, v. 112, n. 1, p. 34-60, Dec. 2018.Tipo: Artigo em Periódico Indexado | Circulação/Nível: A - 1 |
Biblioteca(s): Embrapa Agricultura Digital. |
| |
Registros recuperados : 1 | |
|
Nenhum registro encontrado para a expressão de busca informada. |
|
|