http://www.cnr.it/ontology/cnr/individuo/prodotto/ID128275

Investigation of named entity recognition in molecular biology by data fusion (Comunicazione a convegno)

Type

Comunicazione a convegno (Classe)
Prodotto della ricerca (Classe)

Label

Investigation of named entity recognition in molecular biology by data fusion (Comunicazione a convegno) (literal)

Anno

2006-01-01T00:00:00+01:00 (literal)

Alternative label

P. Arrigo, P. P. Cardo (2006)
Investigation of named entity recognition in molecular biology by data fusion
in Fifth Conference on Bioinformatics of genome regulation and structure, Novosibirsk, july 16-22
(literal)

Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#autori

P. Arrigo, P. P. Cardo (literal)

Pagina inizio

255 (literal)

Pagina fine

258 (literal)

Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#pagineTotali

4 (literal)

Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#affiliazioni

Istituto di Dermatologia Universita'di Genova CNR ISMAC (literal)

Titolo

Investigation of named entity recognition in molecular biology by data fusion (literal)

Abstract

Motivation: The amount of published scientific literature is fast expanding its management and processing is become a burden task. Text Mining (TM) is acquiring a key role for bioinformatics; it seems one of more suitable approaches for heterogeneous data sources integration. Textual data has been recently used to support scientific hypotheses generation (Literature Based Discovery). In this work we have considered the screening of previous unknown molecular in a literature based discovery perspective. The identification of molecular species, that could interact among them, is the first step for in silico design of molecular interaction networks. In order to achieve this goal, we have need to extract the more standardized set of potentially interacting molecules. The published articles reflects the fragmentation biomedical researches, this situation could affect the reliability of the set. During the temporal evolution, new published papers can modify the knowledge about molecular interactions. The evaluation these changes on knowledge is important in a model development perspective. The screening of potentially interacting molecules could be considered equivalent to linguistic named entity recognition process. In this paper we have applied an ensemble of unsupervised learning machines to selection and extraction of named entities associated to potentially interacting molecules; the analysis has been focused on the changes emerged in PubMed repository during the period of time 1985-2000. Results: A set of PubMed queries has been analyzed; everyone of which was a molecular entity. Each corresponding set of PubMed abstracts has been separately retrieved and processed; the retrieval phase has limited to the period 1985-2000. Each set has been split into three chunks; each chunk represent a five year sub interval. This procedure allowed us to screen named entities, specific for each time interval, associated with potentially interacting molecules; The recognition of time invariant named entities is essential for subsequent molecular interaction screening. A data-fusion system, based on self-organization paradigm, seems to be able to evaluate the temporal modification in textual information. Our system has detected, in this preliminary analysis, several named entities that can be functionally related with the original query. Motivation: The amount of published scientific literature is fast expanding its management and processing is become a burden task. Text Mining (TM) is acquiring a key role for bioinformatics; it seems one of more suitable approaches for heterogeneous data sources integration. Textual data has been recently used to support scientific hypotheses generation (Literature Based Discovery). In this work we have considered the screening of previous unknown molecular in a literature based discovery perspective. The identification of molecular species, that could interact among them, is the first step for in silico design of molecular interaction networks. In order to achieve this goal, we have need to extract the more standardized set of potentially interacting molecules. The published articles reflects the fragmentation biomedical researches, this situation could affect the reliability of the set. During the temporal evolution, new published papers can modify the knowledge about molecular interactions. The evaluation these changes on knowledge is important in a model development perspective. The screening of potentially interacting molecules could be considered equivalent to linguistic named entity recognition process. In this paper we have applied an ensemble of unsupervised learning machines to selection and extraction of named entities associated to potentially interacting molecules; the analysis has been focused on the changes emerged in PubMed repository during the period of time 1985-2000. Results: A set of PubMed queries has been analyzed; everyone of which was a molecular entity. Each corresponding set of PubMed abstracts has been separately retrieved and processed; the retrieval phase has limited to the period 1985-2000. Each set has been split into three chunks; each chunk represent a five year sub interval. This procedure allowed us to screen named entities, specific for each time interval, associated with potentially interacting molecules; The recognition of time invariant named entities is essential for subsequent molecular interaction screening. A data-fusion system, based on self-organization paradigm, seems to be able to evaluate the temporal modification in textual information. Our system has detected, in this preliminary analysis, several named entities that can be functionally related with the original query. Availability: http://biocomp.ge.ismac.cnr.it/ (literal)

Prodotto di

Autore CNR

PATRIZIO ARRIGO (Persona)

Insieme di parole chiave

Keywords of "Investigation of named entity recognition in molecular biology by data fusion" (Insieme di parole chiave)

Incoming links:

Prodotto

Autore CNR di

PATRIZIO ARRIGO (Persona)

Insieme di parole chiave di

Keywords of "Investigation of named entity recognition in molecular biology by data fusion" (Insieme di parole chiave)

data.CNR.it