Investigation of named entity recognition in molecular biology by data fusion (Comunicazione a convegno)

Type
Label
  • Investigation of named entity recognition in molecular biology by data fusion (Comunicazione a convegno) (literal)
Anno
  • 2006-01-01T00:00:00+01:00 (literal)
Alternative label
  • P. Arrigo, P. P. Cardo (2006)
    Investigation of named entity recognition in molecular biology by data fusion
    in Fifth Conference on Bioinformatics of genome regulation and structure, Novosibirsk, july 16-22
    (literal)
Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#autori
  • P. Arrigo, P. P. Cardo (literal)
Pagina inizio
  • 255 (literal)
Pagina fine
  • 258 (literal)
Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#pagineTotali
  • 4 (literal)
Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#affiliazioni
  • Istituto di Dermatologia Universita'di Genova CNR ISMAC (literal)
Titolo
  • Investigation of named entity recognition in molecular biology by data fusion (literal)
Abstract
  • Motivation: The amount of published scientific literature is fast expanding its management and processing is become a burden task. Text Mining (TM) is acquiring a key role for bioinformatics; it seems one of more suitable approaches for heterogeneous data sources integration. Textual data has been recently used to support scientific hypotheses generation (Literature Based Discovery). In this work we have considered the screening of previous unknown molecular in a literature based discovery perspective. The identification of molecular species, that could interact among them, is the first step for in silico design of molecular interaction networks. In order to achieve this goal, we have need to extract the more standardized set of potentially interacting molecules. The published articles reflects the fragmentation biomedical researches, this situation could affect the reliability of the set. During the temporal evolution, new published papers can modify the knowledge about molecular interactions. The evaluation these changes on knowledge is important in a model development perspective. The screening of potentially interacting molecules could be considered equivalent to linguistic named entity recognition process. In this paper we have applied an ensemble of unsupervised learning machines to selection and extraction of named entities associated to potentially interacting molecules; the analysis has been focused on the changes emerged in PubMed repository during the period of time 1985-2000. Results: A set of PubMed queries has been analyzed; everyone of which was a molecular entity. Each corresponding set of PubMed abstracts has been separately retrieved and processed; the retrieval phase has limited to the period 1985-2000. Each set has been split into three chunks; each chunk represent a five year sub interval. This procedure allowed us to screen named entities, specific for each time interval, associated with potentially interacting molecules; The recognition of time invariant named entities is essential for subsequent molecular interaction screening. A data-fusion system, based on self-organization paradigm, seems to be able to evaluate the temporal modification in textual information. Our system has detected, in this preliminary analysis, several named entities that can be functionally related with the original query. Motivation: The amount of published scientific literature is fast expanding its management and processing is become a burden task. Text Mining (TM) is acquiring a key role for bioinformatics; it seems one of more suitable approaches for heterogeneous data sources integration. Textual data has been recently used to support scientific hypotheses generation (Literature Based Discovery). In this work we have considered the screening of previous unknown molecular in a literature based discovery perspective. The identification of molecular species, that could interact among them, is the first step for in silico design of molecular interaction networks. In order to achieve this goal, we have need to extract the more standardized set of potentially interacting molecules. The published articles reflects the fragmentation biomedical researches, this situation could affect the reliability of the set. During the temporal evolution, new published papers can modify the knowledge about molecular interactions. The evaluation these changes on knowledge is important in a model development perspective. The screening of potentially interacting molecules could be considered equivalent to linguistic named entity recognition process. In this paper we have applied an ensemble of unsupervised learning machines to selection and extraction of named entities associated to potentially interacting molecules; the analysis has been focused on the changes emerged in PubMed repository during the period of time 1985-2000. Results: A set of PubMed queries has been analyzed; everyone of which was a molecular entity. Each corresponding set of PubMed abstracts has been separately retrieved and processed; the retrieval phase has limited to the period 1985-2000. Each set has been split into three chunks; each chunk represent a five year sub interval. This procedure allowed us to screen named entities, specific for each time interval, associated with potentially interacting molecules; The recognition of time invariant named entities is essential for subsequent molecular interaction screening. A data-fusion system, based on self-organization paradigm, seems to be able to evaluate the temporal modification in textual information. Our system has detected, in this preliminary analysis, several named entities that can be functionally related with the original query. Availability: http://biocomp.ge.ismac.cnr.it/ (literal)
Prodotto di
Autore CNR
Insieme di parole chiave

Incoming links:


Prodotto
Autore CNR di
Insieme di parole chiave di
data.CNR.it