Search Mailing List Archives
text mining (was RE: ... mapping to KEGG 'orthology')
andrew brian clegg
a.clegg at mail.cryst.bbk.ac.uk
Thu May 16 05:03:27 PDT 2002
On the subject of text mining...
I'm embarking on a project to build a smart search engine for biological
literature with Gene Ontology (and its cross-references to other
databases) as the central framework.
The initial goal is to allow users to enter search terms, and be presented
with a scored list of fuzzy hits, where a document matching a search term
exactly scores highly, and a document matching a GO term closely related
to a search term scores less highly, with a weighting based on closeness
in ontology, direction and class of relationship, etc.
I hope to extend this to cover clustering of documents around terms,
detection of similar themes across sets of documents, and eventually more
sophisticated discovery of connections between themes, hopefully leading
to a rudimentary assault on the knowledge-discovery problem.
Does anyone have any experiences or ideas they would like to relate? I'm
particularly keen to hear of any false starts to be wary of or wrong trees
to avoid barking up. There's a lot of literature around on the subject of
information retrieval and text mining but I've found little so far that
covers the use of ontologies -- apart from for special purposes like
disambiguation of terms. Also the problem of indexing a document in terms
of its relationship to a set or structure of terms -- without the multi-
dimensionality involved getting out of control -- is going to prove
interesting, I'm sure.
(Actually if anyone knows of any papers or textbooks that cover
efficient ways to index a document against a flat list of terms, I'd
love to hear of them. I'm sure that kind of approach could be adapted
Thanks in advance for any pointers.
School of Crystallography
Birkbeck, University of London
On Wed, 15 May 2002, C. E. Crangle wrote:
> We are also interested in this question, with a focus on text data mining
> C. E. Crangle, Ph.D.
> Senior Partner, ConverSpeech LLC
> Tel: (001)1-650-322-9257
> Fax: (001)1-650-328-6138
> > Second: Do you know of any attempt to map between (or even
> > unify) the GO ontologies, and the 'orthology' used by KEGG?
> > Here at the Institute of Systems Biology, our biologists are
> > very fond of KEGG; I -- as a programmer -- am naturally
> > drawn to clean design of GO, which I have used for the last year.
This message is from the GOFriends moderated mailing list. A list of public
announcements and discussion of the Gene Ontology (GO) project.
Problems with the list? E-mail: owner-gofriends at geneontology.org
Subscribing send "subscribe" to gofriends-request at geneontology.org
Unsubscribing send "unsubscribe" to gofriends-request at geneontology.org
More information about the go-friends