Search Mailing List Archives
[bioontology-support] extraction of subtree of ontology
eugene at nextbio.com
Tue Mar 16 17:32:51 PDT 2010
Is there a way to download the whole disease or compound category by just one command?
The example said in the email needs to specify the concept id. However, under disease category, for example, it has more than 10 top-level concepts. So I need to call more than 10 times. Then I need to merge those 10+ files into one file. It is not pretty straightforward.
From: drnigam at gmail.com [mailto:drnigam at gmail.com] On Behalf Of Nigam Shah
Sent: Tuesday, March 16, 2010 3:46 PM
To: Eugene; Satnam Alag
Subject: Re: Introductions
BTW, for questions like this emailing support at bioontology.org<mailto:support at bioontology.org> is best. It will go to people who respond to users regularly. I merely end up reading the documentation and replying to you.
On Tue, Mar 16, 2010 at 3:36 PM, Nigam Shah <nigam at stanford.edu<mailto:nigam at stanford.edu>> wrote:
Yes it is. You are not using a conceptid ("Peptides" is not a concept id in MSH). See: http://bioportal.bioontology.org/visualize/42142 to find a valid ID for "Peptides" .. use the search box above the tree view.
For example, for "Melanoma", the ID is D008545. And if I use that, I will get an OWL file from: http://rest.bioontology.org/bioportal/viewextractor/42142/?conceptid=D008545&ontologyname=testing.owl&delay=2000. I just tested it. So the URL is correct, you need to use the right parameters.
On Tue, Mar 16, 2010 at 3:29 PM, Eugene <eugene at nextbio.com<mailto:eugene at nextbio.com>> wrote:
Is the URL right?
Should ontology name have prefix "http"? I use above URL, but I cannot get it.
42142 is MeSH ID (according to the link http://firstname.lastname@example.org ).
I have followed the instruction: http://www.bioontology.org/wiki/index.php/View_Extraction
From: drnigam at gmail.com<mailto:drnigam at gmail.com> [mailto:drnigam at gmail.com<mailto:drnigam at gmail.com>] On Behalf Of Nigam Shah
Sent: Tuesday, March 16, 2010 2:31 PM
Subject: Fwd: Introductions
Might be useful for you to know the options too .. sorry for the short onliners .. trying to multi task at a meeting.
---------- Forwarded message ----------
From: Nigam Shah <nigam at stanford.edu<mailto:nigam at stanford.edu>>
Date: Tue, Mar 16, 2010 at 2:27 PM
Subject: Re: Introductions
To: Satnam Alag <satnam at nextbio.com<mailto:satnam at nextbio.com>>
I am in Seattle right now at a meeting .. so can't get on the phone. Overall, here are the options:
1) The simple, extraction of a sub-tree of an ontology (say the 'disease branch' or MeSH .. or the subtree under 'Melanoma'). That can be done using our production services .. for example the one at: http://www.bioontology.org/wiki/index.php/View_Extraction
2) A bit detailed extraction that give more "knobs" beyond just the the sub-tree .. i.e. allow you to 'exclude' terms that are not the right semantic type, that have a high freq in medline, that have the wrong syntactic type on average (say not noun-phrase). That can be done using our prototype Lexicon Builder .. the one at: http://www.bioontology.org/wiki/index.php/Lexicon_Builder
3) Getting the compound and diseases sections of multiple (or all) UMLS ontologies; the 13% actually used in medline (per Rong Xu's paper). This data is in mysql tables that are not open to the public. Someone (i.e. me or my student) would have to work with you to figure out the exact query you want to run and then run it.
My guess is the number (2) WILL get you what you are after. However, in order to do that, you (or probably Eugene) need to:
- identify what UMLS ontologies you want (MSH, SNOMEDCT, NCIT, ICD9 .. what else?). You can't do all "UMLS" in the Lexicon Builder.
- make a few trial runs (say with MSH) ... trying different parameters for the semantic types, the syntactic types and the frequency cutoffs; try including, excluding synoyms, using or not using "mappings" to terms from other ontologies. Basically read the Lexicon Builder paper and try out different parameter combinations
- if it gets you what you need, great .. if not, we can iterate to define the right parameter set for you.
If you are convinced that (2) will not work, or you don't want to (or can't) spend time digging into this, we can explore option (3). But that requires my time, which I don't have much right now.
On Tue, Mar 16, 2010 at 1:51 PM, Satnam Alag <satnam at nextbio.com<mailto:satnam at nextbio.com>> wrote:
Is there an easy way for us to get the compound and diseases sections of the UMLS ontology that has been processed by you. We are particularly interested in the 13% of the ontology that is actually used in medline. Is there a good phone number I can call you at? Thx
V.P. of Engineering
Ph: 408 582 4160 (C)
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the bioontology-support