Search Mailing List Archives
[bioontology-support] Get "standard" gene and protein names from "standard" ontologies
mdorf at stanford.edu
Tue Mar 19 16:25:13 PDT 2019
Please see my answers inline below:
On Mar 19, 2019, at 11:34 AM, John Zobolas <john.zobolas at ntnu.no<mailto:john.zobolas at ntnu.no>> wrote:
I have build a web-module that users use and when they type in a box for example a known gene or protein name (e.g. TP53, BRCA1) there is an autocomplete component that fires queries in your REST API asking for terms that match these strings. Now, the thing is, users as they may be, they want to see "standard" ontology names (or standard databases) like uniprot/chebi/ensemble which as far as I can tell they have no equivalent, daily-supporting ontologies.
The closest ontology I found that returns meaningful results is the OGG - though it seems its not actively maintained (and users didn't know about it). Also, the HGNC was another one, which is actively maintained, but the terms lack the synonyms (so its not searchable via the "standard name" - like tp53 for example). In short, this returns nothing (the string on the q parameter is what the user fills in):
http://data.bioontology.org/search?q=TP53&ontologies=HGNC, while this:
http://data.bioontology.org/search?q=HGNC_11998&ontologies=HGNC returns the entry of tp53, but there are no synonyms there, while the entry in the web bioportal interface<https://bioportal.bioontology.org/ontologies/HGNC/?p=classes&conceptid=http%3A%2F%2Fncicb.nci.nih.gov%2Fxml%2Fowl%2FEVS%2FHugo.owl%23HGNC_11998&jump_to_nav=true> has the aliases and approved symbol that seem reasonable choices to be in the synonyms list of that term (and as such searchable)...
So, my question is first why the above happens?
Looking at the full data for this term, the value “TP53” appears in two properties:
Neither of these is explicitly set as a “synonym property” in the ontology, which means their values aren’t indexed in the “synonym” field:
By default, if the ontology owner does not specify a property to be used as a "synonym property", these are the ones we look for:
And what would you suggest me doing?
The only way to get the HGNC_11998 as a result of querying for “TP53” is to add the “include_properties=true” flag to your search:
Unfortunately, the properties that are not explicitly defined (prefLabel, synonym, definition, cui) are indexed as a bulk key-value pairs in a single search field. This means you cannot use the “require_exact_match=true” flag with this search. You will just get results for ALL terms whose ANY property contains a value “TP53”. Not ideal, but at least can be a workaround.
I made the link to BioPortal because I thought that I wouldn't need to write another dictionary-module to translate from other database/ontologies to my terms/data format, but in the end I may need to do just that? Or it would make more sense to make an .obo file from e.g. uniprot database and to actively maintain it (you pulling it also)?
Creating a custom ontology definitely an option, though it could amount to a hefty task. I’d try the “include_properties” flag first to see if it serves your use case before creating a full blown ontology.
I thought it better to ask your opinion about this, since you might have come across similar questions/issues in the past and I am fairly new to this ontology business :)
Department of Biology, Faculty of Natural Sciences, NTNU
Department of Clinical and Molecular Medicine, Faculty of Medicine and Health Sciences, NTNU
bioontology-support mailing list
bioontology-support at lists.stanford.edu<mailto:bioontology-support at lists.stanford.edu>
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the bioontology-support