Search Mailing List Archives


Limit search to: Subject & Body Subject Author
Sort by: Reverse Sort
Limit to: All This Week Last Week This Month Last Month
Select Date Range     through    

[bioontology-support] Questions regarding the REST API

Michael Dorf mdorf at stanford.edu
Mon Nov 12 16:51:40 PST 2018


Hi John,

Really helpful!!! I noticed also that when you do: http://data.bioontology.org/ontologies/NCIT/classes/ you get a total count of 144695 classes, while the query http://data.bioontology.org/search?ontologies=NCIT&ontology_types=ONTOLOGY returns 140329 as a total count and the difference must be that the first has the obsolete terms as well, right? Because if I do: http://data.bioontology.org/search?ontologies=NCIT&ontology_types=ONTOLOGY&also_search_obsolete=true, then I get 144695!

At the moment, the pagination counts aren’t always reported correctly. Sometime ago, we’ve implemented a system that prevents expensive COUNT queries going live against our 4store backend. These queries used to really bog down our servers, often resulting in downtime. The COUNT queries used to be executed on paged REST services, such as search.  So, in order to determine the correct number of pages for a given call, our system used to first execute a COUNT query, storing the result in the output. The new system would pre-cache these counts, so when a paged service call is made, the count would be retrieved from a static repository. Unfortunately, there appears to be a bug in this process that triggers the different numbers you are seeing. There is an issue in our Github repository that tracks our progress on fixing this problem:

https://github.com/ncbo/ontologies_linked_data/issues/88

It's best to simply use an iterator to go through ALL pages of available results until you hit an empty collection instead of relying on the reported totalCount.

Is there a specific ordering on the returned results on the query: http://data.bioontology.org/search?ontologies=NCIT&ontology_types=ONTOLOGY? Are the results ordered by prefLabel or @id for example?(doesn't seem to be the case)

That’s a good question! Normally, the results are sorted by the match rank and ontology rank (see more on that below) respectively. In this case, however, there is neither a search string, nor multiple ontologies to rank against. So, there really isn’t any deterministic order in this case.

Also, does this query return all classes in all ontologies in BioPortal: http://data.bioontology.org/search?ontologies=&ontology_types=ONTOLOGY ?
And if so, what ordering is applied to the results?

Yes, that should return results for all publicly available ontologies. Sort order (or lack of thereof) is the same.

Actually I kinda though that the `search?q=something` searched for `something` in the prefLabel and synonyms only! (but it seems this is not the case, since it searchs for a match with the @id as well :) Which other fields does it look for a match?

These are the fields being searched in their order of rank priority:

id
prefLabelExact (match on the full pref label)
prefLabel (match on partial pref label)
synonymExact (match on the full synonym(s))
synonym (match on the partial synonym(s))
notation (last fragment of id)
cui (for UMLS ontologies)
semantic_types

Also I noticed that the `something` must always be URL encoded (otherwise you get no results). Do you think that the `ontologies=GO,BAO` part should be also (for the comma mostly)?

Not strictly necessary, but it’s definitely an option:

ontologies=NCIT%2CGO

Do you know if there is an id I can request that has more than 50 matches (a class that is included in more than 50 ontologies)? This would mean that I would have a second page of results (since the default pagesize is 50).

Sure:

http://data.bioontology.org/search?q=http%3A%2F%2Fpurl.obolibrary.org%2Fobo%2FBFO_0000002&require_exact_match=true<http://data.bioontology.org/search?q=http://purl.obolibrary.org/obo/BFO_0000002&require_exact_match=true>

BTW, you can control the number of results per page by passing "pagesize=10” parameter.

Here, is there a preferential sorting happening for these results (first BAO entry and then DOID for example? - though I see you get them the other way around)
http://data.bioontology.org/search?q=http%3A%2F%2Fpurl.obolibrary.org%2Fobo%2FDOID_1909&require_exact_match=true&ontologies=BAO<http://data.bioontology.org/search?q=http://purl.obolibrary.org/obo/DOID_1909&require_exact_match=true&ontologies=BAO>

We have an algorithm that ranks ontologies, based on a variety of flags. The latest rankings can be found here:

https://gist.github.com/mdorf/cea96433cf4bf7dd94d109c8e06e29c0

"BAO"=>{:bioportalScore=>0.648, :umlsScore=>0.0},

"DOID"=>{:bioportalScore=>0.711, :umlsScore=>0.0},


As you can see, DOID is ranked higher, therefore it appears before the BAO result.

Thanks again for your great questions!

Michael


On Nov 11, 2018, at 12:18 PM, John Zobolas <john.zobolas at ntnu.no<mailto:john.zobolas at ntnu.no>> wrote:

Hi Michael,

Thanks so much for the answers and very helpful indeed! Please see inline for further discussion/clarification on some of those.

BR, John.
________________________________
From: Michael Dorf <mdorf at stanford.edu<mailto:mdorf at stanford.edu>>
Sent: Saturday, November 10, 2018 12:46 AM
To: John Zobolas
Cc: support at bioontology.org<mailto:support at bioontology.org>
Subject: Re: [bioontology-support] Questions regarding the REST API

Hi John,

Thanks for contacting us. See my answers inline below.

On Nov 9, 2018, at 5:57 AM, John Zobolas <john.zobolas at ntnu.no<mailto:john.zobolas at ntnu.no>> wrote:

Hi,

I am developing a module that's uses your API to get results back from different ontologies and I want to ask a few things:

  1.  ​I see that when I search for a string, in the results there is a property obsolete (e.g. http://data.bioontology.org/search?q=melanoma)<http://data.bioontology.org/search?q=melanoma> which almost always (as far as I can tell) isfalse. Can I ever find it to be true (meaning that that entry is not used any more so I will have to prune that result)? Or you automatically filter the results to show only obsolete:falseones?

The results are by default filtered on obsolete:false. There is a parameter called also_search_obsolete={true|false} if you want a more granular control over this flag.

  1.  Is the URL parameter no_contexts=true equal to display_context=false (I accidentally discovered that they work the same but the first one is not mentioned in the documentation, so I should probably use the later!)

Both paramours work, but the correct one to use is display_context={true|false}, which is the one advertised in our documentation.

  1.  I was looking at the documentation in the available media types section, and I was wondering that if I send an HTTP request with method DELETE to a URL like http://data.bioontology.org/groups/:acronym or http://data.bioontology.org/ontologies/:acronym,<http://data.bioontology.org/ontologies/:acronym> will I be actually deleting that specific group/ontology? I mean do I even have the privileges to do that or is something that only an `admin` could do (because in the documentation it is not specified who can do what for every HTTP verb and media type).

You can create ontologies/groups programmatically using your own API key via a POST call, but you cannot delete anything from the system. That function is limited to admins only.

  1.  The format of error responses is always the same no matter what query I use in your provided REST service (an object with errors and status properties, the first having an array of Strings as a value and the second the status code/number as a value)?​​ E.g. what I will get when I hit: http://data.bioontology.org/ontologies/GOfr

Yes, the errors should all have a uniform response. If you notice otherwise, that would probably constitute a bug. Let us know if you find an error that deviates from this format.

{

  *
-
"errors": [
     *   "You must provide a valid `acronym` to retrieve an ontology"
],
  *   "status": 404

}

{

  *
-
"errors": [
     *   "The search query must be provided via /search?q=<query>[&page=<pagenum>&pagesize=<pagesize>]"
],
  *   "status": 400

}

  1.  If I have the acronym of an ontology, I can access (all) the classes: (e.g. http://data.bioontology.org/ontologies/MCCL/classes). The first result in the previous query has the prefLabel:FetalCellLine. So, if I query the: http://data.bioontology.org/search?q=FetalCellLine&ontologies=MCCL I get this one result and whatever property was empty in the 'classes query' result (e.g. definition:[ ]), it's not shown in the later query, right?

Correct. These two endpoints appear to have a slightly different handling of the empty fields. The classes endpoint displays empty lists, where as the search endpoint just drops the empty attributes from the response. Example:

http://data.bioontology.org/ontologies/NCIT/classes/http%3A%2F%2Fncicb.nci.nih.gov%2Fxml%2Fowl%2FEVS%2FThesaurus.owl%23C129834<http://data.bioontology.org/ontologies/NCIT/classes/http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#C129834>
http://data.bioontology.org/search?q=CALR%20NM_004343.3:c.1092_1143del52&ontologies=NCIT&require_exact_match=true

  1.  Is there a way to get all the results from an ontology (paginated of course) through a query different than the one like /ontologies/:acronym/classes and have the results pruned (no empty properties) as when you query by search string?

You can try calling the search endpoint without passing in a search string and limiting the results to a specific ontology. You need to pass a “special” parameter called “ontology_types=ONTOLOGY” in order for the queriless search to work:

http://data.bioontology.org/search?ontologies=NCIT&ontology_types=ONTOLOGY

This will give you ALL results stored in our search index for the ontology NCIT. This isn’t the “advertised” method of getting all classes, rather a “workaround” that let’s you execute a search call without passing a query string.

Really helpful!!! I noticed also that when you do: http://data.bioontology.org/ontologies/NCIT/classes/ you get a total count of 144695 classes, while the query http://data.bioontology.org/search?ontologies=NCIT&ontology_types=ONTOLOGY returns 140329 as a total count and the difference must be that the first has the obsolete terms as well, right? Because if I do: http://data.bioontology.org/search?ontologies=NCIT&ontology_types=ONTOLOGY&also_search_obsolete=true, then I get 144695!

Is there a specific ordering on the returned results on the query: http://data.bioontology.org/search?ontologies=NCIT&ontology_types=ONTOLOGY? Are the results ordered by prefLabel or @id for example?(doesn't seem to be the case)

Also, does this query return all classes in all ontologies in BioPortal: http://data.bioontology.org/search?ontologies=&ontology_types=ONTOLOGY ?
And if so, what ordering is applied to the results?

  1.  I have noticed that the @id in the results is not a unique id, right? For example, there are many results with this id: http://purl.obolibrary.org/obo/DOID_1909, belonging to different ontologies - and the difference between them is small, for example one result does not provide the definition while the other does.

Again, correct! The @id represents the class ID as defined in the original source ontology. The same class can be reused in multiple ontologies. The “unique” ID of a class in BioPortal is a combination of the @id and the ontology acronym, as in:

http://data.bioontology.org/ontologies/NCIT/classes/http%3A%2F%2Fncicb.nci.nih.gov%2Fxml%2Fowl%2FEVS%2FThesaurus.owl%23C129834<http://data.bioontology.org/ontologies/NCIT/classes/http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#C129834> Useful to know!

  1.  Is there a way to get results by id? Something like: /searchByID/id=URLid  (according to (7) this could result in more than 1 result) or /searchByID/ids={URLid1,URLid2,URLid3,…}?

Yes, you can pass the URL-encoded full ID in the “q” parameter to the /search endpoint:

http://data.bioontology.org/search?q=http%3A%2F%2Fpurl.obolibrary.org%2Fobo%2FDOID_1909&require_exact_match=true<http://data.bioontology.org/search?q=http://purl.obolibrary.org/obo/DOID_1909&require_exact_match=true>

Actually I kinda though that the `search?q=something` searched for `something` in the prefLabel and synonyms only! (but it seems this is not the case, since it searchs for a match with the @id as well :) Which other fields does it look for a match?
Also I noticed that the `something` must always be URL encoded (otherwise you get no results). Do you think that the `ontologies=GO,BAO` part should be also (for the comma mostly)?
Do you know if there is an id I can request that has more than 50 matches (a class that is included in more than 50 ontologies)? This would mean that I would have a second page of results (since the default pagesize is 50).

You cannot search by multiple IDs.

  1.  Is there a way to get results by a combination of id+ontology acronym? Something like /searchByIDAndOntology/id=URL&ontology=OntologyAcronym? Actually, if you merge the 2 last questions, what I am asking is this query: /searchByIDAndOntology/id=[list of URLids]&ontologies=[list of ontologies]? The nearest I found in the documentation was the query with the subtree_root_id which needs the search string (so I can't use it in my case). Also, this could cover the (6) if the list of ids is empty!

If you want to limit results to a given class within a given ontology(ies), just add the “ontologies=“BAO,DOID” to the query:

http://data.bioontology.org/search?q=http%3A%2F%2Fpurl.obolibrary.org%2Fobo%2FDOID_1909&require_exact_match=true&ontologies=BAO,DOID<http://data.bioontology.org/search?q=http://purl.obolibrary.org/obo/DOID_1909&require_exact_match=true&ontologies=BAO,DOID>
Here, is there a preferential sorting happening for these results (first BAO entry and then DOID for example? - though I see you get them the other way around)
http://data.bioontology.org/search?q=http%3A%2F%2Fpurl.obolibrary.org%2Fobo%2FDOID_1909&require_exact_match=true&ontologies=BAO<http://data.bioontology.org/search?q=http://purl.obolibrary.org/obo/DOID_1909&require_exact_match=true&ontologies=BAO>


Hope this helps! Feel free to reach out if you have further questions.

Michael


----------------------------------------------------
Michael Dorf
Chief Software Architect
The National Center for Biomedical Ontology
Stanford Biomedical Informatics Research
mdorf at stanford.edu<mailto:mdorf at stanford.edu>
O: 650-723-0357
M: 650-995-4374
----------------------------------------------------


​BR, John.
​------------------
John Zobolas
PhD Student
Department of Biology, Faculty of Natural Sciences, NTNU
Department of Clinical and Molecular Medicine, Faculty of Medicine and Health Sciences, NTNU

_______________________________________________
bioontology-support mailing list
bioontology-support at lists.stanford.edu<mailto:bioontology-support at lists.stanford.edu>
https://mailman.stanford.edu/mailman/listinfo/bioontology-support



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.stanford.edu/pipermail/bioontology-support/attachments/20181113/0090ca79/attachment-0001.html>


More information about the bioontology-support mailing list