Search Mailing List Archives


Limit search to: Subject & Body Subject Author
Sort by: Reverse Sort
Limit to: All This Week Last Week This Month Last Month
Select Date Range     through    

[bioontology-support] Any suggestions for improving concept search results from the new API?

Ray Fergerson ray.fergerson at stanford.edu
Tue Mar 18 16:25:35 PDT 2014


Lee,



I have confirmed that subtree search is unusably slow for large ontoloiges. 
We have isolated the problem but the solution is going to take some time. 
The problem is that the filtering step first gets all of the descendents of 
the subtree node. This process is very slow if the hierarchy below the 
subtree node is deep. We have a couple of ideas that we are going to try 
out. It is likely to be a month or so though.



As for definition matches not being given enough weight, can you provide a 
specific example of multiterm search where an exact match in the definition 
loses out to single term matches in prefLabel and synonym. We may need to do 
some fiddling with weights but a specific example (or even better several) 
would be very helpful.



Ray



From: Lee M Surprenant [mailto:lmsurpre at us.ibm.com]
Sent: Tuesday, March 18, 2014 3:48 PM
To: Ray Fergerson
Cc: support
Subject: RE: [bioontology-support] Any suggestions for improving concept 
search results from the new API?



Ray, please excuse my bumping this old thread...

To me, subtree search is not usable in its current state.  For instance, 
consider the Melanoma example in the documentation:

If you perform this search one level higher, at Melanocytic Neoplasm (C7058) 
it take almost 30 seconds. 
<http://data.bioontology.org/search?q=melanoma&ontology=NCIT&subtree_id=http%3a%2f%2fncicb.nci.nih.gov%2fxml%2fowl%2fEVS%2fThesaurus.owl%23C7058> 
http://data.bioontology.org/search?q=melanoma&ontology=NCIT&subtree_id=http%3a%2f%2fncicb.nci.nih.gov%2fxml%2fowl%2fEVS%2fThesaurus.owl%23C7058
If you perform it two levels higher, at Neoplam by Morphology (C4741) it 
returns a 502 Bad Gateway  (presumably due to a timeout) 
<http://data.bioontology.org/search?q=melanoma&ontology=NCIT&subtree_id=http%3a%2f%2fncicb.nci.nih.gov%2fxml%2fowl%2fEVS%2fThesaurus.owl%23C4741> 
http://data.bioontology.org/search?q=melanoma&ontology=NCIT&subtree_id=http%3a%2f%2fncicb.nci.nih.gov%2fxml%2fowl%2fEVS%2fThesaurus.owl%23C4741


Regarding the AND vs OR stuff, my main concern was whether hits in the 
DEFINITION field (via 'include_properties') are being given any weight since 
the concepts which match BOTH terms in the definition field were returned 
below the concepts that matched a single search term in the 
prefLabel/synonym fields.
That said, I may have obsessed on this single example too much...need to see 
how often this is really an issue now that we'll be using it more.  Will 
follow up again if I find more evidence of this causing issues.

thanks,

Lee Surprenant
IBM Emerging Technologies | jStart Team

Inactive hide details for Ray Fergerson ---01/10/2014 09:21:53 PM---Lee,Ray 
Fergerson ---01/10/2014 09:21:53 PM---Lee,

From: Ray Fergerson <ray.fergerson at stanford.edu>
To: Lee M Surprenant/Raleigh/IBM at IBMUS
Cc: "support" <support at bioontology.org>
Date: 01/10/2014 09:21 PM
Subject: RE: [bioontology-support] Any suggestions for improving concept 
search results from the new API?

  _____




Lee,

Yes we did change the search behavior  as a result of your earlier 
suggestions. Currently we use the default Lucene behavior for search. Thus 
we return good matches first and then increasingly bad ones (by Lucene’s 
judgement). This means that a match on all words in the string will come 
first but eventually you will get matches that are only good on one of the 
words. It seems to us that this is good behavior. If you, for example, 
misspell a word you will get results from one word but not the other. This 
should help to locate the problem. This seems better than returning no 
matches in the event of one misspelled word.

Ray

From: Lee M Surprenant [mailto:lmsurpre at us.ibm.com]
Sent: Monday, January 6, 2014 6:36 AM
To: Ray Fergerson
Cc: support
Subject: RE: [bioontology-support] Any suggestions for improving concept 
search results from the new API?


Ray,

It looks like the search mechanism was updated in December?

The new results are much better.  My sample search, "Tretinoin Cytarabine", 
now returns 149 results, with the top 7 being the ones containing both 
terms.


However, I'm still not getting the desired results from my other sample: 
"history and physical".  What I was was hoping for is similar results to the 
v1 api, which surfaced strong matches like "Work-up" (C85833) and "Review of 
Systems" (C95618). Here is what I'm seeing instead:


search query

options

number of hits

comment


history and physical

ontologies=NCIT

58 pages * 50/page = 2900

shouldn't you be using stopwords to prevent matching the word 'and'?


history physical

ontologies=NCIT

5 pages * 50/page = 250

highest results include only "history" OR "physical", but not both


history physical

added include_properties=true

41 pages * 50/page = 2050

still no sign of the "good matches" which include both terms


history physical

added subtree search - look only in Activity subtree.  Same result with or 
without include_properties, so I think it has more to do with size of 
subtree than with number of search results?

INTERNAL SERVER ERROR

took long time to respond.  maybe a performance/timeout issue?


So, it seems like the search now performs a simple "OR" on the search terms, 
but in this case I'd much prefer an AND.  "OR" would be OK if the results 
were well-ordered (like for "Tretinoin Cytarabine"), but in this case none 
of the the top results contain both search terms.  Maybe it is related to 
the fact the matches are in the description (include_properties=true) and 
not the concept name?

Here are the latter two queries for testing (and reproducing that subtree 
search error):
 <http://data.bioontology.org/search?q=history%20physical&ontologies=NCIT&include_properties=true>http://data.bioontology.org/search?q=history%20physical&ontologies=NCIT&include_properties=true <http://data.bioontology.org/search?q=history%20physical&ontology=NCIT&subtree_id=http%3A%2F%2Fncicb.nci.nih.gov%2Fxml%2Fowl%2FEVS%2FThesaurus.owl%23C43431>http://data.bioontology.org/search?q=history%20physical&ontology=NCIT&subtree_id=http%3A%2F%2Fncicb.nci.nih.gov%2Fxml%2Fowl%2FEVS%2FThesaurus.owl%23C43431PS. Number of hits would have been easier for me to calculate if thedocumentation page indicated that the default pagesize is 50.-LeeInactive hide details for Ray Fergerson ---12/13/2013 08:27:41 PM---Lee,RayFergerson ---12/13/2013 08:27:41 PM---Lee,From: Ray Fergerson < <mailto:ray.fergerson at stanford.edu>ray.fergerson at stanford.edu>To: Lee M Surprenant/Raleigh/IBM at IBMUS, "support" <<mailto:support at bioontology.org> support at bioontology.org>Date: 12/13/2013 08:27 PMSubject: RE: [bioontology-s
 upport] Any suggestions for improving conceptsearch results from the new API?  _____Lee,Sorry for the non-response on this. The search is workings as designed but Ithink that you bring up a fair point. We will investigate changing thisbehavior.RayFrom:  <mailto:bioontology-support-bounces at lists.stanford.edu>bioontology-support-bounces at lists.stanford.edu [<mailto:bioontology-support-bounces at lists.stanford.edu>mailto:bioontology-support-bounces at lists.stanford.edu] On Behalf Of Lee MSurprenantSent: Tuesday, November 26, 2013 5:44 AMTo: supportSubject: [bioontology-support] Any suggestions for improving concept searchresults from the new API?*bump*Even if its just clarification on whether search is working as expected, cansomeone reply to this?-Lee----- Forwarded by Lee M Surprenant/Raleigh/IBM on 11/26/2013 08:39 AM -----From: Lee M Surprenant/Raleigh/IBM at IBMUSTo: " <mailto:support at bioontology.org%20Support> support at bioontology.orgSupport" < <mailto:support at bioontology.org> support at bio
 ontology.org>Date: 11/19/2013 06:30 PMSubject: [bioontology-support] Any suggestions for improving concept searchresults from the new API?Sent by:  <mailto:bioontology-support-bounces at lists.stanford.edu>bioontology-support-bounces at lists.stanford.edu  _____In past, I was seeing good search results (5 hits) when searching NCIThesaurus (including Properties) with terms like the following: "history andphysical".With the new backend, this search now returns 0 results.A few other searches I've tried also offer fewer matches than before.  Forexample "Tretinoin Cytarabine" used to return 7 results, now returns 0.It seems to me like, in the past, it did a simple AND. And now maybe it onlymatches contiguous text (eg. like doing a google search with " " around thephrase)?Is there a new syntax to match two non-contiguous words (ignoring order)when they appear in the same concept name/description?Any other ideas for me?thanks,Lee SurprenantIBM Emerging Technologies | jStart Team <mailto:lmsurpre
 @us.ibm.com> lmsurpre at us.ibm.com | (919)543-8919_______________________________________________bioontology-support mailing list <mailto:bioontology-support at lists.stanford.edu>bioontology-support at lists.stanford.edu <https://mailman.stanford.edu/mailman/listinfo/bioontology-support>https://mailman.stanford.edu/mailman/listinfo/bioontology-support
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.stanford.edu/pipermail/bioontology-support/attachments/20140318/3a8ab259/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.jpg
Type: image/jpeg
Size: 36911 bytes
Desc: not available
URL: <http://mailman.stanford.edu/pipermail/bioontology-support/attachments/20140318/3a8ab259/attachment-0001.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image002.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://mailman.stanford.edu/pipermail/bioontology-support/attachments/20140318/3a8ab259/attachment-0001.gif>


More information about the bioontology-support mailing list