Search Mailing List Archives

Limit search to: Subject & Body Subject Author
Sort by: Reverse Sort
Limit to: All This Week Last Week This Month Last Month
Select Date Range     through    

[bioontology-support] BioPortal: Questions

Samson Tu swt at
Fri Jul 13 15:17:31 PDT 2018

Dear Andon,

No doubt the semantic type information is useful in many applications.

I don’t think it’s appropriate to "discard anything that lives outside of the namespace of the terminology or ontology of interest.” A terminology or ontology, as it is constructed, may re-use entities from other ontologies. These re-used entities are very much part of the source ontology as intended by the developers of the ontology.

The issue is how to represent UMLS semantic type information that UMLS/NCBO adds to a source like ATC. My preferred design would involve an ATC-UMLS ontology that imports the ATC ontology, where the ATC ontology includes only information in the original source terminology and ATC-UMLS contains the UMLS extensions that are deemed useful. With this design, a user or an application can easily determine how to access the information they need.

With best regards,

On Jul 12, 2018, at 5:25 AM, Andon Tchechmedjiev <andon.tchechmedjiev at<mailto:andon.tchechmedjiev at>> wrote:

Dear Samson,

Generally speaking, the ontologies and terminologies integrated into UMLS become distinct entities compared to the source ontologies and terminologies and should never be considered as canonical (there are numerous differences that aren’t always obvious at first) when used in applications. As Jennifer stated, semantic types and CUIs are pieces of information intrinsic to UMLS and relevant for the resources in BioPortal that were extracted from UMLS. The reason for including the umls:hasSty and umls:cui properties is to enable the BioPortal Annotator to use this information to filter annotations based on semantic types, which is an extremely common use-case and an important feature. You will notice that the aforementioned properties live in their own namespace: "@prefix umls: <> .”, while everything else lives in "“<>. This is actually the right way of of doing things as far as ontology engineering standards and practices go, as opposed to an ad-hoc string encoding of the information (a bad practice that makes querying with SPARQL much less practical).

The proper (semantically enabled) way of dealing with this for applications that rely on the “purity” of the terminological information is to discard anything that lives outside of the namespace of the terminology or ontology of interest (i.e. only keeping entities that are in "” in the case of BioPortal ontologies extracted from UMLS). The fact that STY concepts appear in the search results and the concept hierarchy and the fact that this is problematic or inconvenient for some specific applications are different issues altogether, however. Changing the behaviour of BioPortal because of an issue with one specific use-case among many (others may be reliant on this specific behaviour) wouldn’t necessarily be the most logical solution. In this instance, if the behaviour is problematic for redcap users, fixing the issue directly in redcap’s BioPortal adapter by filtering all concepts in the umls prefix namespace would very likely be the simplest solution.

Best Regards,

Andon Tchechmedjiev, PhD. Postdoctoral researcher in Natural Language Processing in the Biomedical Domain at LIRMM, Office 3/164, Bât. 5 - 860 rue de St Priest 34095 Montpellier cedex 5,  FRANCE |<>

On 12 Jul 2018, at 08:53, Samson Tu <swt at<mailto:swt at>> wrote:

Hi Jennifer,

For me, the question is not whether the Metathesaurus browser display semantic type information of an UMLS concept derived from a terminology. The question is why Bioportal should model semantic type information as part of a terminology’s representation as an ontology, and therefore include STY as part of the ontology. First, when NCBO makes a UMLS-derived terminology available on Bioportal, is the expectation that it has UMLS-augmented relations or that it has only relations that are in the original terminology? Until today I didn’t realize that UMSL-derived terminologies I download from Bioportal have extra concepts that are not in the original terminology. Second, why isn’t it sufficient to encode the semantic types as string annotations (as it does already)? Doing so would avoid the problem of unexpected concepts turning up in search results.

I suspect that there are good reasons for the design choices made in the UMLS-terminology-extraction code. In any case, we probably have to live with the constraints imposed by these choices. It is better that we clarify that UMLS-derived terminologies on Bioportal, by design, have STY included so that semantic type information is explicitly modeled. Those who need a “clean” version probably can adapt the UMLS-terminology-extraction code to get different versions of the terminologies. (Or they need to filter out STY concepts in search results obtained from Bioportal. Or Bioportal search API can do the filtering.)

With best regards,

On Jul 11, 2018, at 5:25 PM, Jennifer Leigh Vendetti <vendetti at<mailto:vendetti at>> wrote:

Hi Samson,

On Jul 11, 2018, at 10:16 AM, Samson Tu <swt at<mailto:swt at>> wrote:

When I searched NLM’s Metathesaurus browser for “Antibiotic” with ATC as the source terminology, I didn't get STY’s Antibiotic in the results. In that respect Bioportal’s behavior is different from that of UMLS Metathesaurus, from which NCBO gets ATC. I find it surprising that Bioportal’s ATC should contain STY. Is there some documentation on how Bioportal extract terminologies from UMLS?

Each time the UMLS releases a new version (e.g., 2018AA), we use their MetamorphoSys installation wizard [1] to install a local copy of the relational database that contains source vocabulary data.

We then execute a Python program [2] that reads data out of that relational database and generates ontology files for each of the source vocabularies. These ontologies are written in RDF (Turtle syntax) and are in turn consumable by the BioPortal application. The developer that wrote the Python program is no longer here, so I can’t speak with complete authority on this topic. I’ll try to add some information to this thread after doing some initial research today. The UMLS relational database has an “MRSTY” table with an “STY” column that provides semantic type information for terms. For example, I can execute a search in the MRSTY table for CUI C0282686 (class from ATC with label “RESPIRATORY SYSTEM DRUGS”), and see that they provide semantic type data in the result:

<Screenshot 2018-07-11 16.58.22.png>

It appears that our Python program includes semantic type data in the ontology files. I looked at the TTL file for the ATC ontology and can see the class declaration for Event:

<> a owl:Class ;
skos:notation "T051"^^xsd:string ;
skos:prefLabel "Event"@en .

This sort of semantic type information is displayed by the UMLS Metathesaurus Browser on the right-hand side:

<Screenshot 2018-07-11 17.21.29.png>



bioontology-support mailing list
bioontology-support at<mailto:bioontology-support at>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the bioontology-support mailing list