Search Mailing List Archives

Limit search to: Subject & Body Subject Author
Sort by: Reverse Sort
Limit to: All This Week Last Week This Month Last Month
Select Date Range     through    

[bioontology-support] Still "writing" MTHSPL triples after 24 hrs, even with 128 GB RAM

Miller, Mark markampa at
Thu Apr 4 08:03:55 PDT 2019

I submitted an issue to the ncbo/umls2rdf GitHub repo today.  I thought I'd share it here, too, in case there isn't much overlap in the audiences.

I'm running the umls2rdf script on an Ubuntu 16 AWS EC2 server. I bump the RAM up to 128 GB when I'm doing this. I have extracted several other, larger sources with zero or minimal difficulty. I'm using UMLS 2018AA. I'm extracting on CUIs.
I haven't done any MySQL tuning, but the SQL portion of the extraction goes quickly... less than 5 minutes, I think. I have tried to do this with the MTHSPL content combined with other sources in a single mmsys extract/MySQL database, and I have also tried doing MTHSPL in a database all by itself, which has been helpful with some of the other sources.
The triples writing has been going for over 1 day, but I don't think the Turtle file's size has grown beyond roughly 400 MB in the last 10 hours. top shows the python process at 100% CPU but a pretty small RAM usage... ~ 10 GB, I think.
select count(distinct CUI) from MRCONSO; in a MTHSPL-only database says there are 58,041 CUIs used by MTHSPL. I have loaded the Turtle content that I have after one day into a triplestore, and that only shows 3 633 CUIs from MTHSPL.

PREFIX umls: <>
select (count(distinct ?o) as ?count)
    graph <> {
        ?s umls:cui ?o

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the bioontology-support mailing list