Search Mailing List Archives


Limit search to: Subject & Body Subject Author
Sort by: Reverse Sort
Limit to: All This Week Last Week This Month Last Month
Select Date Range     through    

[protege-owl] Database backend performance

Timothy Redmond tredmond at stanford.edu
Thu May 1 08:50:21 PDT 2008


> I have another question: how could I load owl model into database
> WITHOUT loading whole model into memory. My .owl file is almost 4GB  
> and
> loading this into memory would consume too much memory.

This is what we call a streaming parse.  Here is some code from a test  
I did once that shows how you can do this programatically:

         CreateOWLDatabaseFromFileProjectPlugin creator = new  
CreateOWLDatabaseFromFileProjectPlugin();
         creator.setKnowledgeBaseFactory(new  
OWLDatabaseKnowledgeBaseFactory());
         creator.setDriver(driver);
         creator.setURL(url);
         creator.setTable(table);
         creator.setUsername(user);
         creator.setPassword(password);
         creator.setOntologyFileURI(new URI(uri));
         creator.setUseExistingSources(true);

         Project p = creator.createProject();
         OWLModel om = (OWLModel) p.getKnowledgeBase();

It can also be done from the ui with File -> new -> choose owl  
database + use existing sources -> insert the uri.

But in the current version this can be extremely slow for large owl  
files because there is a lot of postprocessing that occurs after the  
database.  I haven't done a lot of testing (Tania may have) but we  
believe that this will be *much* faster in the version in the latest  
svn.

> I have also another questions:
> 1. is this possible to import a database model into files model?
> 2. is this possible to handle imports in database models?


What timing!  This has been an open problem for a very long time.  But  
with the code in the latest svn database and file models can import  
each other easily.  We hope to release this by this week or next but  
it is a very big refactor and will probably be a bit unstable for a  
bit.  I am writing a wiki page telling how this is done.

-Timothy



On Apr 30, 2008, at 5:58 AM, Bartosz Porzuczek wrote:
> Timothy,
>
> Thank you for your answer.
>> This  is very slow and your performance is significantly worse than I
>> have seen with database mode.  I forget the exact amount of time  but
>> we can do a streaming parse of the thesaurus (1.5 million rows) in 30
>> minutes or so I believe.  I first want to make sure that you are not
>> using the odbc drivers (probably not).  This would be horrible and
>> would explain anything.   You should be sure to use  the postgres  
>> jdbc
>> drivers.  Also it makes a noticeable difference if the database is on
>> localhost.  And the disk is not mounted with nfs.  Just some random
>> thoughts.
>>
> I'm using JDBC postgres driver. Problem is that I don't have  
> OWLModel in
> a file - I'm creating it "on a fly" from another xml format. So there
> are many updates to DatabaseModel that cause insert and delete  
> queries.
>
> I'll try to convert model to .owl file before inserting it into  
> database.
>
> I have another question: how could I load owl model into database
> WITHOUT loading whole model into memory. My .owl file is almost 4GB  
> and
> loading this into memory would consume too much memory.
>
> I have also another questions:
> 1. is this possible to import a database model into files model?
> 2. is this possible to handle imports in database models?
>
> Thank you,
>
> Best regards,
> Bartek
>
>> Without further information I am not sure what to say.  If  we  had
>> some stack traces of what Protege is doing  when it is slow this  
>> could
>> be useful.  We might then know if there is something that Protege is
>> doing that could easily be removed (for instance there are several
>> frame stores that perform various functions (undo, journaling, etc)
>> that can be turned off  and this might help).  Perhaps we could see  
>> or
>> run a sample of your code (out of line?).
>>
>> There are other possibilities if you get a bit more desperate such as
>> bypassing some of the high level interfaces.
>>
>>
>>> Would it be possible to serialize OWLModel into database in bursts,
>>> eg.
>>> every 1000 individuals, or so?
>>>
>>
>>
>> We do this when copying a project from memory mode to the database.
>> What you really want is the technique that Jena uses which allows the
>> developer to mark the beginning of a bulk operation and when it is
>> done.  This is a nice idea but it does not exist in Protege.  It  
>> would
>> take some work to implement (you would need a narrow frame store that
>> tracked changes in memory and then wrote them out - not too difficult
>> but not instantaneous either).
>>
>> -Timothy
>>
>>
>> On Apr 14, 2008, at 1:34 AM, Bartosz Porzuczek wrote:
>>
>>> Hi all,
>>>
>>> I'm converting some xml file to ontology. As resulting ontology
>>> would be
>>> very large (approximately 50-100M triples), I can't store it in
>>> memory.
>>> I decided to use Protege database backend, but there are serious
>>> performance problem.
>>>
>>> I'm creating OWLDatabaseModel on the fly during converting xml  
>>> model,
>>> and process of storing it into database is very slow. It takes about
>>> one
>>> day to write 600k rows - so it would take several months to create
>>> whole
>>> model - what is unacceptable. I'm curious if there is a way to speed
>>> up
>>> this process somehow.
>>> I examined my postgres logs*, and there are not only inserts, but  
>>> also
>>> multiple select and delete queries. I suppose that delete queries  
>>> are
>>> caused by updates to model (setting properties, etc.).
>>> Would it be possible to serialize OWLModel into database in bursts,
>>> eg.
>>> every 1000 individuals, or so?
>>> Do you have any advices how to make it faster?
>>>
>>> Thank you,
>>>
>>> Best regards,
>>> Bartek
>>>
>>> * logging was turned on only for testing
>>>
>>> _______________________________________________
>>> protege-owl mailing list
>>> protege-owl at lists.stanford.edu
>>> https://mailman.stanford.edu/mailman/listinfo/protege-owl
>>>
>>> Instructions for unsubscribing: http://protege.stanford.edu/doc/faq.html#01a.03
>>>
>>
>> _______________________________________________
>> protege-owl mailing list
>> protege-owl at lists.stanford.edu
>> https://mailman.stanford.edu/mailman/listinfo/protege-owl
>>
>> Instructions for unsubscribing: http://protege.stanford.edu/doc/faq.html#01a.03
>>
>>
>>
>
> _______________________________________________
> protege-owl mailing list
> protege-owl at lists.stanford.edu
> https://mailman.stanford.edu/mailman/listinfo/protege-owl
>
> Instructions for unsubscribing: http://protege.stanford.edu/doc/faq.html#01a.03




More information about the protege-owl mailing list