Search Mailing List Archives


Limit search to: Subject & Body Subject Author
Sort by: Reverse Sort
Limit to: All This Week Last Week This Month Last Month
Select Date Range     through    

[p4-feedback] Questions re the Future of 4.0

Timothy Redmond tredmond at stanford.edu
Tue Apr 28 13:00:40 PDT 2009



>> Even for really large ontologies, the changes being transferred  
>> between
> clients will not
>> grow so the network will not be  overloaded.
>
> I read your client-server design wiki posting - when you say that a  
> baseline
> must be downloaded to the client at startup, does this include all the
> instances/individuals as well, or just the OWL model?  Is that what's
> happening in 3.4 as well?  We will likely have several instances of  
> the
> Protégé client API running on the same host, so this could be a  
> problem for
> us..

The Protege 3.4 and the proposed Protege 4 client-server architectures  
are very different.  In Protege 3.4, the data going across the network  
corresponds to fine grained knowledge base calls.   We have shown that  
we can make this work but it adds a lot of complexity.  In order to  
perform well, the Protege client must make use of an efficient cache  
mechanism.  In order to have the right data in the cache, the Protege  
server must anticipate the needs of the client before the client  
asks.  And all this must work in the presence of transactions which  
are quite complex.  The advantage of this approach is that it allows  
the Protege client to be able to access very large ontologies without  
requiring that the client load the all the data in those ontologies.   
But maintaining the client cache does use significant client-side  
memory and  network bandwidth.

In Protege 4 we decided to avoid this complexity but the choice  
creates a different set of problems.  We have decided that the thin  
client approach will be provided by WebProtege, which will be ported  
to use the owl api.  We will assume that the Protege 4 client machine  
has sufficient resources to load the ontology.  This means that  
network traffic only need be concerned with managing change.  When the  
ontology on the server really is too large to handle on the client, we  
will depend on modularization technologies being developed at  
Manchester and elsewhere to help solve this problem.

> We will likely have several instances of the
> Protégé client API running on the same host, so this could be a  
> problem for
> us..

I hadn't given this scenario a lot of thought.  Will the Protege 4  
clients be in the same jvm?  If so then perhaps the baselines could be  
loaded into owl api database projects.  This will solve the memory  
problem but will have a negative impact on loading the ontology during  
the first connect and to a lesser degree on all access.  We are  
starting work on the owlapi database backend now and the performance  
measurements will be very important to see how this will work.

-Timothy


On Apr 24, 2009, at 3:18 AM, Miceli, Gino (FOEL) wrote:

> Timothy,
>
> Thanks a million for your fast and thorough response.
>
>> We are planning on having a Protege-4 version of web protege.
>
> WebProtégé looks promising, but we require custom forms to allow our
> non-technical users to perform data-entry.  I am currently looking  
> into
> creating data-driven forms which may be configured directly in the  
> ontology.
> If we come up with something reusable I will encourage my manager to
> open-source it.
>
>> Protege 4 will support sandboxing of changes which I think will  
>> turn out to
> be a much better model than transactions.
>
> I can see how this would be better; ontologies are quite different  
> than
> relational-databases and likely have a different transaction  
> lifecycle.
>
>> The server definitely will run headless in the current  
>> implementation.  I
>> believe that client code can also run headless but I will have to
>> check this.
>
> Wouldn't a headless client be required by WebProtégé?  If not, am I  
> using the
> API in a way it was not intended? (i.e. headless from a web app?)
>
>> You can't use the same session to connect twice to the same project  
>> in
> Protege 3.4.
>
> After posting I discovered that this was in fact the problem.  When  
> I use two
> sessions in two separate threads, the transactions seem to be  
> correctly
> isolated.  Btw, it's encouraging to hear that you're also unit  
> testing for
> these types of issues!
>
>> There is not much common ground between the two implementations.
>
> To go forward using 3.4, I'm wrapping all of our calls to the OWL  
> API with a
> thin façade.  This way migrating to 4.0 shouldn't be a monumental  
> task.
>
>> Even for really large ontologies, the changes being transferred  
>> between
> clients will not
>> grow so the network will not be  overloaded.
>
> I read your client-server design wiki posting - when you say that a  
> baseline
> must be downloaded to the client at startup, does this include all the
> instances/individuals as well, or just the OWL model?  Is that what's
> happening in 3.4 as well?  We will likely have several instances of  
> the
> Protégé client API running on the same host, so this could be a  
> problem for
> us..
>
>> I don't really want to commit but I am hoping that in 3-6 months we  
>> will
> have something.
>
> Looking forward to it!  It seems that 4 will be a huge step  
> forward.  Since
> we need something now, we will likely go forward with 3.4, which  
> seems to fit
> many of our requirements.
>
> Thanks again for your answers!
>
> -Gino
>
>
> -----Original Message-----
> From: p4-feedback-bounces at lists.stanford.edu
> [mailto:p4-feedback-bounces at lists.stanford.edu] On Behalf Of Timothy  
> Redmond
> Sent: 23 April 2009 20:51
> To: Submit feedback for Protege 4.0 beta
> Subject: Re: [p4-feedback] Questions re the Future of 4.0
>
>
>
>> I am currently testing against 3.4 since I hope to use Protégé as a
>> backend for several web-based applications.  Ideally, I would have a
>> frontend which would present results on the web, and an
>> administrative backend for data entry.  I have noticed several
>> limitations with 3.4 in these areas, and was wondering what was
>> slated for 4.0.
>
> We are planning on having a Protege-4 version of web protege.  This is
> already funded but I am not sure of the start date for this project.
>
>> 1) Will concurrent transactions be supported?  Do I understand
>> correctly that in 3.4 they are synchronized/blocking operations?
>
> Protege 4 will support sandboxing of changes which I think will turn
> out to be a much better model than transactions.  This is a very
> natural approach with the owl api.  It will allow a user/process to
> make local copies of a change, employ a reasoner or other tools to
> evaluate these changes and then finally commit them to the server.
> Protege 4 will support a pluggable locking mechanism whereby different
> users/processes can lock different portions of the ontology for their
> work.  Two plugins that we will probably develop early are the NCI
> locking mechanism and Julian's locking mechanism.
>
> Protege 3.4 uses transactions but the approach taken there has some
> disadvantages.  First, the transaction model in Protege 3.4 is derived
> from the underlying database storage mechanism.  We have discovered
> that there is a mismatch between the type of transaction locking
> provided by the database and the desired locking for ontologies.  In
> practice concurrent transactions often get in each others way.  In
> addition, making transactions work with ontologies proved extremely
> complex in Protege 3.4.   I hope we can avoid that type of complexity
> in Protege 4.
>
> Protege 4 will also handle concurrency more efficiently.  In Protege
> 3.4 the knowledge base is synchronized by a single course grained
> lock.  In Protege 4, the owl api will use fine grained locks to
> achieve thread-safety for multiple readers.  (Writers will still take
> a course grained lock).
>
>> 2) Are there plans to better isolate the GUI, server and client
>> components of Protégé?
>
> This is not planned in the immediate future for Protege 3.4.  The
> server definitely will run headless in the current implementation.  I
> believe that client code can also run headless but I will have to
> check this.  The client and the server are well isolated from one
> another.  But you are right there is some unfortunate linkage between
> the ontology model and the GUI.
>
>> 3) In 3.4, the project and remote project manager are singletons.
>> This means a client can only connect to one server project at a
>> time.  Will this limitation be removed in 4.0?
>
> First note that when you open a collaborative project, you are
> actually opening several server projects simultaneously - the main
> project, the annotations/changes project and the chat project.   On
> the other hand, a server can only serve up Protege once because the
> server object is a singleton.
>
> There is a gui limitation that a Protege 3.4 client can only edit one
> main project at a time.   Protege 3.4 has always worked this way.  In
> Protege 4 this has never been the case.  From the start Protege 4 can
> used to edit as many ontologies simultaneously as you can load.
>
>> One problem that I think may be related to this is that in 3.4
>> transactions in a particular VM are not fully isolated.  In
>> particular, if I create two sessions in the same application, begin
>> a transaction in model A, and create a new individual w/o
>> committing, model B sees the new individual (but the properties are
>> null until A commits).  If A and B are created in different
>> classloaders/VMs, B does not see the new individual until A commits.
>
>
> Transactions are fully isolated in Protege 3.4 but it does depend on
> the backend..  If the server uses a file backend then the transaction
> isolation level is pretty low and there is no protection.  But with a
> database backend, the transaction isolation level can go up to what is
> supported by the database backend.  We run nightly junits that test
> that the isolation is enforced and these junits do run on a single
> jvm.  (Not for any special reason other than to simplify some already
> quite complex tests).
>
> There is another limitation that you might have bumped into.  You
> can't use the same session to connect twice to the same project in
> Protege 3.4.  The server only sends  updates for a project once for
> the session and these updates then get divided between the two remote
> projects.  Also the notion of a transaction is defined relative to a
> session.  There is a RemoteServer.cloneSession method that can help
> with this problem.  I mention this because you say you are having
> different experiences in the same vs. different jvms.
>
>> 4) To what degree will the 4.0 API be compatible with the 3.4 one?
>> In other words, if I code to 3.4 for now, how complex will the
>> migration path to 4.0 be?
>
> There is not much common ground between the two implementations.
> Perhaps if you can isolate the code that uses the Protege 3.4 owl api/
> owl api from the rest of your app this would help.  They are both
> Swing based so perhaps some of the gui code could survive.
>
>> 5) Will the client-server architecture be the same as 3.4? (i.e. DB/
>> File<-->Server<-(rmi)->Client<-->App?
>
> The architecture will be very different.   But Protege 4 will probably
> support rmi and web-service based communication and will  also have a
> database backend.  Simply creating a client-server will not depend
> much on the database backend.  But for more features such as
> WebProtege the database back end will be very important.
>
> The Protege 3.4 architecture was based on implementing the Protege 3.4
> owl api over the wire.  This approach makes many demands on the
> network and requires sophisticated caching protocols to ensure that
> the client will be responsive even when the network is slow.  The
> Protege 4 architecture [1,2] is based on the idea of change
> management.  Most of the network communication will involve updates of
> changes made to the ontology between the server and different
> clients.  This mechanism will decouple the client from the server.  In
> an extreme case, a client could even go offline for an extended period
> of time and have his changes committed when he gets back.  (Like I can
> do with e-mail.)
>
>> 6) When will the above features be released?
>
> It is hard to say.  Work is starting now.  But one of the things that
> will have to happen first is the update of Protege 4 to the full set
> of OWL 2.0 features.   I don't really want to commit but I am hoping
> that in 3-6 months we will have something.
>
>> 7) Lastly, was there any load testing done on Protégé 4.0 (or 3.4
>> for that matter?)  I'm curious of what magnitude of data can be
>> reasonably supported in client-server mode.
>
>
> This question probably needs some refining.  In standalone mode, there
> has been quite a bit of load testing.  Tania recently loaded SNOMED
> into a database and Matthew has done several experiments with the owl
> api.  The owl api is much more lightweight than Protege 3.4 and
> Matthew has achieved some impressive results.  I don't have
> architecture/timing pairs handy but I wouldn't be surprised if these
> can easily be found on the internet (at least for the owl api).
>
> For the Protege 4 client server, this probably will give an accurate
> picture of the limits because the client will  be required to do the
> parsing and navigation of the ontology.  Even for really large
> ontologies, the changes being transferred between clients will not
> grow so the network will not be  overloaded.  In Protege 3.4 with its
> dependence on caches and interaction with the server for each call,
> the situation is more complex.  We have a user base that edits the NCI
> thesaurus (a moderately  large ontology)  using the client-server on a
> regular basis.
>
> -Timothy
>
>
> [1]
> http://sunsite.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-432/owled20
> 08eu_submission_33.pdf
> [2]
> https://bmir-gforge.stanford.edu/gf/project/owleditor/wiki/?pagename=ClientSe
> rverDesignIdea
>
>
> On Apr 23, 2009, at 6:48 AM, Miceli, Gino (FOEL) wrote:
>
>> Dear Protégé Team,
>>
>> First of all, I would like to compliment you on a fine job of
>> Protégé so far.  I've been evaluating it to model and maintain some
>> our knowledge bases, and the tools provided seem quite powerful and
>> well thought out.
>>
>> I am currently testing against 3.4 since I hope to use Protégé as a
>> backend for several web-based applications.  Ideally, I would have a
>> frontend which would present results on the web, and an
>> administrative backend for data entry.  I have noticed several
>> limitations with 3.4 in these areas, and was wondering what was
>> slated for 4.0.  In particular:
>>
>> 1) Will concurrent transactions be supported?  Do I understand
>> correctly that in 3.4 they are synchronized/blocking operations?
>> 2) Are there plans to better isolate the GUI, server and client
>> components of Protégé?
>> 3) In 3.4, the project and remote project manager are singletons.
>> This means a client can only connect to one server project at a
>> time.  Will this limitation be removed in 4.0?  One problem that I
>> think may be related to this is that in 3.4 transactions in a
>> particular VM are not fully isolated.  In particular, if I create
>> two sessions in the same application, begin a transaction in model
>> A, and create a new individual w/o committing, model B sees the new
>> individual (but the properties are null until A commits).  If A and
>> B are created in different classloaders/VMs, B does not see the new
>> individual until A commits.
>> 4) To what degree will the 4.0 API be compatible with the 3.4 one?
>> In other words, if I code to 3.4 for now, how complex will the
>> migration path to 4.0 be?
>> 5) Will the client-server architecture be the same as 3.4? (i.e. DB/
>> File<-->Server<-(rmi)->Client<-->App?
>> 6) When will the above features be released?
>> 7) Lastly, was there any load testing done on Protégé 4.0 (or 3.4
>> for that matter?)  I'm curious of what magnitude of data can be
>> reasonably supported in client-server mode.
>>
>> Sorry for the flood of questions, but out group must decide in the
>> coming days if we can use Protégé to model our knowledge, and these
>> answers may quell concerns and allows us to go forward.
>>
>> Again, thanks for all your efforts and I look forward to learning
>> more soon!
>>
>> Best regards,
>>
>> --------------------------------------------
>> Gino Miceli
>> System Development Specialist
>> Food and Agriculture Organization of the UN
>> Forest Communication Service (FOEL)
>>
>> _______________________________________________
>> p4-feedback mailing list
>> p4-feedback at lists.stanford.edu
>> https://mailman.stanford.edu/mailman/listinfo/p4-feedback
>
> _______________________________________________
> p4-feedback mailing list
> p4-feedback at lists.stanford.edu
> https://mailman.stanford.edu/mailman/listinfo/p4-feedback
> _______________________________________________
> p4-feedback mailing list
> p4-feedback at lists.stanford.edu
> https://mailman.stanford.edu/mailman/listinfo/p4-feedback




More information about the p4-feedback mailing list