Search Mailing List Archives


Limit search to: Subject & Body Subject Author
Sort by: Reverse Sort
Limit to: All This Week Last Week This Month Last Month
Select Date Range     through    

[protege-owl] Using Krextor for XML->Ontology extraction [Re: xml extraction]

Christoph LANGE ch.lange at jacobs-university.de
Tue Aug 11 16:45:58 PDT 2009


Dear Faraz, dear Jean-Marc, dear all,

  I only noticed this discussion
(http://www.nabble.com/xml-extraction-td24682925.html) later, but as my
Krextor library was mentioned, let me clarify this a bit:

Jean-Marc:
> There are 2 tools that I'm considering :
> - Gloze is part of Jena
> (http://jena.hpl.hp.com/juc2006/proceedings/battle/paper.pdf)
> - Krextor - http://www.semanticscripting.org/SFSW2009/short_2.pdf
> 
> Both use an XML Schema to "RDFify" an XML instance.

As far as I understand Gloze (I've read that paper 3 years ago and skimmed it
now again, so I might be wrong), it is based on XML Schema, which is a matter
of taste and has both positive and negative aspects:  You have to provide the
schema, but once you have it, it makes some tasks easier.

Krextor is technically independent from XML Schema.  OK, conceptually, one
could regard it as an "XML Schema -> Ontology" translator, but what it
technically does it that it matches elements of a concrete XML document (= an
instance of a schema) and generates concrete RDF resources (= instances of an
ontology) from them.

Gloze strictly follows the XML Schema when generating RDF, basically treating
the XML Schema like an ontology.  This means that you don't have to do much to
obtain RDF but to run Gloze, but on the other hand, the resulting RDF may not
suit your preferred ontology.  Krextor is harder to use in the sense that you
have to implement your own XML->RDF mapping, but then it gives you more
flexibility, as you can use any ontology as a vocabulary for the resulting
RDF.  On the other hand that means that there is no longer a standard,
straightforward way of converting that RDF back into XML.

> I had a look at already run examples in Gloze, it looks good :
> https://sourceforge.net/projects/jena/files/

Gloze hasn't been maintained for two years.  Krextor is under active
maintenance at the moment, but, admittedly, just by 1 1/2 persons.  See
http://trac.kwarc.info/krextor for detailed project information.

> For Krextor, I'm not sure that there is some reusable library.

Krextor is designed as a reusable library.  There is a shell script frontend
for running it, but that's merely for convenience in case you want to test
something.  Krextor is implemented in XSLT 2.0 and thus can be integrated with
any programming language (OK, there are not so many yet) that has access to an
XSLT 2.0 processor.  As I'm particularly working with the Java-based Saxon
XSLT processor, there is a tight integration with Java, provided by a
convenient Java wrapper around the Krextor core.  The above-mentioned homepage
contains a lot of documentation.

Hope that helps,

Christoph

-- 
Christoph Lange, Jacobs Univ. Bremen, http://kwarc.info/clange, Skype duke4701

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part.
URL: <http://mailman.stanford.edu/pipermail/protege-owl/attachments/20090812/f2acc2a2/attachment.asc>


More information about the protege-owl mailing list