Search Mailing List Archives

Limit search to: Subject & Body Subject Author
Sort by: Reverse Sort
Limit to: All This Week Last Week This Month Last Month
Select Date Range     through    

GO XML/RDF format and anatomy ontologies

stuart at stuart at
Wed Aug 13 03:46:55 PDT 2003

Continuing the XML vs flat file discussion, the GO XML/RDF format as
used in the termdb.xml (latest version go_200307-termdb.xml) is a
good option for anatomy ontologies as it is common to want to
specialise relations e.g. to have  'descends' and
'descends-in-male' and defining this is straightforward in RDF. RDF has tool
support e.g. the HP Jena toolkit, and is actually intended to be an
ontology language. Also, RDF could be used to store all the information
associated with a GO term (although, technically, RDF is rather
unconstrained hence DAML-OIL and OWL are layered on RDFS - constraining what
you can say).   
But in trying to extend the RDF element I found the XML go.dtd no longer
defines correct RDF (perhaps standards have changed). Also the go.dtd seems to
be incorrect so it is not possible to validate the XML either. Some
problems are: 
1. the dtd uses go:isa and go:is_a, plus go:part-of and go:part_of 
   which seem like typos. 
2. go:term has the attribute n_associations which produces incorrect RDF.
3. the URI references:
   go:term rdf:about="" 
   are notional, that is, there is no file at this URI.
These problems are easily fixed, allowing the validation of
the XML and use of checking in RDF. Tools like VRP from ICS-FORTH
are able to check syntax and semantic errors such as loops in the class
hierarchy, loops in the property hierarchy, domain and range checks
plus make inferences using external RDF sources.

The differences are not great:
 <go:term rdf:ID="&GOQ;0003673">

  <go:term rdf:about="" n_associations="0">

A small GO file in the slightly modified format is here:
the modified dtd plus a RDFS file that defines what the GO vocabulary
means can be seen here:
(but it is notable that is_a and part_of are no longer in the go file,
they are defined in the go.rdfs file.)

RDF, and the RFD Schema that comes with it, can be used to define a
richer ontology for GO, for example, go:isa should really be replaced with
rdfs:subClassOf, lineage and part-of relations should be related by 
rdfs:subPropertyOf and so on. see
However, the syntactic problems mean that the current GO
XML/RDF format files need editing before a valid RDF file can be obtained.  

Two issues arise - to encourage the use of RDF, and semantic analysis,
is it easier to generate a new RDFS termdb file than correct the current
XML ? this depends on the way current tools make use of the XML.  
Finally, GO IDs such as GO:0003673 are not valid 'Qnames' (qualified
names) in XML as they contain a colon, and colons are reserved for
namespace delimiters
so completely migrating GO to the current recommended standards for
the 'Semantic Web' is in some conflict with the existing GO naming
convention. A simple solution would be to make the ID part of the term
 <go:term rdf:ID="Gene_Ontology.0003673">      :the URI ref (a unique Qname)
     <go:id>GO:0003673</go:id>		       :the ID
     <go:name>Gene_Ontology</go:name>	       :the name

Neither the XML or RDF is easily read, so tools must be used. However,
I really wonder whether flat anatomy files of 6000-7000 lines are easily and
correctly read and modified either?
Thoughts, comments?

Dr Stuart Aitken
Artificial Intelligence Applications Institute
The University of Edinburgh
Appleton Tower, Room 3.08
Crichton St
Edinburgh EH8 9LE
United Kingdom

This message is from the GOFriends moderated mailing list.  A list of public
announcements and discussion of the Gene Ontology (GO) project.
Problems with the list?           E-mail: owner-gofriends at
Subscribing   send   "subscribe"   to   gofriends-request at
Unsubscribing send   "unsubscribe"  to  gofriends-request at

More information about the go-friends mailing list