Search Mailing List Archives

Limit search to: Subject & Body Subject Author
Sort by: Reverse Sort
Limit to: All This Week Last Week This Month Last Month
Select Date Range     through    

GK release version 7.

Ewan Birney birney at
Thu Dec 18 06:11:11 PST 2003

Dear Go Friends;

I'd like to update you on the progress of GK (genomeknowledgebase; a 
rather content free name; we really are "human pathwaybase" but that's a 
little long winded).

GK is a rich data model that models a "reaction network" view of 
molecular biology, with an equal stress on molecular biology pathways 
(eg, signalling pathways, molecular assemblies) and metabolic pathways 
(eg, glycolysis). 

For example here is a inner page describing the cyclin E related events in 
the cell cycle


GK version 7 has 661 referenced human proteins which participate in 524 
reactions, organised into 62 pathways. Some of these pathways are small 
(eg, glucose uptake); but there are over 20 large pathways, for example 
Cell Cycle, mRNA processing and nucleotide metabolism.

We have also extended GK to cover the other species in Ensembl using 
ortholog tables generated by Ensembl. This gives us a total 1,806 proteins 
that we have implicated in reactions (we clearly distinguish 
computationally infered events from curated events).

GK's data model has a particular emphasis on the actual molecular
assemblies that are known to participate in reactions. We have 371
different complex descriptions. The ability to track the different
molecular assemblies allows us to unambiguously distinguish complexes (eg,
pyruvate dehydrogenase) from isozymes (eg, the different hexokinases) and
allows clear assignment of molecular function to active complexes.

GK makes explicit use of GO in the assignment of molecular function to 
physical entities and the assignment of compartments to reactions. There 
is also coordination of  "GK pathways" and GO "biological process" but 
like many people in the GO consoritum, the precise assignment of 
"biological process" is by far the most complex ontology to apply from GO.

GK's data input is a controlled process of curators interacting with 
experts. The experts outline the "core" knowledge of pathways (we try to 
ask experts to stay about 1 year behind the cutting edge, to ensure we are 
storing predominantly correct and confident knowledge) which is then 
processed into the database. All reactions are either associated directly 
with a PubMedID as evidence or are labelled as "inferred" from a reaction 
in another species, which also has to be described in full and referenced 
to a PubMedID. This allows a clear traceable route for the evidence of a 
reaction, mimmicking common molecular biology practice to have a 
"patchwork" of experiments on different species contributing a fuller 
understanding of the biology. We *always* keep different species data 
separate (ie, for one reaction, all participants must have the same 

GK's data model is similar in many aspects to the Cyc's, (though there are
some differences) with an explicit data model for everything that we
store.  Crucially we tied every protein back to Swissprot/sptrembl (now
Uniprot)  accession numbers to provide clear identification of proteins.
We use the Protege data modelling tool for both schema design and data
input. (we are working on more focused ways for data input in the future).
The entire GK project can be downloaded as a Protege project from

In particular, the Protege .pins file is a plain text readable file which 
can be relatively easily parsed; eg, this described the concrete complex 
of phospho-SHC with the activated insulin receptor:

([GK_74685] of ConcreteComplex
        (_Protege_id "GK_74685")
        (name "phospho-SHC: activated insulin receptor")
        (_timestamp "20031209200901")
        (taxon [GK_48887])
        (DB_ID "74685")
        (compartment [GK_4254])
        (_partial "0")
        (_displayName "phospho-SHC: activated insulin receptor [integral to plasma membrane]")
        (created [GK_74677]))

We have successful transformed this information into (for example) a 
prolog predicate structure to verify aspects of the data model, and we'd 
be happy to share our processing infrastructure.

The genomeknowledgebase web site runs off a MySQL,Perl,Apache architecture
and is relatively easy to run in-house.

We have a large number of projects in the process of being entered,
including Apoptosis, downstream events of Insulin signalling and Xenbiotic

We welcome all collaborations and contributions; in particular 
we are interested in:

  (a) Biologist experts who would like to contribute a pathway

  (b) Other pathway databases who would like to share data with us

  (c) Bioinformaticians who would like to leverage GK 

  (d) Model organism databases who might be interested in using the GK 
framework of the web site, protege tools and our editing protocols to 
enter pathways for model organisms.

GK as a whole can be reached at

gkb-dev at

You can also contact the two PIs, myself (birney at and Lincoln 
Stein (lstein at directly.

We hope GK is of use to many people and look forward to contributing more 
to the GO community



Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420
<birney at>. 

This message is from the GOFriends moderated mailing list.  A list of public
announcements and discussion of the Gene Ontology (GO) project.
Problems with the list?           E-mail: owner-gofriends at
Subscribing   send   "subscribe"   to   gofriends-request at
Unsubscribing send   "unsubscribe"  to  gofriends-request at

More information about the go-friends mailing list