Search Mailing List Archives
GK release version 7.
birney at ebi.ac.uk
Thu Dec 18 06:11:11 PST 2003
Dear Go Friends;
I'd like to update you on the progress of GK (genomeknowledgebase; a
rather content free name; we really are "human pathwaybase" but that's a
little long winded).
GK is a rich data model that models a "reaction network" view of
molecular biology, with an equal stress on molecular biology pathways
(eg, signalling pathways, molecular assemblies) and metabolic pathways
For example here is a inner page describing the cyclin E related events in
the cell cycle
GK version 7 has 661 referenced human proteins which participate in 524
reactions, organised into 62 pathways. Some of these pathways are small
(eg, glucose uptake); but there are over 20 large pathways, for example
Cell Cycle, mRNA processing and nucleotide metabolism.
We have also extended GK to cover the other species in Ensembl using
ortholog tables generated by Ensembl. This gives us a total 1,806 proteins
that we have implicated in reactions (we clearly distinguish
computationally infered events from curated events).
GK's data model has a particular emphasis on the actual molecular
assemblies that are known to participate in reactions. We have 371
different complex descriptions. The ability to track the different
molecular assemblies allows us to unambiguously distinguish complexes (eg,
pyruvate dehydrogenase) from isozymes (eg, the different hexokinases) and
allows clear assignment of molecular function to active complexes.
GK makes explicit use of GO in the assignment of molecular function to
physical entities and the assignment of compartments to reactions. There
is also coordination of "GK pathways" and GO "biological process" but
like many people in the GO consoritum, the precise assignment of
"biological process" is by far the most complex ontology to apply from GO.
GK's data input is a controlled process of curators interacting with
experts. The experts outline the "core" knowledge of pathways (we try to
ask experts to stay about 1 year behind the cutting edge, to ensure we are
storing predominantly correct and confident knowledge) which is then
processed into the database. All reactions are either associated directly
with a PubMedID as evidence or are labelled as "inferred" from a reaction
in another species, which also has to be described in full and referenced
to a PubMedID. This allows a clear traceable route for the evidence of a
reaction, mimmicking common molecular biology practice to have a
"patchwork" of experiments on different species contributing a fuller
understanding of the biology. We *always* keep different species data
separate (ie, for one reaction, all participants must have the same
GK's data model is similar in many aspects to the Cyc's, (though there are
some differences) with an explicit data model for everything that we
store. Crucially we tied every protein back to Swissprot/sptrembl (now
Uniprot) accession numbers to provide clear identification of proteins.
We use the Protege data modelling tool for both schema design and data
input. (we are working on more focused ways for data input in the future).
The entire GK project can be downloaded as a Protege project from
In particular, the Protege .pins file is a plain text readable file which
can be relatively easily parsed; eg, this described the concrete complex
of phospho-SHC with the activated insulin receptor:
([GK_74685] of ConcreteComplex
(name "phospho-SHC: activated insulin receptor")
(_displayName "phospho-SHC: activated insulin receptor [integral to plasma membrane]")
We have successful transformed this information into (for example) a
prolog predicate structure to verify aspects of the data model, and we'd
be happy to share our processing infrastructure.
The genomeknowledgebase web site runs off a MySQL,Perl,Apache architecture
and is relatively easy to run in-house.
We have a large number of projects in the process of being entered,
including Apoptosis, downstream events of Insulin signalling and Xenbiotic
We welcome all collaborations and contributions; in particular
we are interested in:
(a) Biologist experts who would like to contribute a pathway
(b) Other pathway databases who would like to share data with us
(c) Bioinformaticians who would like to leverage GK
(d) Model organism databases who might be interested in using the GK
framework of the web site, protege tools and our editing protocols to
enter pathways for model organisms.
GK as a whole can be reached at
gkb-dev at genomeknowledge.org
You can also contact the two PIs, myself (birney at ebi.ac.uk) and Lincoln
Stein (lstein at cshl.org) directly.
We hope GK is of use to many people and look forward to contributing more
to the GO community
Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420
<birney at ebi.ac.uk>.
This message is from the GOFriends moderated mailing list. A list of public
announcements and discussion of the Gene Ontology (GO) project.
Problems with the list? E-mail: owner-gofriends at geneontology.org
Subscribing send "subscribe" to gofriends-request at geneontology.org
Unsubscribing send "unsubscribe" to gofriends-request at geneontology.org
More information about the go-friends