Search Mailing List Archives
Phone call minutes
jclark at ebi.ac.uk
Mon Apr 2 03:42:35 PDT 2007
Thanks for the great minutes. I have added in the part where I took
minutes and pasted below. Would you like to put the whole lot up on the
Minutes for GO Outreach call of 3/30/07
We attempted to come to a finished product about the IEA flowchart(s). A
basic idea is that how one gets IEA annotation depends upon many
factors, including whether one can make use of UniProt, where the
sequences given to EMBL/Genbank; etc.
How to obtain records for one taxon ID from UniProt?
Sending everything “through” UniProt has limitations. UniProt does not
have anything. For example, the farm animal grouped in UniPark do not
have UniProt ids, but IPI accession Ids.
UniProt does not have a great deal of prokaryotic products. TIGRE may be
a better source for comparisons.
Things to emphasize
InterPro domain to GO mappings are meant to be broad. ISS via blasts
help you get more specific.
HAMAP to GO : a more manual GO mapping
New in UniProt: SPCL to GO (subcellular localization to GO)
Harold attempted to clarify MGI IEA flow chart
During the night we download all the uniprot records for the mouse taxon
id. Each uniprot record has a section that lists embl and genbank
records (nucleic acid version.)
If any ids match any gene in the MGI database then they keep that
record. This record is attached to the marker. That means that we load
the swissprot ids, and the two accessions are linked in a relational
Each record also contains keyword. We don't load the keywords but we map
them to GO in house. The keyword2go mapping could be used but MGI makes
there own as they have particular needs.
We load the EC numbers and the domains into the database where they are
known to apply to a given gene product.
(Unless it is a trembl record in which we'd only do an EC number.) Not
all domains are taken as they'd get odd results from a patially curated
e.g. s6kinase domain there were thousands of them.
Every night we load GO also.
Some very broad mapping terms may be filtered e.g. enzyme.
Questions about “marker” ; could change to gene; again specific to MGI
(and maybe others) because we have many seq_ids (nucleic acid and
proteins) that are collected under the thing we call Marker.
(Originally, MGI was a chromosome mapping db, so marker was appropriate:
something one followed in crosses). Evelyn indicated that there will
be/are now manual rules applied to translating the keywords to GO
terms, and that perhaps the rules should be shared. However, MGI
strictly takes the keywords in record and takes the GO term that the
keyword maps to in the translation table with NO inspection (done
nightly by HAL2000).
Think of ISS or IEA as a “suggestion”.
Use of ISS noting the context within an organisms as to whether to take
the GO terms of the organisms that the ISS points to.
Michelle: TIGRE does not make much use of the UniProt resource, but uses
Procite, Pfam, and TIGRE2GO mappins.
If one wanted to do the IEA on ones own, say for sequence that is
nowhere else other than at the users site, then it would appear that the
best approach would be a domain scan using any of several tools, and
then using the IP2GO mappings.
Harold Drabkin wrote:
> Here is what I have so far, based on my scribbles. If you can remember
> anything please add.
Gene Ontology Consortium
EMBL-European Bioinformatics Institute
More information about the go-discuss