Search Mailing List Archives


Limit search to: Subject & Body Subject Author
Sort by: Reverse Sort
Limit to: All This Week Last Week This Month Last Month
Select Date Range     through    

Phone call minutes

J Clark jclark at ebi.ac.uk
Mon Apr 2 03:42:35 PDT 2007


Hi Harold,

Thanks for the great minutes. I have added in the part where I took 
minutes and pasted below. Would you like to put the whole lot up on the 
wiki?

Thanks,

Jen

Minutes for GO Outreach call of 3/30/07

Rama Balakrishnan
Evelyn Camon
Jennifer Clark
Harold Drabkin*
Michelle Gwinn
Pascale Gaudet
Fiona McCarthy

We attempted to come to a finished product about the IEA flowchart(s). A 
basic idea is that how one gets IEA annotation depends upon many 
factors, including whether one can make use of UniProt, where the 
sequences given to EMBL/Genbank; etc.

How to obtain records for one taxon ID from UniProt?
Sending everything “through” UniProt has limitations. UniProt does not 
have anything. For example, the farm animal grouped in UniPark do not 
have UniProt ids, but IPI accession Ids.
UniProt does not have a great deal of prokaryotic products. TIGRE may be 
a better source for comparisons.

Things to emphasize
InterPro domain to GO mappings are meant to be broad. ISS via blasts 
help you get more specific.

HAMAP to GO : a more manual GO mapping

New in UniProt: SPCL to GO (subcellular localization to GO)

Harold attempted to clarify MGI IEA flow chart

===================

During the night we download all the uniprot records for the mouse taxon 
id. Each uniprot record has a section that lists embl and genbank 
records (nucleic acid version.)
If any ids match any gene in the MGI database then they keep that 
record. This record is attached to the marker. That means that we load 
the swissprot ids, and the two accessions are linked in a relational 
database.

Each record also contains keyword. We don't load the keywords but we map 
them to GO in house. The keyword2go mapping could be used but MGI makes 
there own as they have particular needs.

We load the EC numbers and the domains into the database where they are 
known to apply to a given gene product.
(Unless it is a trembl record in which we'd only do an EC number.) Not 
all domains are taken as they'd get odd results from a patially curated 
record.
e.g. s6kinase domain there were thousands of them.

Every night we load GO also.

Some very broad mapping terms  may be filtered e.g. enzyme.
=============

Questions about “marker” ; could change to gene; again specific to MGI 
(and maybe others) because we have many seq_ids (nucleic acid and 
proteins) that are collected under the thing we call Marker. 
(Originally, MGI was a chromosome mapping db, so marker was appropriate: 
something one followed in crosses).  Evelyn indicated that there will 
be/are  now manual rules applied to translating the keywords to GO 
terms, and that perhaps the rules should be shared. However, MGI 
strictly takes the keywords in record  and takes the GO term that the 
keyword maps to in the translation table with NO inspection (done 
nightly by HAL2000).

Think of  ISS or IEA as a “suggestion”.
Use of ISS noting the context within an organisms as to whether to take 
the GO terms of the organisms that the ISS points to.

Michelle: TIGRE does not make much use of the UniProt resource, but uses 
Procite, Pfam, and TIGRE2GO mappins.

If one wanted to do the IEA on ones own, say for sequence that is 
nowhere else other than at the users site, then it would appear that the 
best approach would be a domain scan using any of several tools, and 
then using the IP2GO mappings.


==============================================

Harold Drabkin wrote:

> Here is what I have so far, based on my scribbles. If you can remember 
> anything please add.
> 
> Harold

-- 
Gene Ontology Consortium
EMBL-European Bioinformatics Institute




More information about the go-discuss mailing list