Search Mailing List Archives


Limit search to: Subject & Body Subject Author
Sort by: Reverse Sort
Limit to: All This Week Last Week This Month Last Month
Select Date Range     through    

Uniprot file

Daniel Barrell dbarrell at ebi.ac.uk
Wed Jan 28 03:49:10 PST 2004


Hi Linda,

The reason for the large number in the UniProt file is because we 
non-descriminately add annotation to this file so there is duplication 
between automatic methods, and manual methods (including those from 
external groups where we haven't tight enough collaboration - something 
we are always trying to improve).

The species specific files we provide are filtered for IEAs where there 
  is annotation by a human being.  We also only take MGI annotation if 
there is none done by GOA/UniProt curators. For example, compare 
annotation in the two files for Q99LW6, YAF2_MOUSE.

However, I have looked at the stats for the UniProt file and have found 
the script to be slightly incorrect, for example it has picked up every 
UniProt accession number that contains 'MGI' not only the data in the 
'assigned_by' column.  I will get this fixed for the next release - 
apologies.

Greetings from a snow covered England,

Daniel

Hannick, Linda wrote:
> Hi Daniel,
> Please refer to the line in the GOA table "Current Composition of
> Uniprot GOA" (http://www.ebi.ac.uk/GOA/SPTR_release.html) that says:
> GO Annotation Source  	        Number of Associations
> Number of Distinct Proteins
> Manual GO annotation by MGI  	15065  	                           5945
> 
> Does this not mean that the number of distinct proteins in GOA that were
> manually annotated by MGI is 5945?  
> 
> Then, on the MGI GOA table
> (http://www.ebi.ac.uk/GOA/MOUSE_release.html):
> 
> GO Annotation Source           Number of Associations  	Number of
> Distinct Proteins
> Manual GO annotation by MGI    6679  	                        3002
> 
> I do not understand why the number of distinct proteins annotated by MGI
> differs in these two sets.  What am I missing?
> Thanks,
> Linda
> 
> 
> 
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~
> Linda I. Hannick, Ph.D.
> Bioinformatics Analyst
> The Institute for Genomic Research
> 301-795-7857 voice
> 301-838-0228 fax
> 
> 
> 
> 
>>-----Original Message-----
>>From: Daniel Barrell [mailto:dbarrell at ebi.ac.uk] 
>>Sent: Friday, January 23, 2004 8:13 AM
>>To: David Hill
>>Cc: camon at ebi.ac.uk; Hannick, Linda; gofriends at genome.stanford.edu
>>Subject: Re: Uniprot file
>>
>>
>>As an aside, GOA also filters external 'NOT' annotation, 
>>annotation that 
>>can't be mapped to a UniProt entry and manual annotation 
>>without a valid 
>>PubMed reference.  This will also affect numbers.
>>
>>Linda - I'm a little confused about your numbers, but upon double 
>>checking I think you have them back to front. GOA 
>>incorporated mappings 
>>to 3002 MGI annotated genes.
>>
>>Regards,
>>
>>Daniel
>>
>>
>>David Hill wrote:
>>
>>
>>>MGI updates our gene annotation files at the GO site on a weekly
>>>basis. On our own site, the database is updated daily. As 
>>
>>of 1/22/04 
>>
>>>we have 4890 genes annotated by hand with at least 1 GO annotation.
>>>
>>>David
>>>
>>>At 11:49 AM 1/23/2004 +0000, camon at ebi.ac.uk wrote:
>>>
>>>
>>>>Hi Linda,
>>>>
>>>>The UniProt GOA dataset has integrated manual annotations 
>>
>>from MGI, 
>>
>>>>RGD, Flybase and SGD which do not have the ISS code.
>>>>
>>>>Perhaps this accounts for the differences with MGI.
>>>>Also GOA is released monthly...so will not be totally
>>>>in sync with MGI perhaps. Perhaps Judy and Harold
>>>>can report on how often MGI GO data is released.
>>>>We update monthly with what is new at that time.
>>>>January GOA release is still pending.
>>>>
>>>>kind regards
>>>>Evelyn
>>>>
>>>>
>>>>>Hi,
>>>>>I need some guidance about the UNIPROT set.  We would like to 
>>>>>non-redundify (how's that for a verb!) the peptide set 
>>
>>that we use 
>>
>>>>>as our comparison set.  Studying the tables at GOA does 
>>
>>not answer 
>>
>>>>>the following questions.
>>>>>
>>>>>It is not clear to me whether the GOA total of manual GO 
>>>>>annotations (26433 proteins) contains all of the 
>>
>>accessions in the 
>>
>>>>>manual GO annotations done by MGI, RGD and Proteome.
>>>>>
>>>>>Why is the number in "manual GO annotation by MGI" at GOA (5945
>>>>>proteins) different than the number in the MGI table (3002 
>>>>>proteins)? If we download all of the manual GO associations (and 
>>>>>sequences) that are in the GOA list, will there be manual GO 
>>>>>associations at MGI
>>>>
>>>>that we
>>>>
>>>>>will have missed?  I think the question really is: are 
>>
>>the UNIPROT 
>>
>>>>>GO associations kept in sync with the versions of MGI, etc?
>>>>>
>>>>>Thanks,
>>>>>Linda
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>>>Linda I. Hannick, Ph.D.
>>>>>Bioinformatics Analyst
>>>>>The Institute for Genomic Research
>>>>>301-795-7857 voice
>>>>>301-838-0228 fax
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>-----Original Message-----
>>>>>>From: owner-gofriends at genome.stanford.edu
>>>>>>[mailto:owner-gofriends at genome.stanford.edu] On Behalf 
>>
>>Of Daniel 
>>
>>>>>>Barrell
>>>>>>Sent: Wednesday, January 21, 2004 6:37 AM
>>>>>>To: DERET Sophie
>>>>>>Cc: gofriends at genome.stanford.edu
>>>>>>Subject: Re: Uniprot file
>>>>>>
>>>>>>
>>>>>>
>>>>>>Hi Sophie,
>>>>>>
>>>>>>gene_association.goa_uniprot.gz contains all species from The 
>>>>>>UniProt knowledgebase (TrEMBL/Swiss-Prot/PIR) - 61962 
>>
>>species (see 
>>
>>>>>>http://www.ebi.ac.uk/GOA/SPTR_release.html).  You can 
>>
>>filter this 
>>
>>>>>>file by taxon id if necessary.  The GOA project also 
>>
>>provides the
>>
>>>>>>following
>>>>>>species specific files:
>>>>>>
>>>>>>gene_association.goa_human
>>>>>>gene_association.goa_mouse
>>>>>>gene_association.goa_rat
>>>>>>
>>>>>>Information on these can be found here:
>>>>>>
>>>>>>ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/UNIPROT/README
>>>>>>
>>>>>>Other groups provide model organism database specific 
>>
>>files here:
>>
>>>>>>http://www.geneontology.org/GO.current.annotations.shtml
>>>>>>
>>>>>>You will notice that the GOA files are also available from this 
>>>>>>link. With regard to your question about GO:0005504, could you
>>>>>>expand a little
>>>>>>on this?  I'm unsure what you mean.
>>>>>>
>>>>>>Regards,
>>>>>>
>>>>>>Daniel
>>>>>>
>>>>>>
>>>>>>
>>>>>>DERET Sophie wrote:
>>>>>>
>>>>>>
>>>>>>>Dear GO friends,
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>I downloaded gene_association.goa_uniprot.gz file from 
>>>>>>>ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/UNIPROT/.
>>>>>>>
>>>>>>>I have question about contents of this file:
>>>>>>>
>>>>>>>- is there only one file for all species (human, rat, pig...)?
>>>>>>>
>>>>>>>- in this file, I didn't find all GO ID included in 
>>>>>>>gene_association.goa_human (example: GO: 0005504), why?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>Thanks a lot for your response.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>Sophie
>>>>>>>
>>>>>>
>>>>>>--
>>>>>>
>>>>>>Daniel Barrell
>>>>>>EMBL - The EBI
>>>>>>Wellcome Trust Genome Campus
>>>>>>Hinxton, Cambridge CB10 1SD
>>>>>>Phone: +44 (0)1223 492551
>>>>>>Email: dbarrell at ebi.ac.uk
>>>>>>
>>>>>>
>>>>>>
>>>>>>--
>>>>>>This message is from the GOFriends moderated mailing 
>>
>>list.  A list 
>>
>>>>>>of public announcements and discussion of the Gene 
>>
>>Ontology (GO) 
>>
>>>>>>project.
>>>>>>Problems with the list?           E-mail:
>>>>>>owner-gofriends at geneontology.org
>>>>>>Subscribing   send   "subscribe"   to
>>>>>>gofriends-request at geneontology.org
>>>>>>Unsubscribing send   "unsubscribe"  to
>>>>>>gofriends-request at geneontology.org
>>>>>>Web:          http://www.geneontology.org/
>>>>>>
>>>>>
>>>>>
>>>>>--
>>>>>This message is from the GOFriends moderated mailing 
>>
>>list.  A list 
>>
>>>>>of public announcements and discussion of the Gene Ontology (GO) 
>>>>>project.
>>>>>Problems with the list?           E-mail: 
>>>>
>>>>owner-gofriends at geneontology.org
>>>>
>>>>>Subscribing   send   "subscribe"   to   
>>>>
>>>>gofriends-request at geneontology.org
>>>>
>>>>>Unsubscribing send   "unsubscribe"  to  
>>>>
>>>>gofriends-request at geneontology.org
>>>>
>>>>>Web:          http://www.geneontology.org/
>>>>>
>>>>
>>>>
>>>>--
>>>>This message is from the GOFriends moderated mailing list. 
>>
>> A list of 
>>
>>>>public
>>>>announcements and discussion of the Gene Ontology (GO) project.
>>>>Problems with the list?           E-mail: 
>>>>owner-gofriends at geneontology.org
>>>>Subscribing   send   "subscribe"   to   
>>>>gofriends-request at geneontology.org
>>>>Unsubscribing send   "unsubscribe"  to  
>>>>gofriends-request at geneontology.org
>>>>Web:          http://www.geneontology.org/
>>>
>>>
>>>David Hill, Ph.D.
>>>Senior Scientific Curator
>>>Gene Expression Database/Gene Ontology Consortium
>>>Mouse Genome Informatics
>>>The Jackson Laboratory
>>>600 Main Street
>>>Bar Harbor, ME 04609
>>>
>>>(207)288-6430
>>>dph at informatics.jax.org
>>>
>>>http://www.informatics.jax.org/
>>>
>>
>>-- 
>>
>>Daniel Barrell
>>EMBL - The EBI
>>Wellcome Trust Genome Campus
>>Hinxton, Cambridge CB10 1SD
>>Phone: +44 (0)1223 492551
>>Email: dbarrell at ebi.ac.uk
>>
>>
>>
> 
> 
> 

-- 

Daniel Barrell
EMBL - The EBI
Wellcome Trust Genome Campus
Hinxton, Cambridge CB10 1SD
Phone: +44 (0)1223 492551
Email: dbarrell at ebi.ac.uk


--
This message is from the GOFriends moderated mailing list.  A list of public
announcements and discussion of the Gene Ontology (GO) project.
Problems with the list?           E-mail: owner-gofriends at geneontology.org
Subscribing   send   "subscribe"   to   gofriends-request at geneontology.org
Unsubscribing send   "unsubscribe"  to  gofriends-request at geneontology.org
Web:          http://www.geneontology.org/



More information about the go-friends mailing list