Search Mailing List Archives
Gene identifier synonym table standard and/or repository?
Gabriel Berriz
gberriz at hms.harvard.edu
Wed Feb 19 08:19:03 PST 2003
Thanks for all of the responses to our earlier question (Fritz's post)
about synonym tables!
We will go with column 11 in the GO gene association tables and support
those model organisms that make use of it. We may also supplement with
Ensembl (although see Ensembl NOTE below).
One comment on obtaining synonyms from the association tables rather than a
stand-alone synonym file. Shortcomings include:
1) No synonyms for genes that are not annotated.
2) Since synonyms are stored in a "denormalized" way, there is potential
for inconsistencies between records for the same gene (although this is
less of an issue if the files are automatically generated from a normalized
database).
FYI, the following association files (from
ftp.geneontology.org/pub/go/gene_associations/) do not use column 11:
gene_association.GeneDB_Pfalciparum
gene_association.GeneDB_Tbrucei
gene_association.GeneDB_tsetse
gene_association.compugen.Genbank
gene_association.compugen.Swissprot
gene_association.fb
gene_association.gramene_oryza
gene_association.zfin
These do:
gene_association.GeneDB_Spombe 890 synonyms; 3765 genes
gene_association.goa_human 19727 synonyms; 19727 genes
gene_association.goa_sptr 28397 synonyms; 566342 genes
gene_association.mgi 10080 synonyms; 9088 genes
gene_association.rgd 522 synonyms; 1424 genes
gene_association.sgd 6573 synonyms; 6905 genes
gene_association.tair 45327 synonyms; 18771 genes
gene_association.tigr_Tbrucei_chr2 2 synonyms; 289 genes
gene_association.tigr_ath 269 synonyms; 5749 genes
gene_association.tigr_shewanella 1233 synonyms; 3767 genes
gene_association.tigr_vibrio 1415 synonyms; 2924 genes
gene_association.vida 19 synonyms; 83 genes
gene_association.wb 1319 synonyms; 6833 genes
It would be a great help to have similar standardized lists of all
"annotatable" genes for each GO organism. In principle the association
tables could serve as the source of all "annotatable" genes if they always
included at least one annotation--possibly to attributes of type
"unknown"--for each annotatable gene id (or is this the case now?). As far
as I know, there is no easy way to determine whether this is the case for
any given association table.
Ensembl NOTE: Ensembl looks to be quite useful for us, and will get us a
more normalized table of synonyms, but we did some spot-checking in fly and
couldn't get an Ensembl list of synonyms to include full-length gene names
(e.g., Wingless, Kruppel) in addition to gene symbols (Wg, Kr). In both
human and fly we never saw more than one synonym for any given gene. Are
we doing something wrong?
Thanks again for all of your help!
Best Regards,
Gabriel Berriz
At 10:19 AM 2/13/2003 -0800, Suzanna Lewis wrote:
>Hi,
>
>I'm double-checking here that we are getting this loaded
>into the DB as well. They don't currently appear in amigo,
>nor can they be searched, but Brad and I are talking about
>how to do that.
>
>-S
>
>On Thursday, February 13, 2003, at 09:33 AM, Valerie Wood wrote:
>
>>
>>
>>i utilize this column for S. pombe too. btw I use "|" to separate
>>multiple
>>synonyms, is this correct?
>>
>>
>>
>>On Thu, 13 Feb 2003, Tanya Berardini wrote:
>>
>>>
>>>In the TAIR gene_association file, column 11 is populated with
>>>synonyms/aliases for the annotated object. These may include
>>>BAC-based
>>>names from the genome sequencing phase, full names for the lettered
>>>abbreviations (e.g. EMF1 is embryonic flower 1), other aliases for
>>>that
>>>gene (e.g. ATROP4 = ROP4 = ATGP3 = ARAC5), Arabidopsis Genome
>>>Initiative
>>>(AGI) locus names (of the format ATxgXXXXX), and gene product names.
>>>
>>>Tanya
>>>
>>>
>>>On Thu, 13 Feb 2003, Suzanna Lewis wrote:
>>>
>>>>In the gene associations table the 11th column is listed
>>>>as DB_object_synonym. I believe that this column was
>>>>added especially to address this issue. It allows for
>>>>white space and has a cardinality of 0, 1, or >1. I think
>>>>this is a more a problem of the organism databases not
>>>>having made the switch to providing this information
>>>>when the gene associations are submitted. Column 12
>>>>is the db object type (is it a gene, or a protein, or a .....)
>>>>and column 13 is the taxon. I think if these were being
>>>>populated it would perhaps help you.
>>>>
>>>>Any chance of this being put into practice annotators??
>>>>
>>>>-S
>>>>
>>>>On Thursday, February 13, 2003, at 07:54 AM, Fritz Roth wrote:
>>>>
>>>>>Greetings GOphiles,
>>>>>
>>>>>We are working on some new software that uses GO annotation, and we
>>>>>would really like it to support all GO-annotated organisms. Our
>>>>>chief
>>>>>barrier to doing this is the lack of gene identifier synonym tables
>>>>>for each organism (so that users can enter gene names rather than
>>>>>being restricted to MOD IDS, e.g., SGD or MGI IDs).
>>>>>
>>>>>Is there an agreed GO Consortium standard for gene identifier
>>>>>synonym
>>>>>tables (could be as simple as tab-delimited text with a
>>>>>synonym-uniqueID pair on each line). If so, is there a repository
>>>>>for
>>>>>such files? Or is this a GMOD question?
>>>>>
>>>>>Thanks!
>>>>>Fritz Roth
>>>>>
>>>>>-------------------------------------------------
>>>>>Frederick P. Roth, Asst. Professor
>>>>>Harvard Medical School
>>>>>Dept. of Biological Chemistry and Molecular Pharmacology
>>>>>250 Longwood Avenue, SGMB-322, Boston, MA 02115
>>>>>(617) 432-3551 phone (617) 432-3557 FAX
>>>>>froth at hms.harvard.edu http://llama.med.harvard.edu
--
This message is from the GOFriends moderated mailing list. A list of public
announcements and discussion of the Gene Ontology (GO) project.
Problems with the list? E-mail: owner-gofriends at geneontology.org
Subscribing send "subscribe" to gofriends-request at geneontology.org
Unsubscribing send "unsubscribe" to gofriends-request at geneontology.org
Web: http://www.geneontology.org/
More information about the go-friends
mailing list