Search Mailing List Archives


Limit search to: Subject & Body Subject Author
Sort by: Reverse Sort
Limit to: All This Week Last Week This Month Last Month
Select Date Range     through    

use of Uniprot accession Vs GenBank Accession in With column

Harold Drabkin hjd at informatics.jax.org
Wed Jan 31 14:29:11 PST 2007


For what it's worth: This doesn't just happen for  protein coding genes:

Most eukaryotic tRNAs are coded for by more than one gene. In mice and 
humans there are  7-8 initiator tRNA genes;
The mature coding region is identical while they differ in their 
flanking sequence. In yeast, there are 4; again identical mature tRNA, 
but flanking sequences are different.

The 4 yeast genes have different transcriptional efficiencies in vitro 
and possibly in vivo.  Similarly, of 3 human tRNA genes  examined, in 
vitro and in vivo studies suggest they  don't seem to be all equally 
active (or, due to the fact that the precursor sequences at 5' and 3' 
ends are different, they may be processed with different efficiencies).  
The point is that one doesn't know what percent of the tRNA came from 
each gene.

For a gene-centric db such as MGI, putting GO annotation based on the 
function of the tRNA in initiation, etc. we don't know which gene(s) to 
pin it on. We would have to assume that all of the genes contribute 
equally to the tRNA pool. But that may not be the case at all (see 
above).  At best, they could be annotated based on "potential" functions 
and processes as determined by assaying the tRNA.

hjd


Pankaj Jaiswal wrote:
> I agree. Also if you talk to a geneticist and/or an evolutionary 
> biologists who care about the polymorphism (at nucleotide and protein 
> levels), this is a significant problem. So far what I have heard from 
> them is that they expect to find (in case of the example given by 
> Emily) two protein entries because there are two different gene loci 
> that encode it. Another reason being both the genes will have 
> different gene full names and symbols e.g. SEC61G1 and SEC61G2.
>
> Therefore even though the sequences were 100% identical SEC61G1 and 
> SEC61G2 are different. This would allow to provide a distinct 
> expression profile, localization, or interaction data if any.
>
> Also lets say someone only did the real experiments with SEC61G1 and 
> not with the SEC61G2, then if by chance I have to generate a GO 
> annotation based on ISS to the Arabidopsis gene, the idea would be to 
> use the protein_accession_number of the SEC61G1 and not the SEC61G2, 
> because only SEC61G1 was characterized using real lab experiments.
>
> It would be wrong to put both the arabidopsis gene products (SEC61G1 
> and SEC61G2) with one single Uniprot entry. However, the GB/EMBL/DDBJ 
> will most likely maintain separate protein accessions encoded by these 
> two genes.
>
> -Pankaj
>
>
>
> Gavin Sherlock wrote:
>> Interesting dilemma.
>>
>> Clearly the result is based on a protein, but what if there are two 
>> genes, A and B, whose DNA sequences may differ, but whose protein 
>> products have identical sequences.  What if one of them is expressed 
>> under some circumstances, allowing its product to interact with 
>> protein X, and the other is never expressed under circumstances that 
>> would allow its product to interact with protein X.  In this case, 
>> when annotating protein X, knowing the gene whose product it 
>> interacts with would be important.  Of course, I have no examples of 
>> this, and no reason to expect that they might exist, but it is a 
>> formal possibility, and there are certainly examples in the 
>> literature where synonymous changes can affect function.
>>
>> Cheers,
>> Gavin
>>
>> On Jan 31, 2007, at 8:34 AM, Emily Dimmer wrote:
>>
>>> Hi,
>>>
>>> Yes this is true, there is only one UniProtKB record when two 
>>> proteins are from the same species and 100% identical.
>>> I thought this discussion was started what type of accessions should 
>>> be used in the 'with' column for IPI-evidenced annotations ... if 
>>> the proteins are identical and the experiment e.g. a protein binding 
>>> assay,  is done with the protein,  how much is this a problem? 
>>> Surely its more correct and more meaningful to the user to use a 
>>> protein identifier.
>>>
>>> For an example of two genes encoding the same protein sequence see 
>>> Q9SW34 (S61G1_ARATH)  
>>> (http://www.ebi.uniprot.org/uniprot-srv/uniProtView.do?proteinId=S61G1_ARATH&pager.offset=null) 
>>>
>>> You can see the two gene name lines here:
>>>
>>> GN   Name=SEC61G1; OrderedLocusNames=At4g24920; ORFNames=F13M23.60;
>>> GN   and
>>> GN   Name=SEC61G2; OrderedLocusNames=At5g50460; ORFNames=MBA10.1;
>>>
>>> Emily
>>>
>>> Doug Howe wrote:
>>>
>>>> I seem to recall that identical proteins generated from distinct 
>>>> genes are represented by a single UniProt record.  If that is still 
>>>> true, isn't that a case where an EMBL accession would be better in 
>>>> the with field?
>>>> -Doug
>>>>
>>>> On Wed, 31 Jan 2007, Valerie Wood wrote:
>>>>
>>>>> Emily Dimmer wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Just a quick note, GenBank Accessions are exactly the same as 
>>>>>> EMBL accessions. All EMBL accessions are cross-referenced in 
>>>>>> UniProt. Therefore if you *did* want to find a UniProtKB 
>>>>>> accession, you should be able just to enter the  GenBank 
>>>>>> accession into the UniProt website (or search via SRS etc...) and 
>>>>>> it will bring up the quivalent UniProt entry (I do realize that 
>>>>>> for some groups there is an issue of a UniProtKB accession not 
>>>>>> yet existing for an equivalent GenBank accession).
>>>>>>
>>>>>
>>>>> In the cases where there is no Uniprot ID, it may be a problem to 
>>>>> refer to the Genbank/EMBL accession number as this will often be a 
>>>>> cosmid or contig and contain multiple CDS- in these cases you 
>>>>> can't refer to the gene/protein uniquely  with an EMBL ID.
>>>>>
>>>>> Presumably though, for the cases where there is no Swiss-Prot 
>>>>> /Trembl ID then the likelihood that you would be using this as a 
>>>>> dbxref in the with column for an ISS is very small (I have never 
>>>>> come across one). Can't we all agree to track down the Uniprot ID 
>>>>> (which is relatively straightforward), or in cases why there isn't 
>>>>> one, contact Uniprot to work out why?
>>>>>
>>>>> Val
>>>>>
>>>>>> Cheers,
>>>>>> Emily
>>>>>>
>>>>>> Midori Harris wrote:
>>>>>>
>>>>>>> Actually, GB or GenBank would also be acceptable, because 
>>>>>>> they're listed as synonyms in GO.xrf_abbs (tthe filtering script 
>>>>>>> allows anything in the 'aabbreviation' or 'synonym' fields).
>>>>>>>
>>>>>>> m
>>>>>>>
>>>>>>> On Tue, 30 Jan 2007, Karen Christie wrote:
>>>>>>>
>>>>>>>> Note that the abbreviation selected by GO for the IDs for 
>>>>>>>> GenBank, DDBJ, and EMBL is EMBL, so that's the namespace that 
>>>>>>>> needs to be used in the gene_association files for GO.
>>>>>>>>
>>>>>>>> -Karen
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, 30 Jan 2007, Pankaj Jaiswal wrote:
>>>>>>>>
>>>>>>>>> Got it. We will use the GB one.
>>>>>>>>>
>>>>>>>>> BTW GenBank ID is different than the GenBank Accession. 
>>>>>>>>> GenBank ID is the ID exclusive for the GenBank database entry. 
>>>>>>>>> One GB accession can have mappings to several GenBank IDs.
>>>>>>>>>
>>>>>>>>> Pankaj
>>>>>>>>>
>>>>>>>>> Karen Christie wrote:
>>>>>>>>>
>>>>>>>>>> Hi Pankaj,
>>>>>>>>>>
>>>>>>>>>> GenBank IDs are already allowed in the with column. The main 
>>>>>>>>>> requirement is that the abbreviation (or namespace) for the 
>>>>>>>>>> source of the ID be included in the GO.xrf_abbs file. There 
>>>>>>>>>> is already an entry for IDs coming from GenBank/DDBJ/EMBL, so 
>>>>>>>>>> these IDs are already permissable.
>>>>>>>>>>
>>>>>>>>>> -Karen
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> abbreviation: EMBL
>>>>>>>>>> database: International Nucleotide Sequence Database 
>>>>>>>>>> Collaboration, comprising EMBL-EBI International Nucleotide 
>>>>>>>>>> Sequence Data Library (EMBL-Bank), DNA DataBank of Japan 
>>>>>>>>>> (DDBJ), and NCBI GenBank
>>>>>>>>>> object: Sequence accession number
>>>>>>>>>> example_id: EMBL:AA816246
>>>>>>>>>> example_id: DDBJ:AA816246
>>>>>>>>>> example_id: GB:AA816246
>>>>>>>>>> synonym: DDBJ
>>>>>>>>>> synonym: GB
>>>>>>>>>> synonym: GenBank
>>>>>>>>>> generic_url: http://www.ebi.ac.uk/embl/
>>>>>>>>>> generic_url: http://www.ddbj.nig.ac.jp/
>>>>>>>>>> generic_url: http://www.ncbi.nlm.nih.gov/Genbank/
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, 30 Jan 2007, Pankaj Jaiswal wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Everyone,
>>>>>>>>>>>
>>>>>>>>>>> I know it is an accepted SOP to include either the Uniprot 
>>>>>>>>>>> accession number or the individual database's own 
>>>>>>>>>>> gene/protein ID in the WITH column of the association tables.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> However while doing it it seems that it is too much of the 
>>>>>>>>>>> work to find out what is the Uniprot entry, because often 
>>>>>>>>>>> the DDBJ and GenBank do not Xref each other using the 
>>>>>>>>>>> Uniprot accession. However the best alternative is to use 
>>>>>>>>>>> the GenBank's Accession number. Which I see that almost all 
>>>>>>>>>>> the databases including Uniprot, DDBJ, EMBL, PIR etc. use it 
>>>>>>>>>>> to cross refer. It is also the most suitable ID used to find 
>>>>>>>>>>> the particular nucleotide/protein accession that we are 
>>>>>>>>>>> looking for using the same query, no matter which db is 
>>>>>>>>>>> queried.
>>>>>>>>>>>
>>>>>>>>>>> I hope you would consider my request by adopting the 
>>>>>>>>>>> GenBank's accession number, unless there is a better option.
>>>>>>>>>>>
>>>>>>>>>>> Thanks
>>>>>>>>>>> Pankaj
>>>>>>>>>>> --Pankaj Jaiswal
>>>>>>>>>>> G-15, Bradfield Hall
>>>>>>>>>>> Dept. of Plant Breeding and Genetics
>>>>>>>>>>> Cornell University
>>>>>>>>>>> Ithaca, NY-14853, USA
>>>>>>>>>>>
>>>>>>>>>>> Ph. +1-607-255-3103 / 4199
>>>>>>>>>>> fax: +1-607-255-6683
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --Pankaj Jaiswal
>>>>>>>>> G-15, Bradfield Hall
>>>>>>>>> Dept. of Plant Breeding and Genetics
>>>>>>>>> Cornell University
>>>>>>>>> Ithaca, NY-14853, USA
>>>>>>>>>
>>>>>>>>> Ph. +1-607-255-3103 / 4199
>>>>>>>>> fax: +1-607-255-6683
>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> ----------------------------------------------------------------------------- 
>>>>>
>>>>> Valerie Wood             Tel: 01223 496909
>>>>> S. pombe Genome Project         Fax: 01223 494919 Wellcome Trust 
>>>>> Sanger Institute     email: val at sanger.ac.uk
>>>>> Wellcome Trust Genome Campus     
>>>>> http://www.genedb.org/genedb/pombe Hinxton, Cambridge, CB10 
>>>>> 1HH     http://www.sanger.ac.uk/Projects/S_pombe
>>>>>
>>>
>>>
>>> --************************************
>>>    Emily Dimmer
>>>    GOA and IntAct Database Curator
>>>    EMBL-EBI
>>>    Wellcome Trust Genome Campus
>>>    Hinxton
>>>    Cambridge CB10 1SD, U.K.
>>>    Tel:     +44 1223 494654
>>>    Fax:    +44 1223 494468
>>>    email:  edimmer at ebi.ac.uk
>>> ************************************
>>
>>
>




More information about the go-discuss mailing list