Search Mailing List Archives


Limit search to: Subject & Body Subject Author
Sort by: Reverse Sort
Limit to: All This Week Last Week This Month Last Month
Select Date Range     through    

Using GO annotation in a blast reflib

Suzanna Lewis suzi at fruitfly.bdgp.berkeley.edu
Wed Oct 4 14:39:09 PDT 2000


> From: Mark Waugh <mew at ncgr.org>
> To: go <gofriends at genome.stanford.edu>
> Cc: cjb at ncgr.org, mew at ncgr.org
> Subject: Using GO annotation in a blast reflib
> 
> Hello,
> 
> We are interested in augmenting the sequence definitions in standard
> Blast output with their corresponding GO annotations where available. 
> To do this, we can either create reflibs containing only those 
> sequences with GO annotations,.....

Yep, I did that exercise once myself, but just as a one-shot. We
have been ruminating and gradually approaching this over the last
year.

> ....or we can scan the HSPs resulting from a search
> against either NT or NR for GB identifiers that have corresponding GOids
> from the GO database. 

That would work too, -if- the GB identifiers were in the database.

> In either case it would seem to be a fairly
> straightforward mapping between Genbank accessions and GOids. The
> problem is that, although there is a place for Genbank accession numbers
> in the table dbxref, there aren't any in the latest version of the
> database (we have a local version running in house and obtained the
> latest update on Tuesday from John Richter). 

Right, although really there needs to an additional table in the
database to hold this information. The dbxref table is one half and the
gene_product table is the other, but there isn't a table specifically
to link a particular gene_product to a particular GB dbxref entry.  All
we have right now are these 3

1. term_dbxref	where the definition came from
2. gene_product	where to find the complete model organism db entry
3. evidence	the supporting citation for an association between
		a term and a gene_product

its easy to create a new table say 'gene_product_seq'
something like this:

create table gene_product_seq (
	gene_product_id integer not null,
	dbxref_id	integer not null
);

or to add another dbxref_id to the gene_product table (though that
mandates a 1:1 relationship so i'll do the separate table).

> We have experimented with
> the idea of using a webbot to obtain the GB accession numbers from the
> consortium members' individual web sites based on identifiers (the
> contributing group's internal accession numbers) parsed from the
> association files available on the GO website, and while this appears to
> work, it's pretty convoluted and we don't want to flood these sites with
> automated requests. This also introduces a potential problem of
> synchronization between the version of GO we are running and all of the
> ancillary files we are parsing to get the GB accessions.
> 
> Are there any plans in the near future to populate the dbxref table with
> IC accession numbers, and if not, can anyone suggest where we might
> obtain this information for each sequence represented in GO?
> 

Ok so the answer is yes. The fiddly bit is implementing the plan. We
have (thanks Mike, Midori, et al.) already a lovely file for yeast
SGD_GO_assoc_prot (its actually a fasta file) that has the data needed
in its header lines.

I am pretty sure we can provide the same for the fly soon. I'd like to
wait until we have a handle on the second release of the fly so that
the data is as good as we can make it. And we're late in providing this
file to the GO repository, but its on the job list.

Each of the above gives us a 1:1 correspondence, one gene product = one
protein. Mouse is a bit more difficult. Do you have a set of mouse
sequences already? Is it 1:M? Is this livable? If I remember correctly
we can ultimately expect a file similar to the one yeast has already
provided from MGD. It might be better just to wait until MGD provides
this and we get updates from them.

> Thanks very much for your time.
> 
> Mark
> 
> 
> -- 
> Mark Waugh, Scientist 
> National Center for Genome Resources
> 2935 Rodeo Park Drive East
> Santa Fe, NM 87505, USA
> mew at ncgr.org http://www.ncgr.org	
> Ph: (505) 995-4446, (800) 450-4854 Fax: (505) 982-7690
> 
> --
> This message is from the GOFriends Mailing list.  A list of public
> announcements and discussion of the Gene Ontology (GO) project.
> Problems with the list?           E-mail: owner-gofriends at genome.stanford.edu
> Subscribing   send   "subscribe"   to   gofriends-request at genome.stanford.edu
> Unsubscribing send   "unsubscribe"  to  gofriends-request at genome.stanford.edu
> Web:          http://www.geneontology.org/
> 

--
This message is from the GOFriends Mailing list.  A list of public
announcements and discussion of the Gene Ontology (GO) project.
Problems with the list?           E-mail: owner-gofriends at genome.stanford.edu
Subscribing   send   "subscribe"   to   gofriends-request at genome.stanford.edu
Unsubscribing send   "unsubscribe"  to  gofriends-request at genome.stanford.edu
Web:          http://www.geneontology.org/



More information about the go-friends mailing list