Search Mailing List Archives


Limit search to: Subject & Body Subject Author
Sort by: Reverse Sort
Limit to: All This Week Last Week This Month Last Month
Select Date Range     through    

Quantifying Specificity of GO Terms

Stan Letovsky sletovsky at aol.com
Fri Apr 20 10:04:45 PDT 2007


This is a rather trivial answer, and may be irrelevant, depending on what
was meant by "specificity" in the original request. But when I have needed
to distinguish between very general and very specific terms in the past, I
have simply relied on the term frequency (or "annotation probability") of
each term in the genome of interest. Granted this is
annotation/knowledge-dependent, and species-dependent, but it was a pretty
useful method of distinguishing broad terms like "Metabolism" from specific
ones like "s-methyltransferase activity". You need to first complete the
transitive closure of the ISA hierarchy; then term frequencies decrease
monotonically down any chain from the root. It's a more objective metric
than depth.

Many people doing function prediction with GO terms will report
precision/recall etc. without specifying whether they are predicting general
terms like "metabolism" or more specific ones. The former predictions are
not very useful. The rarer the term, the more interesting the prediction. 

Cheers, -Stan

-----Original Message-----
From: owner-gofriends at genome.stanford.edu
[mailto:owner-gofriends at genome.stanford.edu] On Behalf Of Sorin Draghici
Sent: Wednesday, April 18, 2007 2:41 PM
To: Stan Dong
Cc: Paul Shannon; tobias at sfsu.edu; gofriends at genome.stanford.edu
Subject: Re: Quantifying Specificity of GO Terms

Hi,

There seems to be a fair amount of confusion here. There are about 20 
tools that are able to calculate a statistical significance value for a 
GO term giving a set of differentially expressed genes. This is a very 
well known problem that was defined about 4-5 years ago, see for instance:
http://vortex.cs.wayne.edu/papers/genomics.pdf  and
http://vortex.cs.wayne.edu/papers/Onto-Express_V2_proof.pdf

GO-TermFinder, GOStats and the others mentioned in the recent emails are 
all tools from the same category, tools that address the problem defined 
above. If anybody is interested in this problem, 17 of these tools have 
been recently reviewed in: 
http://vortex.cs.wayne.edu/papers/Ontological_analysis.pdf. The GO tools 
page includes pointers to many if not all such tools.

The question at hand here is how to quantify the specificity of a given 
term. This is independent of any experiment and any set of 
differentially regulated genes and has to do with the structure of the 
GO and the position of the given term in the DAG. For instance, 
"regulation of apoptosis through extracellular signals" is more specific 
than "regulation of apoptosis" or "apoptosis". The problem is how to 
numerically quantify this specificity. To my knowledge, there is no 
tools of any kind that would even remotely provide any quantitative 
assessment of this specificity. Any answers or thoughts on this issue 
would be very valuable.

Regards,

Sorin



Stan Dong wrote:
> Another tool is the GO-TermFinder by Gavin Sherlock. I believe there 
> is interest for Amigo to incorporate this tool.
>
>     http://search.cpan.org/dist/GO-TermFinder/
>
> SGD has been using it with great satisfaction from our users. You may 
> check the SGD page to get some sense of a use case.
>
>     http://db.yeastgenome.org/cgi-bin/GO/goTermFinder
>
> -Stan
>
> On Apr 17, 2007, at 9:36 PM, Paul Shannon wrote:
>
>> The Bioconductor project has, I believe, a fine solution to this 
>> problem -- though
>> please forgive me if I have misconstrued things.   The relevant 
>> packages (see
>> below) use the Hypergeometric distribution to calculate a p-value for 
>> the
>> enrichment of any GO node for the genes in question.  I typically map 
>> proteins
>> to GeneID's as the first step in my analysis.
>>
>> If this sounds like it addresses your problem, you may wish to take a 
>> look at
>>
>>    http://bioconductor.org/packages/1.9/bioc/html/GOstats.html   and
>>    http://bioconductor.org/packages/1.9/bioc/html/Category.html
>>
>> Each of these web pages contains a 'vignette' in a pdf file which 
>> makes for
>> a good introduction to the methods.
>>
>> Though orginally conceived in the context of microarrays, I use these 
>> packages
>> quite fruitfully with proteomics data.
>>
>>  - Paul
>>
>>
>>>> I am working on a project that involves curation of protein data that
>>>> includes GO terms, and it would be very helpful if I had some
>>>> numerical quantification of the specificity of each term.  It is
>>>> possible to manually examine each term to determine this specificity,
>>>> but because there is a large amount of data, I would like to automate
>>>> the process.  I understand that there is no reliable way to do this
>>>> simply using the level in the DAG hierarchy, but I am wondering if any
>>>> of you might have a work-around.
>>
>> -- 
>> This message is from the GOFriends moderated mailing list.  A list of 
>> public
>> announcements and discussion of the Gene Ontology (GO) project.
>> Problems with the list?           E-mail: 
>> owner-gofriends at geneontology.org
>> Subscribing   send   "subscribe"   to   
>> gofriends-request at geneontology.org
>> Unsubscribing send   "unsubscribe"  to  
>> gofriends-request at geneontology.org
>> Web:          http://www.geneontology.org/
>
>
> -- 
> This message is from the GOFriends moderated mailing list.  A list of 
> public
> announcements and discussion of the Gene Ontology (GO) project.
> Problems with the list?           E-mail: 
> owner-gofriends at geneontology.org
> Subscribing   send   "subscribe"   to   
> gofriends-request at geneontology.org
> Unsubscribing send   "unsubscribe"  to  
> gofriends-request at geneontology.org
> Web:          http://www.geneontology.org/
>

-- 
Sorin Draghici, Ph.D. 


Director of the Bioinformatics Core, Karmanos Cancer Institute

Associate Professor     Tel: (313) 577-5484
Dept. of Computer Science   Fax: (313) 577-6868
Wayne State University
5143 Cass Ave, Room 431 State Hall, 
Detroit, MI, 48202
WWW: http://vortex.cs.wayne.edu/Sorin/ (personal)
WWW: http://vortex.cs.wayne.edu/Projects.html (lab)


Check out my recent book: Data Analysis Tools for Microarrays:
http://www.crcpress.com/shopping_cart/products/product_detail.asp?sku=C3154&
parent_id=&pc=



--
This message is from the GOFriends moderated mailing list.  A list of public
announcements and discussion of the Gene Ontology (GO) project.
Problems with the list?           E-mail: owner-gofriends at geneontology.org
Subscribing   send   "subscribe"   to   gofriends-request at geneontology.org
Unsubscribing send   "unsubscribe"  to  gofriends-request at geneontology.org
Web:          http://www.geneontology.org/


--
This message is from the GOFriends moderated mailing list.  A list of public
announcements and discussion of the Gene Ontology (GO) project.
Problems with the list?           E-mail: owner-gofriends at geneontology.org
Subscribing   send   "subscribe"   to   gofriends-request at geneontology.org
Unsubscribing send   "unsubscribe"  to  gofriends-request at geneontology.org
Web:          http://www.geneontology.org/



More information about the go-friends mailing list