Search Mailing List Archives


Limit search to: Subject & Body Subject Author
Sort by: Reverse Sort
Limit to: All This Week Last Week This Month Last Month
Select Date Range     through    

Quantifying Specificity of GO Terms

Nicholas Mitsakakis n.mitsakakis at utoronto.ca
Fri Apr 20 11:06:06 PDT 2007


Angel,

I think I came across your work on comparing correlation values from 
gene expression data and different semantic similarities. I wonder if 
you and your group tried using partial correlations instead. Let me know 
if you know anything about this.

Thanks,

Nicholas

Rubio, Angel wrote:

>Some years ago, my group compared the correlation between gene expression and different versions of semantic similarity. We found it that the Resnik similarity measure (already used by Dr. Lord comparing sequence and functions) outperformed other measures also based on a corpus for the three categories (BP, MF and CC).
>Indeed, in our case these other measures (Lin and Jiang) did not perform well at all.
>Resnik similarity measure is easy to evaluate:
>
>Resnik(GeneProduct1, GeneProduct2) = -log(ni/nt)
>
>Where
>ni: number of gene products in the corpus annotated for the common ancestor of the annotations of a pair of gene products (it seems a sort of tongue twister!).
>nt: total number of gene products.
>
>I expect that it helps.
>
>-----Original Message-----
>From: owner-gofriends at genome.stanford.edu [mailto:owner-gofriends at genome.stanford.edu] On Behalf Of Phillip Lord
>Sent: Thursday, April 19, 2007 1:09 PM
>To: tobias at sfsu.edu
>Cc: gofriends at genome.stanford.edu
>Subject: Re: Quantifying Specificity of GO Terms
>
>  
>
>>>>>>"TS" == Tobias Sayre <tobias at sfsu.edu> writes:
>>>>>>            
>>>>>>
>
>  TS> Dear GO Friends,
>
>  TS> I am working on a project that involves curation of protein data
>  TS> that includes GO terms, and it would be very helpful if I had
>  TS> some numerical quantification of the specificity of each term.
>  TS> It is possible to manually examine each term to determine this
>  TS> specificity, but because there is a large amount of data, I
>  TS> would like to automate the process.  I understand that there is
>  TS> no reliable way to do this simply using the level in the DAG
>  TS> hierarchy, but I am wondering if any of you might have a
>  TS> work-around.
>
>
>There are basically two ways. Information content or GO structure.
>
>Information content works fine, but depends on having a corpus. This
>exists for GO, of course, but it's hard to determine what corpus you
>should use. So, if you are comparing GO terms for proteins between
>human and yeast, should you use SGD? Or Swissprot?
>
>Structure based techniques are myriad and probably more common. They
>tend to be less computationally intensive, because they only need a
>structure, while information content needs a structure and a
>corpus. Some are based on "level", but this is not great. GO is a DAG
>and not a tree, and so doesn't really have levels. To my mind level
>based approaches are treating the DAG nature of GO as an embarrasement
>rather that a feature. Not all structure based techniques are level
>based though.
>
>
>If I may be so bold, and express my untried, unproven and generally
>dubious opinion here, my own feeling is that, in practise, it doesn't
>actually matter that much. Most measures of specificity give a result
>which looks sort of correct. The parts of GO with highest information
>content *tend* to be the "deepest" in terms of maximum level and vice
>versa.
>
>In all the papers I have read on specificity (or, more generally,
>similarity of which there have been more papers, but which is highly
>related), authors have tested against some gold standard, or applied
>to a specific application. In my papers, I used sequence similarity,
>for instance. And as far as I can tell, there is no real clear
>winner; different authors showed that different measures were better
>for different things.
>
>Wow, talking about sitting on the fence!
>
>Phil
>
>
>--
>This message is from the GOFriends moderated mailing list.  A list of public
>announcements and discussion of the Gene Ontology (GO) project.
>Problems with the list?           E-mail: owner-gofriends at geneontology.org
>Subscribing   send   "subscribe"   to   gofriends-request at geneontology.org
>Unsubscribing send   "unsubscribe"  to  gofriends-request at geneontology.org
>Web:          http://www.geneontology.org/
>
>
>--
>This message is from the GOFriends moderated mailing list.  A list of public
>announcements and discussion of the Gene Ontology (GO) project.
>Problems with the list?           E-mail: owner-gofriends at geneontology.org
>Subscribing   send   "subscribe"   to   gofriends-request at geneontology.org
>Unsubscribing send   "unsubscribe"  to  gofriends-request at geneontology.org
>Web:          http://www.geneontology.org/
>  
>


-- 
Nicholas Mitsakakis
PhD Candidate - Biostatistics
Department of Public Health Sciences
University of Toronto


--
This message is from the GOFriends moderated mailing list.  A list of public
announcements and discussion of the Gene Ontology (GO) project.
Problems with the list?           E-mail: owner-gofriends at geneontology.org
Subscribing   send   "subscribe"   to   gofriends-request at geneontology.org
Unsubscribing send   "unsubscribe"  to  gofriends-request at geneontology.org
Web:          http://www.geneontology.org/



More information about the go-friends mailing list