Search Mailing List Archives


Limit search to: Subject & Body Subject Author
Sort by: Reverse Sort
Limit to: All This Week Last Week This Month Last Month
Select Date Range     through    

Quantifying Specificity of GO Terms

Rubio, Angel arubio at ceit.es
Fri Apr 20 00:34:21 PDT 2007


Some years ago, my group compared the correlation between gene expression and different versions of semantic similarity. We found it that the Resnik similarity measure (already used by Dr. Lord comparing sequence and functions) outperformed other measures also based on a corpus for the three categories (BP, MF and CC).
Indeed, in our case these other measures (Lin and Jiang) did not perform well at all.
Resnik similarity measure is easy to evaluate:

Resnik(GeneProduct1, GeneProduct2) = -log(ni/nt)

Where
ni: number of gene products in the corpus annotated for the common ancestor of the annotations of a pair of gene products (it seems a sort of tongue twister!).
nt: total number of gene products.

I expect that it helps.

-----Original Message-----
From: owner-gofriends at genome.stanford.edu [mailto:owner-gofriends at genome.stanford.edu] On Behalf Of Phillip Lord
Sent: Thursday, April 19, 2007 1:09 PM
To: tobias at sfsu.edu
Cc: gofriends at genome.stanford.edu
Subject: Re: Quantifying Specificity of GO Terms

>>>>> "TS" == Tobias Sayre <tobias at sfsu.edu> writes:

  TS> Dear GO Friends,

  TS> I am working on a project that involves curation of protein data
  TS> that includes GO terms, and it would be very helpful if I had
  TS> some numerical quantification of the specificity of each term.
  TS> It is possible to manually examine each term to determine this
  TS> specificity, but because there is a large amount of data, I
  TS> would like to automate the process.  I understand that there is
  TS> no reliable way to do this simply using the level in the DAG
  TS> hierarchy, but I am wondering if any of you might have a
  TS> work-around.


There are basically two ways. Information content or GO structure.

Information content works fine, but depends on having a corpus. This
exists for GO, of course, but it's hard to determine what corpus you
should use. So, if you are comparing GO terms for proteins between
human and yeast, should you use SGD? Or Swissprot?

Structure based techniques are myriad and probably more common. They
tend to be less computationally intensive, because they only need a
structure, while information content needs a structure and a
corpus. Some are based on "level", but this is not great. GO is a DAG
and not a tree, and so doesn't really have levels. To my mind level
based approaches are treating the DAG nature of GO as an embarrasement
rather that a feature. Not all structure based techniques are level
based though.


If I may be so bold, and express my untried, unproven and generally
dubious opinion here, my own feeling is that, in practise, it doesn't
actually matter that much. Most measures of specificity give a result
which looks sort of correct. The parts of GO with highest information
content *tend* to be the "deepest" in terms of maximum level and vice
versa.

In all the papers I have read on specificity (or, more generally,
similarity of which there have been more papers, but which is highly
related), authors have tested against some gold standard, or applied
to a specific application. In my papers, I used sequence similarity,
for instance. And as far as I can tell, there is no real clear
winner; different authors showed that different measures were better
for different things.

Wow, talking about sitting on the fence!

Phil


--
This message is from the GOFriends moderated mailing list.  A list of public
announcements and discussion of the Gene Ontology (GO) project.
Problems with the list?           E-mail: owner-gofriends at geneontology.org
Subscribing   send   "subscribe"   to   gofriends-request at geneontology.org
Unsubscribing send   "unsubscribe"  to  gofriends-request at geneontology.org
Web:          http://www.geneontology.org/


--
This message is from the GOFriends moderated mailing list.  A list of public
announcements and discussion of the Gene Ontology (GO) project.
Problems with the list?           E-mail: owner-gofriends at geneontology.org
Subscribing   send   "subscribe"   to   gofriends-request at geneontology.org
Unsubscribing send   "unsubscribe"  to  gofriends-request at geneontology.org
Web:          http://www.geneontology.org/



More information about the go-friends mailing list