Search Mailing List Archives

Limit search to: Subject & Body Subject Author
Sort by: Reverse Sort
Limit to: All This Week Last Week This Month Last Month
Select Date Range     through    

Quantifying Specificity of GO Terms

Phillip Lord phillip.lord at
Thu Apr 19 04:08:36 PDT 2007

>>>>> "TS" == Tobias Sayre <tobias at> writes:

  TS> Dear GO Friends,

  TS> I am working on a project that involves curation of protein data
  TS> that includes GO terms, and it would be very helpful if I had
  TS> some numerical quantification of the specificity of each term.
  TS> It is possible to manually examine each term to determine this
  TS> specificity, but because there is a large amount of data, I
  TS> would like to automate the process.  I understand that there is
  TS> no reliable way to do this simply using the level in the DAG
  TS> hierarchy, but I am wondering if any of you might have a
  TS> work-around.

There are basically two ways. Information content or GO structure. 

Information content works fine, but depends on having a corpus. This
exists for GO, of course, but it's hard to determine what corpus you
should use. So, if you are comparing GO terms for proteins between
human and yeast, should you use SGD? Or Swissprot? 

Structure based techniques are myriad and probably more common. They
tend to be less computationally intensive, because they only need a
structure, while information content needs a structure and a
corpus. Some are based on "level", but this is not great. GO is a DAG
and not a tree, and so doesn't really have levels. To my mind level
based approaches are treating the DAG nature of GO as an embarrasement
rather that a feature. Not all structure based techniques are level
based though. 

If I may be so bold, and express my untried, unproven and generally
dubious opinion here, my own feeling is that, in practise, it doesn't
actually matter that much. Most measures of specificity give a result
which looks sort of correct. The parts of GO with highest information
content *tend* to be the "deepest" in terms of maximum level and vice

In all the papers I have read on specificity (or, more generally,
similarity of which there have been more papers, but which is highly
related), authors have tested against some gold standard, or applied
to a specific application. In my papers, I used sequence similarity,
for instance. And as far as I can tell, there is no real clear
winner; different authors showed that different measures were better
for different things. 

Wow, talking about sitting on the fence!


This message is from the GOFriends moderated mailing list.  A list of public
announcements and discussion of the Gene Ontology (GO) project.
Problems with the list?           E-mail: owner-gofriends at
Subscribing   send   "subscribe"   to   gofriends-request at
Unsubscribing send   "unsubscribe"  to  gofriends-request at

More information about the go-friends mailing list