Search Mailing List Archives

Limit search to: Subject & Body Subject Author
Sort by: Reverse Sort
Limit to: All This Week Last Week This Month Last Month
Select Date Range     through    

Quantifying Specificity of GO Terms

Tobias Sayre tobias at
Thu Apr 19 19:35:14 PDT 2007

GO Friends,

Thank you so much for all of your answers to my question.  I will probably
use some technique that measures the representation of the use of the GO
term among all GO terms.

Thanks again,


On 4/19/07, Phillip Lord <phillip.lord at> wrote:
> >>>>> "TS" == Tobias Sayre <tobias at> writes:
>   TS> Dear GO Friends,
>   TS> I am working on a project that involves curation of protein data
>   TS> that includes GO terms, and it would be very helpful if I had
>   TS> some numerical quantification of the specificity of each term.
>   TS> It is possible to manually examine each term to determine this
>   TS> specificity, but because there is a large amount of data, I
>   TS> would like to automate the process.  I understand that there is
>   TS> no reliable way to do this simply using the level in the DAG
>   TS> hierarchy, but I am wondering if any of you might have a
>   TS> work-around.
> There are basically two ways. Information content or GO structure.
> Information content works fine, but depends on having a corpus. This
> exists for GO, of course, but it's hard to determine what corpus you
> should use. So, if you are comparing GO terms for proteins between
> human and yeast, should you use SGD? Or Swissprot?
> Structure based techniques are myriad and probably more common. They
> tend to be less computationally intensive, because they only need a
> structure, while information content needs a structure and a
> corpus. Some are based on "level", but this is not great. GO is a DAG
> and not a tree, and so doesn't really have levels. To my mind level
> based approaches are treating the DAG nature of GO as an embarrasement
> rather that a feature. Not all structure based techniques are level
> based though.
> If I may be so bold, and express my untried, unproven and generally
> dubious opinion here, my own feeling is that, in practise, it doesn't
> actually matter that much. Most measures of specificity give a result
> which looks sort of correct. The parts of GO with highest information
> content *tend* to be the "deepest" in terms of maximum level and vice
> versa.
> In all the papers I have read on specificity (or, more generally,
> similarity of which there have been more papers, but which is highly
> related), authors have tested against some gold standard, or applied
> to a specific application. In my papers, I used sequence similarity,
> for instance. And as far as I can tell, there is no real clear
> winner; different authors showed that different measures were better
> for different things.
> Wow, talking about sitting on the fence!
> Phil
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the go-friends mailing list