Search Mailing List Archives

Limit search to: Subject & Body Subject Author
Sort by: Reverse Sort
Limit to: All This Week Last Week This Month Last Month
Select Date Range     through    

how to define GO groups with a certain size boundary

Albert Vilella avilella at
Tue Nov 8 06:55:37 PST 2005

Hi all go-dev-elopers and gofriends,

> >> I am trying to split a large subset of the GO-annotated Flybase
> >> into GO groups to use them as categories for my analyses.
> >> 
> >> But I would like to try and merge the groups that have too few
> >> into their upper GO level and split those that accumulate too
> >> much genes into their lower GO level.
> >> 
> >> For what I have seen in most of the published articles, this is
> >> the other way around of what most people (for example, in the
> >> microarrays field) is doing: performing an analysis to obtain a
> >> of genes, then look for enrichments in the GO DB distribution of
> >> ids.
> >> 
> >> I would like to ask if anybody has stumbled over this situation
> >> or if anyone has any suggestion about how to do this.
> >> 
> >> It is worth to mention that this merge/split trick was used in the
> >> paper of the chimp genome ("Initial sequence of the chimpanzee
> >> and comparison with the human genome" (Nature)) although they did
> >> "a posteriori" only in the categories that showed a significant
> >> diference to a given analysis.

El dt 08 de 11 del 2005 a les 13:45 +0000, en/na Jane Lomax va escriure:

> Hi Albert - as far as know there isn't an easy way to do this, but
> explain the long way to do it. It may be worth emailing GO friends 
> (gofriends at though, as there are often tools out
> that we don't yet know about.
> So a way you could do it would be to create your own set of high-level
> categories (called a GO slim, see 
>, and then use this to sort
> your annotation set into those categories. You can create a GO slim
> DAG-Edit ( - there
> some instructions on doing this here: 
> (I will get round to making proper documentation for doing this soon,
> promise!). Then you can use the Perl script map2slim 
> ( to 'bucket' your 
> annotations into your categories - it may be easier for you to use a 
> web-based implementation of this script e.g. Generic GO Term Mapper
> (
> The difficult thing will be that you will have to keep adjusting your 
> GO slim set and re-running the mapping script to see which categories
> too many annotations until you get the correct balance. It would be
> to have a tool to do this automatically.

Yes, from my humble biological-degree-not-graph-theory-guru
background, I understand that the tricky bit of this stuff is to
balance the categories: split the large GOids into their childs and
merge the small GOids into their parent.

For the tiny amount of investigation I made, I believe that if GO were
a cyclic graph, this would be a challenging algorithm. 

But with a DAG with the characteristics of GO, I bet there must be a
graph theory guru that can enlighten me on this issue with an
algorithm that does this adjustments.

Manually balancing the DAG is worth the try.

Anyone any hint?



This message is from the GOFriends moderated mailing list.  A list of public
announcements and discussion of the Gene Ontology (GO) project.
Problems with the list?           E-mail: owner-gofriends at
Subscribing   send   "subscribe"   to   gofriends-request at
Unsubscribing send   "unsubscribe"  to  gofriends-request at

More information about the go-friends mailing list