Search Mailing List Archives

Limit search to: Subject & Body Subject Author
Sort by: Reverse Sort
Limit to: All This Week Last Week This Month Last Month
Select Date Range     through    

[go-friends] Is there a standard format for GO term enrichment results?

Chris Mungall cjmungall at
Thu Aug 18 09:23:45 PDT 2011

On Aug 17, 2011, at 11:25 AM, Robinson, Peter wrote:

> Hi Chris,
> I assume that with the following you mean the study set (e.g., diff regulated genes) and the population set (e.g., all genes in microarray or genome)

Yes, that's right

> # Input token list + token type (e.g. symbol)
> # Background token list + token type (if provided)
> In that case, given the ontology and association files, it is possible to reconstruct the mapping and list of unmatched tokens and this does not need to be part of the output format explicitly:
> #Token-gene ID mapping (plus unmatched tokens)
> Or have I misunderstood what you mean and the above is meant to apply to tools that also do extra work to match wierd tokens to gene symbols?

Many tools do perform some additional ID mapping. I thought it might be useful to capture this, but this may be overspecifying things. As Casey pointed out on the biostar thread, we may want other kinds of inputs, and we might want to support methods such as GSEA

> #Algorithm parameters (cut-offs, algorithm selected, etc)
> => This is difficult given the wide range of algorithms. Does OBI have a vocabulary to cover everything? 

Probably not at the moment, but I don't see why it or some extension could not cover everything

> => Probably it would be good to have a separate field for the multiple-testing correction used and the methods etc.

Yes, good point.

> #List of results - for each result:
>    * term ID
>    * optional term metadata
>    * list of gene IDs (+ optional gene metadata)
>    * scoring metadata (p-vals, rank, etc)
> => Here, there is no absolute reason to include a list of genes for each term, as this can be reconstructed assuming that everything else in the file is correct and comprehensive.

If a versioned GAF is referenced then yes, everything can be reconstructed. However, it can be useful to report the gene IDs at this stage. Also it highlights if tools are doing different things to construct the list of genes annotated to a given term - some may not use the graph at all, some may use certain relationships.

> In any case, this would be a useful thing to have and we would add such a feature to the ontologizer.

Great! I know you and Sebastian have already given this some thought, thanks.

> best wishes Peter
> PD Dr. med. Peter N. Robinson, MSc.
> Institut für Medizinische Genetik
> Charité - Universitätsmedizin Berlin
> Augustenburger Platz 1
> 13353 Berlin
> Germany
> +4930 450566042
> peter.robinson at
> ________________________________________
> Von: go-friends-bounces at [go-friends-bounces at] im Auftrag von Chris Mungall [cjmungall at]
> Gesendet: Mittwoch, 17. August 2011 20:10
> An: gofriends friends
> Betreff: [go-friends] Is there a standard format for GO term enrichment results?
> Hi gofriends,
> Within the GO consortium we are considering providing a unified interface to a number of heterogeneous term enrichment tools. In order to facilitate this we would need to have a standard output format and/or API for the tool results. I think the standardizing on a format would be easier than an API.
> I asked this question on the biostar forum:
> But I thought I would also ask here, as I know a number of tool providers are subscribed.
> Thanks
> Chris
> _______________________________________________
> go-friends mailing list
> go-friends at

More information about the go-friends mailing list