Search Mailing List Archives


Limit search to: Subject & Body Subject Author
Sort by: Reverse Sort
Limit to: All This Week Last Week This Month Last Month
Select Date Range     through    

[Gofriends] matrices from terms

Gabriel Berriz gberriz at hms.harvard.edu
Tue Nov 18 06:23:00 PST 2008


On 2008.11.14 Fri, at 14:06, Chris Mungall wrote:
>
> The generally recommended way to do this is via the graph_path table
> in the GO database. You can query either a local installation, or a
> public mirror, via
> http://berkeleyop.org/goose. The documentation for this table is here:
>        http://www.geneontology.org/GO.database.schema.shtml#go-optimisations.table.graph-path
>
> However, at this time this table contains the transitive closure
> computed by considering all edges, ignoring edge labels (relations).
> Thus it does not satisfy your requirement that only the is_a relation
> is considered


Chris, couldn't one get the information Nicholas wants directly from  
the table term2term in the GO database?  Unlike the graph_path table,  
this table does provide edge labels.  One could use a recursive  
descendants function together with a children function that returns  
only is_a children (from term2term).  In Perl-ish it would look like  
this:

sub descendants {
   my $node = shift;

   my @descendants = ( $node );

   for my $child ( children( $node ) ) {
     push @descendants, descendants( $child );
   }

   return @descendants;
}

In English, the descendants of a node is the union of the node's  
children's descendants, plus the node itself.  (Here we follow the  
convention, also followed by the graph_path table, of including the  
node among its descendants.)

This is just a sketch.  As written it returns a list of descendants  
that will generally contain duplicates.  It would be better if these  
duplicates were removed.  Also, its performance can be optimized  
significantly by avoiding unnecessary recursion through some form of  
memoization.

The children function used by the descendants function would be  
implemented using something like the following SQL query:

SELECT term2_id FROM term2term tt, term t WHERE  
tt.relationship_type_id = t.id AND t.acc = 'is_a' AND tt.term1_id = ?;




Gabriel




=============================================================
Gabriel F. Berriz, PhD
Bioinformatics Developer
Roth Lab
Biological Chemistry and Molecular Pharmacology -- Harvard Medical  
School
Seeley G. Mudd Building 322B
Boston, MA 02115-5701
Telephone: 617.432.3555
Fax: 617.432.3557



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.stanford.edu/pipermail/go-friends/attachments/20081118/4ecf3056/attachment.html>


More information about the go-friends mailing list