Search Mailing List Archives


Limit search to: Subject & Body Subject Author
Sort by: Reverse Sort
Limit to: All This Week Last Week This Month Last Month
Select Date Range     through    

Problems with gene association files

Gabriel Berriz gberriz at hms.harvard.edu
Mon Jul 1 13:48:27 PDT 2002


Hi, Karen.  Thanks for the clarification.

>Because we all felt so strongly that the unknown terms are useful to
>represent that the research community does not know what a given gene
>does, we also came up with a procedure to standardize use of the 3
>unknown terms, so that all the groups should be doing something fairly
>consistent in their use of these terms. As a result of this
>discussion, the date field was added in order to provide a time
>context for annotations to any of the 3 unknown terms.

There must be a significant difference between the way SGD handles these 
attributions and the way MGD, FB, and WB handle them, because about 1/3 of 
all the SGD associations are to the unknown attribs, whereas for the other 
three organisms, the fraction is under 3%.  In fact, for FB, not a single 
association has been made to these attributes.  The total numbers of 
associations for these organisms are approximately 20K (SGD), 38K (MGD), 
21K (FB), and 23K (WB). (We have not looked at any of the other 
gene_association.xxx files.)

>For SGD, we have a script that runs nightly to pick up both obsoleted
>or synonymous (deprecated) GOids. Several of us SGD curators get this
>email and are responsible for manually fixing them. These GOids
>disappear from the associations file within a couple days of being
>identified.
>
>These GOids are *not* automatically deleted or transferred because,
>particularly with obsoletes, there is not a computational way to
>reassign a correct GOid. Even with synonymous GOids, we cannot make
>the transfer computationally, because historically, an original GOid
>that gets split into two separate terms gets made synonymous with both
>new, non-equivalent, terms as part of the mechanism of GOid
>tracking. We have discussed that this is confusing, but until and
>unless we change this, it is impossible to computationally assign a
>correct new GOid for synonymous GOids.

If this is the case, we must we doing something wrong here, because we 
perform the appropriate clean-up automatically.  In the case of deprecated 
synonyms, our script generates the synonyms from the *.ontology 
files.  E.g., for the line

  %chaperone ; GO:0003754, GO:0003757, GO:0003758, GO:0003760, GO:0003761

the script sets GO:0003754 as the correct id, and GO:0003757, GO:0003758, 
GO:0003760, and GO:0003761 as deprecated synonyms.  Whenever an association 
is found that uses one of the deprecated synonyms, the script simply 
replaces it with the correct id.  Is this an incorrect interpretation of 
the ordering in this list?

For obsoleted attributes, again, we use the information in the *.ontology 
file to determine whether an attribute in an association is a descendant of 
one of the attributes called "obsolete" in one of the three main 
branches.  If it is, we discard the association.  It seems to me that the 
same approach could be used to automatically weed out the obsolete 
associations from a gene associations file.

Gabriel


--
This message is from the GOFriends moderated mailing list.  A list of public
announcements and discussion of the Gene Ontology (GO) project.
Problems with the list?           E-mail: owner-gofriends at geneontology.org
Subscribing   send   "subscribe"   to   gofriends-request at geneontology.org
Unsubscribing send   "unsubscribe"  to  gofriends-request at geneontology.org
Web:          http://www.geneontology.org/



More information about the go-friends mailing list