Search Mailing List Archives
Problems with gene association files
Gabriel Berriz
gberriz at hms.harvard.edu
Mon Jul 1 13:48:27 PDT 2002
Hi, Karen. Thanks for the clarification.
>Because we all felt so strongly that the unknown terms are useful to
>represent that the research community does not know what a given gene
>does, we also came up with a procedure to standardize use of the 3
>unknown terms, so that all the groups should be doing something fairly
>consistent in their use of these terms. As a result of this
>discussion, the date field was added in order to provide a time
>context for annotations to any of the 3 unknown terms.
There must be a significant difference between the way SGD handles these
attributions and the way MGD, FB, and WB handle them, because about 1/3 of
all the SGD associations are to the unknown attribs, whereas for the other
three organisms, the fraction is under 3%. In fact, for FB, not a single
association has been made to these attributes. The total numbers of
associations for these organisms are approximately 20K (SGD), 38K (MGD),
21K (FB), and 23K (WB). (We have not looked at any of the other
gene_association.xxx files.)
>For SGD, we have a script that runs nightly to pick up both obsoleted
>or synonymous (deprecated) GOids. Several of us SGD curators get this
>email and are responsible for manually fixing them. These GOids
>disappear from the associations file within a couple days of being
>identified.
>
>These GOids are *not* automatically deleted or transferred because,
>particularly with obsoletes, there is not a computational way to
>reassign a correct GOid. Even with synonymous GOids, we cannot make
>the transfer computationally, because historically, an original GOid
>that gets split into two separate terms gets made synonymous with both
>new, non-equivalent, terms as part of the mechanism of GOid
>tracking. We have discussed that this is confusing, but until and
>unless we change this, it is impossible to computationally assign a
>correct new GOid for synonymous GOids.
If this is the case, we must we doing something wrong here, because we
perform the appropriate clean-up automatically. In the case of deprecated
synonyms, our script generates the synonyms from the *.ontology
files. E.g., for the line
%chaperone ; GO:0003754, GO:0003757, GO:0003758, GO:0003760, GO:0003761
the script sets GO:0003754 as the correct id, and GO:0003757, GO:0003758,
GO:0003760, and GO:0003761 as deprecated synonyms. Whenever an association
is found that uses one of the deprecated synonyms, the script simply
replaces it with the correct id. Is this an incorrect interpretation of
the ordering in this list?
For obsoleted attributes, again, we use the information in the *.ontology
file to determine whether an attribute in an association is a descendant of
one of the attributes called "obsolete" in one of the three main
branches. If it is, we discard the association. It seems to me that the
same approach could be used to automatically weed out the obsolete
associations from a gene associations file.
Gabriel
--
This message is from the GOFriends moderated mailing list. A list of public
announcements and discussion of the Gene Ontology (GO) project.
Problems with the list? E-mail: owner-gofriends at geneontology.org
Subscribing send "subscribe" to gofriends-request at geneontology.org
Unsubscribing send "unsubscribe" to gofriends-request at geneontology.org
Web: http://www.geneontology.org/
More information about the go-friends
mailing list