Search Mailing List Archives


Limit search to: Subject & Body Subject Author
Sort by: Reverse Sort
Limit to: All This Week Last Week This Month Last Month
Select Date Range     through    

[go-friends] [go] ANNOUNCEMENT: Retire gp2protein files

Suzi Aleksander suzia at stanford.edu
Fri Jan 17 15:58:07 PST 2020


Forwarding a message from Suzi L, which I think got caught in our mailing list filters...

________________________________
From: Suzanna Lewis <selewis at lbl.gov>
Sent: Friday, January 17, 2020 1:18 PM
To: Stacia R Engel <stacia at stanford.edu>
Cc: Pascale Gaudet <pascale.gaudet at sib.swiss>; go-consortium at lists.stanford.edu <go-consortium at mailman.stanford.edu>
Subject: Re: [go] [go-friends] ANNOUNCEMENT: Retire gp2protein files

Chiming in, which I do with some reluctance, but just to clarify a few things.

1. The gp2[annotatableObject] files came into existence prior to "RefGenome". They date back almost to the beginnings of GO. See 20011013_Chicago.pdf minutes. The (my) motivation for generating these was because very often there simply was no way to figure out what protein an annotation was referring to. This was needed largely for BLAST usage since so many early tools wanted to BLAST against the GO protein sets. Annotation, then as now, needs to specify the  entity being described as precisely as possible.

But that is just a historical correction, not directly pertinent today. However...

2. For any sort of "mapping" file there are by definition two parties involved (at a minimum). These are the two groups who are responsible for generating the respective set of identifiers. In the gp2[annotatableObject] case these two parties are the MOD(aka Alliance) and UniProt.

3. What happened at the first QfO/RefGenome meeting was that UniProt stepped up to help with UniProt IDs (after realizing that their help was needed) and offered to maintain the files working in -conjunction- with the MOD (whatever the file name, it amounts to the same thing, which is information connecting MOD ids to UniProt ids). The point is that, regardless of where these are physically kept, both sides need to be involved in the file maintenance. It simply is impossible for one group alone to do it without consulting with the other side.

Consequently,

4. A utility/mapping file is still required by all parties (UniProt, Q4O, GO, community, etc). And it is fine to "retire" gp2[annotatableObject] files from GO submission requirements, but only with the caveat that MOD representatives must remain cognizant of the files at UniProt and work with UniProt to ensure the accuracy of the mappings. Further it would be helpful to the community to have links from the GO/Alliance site to where the mapping files can be found at UniProt.

(here's hoping I've been clear enough not to generate even more on this thread)

All the best, S


On Fri, Jan 17, 2020 at 2:17 PM Stacia R Engel <stacia at stanford.edu<mailto:stacia at stanford.edu>> wrote:
Hi Pascale,
thanks for this.

all i’m saying is simply that a gp2[annotatableObject] identifier mapping is a need in the Alliance, and that GOC seems uniquely qualified to provide this type of work product for the Alliance in a more efficient and comprehensive manner than any of the other workgroups.  and it doesn’t make sense for the Alliance to have to find a solution from scratch.  we’re already doing a lot of spinning of wheels when it comes to identifiers.  that’s all.  it’s just a suggestion tying a known need to a possible expert source.

within the Alliance, GOC are the premier experts at handling identifiers for annotatable objects, and as a founding member of the Alliance, it’s appropriate for GOC to provide this to the Alliance.

it’s just an idea.  just thinking out loud.

stacia


On Jan 17, 2020, at 10:59 AM, Pascale Gaudet <pascale.gaudet at sib.swiss<mailto:pascale.gaudet at sib.swiss>> wrote:

Hi Stacia,

The gp2protein files were first requested by the RefGenome project, and have now been replaced by UniProt Reference Proteomes. I dont know of another use for the gp2protein file.

The files are not in our GO releases. GOA has a few of them on its ftp site:
chicken
geneid
human
pseudocap
refseq
unigene
uniprot

They are in the old GO SVN site: http://viewvc.geneontology.org/viewvc/GO-SVN/trunk/gp2protein/
The only one that is recent in RGD; others date bewteen 8 months and 7 years. As far as I know, all species we have those files for are in the  UniProt Reference Proteomes set.

Given that this is so incomplete, and nobody has notified us, it seems unlikely that these files are used.

Thanks, Pascale



On 16 Jan 2020, at 18:41, Stacia R Engel <stacia at stanford.edu<mailto:stacia at stanford.edu>> wrote:

It seems this would be a useful work product for GOC, as a founding member, to provide for the Alliance.  Is this retirement a done deal?  If so, perhaps it’s worth a discussion about how to provide this info in a different way.  The topic of identifier mapping has come up often as a need within the Data Quartermasters working group.

stacia

On Jan 16, 2020, at 7:31 AM, Van Auken, Kimberly M. <vanauken at caltech.edu<mailto:vanauken at caltech.edu>> wrote:


Dear GO Friends,

The GOC would like to retire the gp2protein files that map gene identifiers from databases, e.g. MGI or WB, to the canonical reference protein in UniProtKB.

These files have been generated by groups contributing annotations to the GOC.

We plan to retire these files in ~ one month's time, February 15th.

We encourage anyone who may still be using these files to contact us (help at geneontology.org<mailto:help at geneontology.org>) to discuss alternative ways of obtaining these mappings (e.g. gene production information (gpi) files).

You may also comment on the corresponding github ticket:

geneontology/go-site#1324<https://github.com/geneontology/go-site/issues/1324>

Thank you.

-- Kimberly Van Auken
-- Pascale Gaudet

_______________________________________________
go-friends mailing list
go-friends at lists.stanford.edu<mailto:go-friends at lists.stanford.edu>
https://mailman.stanford.edu/mailman/listinfo/go-friends



-----------------------------------------------------
Stacia R. Engel, Ph.D.
Principal Biocuration Scientist
Program Coordinator, Alliance of Genome Resources
www.alliancegenome.org<http://www.alliancegenome.org/>
Group Leader, Biocuration, Saccharomyces Genome Database
www.yeastgenome.org<http://www.yeastgenome.org/>
Department of Genetics
Stanford University
Palo Alto, CA 94304-5477 USA
stacia at stanford.edu<mailto:stacia at stanford.edu>
-----------------------------------------------------


_______________________________________________
go-consortium mailing list
go-consortium at lists.stanford.edu<mailto:go-consortium at lists.stanford.edu>
https://mailman.stanford.edu/mailman/listinfo/go-consortium


_______________________________________________
go-consortium mailing list
go-consortium at lists.stanford.edu<mailto:go-consortium at lists.stanford.edu>
https://mailman.stanford.edu/mailman/listinfo/go-consortium
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.stanford.edu/pipermail/go-friends/attachments/20200117/3eaf8610/attachment-0001.html>


More information about the go-friends mailing list