Search Mailing List Archives

Limit search to: Subject & Body Subject Author
Sort by: Reverse Sort
Limit to: All This Week Last Week This Month Last Month
Select Date Range     through    

[go-friends] [go] ANNOUNCEMENT: Retire gp2protein files

Suzi Aleksander suzia at
Fri Jan 17 15:58:07 PST 2020

Forwarding a message from Suzi L, which I think got caught in our mailing list filters...

From: Suzanna Lewis <selewis at>
Sent: Friday, January 17, 2020 1:18 PM
To: Stacia R Engel <stacia at>
Cc: Pascale Gaudet <pascale.gaudet at>; go-consortium at <go-consortium at>
Subject: Re: [go] [go-friends] ANNOUNCEMENT: Retire gp2protein files

Chiming in, which I do with some reluctance, but just to clarify a few things.

1. The gp2[annotatableObject] files came into existence prior to "RefGenome". They date back almost to the beginnings of GO. See 20011013_Chicago.pdf minutes. The (my) motivation for generating these was because very often there simply was no way to figure out what protein an annotation was referring to. This was needed largely for BLAST usage since so many early tools wanted to BLAST against the GO protein sets. Annotation, then as now, needs to specify the  entity being described as precisely as possible.

But that is just a historical correction, not directly pertinent today. However...

2. For any sort of "mapping" file there are by definition two parties involved (at a minimum). These are the two groups who are responsible for generating the respective set of identifiers. In the gp2[annotatableObject] case these two parties are the MOD(aka Alliance) and UniProt.

3. What happened at the first QfO/RefGenome meeting was that UniProt stepped up to help with UniProt IDs (after realizing that their help was needed) and offered to maintain the files working in -conjunction- with the MOD (whatever the file name, it amounts to the same thing, which is information connecting MOD ids to UniProt ids). The point is that, regardless of where these are physically kept, both sides need to be involved in the file maintenance. It simply is impossible for one group alone to do it without consulting with the other side.


4. A utility/mapping file is still required by all parties (UniProt, Q4O, GO, community, etc). And it is fine to "retire" gp2[annotatableObject] files from GO submission requirements, but only with the caveat that MOD representatives must remain cognizant of the files at UniProt and work with UniProt to ensure the accuracy of the mappings. Further it would be helpful to the community to have links from the GO/Alliance site to where the mapping files can be found at UniProt.

(here's hoping I've been clear enough not to generate even more on this thread)

All the best, S

On Fri, Jan 17, 2020 at 2:17 PM Stacia R Engel <stacia at<mailto:stacia at>> wrote:
Hi Pascale,
thanks for this.

all i’m saying is simply that a gp2[annotatableObject] identifier mapping is a need in the Alliance, and that GOC seems uniquely qualified to provide this type of work product for the Alliance in a more efficient and comprehensive manner than any of the other workgroups.  and it doesn’t make sense for the Alliance to have to find a solution from scratch.  we’re already doing a lot of spinning of wheels when it comes to identifiers.  that’s all.  it’s just a suggestion tying a known need to a possible expert source.

within the Alliance, GOC are the premier experts at handling identifiers for annotatable objects, and as a founding member of the Alliance, it’s appropriate for GOC to provide this to the Alliance.

it’s just an idea.  just thinking out loud.


On Jan 17, 2020, at 10:59 AM, Pascale Gaudet <pascale.gaudet at<mailto:pascale.gaudet at>> wrote:

Hi Stacia,

The gp2protein files were first requested by the RefGenome project, and have now been replaced by UniProt Reference Proteomes. I dont know of another use for the gp2protein file.

The files are not in our GO releases. GOA has a few of them on its ftp site:

They are in the old GO SVN site:
The only one that is recent in RGD; others date bewteen 8 months and 7 years. As far as I know, all species we have those files for are in the  UniProt Reference Proteomes set.

Given that this is so incomplete, and nobody has notified us, it seems unlikely that these files are used.

Thanks, Pascale

On 16 Jan 2020, at 18:41, Stacia R Engel <stacia at<mailto:stacia at>> wrote:

It seems this would be a useful work product for GOC, as a founding member, to provide for the Alliance.  Is this retirement a done deal?  If so, perhaps it’s worth a discussion about how to provide this info in a different way.  The topic of identifier mapping has come up often as a need within the Data Quartermasters working group.


On Jan 16, 2020, at 7:31 AM, Van Auken, Kimberly M. <vanauken at<mailto:vanauken at>> wrote:

Dear GO Friends,

The GOC would like to retire the gp2protein files that map gene identifiers from databases, e.g. MGI or WB, to the canonical reference protein in UniProtKB.

These files have been generated by groups contributing annotations to the GOC.

We plan to retire these files in ~ one month's time, February 15th.

We encourage anyone who may still be using these files to contact us (help at<mailto:help at>) to discuss alternative ways of obtaining these mappings (e.g. gene production information (gpi) files).

You may also comment on the corresponding github ticket:


Thank you.

-- Kimberly Van Auken
-- Pascale Gaudet

go-friends mailing list
go-friends at<mailto:go-friends at>

Stacia R. Engel, Ph.D.
Principal Biocuration Scientist
Program Coordinator, Alliance of Genome Resources<>
Group Leader, Biocuration, Saccharomyces Genome Database<>
Department of Genetics
Stanford University
Palo Alto, CA 94304-5477 USA
stacia at<mailto:stacia at>

go-consortium mailing list
go-consortium at<mailto:go-consortium at>

go-consortium mailing list
go-consortium at<mailto:go-consortium at>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the go-friends mailing list