Rolf Apweiler apweiler at
Sat Apr 6 04:53:10 PST 2002

Hi Xinghua,

> We have tried different methods to associate the GO term with PROSITE
> patterns.  The result is interesting and promising. However, for most of
> time,  the GO term tends to be over generalized than the annotation of
> PROSITE pattern.  We need some "gold standard" to evaluate our result.
> Alternatively, we can make our system available on the web and have
> friends at geneontology evaluate the results.  Eventually, if the system works
> well, it can be part of GO.  I hope this will be of interest?

I believe you are wasting your time. This mapping is already done and is
constantly updated. I just have sent you the mappings. As Wolfgang already

> > PROSITE is part of InterPro, see for details.
> > Nicky Mulder started to map InterPro entries to GO terms. As all PROSITE
> > patterns have exactly one InterPro accession, her mapping can be
> > translated
> > easily from InterPro -> GO to PROSITE -> GO. If you have difficulties
> > doing
> > that, mail again, we surely can help with that.
> >
> > Similarily, searching for PROSITE patterns is part of searching for
> > InterPro
> > entries. You can use this online or download and install at your site.
> > "InterProScan"
> > has an option to do the mapping to GO terms. Input is a aminoacid
> > sequence,
> > output are GO terms.

If you want to know more about InterPro2Go (and thus also PROSITE2GO), here it

InterPro [Apweiler et al., 2001] is an integrated documentation resource for
protein families, domains and sites, developed initially as a means of
rationalising the complementary efforts of the PROSITE (Falquet et al., 2002),
PRINTS (Attwood et al., 2002), Pfam (Bateman et al., 2002) and ProDom (Corpet
et al., 2000) databases. The project has now been extended to include SMART
(Letunic et al., 2002) and TIGRFAMs (Haft et al., 2001).

InterPro entries provide annotation describing a set of related proteins, some
of which may have identical molecular functions, be involved in the same
processes, and perform their function in the same cellular locations. Mapping
of InterPro entries to GO terms thus provide an automatic means of assigning GO
terms to the corresponding proteins. The assignment of GO terms to InterPro
entries was done by manual inspection of the abstract of the entries and
annotation of proteins in the match lists, and mapping of the appropriate GO
terms of any level which apply to the whole protein, not necessarily only the
domain described. The associated GO terms should also apply to all proteins
with true hits to all signatures in the InterPro entry. For each associated
term the name of the term and GO accession number is given, and these are
visible in InterPro entries, with links to the EBI QuickGO browser. In this
way, all proteins belonging to InterPro entries mapped to GO terms can be
automatically mapped to these GO terms. An additional advantage is that
multifunctional proteins can be mapped to multiple GO terms though associations
with more than one different InterPro entry matched.

Some entries could be mapped to very deep level (specific) GO terms, while
entries describing wider families or common domains could only be mapped to
higher level terms or could not be mapped at all. In many cases where there is
a parent/child relationship in InterPro, a protein can be mapped to a high
level term through the parent entry as well as to a specific term through a
more specific child entry.

The integrity of the InterPro to GO mappings is maintained by running regular
sanity checks on the data. The checks include searching for mappings from
secondary or deleted InterPro accession numbers, and mappings to obsolete or
non-existent GO terms. The reports are manually checked and corrected.

The data is available directly from the InterPro database at or through an InterPro-to-GO flat-file available
from the EBI ftp site []. This file
lists the InterPro entries and corresponding GO terms. A protein-to-GO file is
also produced which maps proteins to InterPro entries to GO. The results are
used for GO association files and for the Proteome Analysis pages
[]. The data is also available via the EBI SRS
server at It is possible to search a sequence against
InterPro using InterProScan at, and
then link the results to the appropriate GO term through the InterPro-to-GO



