Curator prediction of NOT kinase activity
Valerie Wood
val at sanger.ac.uk
Wed Mar 8 07:38:27 PST 2006
> it is also possible that they use a modified
> catalytic mechanism that does not require these
> residues. This has been shown for the Wnk
> family, where Lys13 is thought to replace Lys30
> in adenosine triphosphate (ATP) binding (25)."
>
> In other words, nobody knows what they do, so we annotated to
> 'unknown' since we were uncomfortable with 'protein kinase activity.'
>
In that case, because it is impossible for a curator to know whether
some family members have evolved to use alternative catalytic residues
to the ones they are familiar with, probably means it is best to
reserve NOT for experimental codes......
I am quite happy to see protein kinase family members annotated to
protein kinase activity 'with' ISS, even if they lack residues which are
presumed to be required (providing these fulfill other criteria for
family membership when clustered), because, presumably anybody doing an
experiment would test this, and the NOT would be added if this was shown
not to be the case....
Although Karen, I can see your reasoning for annotating to function
unknown, but if somebody wanted to retrieve all proteins which were
likely to be kinases, based on sequence similarity wouldn't they also
want to retrieve these gene products? After all this is a reasonable
inference.....Sometimes using 'unknown' means these will not come to the
attention of people who may be interested in them as potential protein
kinases.
Val
> Karen
>
> At 07:54 AM 3/8/2006, Alexander D. Diehl wrote:
>
> > Hi,
> >
> > Sorry to be so late on this discussion, but I would like suggest
> > that perhaps the annotation is inappropriate on the grounds that NOT
> > annotations in general are very tricky assertions, and are best
> > based on actual experimental evidence, which itself is very
> > dependent on the experimental conditions used. There's a universe
> > of proteins that I could provide NOT kinase activity for, but of
> > course most people would not confuse them with kinases. The
> > question here is more whether the original algorithms for predicting
> > kinase activity need to be refined so that proteins lacking the
> > correct residues in the active site are not flagged as kinases in
> > the first place. Such refinement, of course, ought best to be based
> > on experimental evidence (wet science) that feeds the computational
> > algorithm development, and that's where expert input is needed.
> >
> > I thus vote against any annotation at all in this case.
> >
> > -- Alex
> >
> >
> > Evelyn Camon wrote:
> >
> >> > what were the objections to a GO_REF, this would be unambiguous
> >>
> >>
> >> Hi,
> >>
> >> The issue of using the GO_REF vs extension of the evidence codes
> >> is on
> >> the GO Consortium meeting agenda.
> >>
> >> Arguments against GO_REF include:
> >>
> >> I don't think the users will read about GO_REF (is that our
> >> problem?)
> >> I don't think the users will even see GO_REF in most tools,
> >> microarray, we are also limited in UniProt ffl to 4 fields of GO
> >> information
> >> Users can filter on GO_REF plus evidence code but are we simply
> >> avoiding extending the number of useful GO evidence codes, for
> >> manual codes I can see that extending the codes might slow down
> >> curation but for more granular IEA codes, they are created
> >> electronically so no extra effort from curator required.
> >> Biology is complex. Although I don't like to see information lost
> >> I think we further complicate what was a simple annotation process
> >> and output.
> >>
> >> Arguments in favour GO_REF include:
> >>
> >> many different techniques not practical (or is it) to create ne
> >> evidence codes, helps to disambiguate annotations
> >>
> >> I am really not wishing to start a new thread here...can we leave
> >> GO_REF and codes to Consortium meeting, I am still collecting
> >> ideas..you could reply to me directly if you wish me to collect
> >> further 'for' and 'aganist' examples for discussion.
> >>
> >> cheers
> >> Evelyn
> >>
> >>
> >> >
> >> >
> >> > So all annotations which use
> >> >
> >> > NOT with the ISS evidence code and GO_REF:xxx and a dbxref to an
> >> > alignment
> >> > mean that:
> >> >
> >> > The curator or an expert have looked at the alignment of this
> >> > sequence
> >> > to the associated database entry, protein family or hmm, and on
> >> > the
> >> > basis of the absence of critical residues have inferred that this
> >> > family
> >> > member is unlikely possess the associated activity.
> >> >
> >> > Or words to that effect.
> >> >
> >> >
> >> >
> >> > David Hill wrote:
> >> >
> >> >> I can see your point, but that would mean that every database
> >> >> should
> >> >> have an internal ISS reference. We have one, but it is only for
> >> >> orthology and subsequent inheritence of GO terms. Our's doesn't
> >> >> address
> >> >> this issue of a NOT. I think in either case, a User might be
> >> >> confused.
> >> >> If I were a naive User and I saw the annotation as you
> >> >> describe, I might
> >> >> think the NOT was a mistake becasue the proteins were so
> >> >> similar. If the
> >> >> protein in the with field is taken from the paper that
> >> >> discusses the
> >> >> critical residues, then maybe it would be less confusing. I'm
> >> >> not sure.
> >> >> I think we could generate confusion both ways.
> >> >>
> >> >> David
> >> >>
> >> >> Pascale Gaudet wrote:
> >> >>
> >> >>
> >> >> > Hi,
> >> >> >
> >> >> > Is this allowed? I thought that the reference had to directly
> >> >> > relate
> >> >> > to the annotation. In this case I would have used our
> >> >> > standard
> >> >> > 'dictybase curators ISS' reference, because that's how the
> >> >> > annotation
> >> >> > was made.
> >> >> >
> >> >> > If I was to see the annotation as you describe it, I might be
> >> >> > tempted
> >> >> > to go and look at the reference, and I would be very confused
> >> >> > because
> >> >> > it doesn't talk about the protein at all.
> >> >> >
> >> >> > Pascale
> >> >> >
> >> >> >
> >> >> > At 07:49 AM 3/8/2006 -0500, David Hill wrote:
> >> >> >
> >> >> >
> >> >> >> This is a bit out of the ordinary, but what about an ISS
> >> >> >> evidence
> >> >> >> code with an active kinase and then a reference to a paper
> >> >> >> that
> >> >> >> identifies the critical residues for kinase activity?
> >> >> >>
> >> >> >> David
> >> >> >>
> >> >> >> Midori Harris wrote:
> >> >> >>
> >> >> >>
> >> >> >> > Seems to me it would be a valuable part of the story, but
> >> >> >> > not
> >> >> >> > necessarily the whole thing. It would tell you what the
> >> >> >> > important
> >> >> >> > residues are, but would miss out the part about observing
> >> >> >> > that those
> >> >> >> > residues are altered/absent in this particular protein.
> >> >> >> > Also, citing
> >> >> >> > only the important-residue reference could give the
> >> >> >> > impression that
> >> >> >> > that paper (or whatever it is) actually states that
> >> >> >> > protein XYZ
> >> >> >> > doesn't have the activity -- which I assume is not the
> >> >> >> > case.
> >> >> >> >
> >> >> >> > m
> >> >> >> >
> >> >> >> > On Wed, 8 Mar 2006, jyoti khadake wrote:
> >> >> >> >
> >> >> >> >
> >> >> >> >
> >> >> >> >
> >> >> >> > > Hi,
> >> >> >> > >
> >> >> >> > > In this particular instance would the reference which
> >> >> >> > > identifies
> >> >> >> > > residues important for the kinase activity in members of
> >> >> >> > > the family
> >> >> >> > > be the appropriate reference?
> >> >> >> > >
> >> >> >> > > JK
> >> >> >> > >
> >> >> >> > > Midori Harris wrote:
> >> >> >> > >
> >> >> >> > >
> >> >> >> > >
> >> >> >> > >
> >> >> >> > >> The reference has to identify the source of the
> >> >> >> > >> information. In
> >> >> >> > >> this case,
> >> >> >> > >> it comes from what the curator knows, and from the work
> >> >> >> > >> she did
> >> >> >> > >> examining
> >> >> >> > >> the protein sequence. So I don't think the protein ID
> >> >> >> > >> would suffice,
> >> >> >> > >> because it would capture nothing of the curator's
> >> >> >> > >> involvement. The
> >> >> >> > >> advantage of a GO_REF is that we could include
> >> >> >> > >> everything the
> >> >> >> > >> curator did,
> >> >> >> > >> and make it unambiguous ... but it's not for me to
> >> >> >> > >> decide whether
> >> >> >> > >> that
> >> >> >> > >> advantage outweighs the problems (btw, what are the
> >> >> >> > >> arguments
> >> >> >> > >> against a
> >> >> >> > >> GO_REF?)
> >> >> >> > >>
> >> >> >> > >> m
> >> >> >> > >>
> >> >> >> > >> On Wed, 8 Mar 2006, Emily Dimmer wrote:
> >> >> >> > >>
> >> >> >> > >>
> >> >> >> > >>
> >> >> >> > >>
> >> >> >> > >>
> >> >> >> > >> > So if using the ISS code with these kinds of
> >> >> >> > >> > annotations, what
> >> >> >> > >> > reference information should be provided? Should the
> >> >> >> > >> > reference
> >> >> >> > >> > field refer back to the protein's identifier? Or to a
> >> >> >> > >> > specific
> >> >> >> > >> > GO_REF (which isn't ideal)
> >> >> >> > >> > e.g.
> >> >> >> > >> > UniProt P12345 GO:0004672
> >> >> >> > >> > UniProt:P12345
> >> >> >> > >> > ISS F
> >> >> >> > >> > protein taxon:9606 20060308 UniProt
> >> >> >> > >> >
> >> >> >> > >> > Midori Harris wrote:
> >> >> >> > >> >
> >> >> >> > >> >
> >> >> >> > >> >
> >> >> >> > >> >
> >> >> >> > >> >
> >> >> >> > >> >
> >> >> >> > >> > > The documentation for ISS says that it can be used
> >> >> >> > >> > > for predicted
> >> >> >> > >> > > or observed sequence features, and that in such
> >> >> >> > >> > > cases the 'with'
> >> >> >> > >> > > field can be left blank. If we choose to regard
> >> >> >> > >> > > altered 'active'
> >> >> >> > >> > > site residues as features -- which seems reasonable
> >> >> >> > >> > > -- ISS will
> >> >> >> > >> > > work.
> >> >> >> > >> > >
> >> >> >> > >> > > Also, using IC would not solve the reference
> >> >> >> > >> > > problem, so you
> >> >> >> > >> > > would still have to either (a) make a GO_REF entry
> >> >> >> > >> > > or (b) think
> >> >> >> > >> > > of something else to use as the reference.
> >> >> >> > >> > >
> >> >> >> > >> > > m
> >> >> >> > >> > >
> >> >> >> > >> > > On Wed, 8 Mar 2006, Evelyn Camon wrote:
> >> >> >> > >> > >
> >> >> >> > >> > >
> >> >> >> > >> > >
> >> >> >> > >> > >
> >> >> >> > >> > >
> >> >> >> > >> > >
> >> >> >> > >> > >
> >> >> >> > >> > >
> >> >> >> > >> > >> ok..so sequence similar to what?? the
> >> >> >> > >> > >> sequence/domain for the
> >> >> >> > >> > >> active kinase??? or could we have Inferred by
> >> >> >> > >> > >> Curator from
> >> >> >> > >> > >> Sequence (ICS??)..hmmm
> >> >> >> > >> > >>
> >> >> >> > >> > >> Ev
> >> >> >> > >> > >>
> >> >> >> > >> > >> Valerie Wood wrote:
> >> >> >> > >> > >>
> >> >> >> > >> > >>
> >> >> >> > >> > >>
> >> >> >> > >> > >>
> >> >> >> > >> > >>
> >> >> >> > >> > >>
> >> >> >> > >> > >>
> >> >> >> > >> > >> > I think I prefer ISS, because this is
> >> >> >> > >> > >> > essentially a judgement
> >> >> >> > >> > >> > which has
> >> >> >> > >> > >> > been made by assessing the sequence.....
> >> >> >> > >> > >> >
> >> >> >> > >> > >> > Evelyn Camon wrote:
> >> >> >> > >> > >> >
> >> >> >> > >> > >> >
> >> >> >> > >> > >> >
> >> >> >> > >> > >> >
> >> >> >> > >> > >> >
> >> >> >> > >> > >> >
> >> >> >> > >> > >> >
> >> >> >> > >> > >> >
> >> >> >> > >> > >> >> Hi,
> >> >> >> > >> > >> >>
> >> >> >> > >> > >> >> I'm not keen on the GO_REF idea I'm
> >> >> >> > >> > >> >> afraid...could we propose
> >> >> >> > >> > >> >> that IC
> >> >> >> > >> > >> >> could be used without GO ID on these odd
> >> >> >> > >> > >> >> occasions...not sure
> >> >> >> > >> > >> >> what
> >> >> >> > >> > >> >> publication you would use though...
> >> >> >> > >> > >> >>
> >> >> >> > >> > >> >> Ev
> >> >> >> > >> > >> >>
> >> >> >> > >> > >> >> Sandra Orchard wrote:
> >> >> >> > >> > >> >>
> >> >> >> > >> > >> >>
> >> >> >> > >> > >> >>
> >> >> >> > >> > >> >>
> >> >> >> > >> > >> >>
> >> >> >> > >> > >> >>
> >> >> >> > >> > >> >>
> >> >> >> > >> > >> >>
> >> >> >> > >> > >> >> > Most kinase recognition patterns are HMMs
> >> >> >> > >> > >> >> > which can only
> >> >> >> > >> > >> >> > predict a
> >> >> >> > >> > >> >> > domain but will not tell you if it is active
> >> >> >> > >> > >> >> > or not. The
> >> >> >> > >> > >> >> > kinases in
> >> >> >> > >> > >> >> > these examples were hit by the HMMs. The only
> >> >> >> > >> > >> >> > method which
> >> >> >> > >> > >> >> > will give any
> >> >> >> > >> > >> >> > indication of activity are ProSite patterns
> >> >> >> > >> > >> >> > which
> >> >> >> > >> > >> >> > specifically say a
> >> >> >> > >> > >> >> > particular residue needs to be in a particulr
> >> >> >> > >> > >> >> > position. The
> >> >> >> > >> > >> >> > HMMs are
> >> >> >> > >> > >> >> > correct in that these are part of the kinase
> >> >> >> > >> > >> >> > family, but are
> >> >> >> > >> > >> >> > inactive
> >> >> >> > >> > >> >> > members of it, they are not false positives
> >> >> >> > >> > >> >> > in that sense.
> >> >> >> > >> > >> >> > This is true
> >> >> >> > >> > >> >> > for many different classes of enzyme.
> >> >> >> > >> > >> >> >
> >> >> >> > >> > >> >> > And I do not remove enzyme InterPro2GO
> >> >> >> > >> > >> >> > annotation just
> >> >> >> > >> > >> >> > because a family
> >> >> >> > >> > >> >> > contains a few inactive members - all the big
> >> >> >> > >> > >> >> > enzyme
> >> >> >> > >> > >> >> > families do and
> >> >> >> > >> > >> >> > they can only really be recognised by manual
> >> >> >> > >> > >> >> > annotation.
> >> >> >> > >> > >> >> >
> >> >> >> > >> > >> >> > Sandra
> >> >> >> > >> > >> >> >
> >> >> >> > >> > >> >> > Valerie Wood wrote:
> >> >> >> > >> > >> >> >
> >> >> >> > >> > >> >> >
> >> >> >> > >> > >> >> >
> >> >> >> > >> > >> >> >
> >> >> >> > >> > >> >> >
> >> >> >> > >> > >> >> >
> >> >> >> > >> > >> >> >
> >> >> >> > >> > >> >> >
> >> >> >> > >> > >> >> >
> >> >> >> > >> > >> >> > > Hi Emily,
> >> >> >> > >> > >> >> > >
> >> >> >> > >> > >> >> > > A few comments which may be relevant:
> >> >> >> > >> > >> >> > >
> >> >> >> > >> > >> >> > > Out of interest, which protein kinase family
> >> >> >> > >> > >> >> > > is this (i.e.
> >> >> >> > >> > >> >> > > which
> >> >> >> > >> > >> >> > > Interpro domain). Is it a family where some
> >> >> >> > >> > >> >> > > (but not all)
> >> >> >> > >> > >> >> > > members are
> >> >> >> > >> > >> >> > > protein kinases, in
> >> >> >> > >> > >> >> > > which case the mapping should be removed?
> >> >> >> > >> > >> >> > >
> >> >> >> > >> > >> >> > > Alternatively, if this appears to be a
> >> >> >> > >> > >> >> > > spurious hit,
> >> >> >> > >> > >> >> > > instead of adding a
> >> >> >> > >> > >> >> > > NOT annotation, you can get spurious matches
> >> >> >> > >> > >> >> > > suppressed by
> >> >> >> > >> > >> >> > > Interpro as
> >> >> >> > >> > >> >> > > false positives (I often do this for S.
> >> >> >> > >> > >> >> > > pombe).
> >> >> >> > >> > >> >> > >
> >> >> >> > >> > >> >> > > Or, could it be a sequencing or gene
> >> >> >> > >> > >> >> > > predicition error?
> >> >> >> > >> > >> >> > >
> >> >> >> > >> > >> >> > >
> >> >> >> > >> > >> >> > >
> >> >> >> > >> > >> >> > > Val
> >> >> >> > >> > >> >> > >
> >> >> >> > >> > >> >> > >
> >> >> >> > >> > >> >> > > Midori Harris wrote:
> >> >> >> > >> > >> >> > >
> >> >> >> > >> > >> >> > >
> >> >> >> > >> > >> >> > >
> >> >> >> > >> > >> >> > >
> >> >> >> > >> > >> >> > >
> >> >> >> > >> > >> >> > >
> >> >> >> > >> > >> >> > >
> >> >> >> > >> > >> >> > >
> >> >> >> > >> > >> >> > >
> >> >> >> > >> > >> >> > >
> >> >> >> > >> > >> >> > >> Hi,
> >> >> >> > >> > >> >> > >>
> >> >> >> > >> > >> >> > >> I think there's no doubt whatsoever that
> >> >> >> > >> > >> >> > >> this information
> >> >> >> > >> > >> >> > >> should be
> >> >> >> > >> > >> >> > >> captured. The question is what to put for
> >> >> >> > >> > >> >> > >> reference and
> >> >> >> > >> > >> >> > >> evidence. The
> >> >> >> > >> > >> >> > >> best
> >> >> >> > >> > >> >> > >> evidence code is probably TAS, although
> >> >> >> > >> > >> >> > >> one could possibly
> >> >> >> > >> > >> >> > >> also make a
> >> >> >> > >> > >> >> > >> case for ISS (note that IC is restricted
> >> >> >> > >> > >> >> > >> to inferences
> >> >> >> > >> > >> >> > >> from other GO
> >> >> >> > >> > >> >> > >> annotations, so isn't suitable).
> >> >> >> > >> > >> >> > >>
> >> >> >> > >> > >> >> > >> For a reference, one possibility is to add
> >> >> >> > >> > >> >> > >> an item to the
> >> >> >> > >> > >> >> > >> GO_REF
> >> >> >> > >> > >> >> > >> collection; then there would be an ID to
> >> >> >> > >> > >> >> > >> plug into the file.
> >> >> >> > >> > >> >> > >>
> >> >> >> > >> > >> >> > >> m
> >> >> >> > >> > >> >> > >>
> >> >> >> > >> > >> >> > >> On Wed, 8 Mar 2006, Emily Dimmer wrote:
> >> >> >> > >> > >> >> > >>
> >> >> >> > >> > >> >> > >>
> >> >> >> > >> > >> >> > >>
> >> >> >> > >> > >> >> > >>
> >> >> >> > >> > >> >> > >>
> >> >> >> > >> > >> >> > >>
> >> >> >> > >> > >> >> > >>
> >> >> >> > >> > >> >> > >>
> >> >> >> > >> > >> >> > >>
> >> >> >> > >> > >> >> > >>
> >> >> >> > >> > >> >> > >>
> >> >> >> > >> > >> >> > >> > Hi,
> >> >> >> > >> > >> >> > >> >
> >> >> >> > >> > >> >> > >> > One of our annotators, who is an expert
> >> >> >> > >> > >> >> > >> > on protein
> >> >> >> > >> > >> >> > >> > kinases, has looked
> >> >> >> > >> > >> >> > >> > at the sequence of a putative protein
> >> >> >> > >> > >> >> > >> > kinase and from
> >> >> >> > >> > >> >> > >> > noticing a couple
> >> >> >> > >> > >> >> > >> > of amino acids changes at its active
> >> >> >> > >> > >> >> > >> > site, has predicted
> >> >> >> > >> > >> >> > >> > that it does
> >> >> >> > >> > >> >> > >> > not possess any kinase activity - she
> >> >> >> > >> > >> >> > >> > did not use any
> >> >> >> > >> > >> >> > >> > software and
> >> >> >> > >> > >> >> > >> > there
> >> >> >> > >> > >> >> > >> > is no published work on this protein.
> >> >> >> > >> > >> >> > >> > Do you think this type of annotation
> >> >> >> > >> > >> >> > >> > should be
> >> >> >> > >> > >> >> > >> > represented in GO (we
> >> >> >> > >> > >> >> > >> > feel this annotation is of high quality
> >> >> >> > >> > >> >> > >> > and adds valuable
> >> >> >> > >> > >> >> > >> > information to
> >> >> >> > >> > >> >> > >> > a protein which has not yet been
> >> >> >> > >> > >> >> > >> > characterized), and if
> >> >> >> > >> > >> >> > >> > so how should
> >> >> >> > >> > >> >> > >> > this annotation be shown?
> >> >> >> > >> > >> >> > >> >
> >> >> >> > >> > >> >> > >> > Thanks,
> >> >> >> > >> > >> >> > >> > Emily
> >> >> >> > >> > >> >> > >> >
> >> >> >> > >> > >> >> > >> >
> >> >> >> > >> > >> >> > >> >
> >> >> >> > >> > >> >> > >> >
> >> >> >> > >> > >> >> > >> >
> >> >> >> > >> > >> >> > >> >
> >> >> >> > >> > >> >> > >
> >> >> >> > >> > >> >> > >
> >> >> >> > >> > >> >> > >
> >> >> >> > >> > >> >
> >> >> >> > >> > >
> >> >> >> > >> >
> >> >> >>
> >
> >
