Search Mailing List Archives


Limit search to: Subject & Body Subject Author
Sort by: Reverse Sort
Limit to: All This Week Last Week This Month Last Month
Select Date Range     through    

[go-friends] NIH RFI on Strategic Plan for Data Science: Database vs Knowledge base

Chris Mungall cjmungall at lbl.gov
Wed Mar 7 19:44:33 PST 2018


Yes, Val made a similar point that "functional annotation" may also be
taken to mean ENCODE-type experimental data, which in their raw form would
not be considered knowledge bases.

Even allowing for some ambiguities around this term, the overall
distinction they attempt to draw doesn't seem to make sense. The text is
here:

"""


*Databases and Knowledgebases: What’s the Difference?Databases are data
repositories that store, organize, validate, and make accessible the core
data related to a particular system or systems. For example, the core data
for a model organism database might include genome, transcriptome, and
protein sequences and functional annotations of gene
products.Knowledgebases accumulate, organize, and link growing bodies of
information related to core datasets. A knowledgebase may contain
information about expression patterns, splicing variants, localization, and
protein- protein interaction and pathway networks related to an organism or
set of organisms. Knowledgebases typically require significant curation
beyond the quality assurance/quality control and annotation needed for
databases.*
"""

As others have mentioned in this thread, we believe MODs fall into the KB
category but here it's used as an example of a database. The attempt to
define in terms of "core data" just leaves open the problem of what core
data is...

On Tue, Mar 6, 2018 at 1:10 PM, Guy Plunkett III <guy.plunkett at wisc.edu>
wrote:

> I agree with Jim: *curation* is a critical aspect of the distinction.
> Part of the confusion is the phrase "functional annotation,” which in NCBI
> parlance often means "the result of an automated pipeline," like a
> bacterial genome annotated via RAST or their own PGAAP. Any post-pipeline
> curation presumably lifts the database contents to knowledge base level.
>
> Dr. Guy Plunkett III
> Senior Scientist Emeritus, UW-Madison
> Senior Scientist, DNASTAR, Inc.
> http://www.genome.wisc.edu/information/gplunkett.html
>
>
>
>
>
> On Mar 6, 2018, at 2:24 PM, Jim Hu <jimhu at tamu.edu> wrote:
>
> My reading of that is less alarming, but I agree that better wording would
> be important for GO and similar projects.
>
> The full text of the box at the bottom of page 10 of the report
>
>
> Databases and Knowledgebases: What’s the Difference?
>
> Databases are data repositories that store, organize, validate, and make
> accessible the core data related to a particular system or systems. For
> example, the core data for a model organism database might include genome,
> transcriptome, and protein sequences and functional annotations of gene
> products.
>
> Knowledgebases accumulate, organize, and link growing bodies of
> information related to core datasets. A knowledgebase may contain
> information about expression patterns, splicing variants, localization, and
> protein- protein interaction and pathway networks related to an organism or
> set of organisms. Knowledgebases typically require significant curation
> beyond the quality assurance/quality control and annotation needed for
> databases.
>
>
> I think we would agree with the part that says that curation is part of
> the difference. I’m reading the functional annotation in databases part as
> those cases where the functional annotations in a less curated database are
> automatically sucked in from a curated knowledgebase. Even the proteins
> sequences come from curation of the structural annotation.
>
> Jim
>
>
>
> On Mar 6, 2018, at 1:56 PM, Chris Mungall <cjmungall at lbl.gov> wrote:
>
> The NIH has put at an RFI together with a draft strategic plan:
> https://grants.nih.gov/grants/guide/notice-files/NOT-OD-18-134.html
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__grants.nih.gov_grants_guide_notice-2Dfiles_NOT-2DOD-2D18-2D134.html&d=DwMFaQ&c=ODFT-G5SujMiGrKuoJJjVg&r=dwlw_8MTfn8wsIOfxPKv9g&m=NPRZTTmvrjPqkT0MtQ3OVPPMftwIO9LUCbO4PTU-aXU&s=c1J-CYmLYNJ7h1l0-HY9wfIPZtGteliwOuuakbMlkuU&e=>
>
> I want to draw people's attention to p10 of the report
>
> *"NIH will distinguish between databases and knowledgebases (see text box
> “Databases and Knowledgebases: What’s the Difference?”) and will support
> each separately from one another"*
>
> OK, this is interesting. But caution advised, these are two pretty squishy
> terms that are used differently by different communities. For those of us
> with an AI background, "databases" are typically closer to the raw data,
> are curated at the level of metadata rather than data, whereas "knowledge
> bases" contain curated generalizations of the data. GO is a classic
> knowledge base (or Knowledge Graph, now that google has made that trendy).
> However it's historically been called a "database" since that is the term
> the community normally uses.
>
> Anyway, the distinction that the NIH makes in the report (box at bottom of
> p10 of the report) doesn't make any sense to me:
>
>    - an example of what might be in a database is *"functional
>    annotations of gene products"*
>    - an example of what might be in a knowledgebase is *"protein-protein
>    interaction networks"*
>
> To me this is precisely reversed. PPI networks are often raw data, e.g.
> coIP. A functional annotation is as absolutely paradigmatic case of
> knowledge as you could wish for.
>
> Normally I save terminological minutiae such as "what's the difference
> between an ontology and terminology" to the bar or to the filing cabinet
> marked Pointless Discussions We Used To Have In The Early Days of GO.
> However, if the NIH is going to make important funding decisions based on a
> difference between "Database" and "Knowledge Base", it's crucial that we
> educate them. This is important for GO (and for other knowledge
> databases/repositories/resources/whatever you want to call them). Given
> that functional annotation is explicitly called out in the draft report, I
> think this calls for a specific response from the entire GO community.
> _______________________________________________
> go-friends mailing list
> go-friends at lists.stanford.edu
> https://mailman.stanford.edu/mailman/listinfo/go-friends
>
>
> =====================================
> Jim Hu
> Professor
> Dept. of Biochemistry and Biophysics
> 2128 TAMU
> Texas A&M Univ.
> College Station, TX 77843-2128
> 979-862-4054 <(979)%20862-4054>
>
> _______________________________________________
> go-friends mailing list
> go-friends at lists.stanford.edu
> https://mailman.stanford.edu/mailman/listinfo/go-friends
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.stanford.edu/pipermail/go-friends/attachments/20180307/42bd2f22/attachment.html>


More information about the go-friends mailing list