Search Mailing List Archives

Limit search to: Subject & Body Subject Author
Sort by: Reverse Sort
Limit to: All This Week Last Week This Month Last Month
Select Date Range     through    

[go-friends] NIH RFI on Strategic Plan for Data Science: Database vs Knowledge base

Jim Hu jimhu at
Tue Mar 6 12:24:34 PST 2018

My reading of that is less alarming, but I agree that better wording would be important for GO and similar projects.

The full text of the box at the bottom of page 10 of the report 

> Databases and Knowledgebases: What’s the Difference?
> Databases are data repositories that store, organize, validate, and make accessible the core data related to a particular system or systems. For example, the core data for a model organism database might include genome, transcriptome, and protein sequences and functional annotations of gene products.
> Knowledgebases accumulate, organize, and link growing bodies of information related to core datasets. A knowledgebase may contain information about expression patterns, splicing variants, localization, and protein- protein interaction and pathway networks related to an organism or set of organisms. Knowledgebases typically require significant curation beyond the quality assurance/quality control and annotation needed for databases. 

I think we would agree with the part that says that curation is part of the difference. I’m reading the functional annotation in databases part as those cases where the functional annotations in a less curated database are automatically sucked in from a curated knowledgebase. Even the proteins sequences come from curation of the structural annotation. 


> On Mar 6, 2018, at 1:56 PM, Chris Mungall <cjmungall at> wrote:
> The NIH has put at an RFI together with a draft strategic plan:
> <>
> I want to draw people's attention to p10 of the report
> "NIH will distinguish between databases and knowledgebases (see text box “Databases and Knowledgebases: What’s the Difference?”) and will support each separately from one another"
> OK, this is interesting. But caution advised, these are two pretty squishy terms that are used differently by different communities. For those of us with an AI background, "databases" are typically closer to the raw data, are curated at the level of metadata rather than data, whereas "knowledge bases" contain curated generalizations of the data. GO is a classic knowledge base (or Knowledge Graph, now that google has made that trendy). However it's historically been called a "database" since that is the term the community normally uses.
> Anyway, the distinction that the NIH makes in the report (box at bottom of p10 of the report) doesn't make any sense to me:
> an example of what might be in a database is "functional annotations of gene products"
> an example of what might be in a knowledgebase is "protein-protein interaction networks"
> To me this is precisely reversed. PPI networks are often raw data, e.g. coIP. A functional annotation is as absolutely paradigmatic case of knowledge as you could wish for.
> Normally I save terminological minutiae such as "what's the difference between an ontology and terminology" to the bar or to the filing cabinet marked Pointless Discussions We Used To Have In The Early Days of GO. However, if the NIH is going to make important funding decisions based on a difference between "Database" and "Knowledge Base", it's crucial that we educate them. This is important for GO (and for other knowledge databases/repositories/resources/whatever you want to call them). Given that functional annotation is explicitly called out in the draft report, I think this calls for a specific response from the entire GO community.
> _______________________________________________
> go-friends mailing list
> go-friends at

Jim Hu
Dept. of Biochemistry and Biophysics
2128 TAMU
Texas A&M Univ.
College Station, TX 77843-2128

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the go-friends mailing list