Search Mailing List Archives

Limit search to: Subject & Body Subject Author
Sort by: Reverse Sort
Limit to: All This Week Last Week This Month Last Month
Select Date Range     through    

[liberationtech] new quarter-billion daily-updated global geocoded event dataset now available (GDELT)

kalev leetaru kalev.leetaru5 at
Mon Jul 8 06:42:48 PDT 2013

Yosem, thought your mailing list might find this of interest as well.

Looking at John Beieler's map of global protests in 2013 in GDELT (,
I was struck by how much protest activity there is in India compared with
the surrounding region and was concerned that this might simply be an
artifact of the greater prevalence of English-language media in that
country.  However, as I looked at the map, I was reminded of the NASA Night
Lights satellite imagery that I used to compare the landscape of
georeferenced tweets to in my last paper.  If you look closely at that NASA
imagery (which basically maps the presence of electric lights at night as a
proxy for the widespread availability of electricty in a given location),
you'll notice that India similarly stands out starkly from its surroundings
and in fact looks quite similar.  (Here is the latest 2012 version of that
NASA imagery

Using Big Query this afternoon I generated a master histogram of every
single location that an event has ever been reported at in GDELT
1979-present (amazing this took just around 6s flat despite scanning a
quarter-billion records) and made a map with a dot for each location having
one or more events.  I didn't size the dots in this version or do anything
else to separate locations with many events from those with just a few
(that's for a future version).

I then overlaid that display (displaying a dot for all locations worldwide
containing one or more GDELT events of any type or actor between 1979-2013
(in red) on top of the NASA Night Lights Imagery (in blue) to explore the
spatial overlap between areas with widespread availability of electricty
(population areas most likely to have substantial news coverage) and the
geography of GDELT's global coverage. Each red dot indicates one or more
events at that location: the dots are not sized, so a single point could
represent one event or one million events at that location over the
1979-2013 period.

You can see the final map here:

It is clear from this map that GDELT has very strong global coverage,
covering far beyond the areas of greatest electrical availability and
reaching deep into surrounding regions. GDELT's emphasis on Africa and
Latin America are also clearly visible, as is the fact that the only
regions with low GDELT representation are those with little electricity or
human habitation.  Even with the long tendril of population in Russia that
connects east with west, you can see a dense scattering of GDELT events all
around it, stretching far out into the rural areas.

So, this is great news in that it suggests that where there's power and
likely civilization GDELT has events in that area and there aren't any
obvious dead spots where there is lots of civilization but no GDELT events.

Just thought you'd find of interest!


On Fri, Jun 21, 2013 at 8:04 PM, Yosem Companys <companys at>wrote:

> From: kalev leetaru <kalev.leetaru5 at>
> Hi everyone, I wanted to let you all know about a new global database of
> events from across the world stretching back to 1979 and updated every 24
> hours, all georeferenced to the city level, that I think could be of great
> interest to many of you in terms of situational awareness, tracking ongoing
> humanitarian situations or disasters, and for exploring long-range trends
> in areas of concern or focus.  I mentioned this dataset in an email a few
> months ago, but I wanted to let you all know that the data is now up and
> available for download!
> We are excited to announce the official release of the Global Database of
> Events, Language, and Tone (GDELT), a new database of nearly a
> quarter-billion global social-political events in the CAMEO taxonomy of
> over 300 categories from riots and protests to diplomatic exchanges and
> peace appeals, covering all countries 1979-present.   Each morning a daily
> update is posted containing 30,000 to 100,000 new events from the previous
> day, making this the first daily-updated event database available for open
> research.  Special emphasis has been placed on enhanced coverage of Africa
> and Latin America, producing one of the first cross-national datasets for
> South America and the most extensive database for Africa.  The standard
> CAMEO actor taxonomy has been enriched with new Religious and Ethnic actor
> attributes and all events are now georeferenced to the city level globally.
> A second version of GDELT to be released late this fall makes use of
> several billion pages of newly available digitized material to extend the
> database back to 1800 and will feature the new CAMEO 2.0 taxonomy that
> covers an array of new categories, ranging from disease to human rights to
> political transitions.  In addition, an array of new emotional and thematic
> indicators will be made available that measure the prevalence and views
> towards a wide array of topics, from education and women’s rights to
> constitutionalism and views towards government, down to the city level
> globally.
> The vision of GDELT is to construct a catalog of human societal-scale
> behavior and beliefs across all countries of the world over the last two
> centuries down to the city level globally, to make all of this data freely
> available for open research, and to provide daily updates to create the
> first "realtime social sciences earth observatory."  All data is therefore
> made available for open research of any kind, and an assortment of
> tutorials, documentation, and quick-start guides are provided on the GDELT
> website:
> Sincerely,
> Kalev Leetaru, Phil Schrodt, and Patrick Brandt
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the liberationtech mailing list