Introducing HXL hashtags for humanitarian data

UN OCHA aims to make it easier for responders to share critical operational data during humanitarian crises. The Humanitarian Exchange Language (HXL) initiative has two components: community agreement on data standards, and a technical infrastructure for automating simple peer-to-peer data exchange

On 23 August 2007, Chris Messina(link is external) posted this message(link is external) to Twitter:

how do you feel about using # (pound) for groups. As in #barcamp [msg]?

Messina didn’t invent the idea of using simple tags to identify discussion topics, but his tweet started a chain of events that led to Twitter supporting hashtagsofficially in 2009, and it marked the moment when information tagging broke out from small, specialised technical communities to seize the public’s imagination. Seven years later, hashtags have become one of the main ways we connect online, not only around events (#barcamp(link is external)), but also around sports teams (#realmadrid(link is external)), places (#nairobi(link is external)), politics (#IndiaPoli(link is external)), social advocacy (#CARCrisis(link is external)), special interests (#knitting(link is external)), and even shared jokes (#CatsSaveTigers(link is external)).

Hashtags and spreadsheets

The members of the HXL Working Group have focused on stripping layer after layer of complexity from the proposed Humanitarian Exchange Language (HXL) standard, until we finally asked ourselves whether labelling information in shared data could be as simple as tagging topics is in social media posts. We realized that the core problem — helping machines understand how to classify information — is the same in both cases. Consider these two (made-up) tweets:

When I can’t be in Manchester, los Blancos’ll do for me.
Tengo muchas ganas de ver Real esta noche.

A sophisticated natural-language system might (or might not) be able to guess that these Tweets are both about the Spanish football team Real Madrid, but adding hashtags makes it obvious and simple:

When I can’t be in Manchester, los Blancos’ll do for me. #realmadrid(link is external)
Tengo muchas ganas de ver #RealMadrid(link is external) esta noche.

Coordinating the response to a humanitarian crisis might seem a long way from tweeting about a football game, but we run into the same kind of problem. Consider these spreadsheet excerpts:

 

Activity Sector Organisation Country Admin level 1
Train teachers/other educational personnel in life skills and psycho-social support Education UNICEF Mali Gao
Monitoring internal and cross-border movements of people (disaggregated by sex and age data), including the return movements of IDPs and refugees, in partnership with the government Protection UNHCR Mali Segou

 

Implementor Region Cluster Project
Agronomes et Vétérinaires Sans Frontières Gao WASH Reinvigorate / put in place the structures for managements of water points / network
OXFAM Gao WASH Drinking water supply for the populations affected site

These two spreadsheets describe the same kind of information, but — like the sample tweets earlier — do not use the same words to describe the topics. So why not add a hashtag to each column?

 

Activity Sector Organisation Country Admin level 1
#activity #sector #org #country #adm1
Train teachers/other educational personnel in life skills and psycho-social support Education UNICEF Mali Gao
Monitoring internal and cross-border movements of people (disaggregated by sex and age data), including the return movements of IDPs and refugees, in partnership with the government Protection UNHCR Mali Segou

 

Implementor Region Cluster Project
#org #adm1 #sector #activity
Agronomes et Vétérinaires Sans Frontières Gao WASH Reinvigorate / put in place the structures for managements of water points / network
OXFAM Gao WASH Drinking water supply for the populations affected site

With this small change, it’s now obvious how the columns in the two spreadsheets are related to each-other, even though they use different language and appear in a different order. Just as Twitter or Google+ can gather related postings that use the same hashtags, HXL-aware software can merge data from multiple sources, provide visualisations, validate, analyse, and summarise information.

The current list of standard HXL hashtags appears in the HXL tag dictionary(link is external). The HXL Working Group has also defined some special conventions for applying tags to different types of data, described in the HXL tagging conventions(link is external).

What can you do with tagged data?

If we all start tagging our humanitarian data spreadsheets, what happens next?

On the simplest level, we can take a spreadsheet of humanitarian data from anywhere — without knowing anything about it — and extract information from it. For example, a software application can note that four different values, “Education”, “Food Security”, “Protection”, and “Water Sanitation & Hygiene” appear in the column tagged “#sector”, and because of the tag, the application can know that those values are the names of humanitarian sectors. Without any user intervention, the application can count how often each sector appears in the data and generate a chart, like the one illustrated above.

A HXL-aware application can also use some hashtags (e.g. those for geography) to filter other ones. For example, assume that a spreadsheet about refugee camps in Haiti has one column tagged “#adm3″ (administrative level 3), one column tagged “#adm4″ (administrative level 4), and one column tagged “#loctype” (location type). A HXL-aware application can use the #adm3 and #adm4 columns to filter the dataset, and then count only the number of different camp-like locations in the 2eme Varreux of Cité Soleil, as in the accompanying illustration.

Of course, all of these analytics could be custom-written for different situations, but the benefit of HXL is that by adding intelligence to spreadsheets using tags, you get some of this analysis for free, across the whole humanitarian community.

To see these examples and many more live, please visit the #HXL showcase(link is external)(we’ll be adding more humanitarian datasets, visualisations, and analysis every week). The next step — and, perhaps, the most-valuable one — will be demonstrating how HXL tags allow data from multiple sources to be combined into a single common operating picture to help coordinate a crisis. Tags alone will take us part of the way there, but the next section describes some of new challenges that the HXL standards community will be addressing in 2015 and beyond.

Beyond tags

Adding simple hashtags to humanitarian data will bring huge benefits, but there will still be challenges for merging data from different sources. For example, if one spreadsheet has the value “WASH” under the column tagged “#sector”, and another spreadsheet has the value “Water Sanitation & Hygiene” under the same column, how can a HXL-aware application know that those are the same sectors when it merges the data? Are “United Nations Children’s Fund” and “UNICEF” the same organisation? Are “Ivory Coast” and “Côte d’Ivoire” different places? The answers to these questions are often obvious to humans, but not so to software.

HXL also defines hashtags for columns that contain unique, machine-readable codes: for example, while “#sector” refers to the name of a sector or cluster, “#sector_id” refers to a unique code for a sector or cluster. However, someone, somewhere, needs to define those codes, and the humanitarian community has to agree on their use.

There are initiatives outside the HXL community working on some of these problems. For example, the International Aid Transparency Initiative (IATI) maintains a large set of code lists and identifiers(link is external) for development aid, theBRIDGE(link is external) project aims to create a global registry of identifiers for aid organisations, and OCHA’s Common Operational Datasets(link is external) (CODs) include geographical codes down to a very local level for many countries. In future years, the HXL community will work with these organisations (and many others) to get agreement on the common codes, identifiers, and taxonomies needed for fully-automated data sharing at a detailed level.

But that’s the future. For now, we are working with agencies such as UNHCR and IOM to introduce HXL hashtags into their data, and will soon provide more tools to help the larger humanitarian community create, manage, analyse, and visualise HXL-tagged data. If you’d like to experiment with tagging your own humanitarian data, please get in touch!

Funded by the Humanitarian Innovation Fund, UN OCHA aims to make it easier for responders to share critical operational data during humanitarian crises. The Humanitarian Exchange Language (HXL) initiative has two components: community agreement on data standards, and a technical infrastructure for automating simple peer-to-peer data exchange.

Elrha is a registered charity in England and Wales (1177110).

Web design by Teamworks