About World Bank Linked Data

Table of contents

What is this?

The data that is collected from The World Bank is composed of World Bank Indicators, World Bank Finances, World Bank Projects and Operations, and World Bank Climate Change data. It was collected by making requests to World Bank API endpoints using the XML format preference.

The data can be used to derive statistical information that’s consumable by humans in the form of charts (see also the Tools section), or compared to statistics from other organizations.

The purpose of the World Bank Linked Data here is to allow consumers and publishers to merge this data with theirs or link to for more information.

The dataset are updated monthly to check for changes in The World Bank’s API output. The last update was made on 2012-08-10.

Who is behind this?

The World Bank Linked Dataspace was created by Sarven Capadisli during his master’s thesis at Linked Data Research Centre, Digital Enterprise Research Institute (DERI), National University of Ireland, Galway. It is one of the case studies in Statistical Linked Dataspaces.

The work was supported by LATC and LOD2 projects.

Process

The original XML files were transformed to RDF/XML using XSLT 2.0. Saxon’s command-line XSLT and XQuery Processor tool was used for the transformations, and employed as part of Bash scripts to iterate through all the files in the datasets.

In order to efficiently import the data into the RDF store, rapper RDF parser utility program was used to first reserialize each RDF/XML file as N-Triples and appended to a single file at run-time before importing.

Apache Jena’s TDB storage system and Fuseki is used to run the SPARQL server. The HTML pages are generated by the Linked Data Pages framework, where Moriarty, Paget, and ARC2 does the heavy lifting for it.

SPARQL Endpoint

A public SPARQL endpoint is available, which accepts SPARQL 1.1 queries.

About the datasets

There is a VoID file which contains metadata for the datasets. The information included, but not limited to is: locations to RDF datadumps, named graphs that are used in the SPARQL endpoint, vocabularies used, dataset size. Statistics for the VoID file is generated using LODStats. The data dumps are available either as individual RDF/XML files or in compressed gzip format.

RDF/XML, Turtle, and JSON serialization formats are supported for the resolvable URIs on this site. However, the resources in the dataset are in the form of generic URIs i.e., they don't have an extension of the serialization format.

Completeness# of triples
The World Bank group is working hard to have their data available through their APIs. Sometimes things change. I'm playing safe with the Projects and Operations dataset, because its API is in testing i.e., while all the projects are listed, I'm holding back on some unstable parts of the data. Therefore, the data dumps and what’s on this site is what I consider to be safe enough. The size of the dataset is in rounded number of triples. See VoID file for exact numbers.
World Bank Climate ChangeComplete78 million
World Bank FinancesComplete8 million
World Bank Projects and OperationsIncomplete1 million
World Bank IndicatorsComplete87 million

The World Bank Metadata consists of 280k triples. There are also triples added which link from the datasets to their observations.

Decisions on source data

Herein is a list of some of the limitations and inconsistencies in the original data which introduced an extra problem layer. In order to arrive at a proper and useful Linked Data representation, some these problems were solved either with a script, or manually curated, and others were brought up to the World Bank team’s attention for investigation.

Missing units

The statistics of the World Bank Indicators consists of a collection of various indicators in different measurement units. At the time of this writing, these measurements in the source data are only provided as part of the string of the indicator name, as opposed to an explicit XML node.

Excluding data

Missing values

Some of the observations in the World Bank Indicators dataset do not have measured data. The nodes for the values were given in the API response, however they contained no numerical values . Hence, in order to keep the RDFized version of the dataset lean, these observations were excluded in the data transformation phase.

Aggregated data

While the World Bank API provides an endpoint to pull aggregated data over the WDI, these particular calls were left out in the data collection process since the atomic parts of the data was collected from individual calls.

MRV values

Most recent values (MRV) values were incorrectly introduced to non-date API calls in WDI. These observation nodes were excluded in the transformation phase since the data already contained observations with corresponding reference periods.

Naming patterns

Different naming patterns were identified across World Bank datasets. Some of these are as follows:

Region names

Region names as used in WDI and WBF datasets differed in a way that although they essentially conveyed the same meaning, labels did not match exactly. In order to have URIs for the region labels in the WBF observations, and to simplify the linking process, unique region names from the WBF observations was added to region resources in WDI. During the XSLT process, the alternative labels were matched with the labels in the observations themselves to arrive at their canonical representations.

Credit and Loan names

Based on a private discussion with the WFI team, it was determined that the vocabulary terms Credit Status and Loan Status, as well as Credit Number and Loan Number was used interchangeably. Thus, the canonical representation for the Linked Data URI pattern was to use one: Loan Status and Loan Number.

Missing countries

Some country codes were identified in the WDI observations that were not defined in the WDI country code list. These were later added to the original data.

Data modeling

The data is primarily composed of observations (e.g., indicators, financial loans, climate for a countries) using the RDF Data Cube vocabulary. There are also code lists for classifications like countries, currencies, projects, loan numbers, global circulation models and so on.

Data interlinking

Interlinks are done by with the LInk discovery framework for MEtric Spaces (LIMES). With respect to some of the concepts for code lists, they were manually matched with corresponding skos:exactMatch or skos:closeMatch links to DBpedia. There are interlinks to: Transparency International, BFS, IMF, ECB, FAO , UIS, and FRB Linked Dataspaces, as well as DBpedia, NASA, Eurostat, Geonames, IATA, and Humanitarian Response. See also: 270a.info.

Additional interlinking was done by adding links to resources with corresponding homepages on the World Bank site, as well as links to referenced documents.

Data enrichment

A code list for currencies was created based on currency and funds code list to represent the SDMX attributes for the amount measurements in the World Bank Finances datasets. They were also linked to each country which officially uses that currency.

Given that some of the codes in the World Bank country code list are not considered to be countries e.g., 1W representing World, only the resources that represent a real country have an added rdf:type instance of dbo:Country.

Vocabularies

Besides RDF, RDFs, XSD, OWL, the most common vocabularies in these datasets are: RDF Data Cube for modeling statistical observations, SDMX for statistical codes, British reference periods (Year, Gregorian Interval), SKOS, DC Terms. Where appropriate, new properties and classifications were created to represent World Bank Linked Data. The URI patterns section gives a further break down of this.

Properties that happen to be semantically the same, yet, syntactically different in the source data was collapsed into a single namespace in order to have a canonical name across the datasets, as well as to minimize the number of vocabulary terms.

In the case of country codes, it should be noted that ISO 3166-2 is used as the primary representation for countries. For example, the URI http://worldbank.270a.info/classification/country/CA identifies the country Canada in the datasets. It contains a skos:exactMatch to http://worldbank.270a.info/classification/country/CAN and vice-versa with 2 or 3-letter skos:notations.

Blank nodes

By in large, the datasets do not contain blank nodes (bnodes), with the exception of unavoidable ones in the Projects and Operations code list. Given the nature of the WBPO API response at the time, the decision was not create arbitrary URIs which may or may not need to exist at a later date.

Normalization

Data was only altered by removing white-space at the start and end of text content. Some of the dates in the data were transformed into equivalent representations in IS0 8601 format.

Data provenance

As part of data enrichment, triples pertaining provenance was added in order to partially provide extra metadata about the data. For the datasets and observations in these datasets, they address the following information:

Provenance in World Bank Linked Datasets
Type of provenanceWorld Bank
Defining sourcerdfs:isDefinedBy
Licensedcterms:license
Source locationdcterms:source
Related resourcedcterms:hasPart, dcterms:isPartOf
Creator of the datadcterms:creator
Publisher of the datadcterms:publisher
Creation datedcterms:created
Issued datedcterms:issued
Modified datedcterms:modified

URI patterns

Classifications
http://worldbank.270a.info/classification/{id}, where id is one of; country, income-level, indicator, lending-type, region, source, topic, project, currency, loan-type, loan-status, variable, global-circulation-model, scenario, basin.
Properties
http://worldbank.270a.info/property/{id}, similar to above, too many to list here.
Data Cube datasets
http://worldbank.270a.info/dataset/{id}, where id is one of; world-bank-indicators, world-bank-finances, world-bank-climates.
Named graphs in RDF store
http://worldbank.270a.info/graph/{id}, where id is one of; meta, world-bank-indicators, world-bank-finances, world-bank-climates, world-bank-projects-and-operations.
World Bank Indicators
http://worldbank.270a.info/dataset/world-bank-indicators/{id}/{country}/{year}, where id is one of; indicator code, country in one of country code, and year in YYYY.
World Bank Finances
http://worldbank.270a.info/dataset/world-bank-finances/{id}/{rowid}, where id is one of; financial dataset code, rowid as a positive integer.
World Bank Climate Change
http://worldbank.270a.info/dataset/world-bank-climates/{id}/{various patterns separated by slash}, where id is one of; climate change dataset code

Notes

Alternate formats as listed (at the bottom of the HTML page) for a given resource is currently the generated version (from a SPARQL query). It may contain additional triples like labels for the vocabulary terms that’s not in the RDF dumps, therefore, you should keep the difference in mind.

Source Code

The code which retrieves the World Bank data, transforms it to RDF serializations, and imports to TDB Triple Store can be found at GitHub: csarven/worldbank-linkeddata. It is using the Apache License 2.0.

Terms of use

The material on this site is not endorsed by The World Bank. The data on this site comes with no warranty. Hence, I am not responsible if chaos ensues on any level at any point in time, in any universe, in any dimension, in any anything. My responsibility is to make sure that the data here is represented using the Linked Data design principles. Unless stated otherwise, the data is not altered during the transformation from The World Bank data source. You shall assume that this data does contain errors, and you shall resort to the data provided by the original content provider if you have doubts about its validity. Make sure to check with The World Bank’s Terms and Restrictions as well, and everything else that they say. If you spot errors in the data here, lets fix them. If I’ve done something else wrong, please inform me so I can make it right. If you agree with this paragraph, you may use this data.

Data License

With the exception of World Bank’s own licensing, the Linked Data version of this data is licensed under CC0 1.0 Universal (CC0 1.0) Public Domain Dedication.

Datasets from The World Bank
NameSource
This table data is from SPARQL query.
Commercial Credit Exposure by Counterparty Ratinghttps://finances.worldbank.org/api/views/p65j-3upu/rows.xml
Contributions to Financial Intermediary Fundshttps://finances.worldbank.org/api/views/536v-dxib/rows.xml
Financial Intermediary Funds Cash Transfershttps://finances.worldbank.org/api/views/h4s8-nwev/rows.xml
Financial Intermediary Funds Commitmentshttps://finances.worldbank.org/api/views/fie8-6fxn/rows.xml
Financial intermediary Funds Funding Decisionshttps://finances.worldbank.org/api/views/ax5s-vav5/rows.xml
IBRD Balance Sheet FY2011https://finances.worldbank.org/api/views/4i57-byta/rows.xml
IBRD Balance Sheet, FY2010https://finances.worldbank.org/api/views/e8yz-96c6/rows.xml
IBRD Statement Of Income, FY2011https://finances.worldbank.org/api/views/eycy-ub35/rows.xml
IBRD Statement Of Loans - Historical Datahttps://finances.worldbank.org/api/views/zucq-nrc3/rows.xml
IBRD Statement of Cash Flows, FY2011https://finances.worldbank.org/api/views/zyqx-8e4a/rows.xml
IBRD Statement of Cash Flows, FY2010https://finances.worldbank.org/api/views/xs8h-cwh5/rows.xml
IBRD Statement of Income, FY2010https://finances.worldbank.org/api/views/pyda-ktbg/rows.xml
IBRD Statement of Loans - Latest Available Snapshothttps://finances.worldbank.org/api/views/sfv5-tf7p/rows.xml
IBRD Statement of Subscriptions to Capital Stock and Voting Power as of June 30, 2011https://finances.worldbank.org/api/views/rcx4-r7xj/rows.xml
IBRD and IDA Operational Summaryhttps://finances.worldbank.org/api/views/jeqz-f7mn/rows.xml
IBRD/IDA/IFC Trust Funds - Annual Cash Contributions and Disbursementshttps://finances.worldbank.org/api/views/iww5-3sst/rows.xml
IDA Balance Sheet, FY2010https://finances.worldbank.org/api/views/s3ey-mkx3/rows.xml
IDA Balance Sheet, FY2011https://finances.worldbank.org/api/views/ri54-wt6e/rows.xml
IDA Statement Of Cash Flows FY2011https://finances.worldbank.org/api/views/i7za-uwi5/rows.xml
IDA Statement Of Credits and Grants - Historical Datahttps://finances.worldbank.org/api/views/tdwh-3krx/rows.xml
IDA Statement Of Income FY2011https://finances.worldbank.org/api/views/wphw-pasx/rows.xml
IDA Statement of Cash Flows, FY2010https://finances.worldbank.org/api/views/9pv4-rtrm/rows.xml
IDA Statement of Credits and Grants - Latest Available Snapshothttps://finances.worldbank.org/api/views/ebmi-69yj/rows.xml
IDA Statement of Income, FY2010https://finances.worldbank.org/api/views/kmwd-f4rk/rows.xml
IDA Statement of Voting Power and Subscriptions and Contributions as of June 30, 2011https://finances.worldbank.org/api/views/v84d-dq44/rows.xml
Paid In Contributions to IBRD/IDA/IFC Trust Funds based on FY of Receipthttps://finances.worldbank.org/api/views/nh5z-5qch/rows.xml
Recipient-executed Grants - Commitments and Disbursementshttps://finances.worldbank.org/api/views/h9ga-h5eb/rows.xml
The World Bank Components Of VPU Budgethttps://finances.worldbank.org/api/views/csrh-vv7b/rows.xml
Total Contributions to IBRD/IDA/IFC Trust Funds - Summary based on FY of Agreementhttps://finances.worldbank.org/api/views/m54j-ersw/rows.xml
World Bank Expenditures By Programhttps://finances.worldbank.org/api/views/hcqu-nmwb/rows.xml
World Bank Program Budgethttps://finances.worldbank.org/api/views/gprm-cvxz/rows.xml
World Bank Climate Changehttp://climatedataapi.worldbank.org/climateweb/rest/
World Bank Indicatorshttp://api.worldbank.org/indicators?format=xml

Application

The application for the the WBLD is viewed in the form of chart visualizations. A custom API is built to pull the necessary data out of the application. The parameters for the API are:

  • indicator, which accepts a single indicator code (skos:notation of the indicator URI)
  • country, which accepts multiple country codes (skos:notation of the country URI)
  • year, which accepts a year in YYYY format

The indicator parameter is a required as one of the dimensions in the observation needs to be known. The other required dimension is either country or year.

Two API calls are made due to modular design approach; the first call is made to get the metadata about the indicator, whereas the second call is made to collect either all of the observations for the countries with that indicator, or all of the observations for a given reference period with that indicator. The response data from the API is requested in JSON format in order to pass it on to JavaScript library which handles the visualizations.

Chart visualizations

The Tools section on the site uses Google Charts Tools to create the visualizations.

Visualizing World Bank Indicators

Depending on the user selections, and the corresponding API call, two possible charts are generated:

Motion chart

It consists of three different views; a bubble chart, bar chart, and line chart. This chart is intended for observation values in countries, over a time period for an indicator. Unique colours are assigned to each country to easily visually separate them from one another. The reference period runs on the x-axis, whereas the measured values run on the y-axis.

Geo chart

It consists of a world map view where countries are separated by their official borders. This chart is used to view observation values for a time period for all the countries in the world. The legend consists in the form of a colour spectrum from lowest to highest measured values. The corresponding colours are assigned to each country on the map.

Page notice
  • Last updated on 2014-06-25.