WPS8284


Policy Research Working Paper                     8284




                        Poverty from Space
             Using High-Resolution Satellite Imagery
              for Estimating Economic Well-Being

                                 Ryan Engstrom
                                 Jonathan Hersh
                                 David Newhouse




Poverty and Equity Global Practice Group
December 2017
Policy Research Working Paper 8284


  Abstract
  Can features extracted from high spatial resolution sat-                           approach. A simple linear regression model, using only these
  ellite imagery accurately estimate poverty and economic                            inputs as explanatory variables, explains nearly 60 percent
  well-being? This paper investigates this question by extract-                      of poverty headcount rates and average log consumption.
  ing object and texture features from satellite images of Sri                       In comparison, models built using night-time lights explain
  Lanka, which are used to estimate poverty rates and average                        only 15 percent of the variation in poverty or income. The
  log consumption for 1,291 administrative units (Grama                              predictions remain accurate when restricting the sample
  Niladhari divisions). The features that were extracted                             to poorer Gram Niladhari divisions. Two sample appli-
  include the number and density of buildings, prevalence                            cations, extrapolating predictions into adjacent areas and
  of shadows, number of cars, density and length of roads,                           estimating local area poverty using an artificially reduced
  type of agriculture, roof material, and a suite of texture                         census, confirm the out-of-sample predictive capabilities.
  and spectral features calculated using a nonoverlapping box




  This paper is a product of the Poverty and Equity Global Practice Group. It is part of a larger effort by the World Bank to
  provide open access to its research and make a contribution to development policy discussions around the world. Policy
  Research Working Papers are also posted on the Web at http://econ.worldbank.org. The authors may be contacted at
  dnewhouse@worldbank.org.




          The Policy Research Working Paper Series disseminates the findings of work in progress to encourage the exchange of ideas about development
          issues. An objective of the series is to get the findings out quickly, even if the presentations are less than fully polished. The papers carry the
          names of the authors and should be cited accordingly. The findings, interpretations, and conclusions expressed in this paper are entirely those
          of the authors. They do not necessarily represent the views of the International Bank for Reconstruction and Development/World Bank and
          its affiliated organizations, or those of the Executive Directors of the World Bank or the governments they represent.


                                                        Produced by the Research Support Team
                Poverty from Space: Using High-Resolution Satellite Imagery for
                              Estimating Economic Well-Being1
                                               Ryan Engstrom2
                                               Jonathan Hersh3
                                              David Newhouse4




    Keywords: poverty estimation, satellite imagery, machine learning
    JEL classification: I32, C50




1
  This project benefited greatly from discussions with Sarah Antos, Ana Areias, Marianne Baxter, Sam Bazzi, Azer
Bestavros, Jacob Bien, Kristen Butcher, John Byers, Pedro Conceição, Francisco Ferreira, Ray Fisman, Alex Guzey,
Klaus-Peter Hellwig, Kristen Himelein, Selim Jahan, Matthew Kahn, Tariq Khokhar, Hannes Mueller, Trevor
Monroe, Dilip Mookherjee, Pierre Perron, Hashem Pesaran, Bruno Sánchez-Andrade Nuño, Kiwako Sakamoto,
Jacob Shapiro, David Shor, Benjamin Stewart, Andrew Whitby, Nat Wilcox, Nobuo Yoshida and seminar
participants at Boston University, Chapman University, University of Southern California, Princeton University,
UNDP, The World Bank, and the Department of Census and Statistics of Sri Lanka. All remaining errors in this
paper remain the sole responsibility of the authors. Sarah Antos, Benjamin Stewart, and Andrew Copenhaver
provided assistance with texture feature classification. Object imagery classification was assisted by James
Crawford, Jeff Stein, and Nitin Panjwani at Orbital Insight, and Nick Hubing, Jacqlyn Ducharme, and Chris Lowe at
Land Info, who also oversaw imagery pre-processing. Hafiz Zainudeen helped validate roof classifications in
Colombo. Colleen Ditmars and her team at DigitalGlobe facilitated imagery acquisition, Dung Doan and Dilhanie
Deepawansa developed and shared the census-based poverty estimates, and we thank Dr. Amare Satharasinghe for
authorizing the use of the Sri Lankan census data. Liang Xu provided research assistance. Zubair Bhatti, Benu
Bidani, Christina Malmberg-Calvo, Adarsh Desai, Nelly Obias, Dhusynanth Raju, Martin Rama, and Ana Revenga
provided additional support and encouragement. The authors gratefully acknowledge financial support from the
Strategic Research Program and World Bank Big Data for Innovation Challenge Grant, and the Hariri Institute at
Boston University. The views expressed here do not necessarily reflect the views of the World Bank Group or its
executive board, and should not be interpreted as such.
2
  rengstro@gwu.edu Department of Geography, George Washington University, 1922 F Street, Washington DC
3
  hersh@chapman.edu Argyros School of Business, Chapman University, 1 University Dr., Orange, CA
4
  dnewhouse@worldbank.org, Poverty and Equity Global Practice, World Bank, 1818 H Street, Washington DC
1   Introduction
Despite the best efforts of national statistics offices and the international development
community, local area estimates of poverty and economic welfare remain rare. Between 2002
and 2011, as many as 57 countries conducted zero or only one survey capable of producing
poverty statistics, and data are scarcest in the poorest countries (Serajuddin et al., 2015). But
even in countries where data are collected regularly, household surveys are typically too small to
produce reliable estimates below the district level. Generating welfare estimates for smaller areas
requires both a household welfare survey and contemporaneous census data, and the latter are
typically available once per decade at best. Furthermore, safety concerns may prohibit survey
data collection in many conflict areas altogether.

Satellite imagery has generated considerable enthusiasm as a potential supplement to household
data that can help fill these severe data gaps. In recent years, private companies such as
DigitalGlobe and Airbus have rapidly expanded the coverage and availability of high spatial
resolution imagery (HSRI), driving down commercial prices. Planet (formerly Planetlabs)
currently operates more satellites than any organization other than the U.S. and Russian
governments, and just recently, successfully launched 88 dove satellites that will allow for
coverage of the entire globe with imagery resolution of 3 to 5 m per pixel on a daily basis.
Continued technological advances will increasingly allow social scientists to benefit from this type
of imagery, which has been utilized intensively by the intelligence and military communities for
decades.

This paper investigates the ability of object and texture features derived from HSRI (High Spatial
Resolution Imagery) to estimate and predict poverty rates at local levels. The area of our study
covers 3,500 square kilometers in Sri Lanka, which contain 1,291 villages (Grama Niladhari (GN)
divisions). For each village, we extract both object and texture features to use as explanatory
variables in poverty prediction models. Object features extracted include the number of cars,
number and size of buildings, type of farmland (plantation or paddy), the type of roofs, the share
of shadow pixels (building height proxy), road extent and road material, along with textural
measures. These features are identified using a combination of deep learning-based Convolutional
Neural Networks (CNN) and classification of spectral and textural characteristics. These satellite
derived features were then matched to household estimates of per capita consumption imputed into
the 2011 Census for the 1,291 GN divisions.

We investigate five main questions: 1) To what extent can variation in GN economic well-being -
- poverty rates defined at the 10 and 40th percentiles of national income and average GN
consumption -- be explained by high spatial-resolution features? 2) Which features are most
strongly correlated with these measures of well-being? 3) Do these features predict equally well
in poor and rich GNs? In urban and rural GNs? 4) Can these models predict into geographically
adjacent areas? and 5) Are predictions robust to the use of a smaller sample of training data?

We find that: i) satellite features are highly predictive of economic well-being and explain about
60 percent of the variation in both village average consumption and estimated poverty headcount
rates; ii) Built-up area and roof type strongly correlate with welfare. Car counts and building height
are strong correlates in urban areas, while the share of paved roads and agricultural type are strong
correlates in rural areas; iii) Accuracy declines only slightly in the poorest decile of villages

                                                  2
(average consumption of $4.67 per day). Models are less accurate in urban areas than rural ones;
iv) Predicting into adjacent areas produces less accurate poverty measures, but ranking between
true and predicted rates is moderately high; and v) Using a 1 percent sample of the census based
ground truth, designed to mimic the sampling strategy of the Household Income and Expenditure
Survey, has little impact on the accuracy of the prediction.

This paper contributes to a growing literature exploring how remotely sensed data may be used
to assess welfare. Traditionally, the most popular remotely sensed measure for economic
applications has been night-time lights (NTL), which measures the intensity of light captured
passively by satellite. Strong correlations between NTL and GDP appear at the country level
(Henderson et al., 2009, Pinkovskiy and Sala-I-Martin, 2016) although within a country NTL
appears more strongly correlated with density than welfare. The relationship between lights and
wages or other measures of income appears weak (Mellander et al., 2013), casting doubt on its
reliability as a proxy for small area estimates of welfare. Additionally, NTL is ill-suited for
identifying variation in welfare within small areas because of its low spatial resolution. Even the
most advanced NTL satellite, the Visible Infrared Imaging Radiometer Suite VIIRS, has a spatial
resolution at nadir of approximately 1.0 km2.5 Indeed, we find that NTL captures only 15% of
the variation in poverty or income in the same area where high resolution spatial features capture
60 percent of the variation.

Daytime imagery has recently emerged as a practical source of information on welfare, in large
part due to new developments in computer vision algorithms. Advances in Deep Learning such as
Convolutional Neural Networks (CNN) have the capability to algorithmically classify objects such
as cars, building area, roads, crops and roof type (Krizhevsky, Sutskever, and Hinton, 2012). These
objects may be more strongly correlated with local income and wealth than NTL. Furthermore,
textural and spectral algorithms provide a simpler alternative to analyzing HSRI that does not rely
on object classification (Graesser et al. 2012, Engstrom et al. 2015, Sandborn and Engstrom 2016).
In this approach, the spatial and spectral variations in imagery are calculated over a neighborhood
of pixels to characterize the local scale spatial pattern of the objects observed in the imagery. These
measures, which we refer to as “texture” or “spectral” measures, capture information about an area
that may not be clear from object recognition alone.

This paper also contributes to a literature exploring how supervised learning techniques from
machine learning may be applied to unstructured data to reveal information about human welfare
(Athey, 2017). Glaeser, Kominers, Luca, and Naik (2015) apply texture-based machine vision
classification to images that are captured from Google Street View, trained using subjective
ratings of the images on the basis of the perceived safety. They estimate a support vector
machine model and show the fitted model can reliably predict block level income in New York
City. Jean et al. (2016) employ an innovative transfer learning approach, in which a set of 4,096
unstructured features are extracted from the penultimate layer of a convolutional neural network
that uses Google Earth daytime imagery to predict the luminosity of NTL. These 4,096 features
are then used to predict the average per capita consumption of enumeration areas (villages),
taken from living standard measurement surveys using ridge regression to prevent overfitting.
The resulting model predicts well and explains an average of 46 percent of the variation in

5
    Pixel size can vary depending on the angle of the satellite relative to the ground site.

                                                              3
village per capita consumption, out of sample, across the four countries in which it was trained.
While this innovative use of daytime imagery substantially improves on the use of night time
lights alone, there are two issues with its applicability to poverty measurement. First, it is not
clear that the transfer learning method generalizes well to other contexts where population
density is low. Extensions of this approach in Haiti and Nepal (Head et al., 2017) show declines
in predictive power, suggesting the NTL step in the transfer learning process may be ill suited for
poor, low-density areas. Second, the transfer learning method is not necessarily optimal for
predicting very poor areas. When the top two quintiles are excluded from their sample,
restricting the sample to those below twice the international poverty line, the     falls
precipitously, to about 0.12. This illustrates the challenges this method faces in distinguishing
welfare among the poorest of the poor, who in the African context most likely live in relative
dark.6

This study utilizes imagery features that are based either on recognizable objects or “texture”
algorithms developed for computer vision applications, derived from High Spatial Resolution
Imagery (HSRI). This method offers several advantages for the estimation of poverty rates. First,
it eliminates reliance on NTL, which is a coarse measure of welfare, to identify relevant features
for model development. Second, it provides a more transparent understanding of the underlying
factors that explain geographic variation in welfare in different contexts. Third, features developed
from HSRI, such as roads and the extent of built-up area, are useful for policy analysis in other
areas, such as transport and urban planning. Finally, a feature-based approach can easily be
extended to alternative welfare indicators, such as headcount poverty rates measured at different
thresholds.

The paper proceeds as follows: Section 2 summarizes how the data were created and presents brief
summary statistics. Section 3 presents the statistical methodology. Section 4 examines the
predictive power of high resolution satellite features (HRSF) to estimate poverty in small areas at
the village level. Section 5 examines out-of-sample performance using two applications from
estimating local area economic well-being. Section 6 concludes.


2    Data Description
Our analysis is restricted to a sample area of approximately 3,500 km2 in Sri Lanka. National
coverage was not feasible due to the high cost and partial availability of high-resolution imagery;
however, these data are rapidly becoming more available and less expensive as companies such a
Planet and DigitalGlobe expand their archives and launch newer, more precise satellites with
more frequent revisit rates. We sampled DS divisions conditional on HSRI being available,
drawing areas from urban, rural, and estate sectors.7 According to the 2012 census, population by
sector in Sir Lanka is rural (77.4%), urban (18.2%) and estate (4.4%) (Sri Lanka Department of


6
  It is not straightforward to generate estimates of poverty headcount rates from predictions of mean consumption at
the village level, since poverty rates depend on the dispersion of welfare within each village as well as its mean.
7
 Sri Lanka classifies sectors as urban, rural, or estate. The estate sector refers to plantation areas of more than 20
acres with 10 or more residential laborers. Except for sample stratification, the estate sector is grouped with the rural
sector.


                                                           4
Census and Statistics, 2012). Population by sector in our sample is rural (45.9%), urban (46.2%)
and estate (7.8%).
2.1     Details on Satellite Imagery
The satellite imagery consists of 55 unique “scenes” purchased from Digital Globe, covering
areas specified in our sample area. Each “scene” is an individual image captured by a particular
sensor at a particular time. Images were acquired by three different sensors: Worldview 2,
GeoEye 1, and Quickbird 2. These sensors have a spatial resolution of 0.46m2, 0.41m2, and
0.61m2, respectively in the panchromatic band and 1.84m2, 1.65m2, and 2.4m2, respectively, in
the multi-spectral bands. Pre-processing of imagery included pan-sharpening, ortho-rectification,
and image mosaicking. Most imagery was captured in either 2011 or 2012, although some
imagery from 2010 was also used.




               Figure 1: Coverage Area of High Resolution Satellite Imagery
                            Notes: Sample area shown highlighted in white.


2.2     Details on Poverty Data
Ideally village poverty and consumption statistics would be generated directly from the 2012/13
Household Income and Expenditure Survey (HIES), a detailed survey that measures the
consumption patterns of 25,000 households on approximately 400 consumption items. The
survey contains an average of 8.4 households per GN division in the 47 sampled DS divisions,
making the HIES insufficient to generate consistent poverty estimates at the GN division level
without supplementary data. We therefore draw on the most common method to impute welfare
estimates (Elbers, Lanjouw, Lanjouw, 2003) into the 2011 Census of Population and Housing,
which is identical to the method used to generate official poverty estimates at the DS division
level (Department of Census and Statistics and World Bank, 2015). For each household in the

                                                  5
census, per capita consumption was estimated based on models developed from the HIES, using
household indicators that are common to both the Census and the HIES. 8 We derive GN
headcount poverty rates using the standard Foster-Greer-Thornbecke method (Foster, et al.,
1984), for two poverty lines: poverty line 1 at the 10th percentile of the national per capita
consumption distribution, and poverty line 2 at the 40th percentile. This is equivalent to $3.00
and $5.13 per day, respectively, in 2011 PPP terms, which compares to an extreme poverty line
in 2011 prices of $1.90 per day.

Imputing welfare into the census requires an assumption of spatial homogeneity within small areas.
This assumption “may severely underestimate the variance of the error in predicting welfare
estimates at the local level in the likely presence of small-area heterogeneity in the conditional
distribution of expenditure or income” (Tarozzi and Deaton, 2009). To test the extent of spatial
heterogeneity in practice, small area estimates of poverty have been compared to census-based
measures in Mexico and Brazil, which each collect income information in their census.
Considerable spatial heterogeneity is present in Mexico.9 In contrast, Elbers et al (2009) find
significantly less in Minas Gerais, Brazil. The effect of spatial heterogeneity on the results
presented below is unclear. We are not aware of any empirical estimate of the extent to which the
spatial heterogeneity assumption leads to biased poverty headcount estimates at the local level. To
the extent any additional noise in the poverty estimates due to uncaptured heterogeneity in the
coefficients is independent across neighboring households within a GN, this noise would be
significantly reduced after averaging over a large number of households.

2.3      Comparison of GN Poverty Rates and Mean NTL Reflectance
A simple visual comparison between mean NTL and village poverty rates illustrates why NTL
provides limited information on sub-national welfare. Figure 2 presents a panel of three images
for the Divisional Secretariat of Seethawaka: mean raw NTL (left), poverty rates derived from the
10% national income threshold (middle), and log of mean population density (right). Comparing
the left and middle panels, there is only a small association between villages that have low NTL
reflectance and those that are high in poverty. Problems of overglow (Henderson et al., 2012) mean
that poor villages adjacent to wealthy ones will be misclassified as non-poor.10 While NTL tracks
the general contours of poverty for the DS – lower poverty areas in the Northwest and higher
poverty areas in the Southeast – this coarse association is only of limited use for public policy
applications such as poverty targeting or budget allocations.

NTL appears to give a more accurate approximation of the population density of the underlying
GN divisions, which is consistent with Mellander et al. (2013). Comparing the right and left panels
shows a strong association between high NTL areas and areas with a high population density. We
take this to suggest that the information content contained within NTL related to human welfare is
limited. While lights at night may indicate gross associations, it is a highly imperfect measure of



8
    One hundred simulations of consumption were estimated in the 2011 census using the PovMap 2.0 software.
9
  Simulations indicate that in 10 percent of municipalities, the coverage rate of the estimated poverty rate is less than
50 percent. In other words, in these 10 percent of municipalities, confidence intervals from simulations that estimate
headcount rates exclude the true poverty rate in more than half the simulations.
10
   Abrahams (2015) describes a method to correct for overglow, but this method has yet to be widely adopted.

                                                           6
  Figure 2: Comparison of Mean Night Time Lights (NTL), Poverty Rate, and Mean Population
                              Density, Seethawaka, Sri Lanka


welfare. We therefore investigate whether the much richer set of information contained in HSRI
daytime imagery translates into more accurate welfare predictions.

2.4. Feature Extraction from High-Resolution Satellites
The derived high-resolution spatial features fall into seven broad categories: (1) Agricultural Land,
(2) Cars, (3) Building Density and Vegetation, (4) Shadows (building height proxy) (5) Road and
Transportation; (6) Roof Type; and (7) Textural and Spectral characteristics. In addition to the
satellite features, we use two geographic attributes of the GN division: Whether it is
administratively classified as an urban area, and its area in square kilometers. Table 1 presents
summary statistics for these variables.

Deep learning-based object classification was used for classifying the share of the GN division
that is built-up (i.e. consists of buildings), the number of cars in the GN, and the share of pixels in
the GN that were identified as shadow pixels (proxy for building heights), and crop type. The
classification method used is similar to Krizhevsky, Sutskever, and Hinton (2012), which utilizes
convolutional neural networks (CNN) to build object predictions from raw imagery. Roof type,
paved and unpaved roads of different widths, and railroads were classified using a combination of
Trimble eCognition and Erdas Imagine software, utilizing a combination of support vector
machines and visual identification. Classifier accuracy is great than 90 percent for all of the objects
recognized. Details on the extraction and classification process are provided in detail in the online
appendix, which includes an example ROC curve for buildings.




                                                  7
2.4.1 Object Classification Details
The agricultural land variables consist of the fraction of GN agriculture identified as paddy (rice
cultivation) or plantation (cash crops such as tea). These sum to 100 percent for GNs with
agricultural land, so the excluded category in subsequent regressions is GN divisions with no
agricultural land. We also calculated the fraction of total GN area that is either paddy, plantation,
or any agriculture. Figure 4 shows an example of a developed area building classification, with
raw image shown at the top and CNN classification accuracy shown below. On the bottom panel,
true positives are highlighted green, with false positives highlighted red. Figure 5 shows a sample
car classification. Cars that are positively identified are shown circled in blue. False negatives are
most prevalent where there is considerable tree masking of pixels.




                 Figure 3: Example Developed Area (Buildings) Classification 
   Notes: above image shows raw (left) and classified (right) for developed area building classifier from raw
    satellite imagery. Areas in green show are true positive building classifications. Images in red are false
                               positives: erroneously classified areas as buildings. 




                                 Figure 4: Example Car Classification
                    Notes: Cars identified by the convolutional neural network shown in blue. 
                                                          
                                                         8
Table 1: Grama Niladhari Summary Statistics  
                                                                  Mean          Sd       Min        Max 
          Economic Well‐Being                                                                             
     Avg Consumption in Rs                                      10274.2  3052.7        4881.9     21077 
     Avg Log Consumption                                            9.19      0.28        8.49      9.96 
     Rel. Pov. Rate at 10% Nat. Cons.                            0.0903      0.066     0.0023       0.39 
     Rel. Pov. Rate at 40% Nat. Cons.                              0.332      0.16      0.035         0.8 
          Geographic Descriptors                                                                          
     log Area (square meters)                                      14.73      1.01        12.1         18 
     = 1 if urban                                                  0.304      0.46           0          1 
     province==[1] Western                                         0.587      0.49           0          1 
     province==[3] Southern                                        0.255      0.44           0          1 
     province==[6] North‐Western                                 0.0643       0.25           0          1 
     province==[7] North‐Central                                 0.0155       0.12           0          1 
     province==[8] UVA                                           0.0782       0.27           0          1 
          Agricultural Land                                                                               
     % of GN area that is agriculture                               16.8      0.15           0         94 
     % of GN agriculture that is paddy                              44.4      37.5           0       100 
     % of GN agriculture that is plantation                        46.38      37.8           0       100 
     % of Total GN area that is paddy                              8.629      10.9           0      74.7 
     % of Total GN area that is plantation                         8.168        11           0      94.1 
          Cars                                                                                            
     log number of cars                                            3.123      1.44           0        8.3 
     Total cars divided by total road length                    0.00556       0.01           0      0.17 
     Total cars divided by total GN Area                               0  0.00007            0  0.00093 
          Building Density and Vegetation                                                                 
     % of area with buildings                                      7.817      6.82       0.13       33.9 
     % shadows (building height) covering valid area               6.509      6.01        0.31      34.9 
     Vegetation Index (NDVI), mean, scale 64                       0.427      0.21           0      0.86 
     Vegetation Index (NDVI), mean, scale 8                        0.566      0.24           0      0.99 
          Shadows                                                                                         
     ln shadow pixels (building height)                            12.96      1.04        7.31      17.6 
     ln Number of Buildings                                         6.90      0.92           0        9.3 
          Road variables                                                                                  
     log of Sum of length of roads                                 9.445      0.94       1.47       13.1 
     fraction of roads paved                                        38.3      28.7           0       100 
     ln length airport roads                                       0.013      0.33           0      9.25 
     ln length railroads                                           1.098      2.67           0      10.8 
          Roof type                                                                                       
     Fraction of total roofs that are clay                          36.5        22           0       100 
     Fraction of total roofs that are aluminum                     14.08      7.06           0      71.9 
     Fraction of total roofs are asbestos                          7.766      11.3           0      71.2 
          Textural and spectral characteristics                                                           
     Pantex (human settlements), mean                              0.627      0.54       0.02       2.94 
     Histogram of Oriented Gradients (scale 64m), mean           3509.4  2070.3         129.1     10381 
     Linear Binary Pattern Moments (scale 32m), mean                49.5        1.1       18.1      49.5 
     Line support regions (scale 8m), mean                      0.00836      0.004     ‐2E‐07      0.035 
     Gabor filter (scale 64m), mean                                0.469      0.28      0.014         1.3 
     Fourier transform, mean                                       84.34      17.8        4.51     113.4 
     SURF (scale 16m), mean                                        12.06      7.77        0.13      31.6 
  
     Observations                                             1291 
  

                                                          9
Three car-related variables were calculated – the log total number of cars in a GN, total cars divided
by total road length, and cars per square kilometer of the GN. The average GN division in our
sample contains 50 cars. However, there is wide dispersion, as the 99th percentile of the car count
distribution is equal to 577 cars and the maximum value is 4,000 cars. On the left side of the
distribution, 136 of 1,291 GNs contain no cars. Because the distribution is skewed, we take the log
of the car count, while imposing a smooth function for GNs with zero or few cars.11

Building density variables include the fraction of an area covered by built-up area and the number
of roofs identified, Built-up area captures any human settlements – buildings, homes, etc. –
regardless of use or condition. These are grouped with two measures of the Normalized Difference
Vegetation Index. Although technically a spectral characteristic, the presence of vegetation in
urban areas indicates development such as parks, trees, or lawns (i.e., area that is not built up)
within the urban environment. In the rural environment it also indicates undeveloped areas, and
the values can aid in describing variations in agricultural type and productivity depending on the
timing of the image acquisition. The fifth category is two indicators that capture shadows of
buildings: the log of the number of pixels classified as shadow as well as the fraction of shadows
in a GN. The shadow variables use the angle of the sun as it shines on a building, and the shadows
it displaces, to estimate the presence of shadows.12

The road variables we calculate are the log of total road length, fraction of roads that are paved,
and length of airport runway and length of railroad identified. For roof type, we calculate the
fraction of roofs in a village that are either clay, aluminum, asbestos, with the omitted category
being roofs that are identified as none of the above, the vast majority being gray cement roofs.
Roof type can be identified through remote sensing by using hyperspectral imaging, or using
reflectance from several contiguous spectral bands. Different roof materials exhibit different
spectral properties, particularly in the sub-visible bands of the spectrum. The roofs in our sample
are clay (36.5%) aluminum (14.08%), asbestos (7.8%) or gray concrete (41.6%).

2.4.2 Details of Textural and Spectral Features
We calculate six separate types of spectral and textural features: Fourier transform, Gabor filter,
Histogram of Oriented Gradients (HoG), Line support regions (LSR), Pantex, and Speed-Up
Robustness Features (SURF). These are often used in machine vision problems to decompose an
image. They are intended to capture aspects of a neighborhood that are not so easily identified
directly, including the presence of characteristics associated with slums such as many irregular
building lines or high density. These features may be considered outputs from a dimension
reduction technique, in that they are reduced dimensionality descriptions of a complex 2-D
satellite imagery.

Because these measures may be novel to readers without backgrounds in remote sensing, further
description may be helpful. We consider Pantex here to be a measure of human settlements. It is a

11
  The log car variable is calculated as the log of the sum of the car count and the square root of the car count plus
one.
12
     Valid area refers to areas at the foot of buildings where shadows may appear.


                                                           10
spatial similarity index, where each cell is compared to adjacent cells in all directions. Forests will
have a low Pantex level, since cells in all directions have similar contrast, as will cells with straight
roads. Cities dense with many buildings will have high Pantex values. HOG captures “local
intensity gradients or edge directions” (Dalal and Triggs, 2005) and in the context here captures
intensity of lines of development or agriculture. Local binary patterns (LBPM) captures local
spatial patterns and gray scale contrast. SURF detects local features used for characterizing grid
patterns, and measures orderliness of building development, the opposite of which is typically
referred to as a slum. Areas with right angles, corners, or areas with regular grid patterns, will have
larger SURF values relative to areas with chaotic or irregular spacing. For more detail on imagery
and the feature extraction process, we refer the reader to the online appendix.


3    Statistical Methodology

We estimate linear regressions in which the dependent variable is estimated poverty or log welfare
(per capita household consumption) at the level of the GN divisions, derived from the census.
Since these are linear models, the fitted values are not constrained to lie between zero and one, but
this is a minor issue in the sample.13 The error term is assumed to be clustered at the level of the
DS division, and the standard errors are robust to heteroscedasticity.

Given the list of available covariates, variable choice is not obvious. Estimating a model with the
full set of candidate variables in table 1 would likely produce predictions that are overfit, in the
sense that they perform much better in-sample than out-of-sample (Athey and Imbens, 2015). One
attractive method for variable selection among a large selection of covariates is Lasso
regularization. Lasso is a regularized regression that estimates a regression model with an added
constraint that enforces parsimony (Tibshirani, 1996). The motivation for the shrinkage estimator
is that by reducing the parameters of the model, one increases bias at the expense of lower variance.

Our baseline model is a “Post-Lasso” estimator (Belloni and Chernuzhukov, 2013). This two-step
estimator first estimates a Lasso model over the full set of coefficients, followed by an OLS model
over the set of non-zero coefficients from the Lasso step. The model we estimate in the Lasso step
is defined as



                          			           argmin
      (1)                                               	
                                                                                                   	



Where the poverty rate in a GN is given by and            0 is a parameter that penalizes the
absolute values of the coefficients. At the extreme, full relaxation of the penalization factor, that

13
  In 6 of the 1,291 GN divisions in the sample, the predicted 10 percent poverty rate is negative, with a minimum
prediction of -1 percent. The predicted 40 percent poverty rate is positive in all GN divisions. As a robustness check,
we estimated binomial regressions and obtained similar results, which are available upon request.


                                                            11
is setting to zero, yields unconstrained OLS estimates. Thus as → ∞ ,              →       . As →
∞, the penalty increases and         	converges to the zero vector. Lasso regressions are useful as
a variable selection methodology because the sharp ℓ metric shrinks variables exactly to zero if
they prove unuseful in decreasing the sum of squared errors. This creates a type of variable
selection. However, simultaneously the Lasso “shrinks” the magnitude of coefficients towards
zero, even for those that remain non-zero (Varian, 2014). Thus, by subsequently estimating an
OLS model in the second stage, we ensure the coefficient estimates are unbiased. To choose the
appropriate value of , we apply 10-fold cross validation, and choose the value of 	that
minimizes root-mean squared error (RMSE) across folds. GLM versions of the model, which
ensures that predicted values lie between zero and one, do not change the results qualitatively
and are available by request.

Inferential standard errors are typically absent from Lasso models. Because of the Oracle property
of the Lasso estimator (Fan and Li, 2001), we use the standard errors from the OLS model in the
second stage as our measures of population inference. The Oracle property ensures that inference
in the second stage using the reduced set of variables selected in the first stage is consistent with
inference were we to use a single stage estimation strategy using only the selected variables present
in the true data-generating process (Belloni and Chernuzhukov, 2013).


4   Results

Table 2 presents the estimates from the main specification for the full sample. The first two
columns show the model where GN poverty is defined at the lower poverty rate, the next two
present the higher poverty rate models, and the next two present average GN consumption
dependent variable models. Many extracted satellite features have high explanatory power,
including agriculture type, length of roads and fraction of roads paved, number and density of
buildings, NDVI, roof type, shadows (building height proxy) and two spatial features, LBPM,
and Fourier transform. The models explain a high amount of the variation in poverty,
summarized in the in-sample R-squared values between 0.608 and 0.618. Out-of-sample R-
squared, estimated using ten-fold cross-validation, varies between 0.588 and 0.605. We conclude
from the results that the models are not likely to be overfit to the data.

The results suggest that, in words, a simple linear model that includes only the geographic size of
the GN division, whether it is urban, and remotely sensed information explains 61 percent of the
variation across GNs in headcount poverty rates. Figure 6 plots predicted against true average
GN consumption, with colors assigned by province in which the GN is located. A LOWESS
smoothing line is shown with associated confidence interval. A perfect model would have
predictions exactly on the 45-degree line. While there is noise, the predictions tend to straddle
the 45° line, indicating a high degree of agreement between the predicted and true welfare
values. However, the model has a tendency to under-predict for wealthier GNs.




                                                 12
4.1     Marginal Effects of Satellite Features
While the primary objective of this exercise is to obtain accurate predictions, the model
coefficients also shed light on the nature and magnitude of the conditional correlations between
imagery features and poverty. The coefficients may be difficult to interpret for two reasons: First,
the independent variables are often measured in different units. Second, in some cases multiple
independent variables are based on the same underlying feature. In these cases, it is meaningless
to evaluate the conditional correlation of one variable while holding the others constant.




 Table 2: Prediction of Local Area Poverty Rates Using High‐Res Spatial Features 
                                              Lower Poverty Rate               Higher Poverty Rate            Average Log Per Capita 
                                                (10% Nat. Inc.)                   (40% Nat. Inc.)                  Consumption 
                                                  Coef           t               coef         t               coef           T 
 log Area (square meters)                           0.020*          [2.52]        0.0093          [0.60]      ‐0.0079           [‐0.31] 
 = 1 if urban                                       ‐0.023         [‐1.80]        ‐0.037         [‐1.06]        0.08            [1.18]
 % of GN area that is agriculture                 ‐0.00025         [‐1.04]      ‐0.00017         [‐0.27]                             
 % of GN agriculture that is paddy              ‐0.00033**  [‐2.97]           ‐0.00087**         [‐2.97]    0.0014**             [2.92] 
 % of GN agriculture that is plantation         ‐0.00021**  [‐2.84]            ‐0.00059*         [‐2.66]     0.0012**            [2.72] 
 % of Total GN area that is paddy                 ‐0.00019         [‐0.58]      ‐0.00083         [‐1.10]     0.0016*             [2.10] 
 Total cars divided by total road length             ‐0.31         [‐1.17]                                                           
 Total cars divided by total GN Area                  29.6          [0.54]                                                           
 log number of cars                                ‐0.0059         [‐0.89]        ‐0.015         [‐1.39]        0.024            [1.60]
 log sum of length of roads                      ‐0.020***         [‐3.64]       ‐0.027*         [‐2.32]       0.033             [1.67] 
 fraction of roads paved                       ‐0.00035***  [‐4.24]           ‐0.00079**         [‐3.24]     0.0014**            [3.06]
 ln length airport roads                           ‐0.0051         [‐1.45]                                     0.022             [1.52] 
 ln length railroads                               0.00098          [1.31]                                    ‐0.0046           [‐1.26] 
 % of area with buildings                         ‐0.0027*         [‐2.31]      ‐0.0093*         [‐2.34]       0.020*            [2.56] 
 log of Total count of buildings in GN           ‐0.0090**         [‐2.71]       ‐0.019*         [‐2.05]        0.029            [1.70] 
 Vegetation Index (NDVI), mean, scale 64            0.061*          [2.20]        0.14**          [2.94]      ‐0.21**           [‐2.93] 
 Vegetation Index (NDVI), mean, scale 8           ‐0.064**         [‐2.80]                                                           
 % shadows (building height)                       0.0022*          [2.04]       0.0064*          [2.18]      ‐0.013*           [‐2.27] 
 ln shadow pixels (building height)                 0.016*          [2.51]        0.039*          [2.64]       ‐0.047           [‐1.95]
 Fraction of total roofs that are clay           0.00077**          [3.35]      0.0017**          [3.25]    ‐0.0027**           [‐3.15] 
 Fraction of total roofs that are aluminum      0.00091***          [3.63]      0.0022**          [3.15]    ‐0.0040**           [‐3.15] 
 Fraction of total roofs are asbestos             ‐0.00033         [‐1.08]                                                           
 Linear Binary Pattern Moments (scale 32m)        0.0021**          [2.91]     0.0090***          [5.53]    ‐0.017***           [‐5.92] 
 Line support regions (scale 8m), mean               ‐0.66         [‐0.87]                                                           
 Gabor filter (scale 64m) mean                      ‐0.052         [‐1.53]                                                           
 Fourier transform, mean                          0.0017**          [3.42]                                                           
 SURF (scale 16m), mean                            ‐0.0014         [‐0.94]        ‐0.001         [‐0.59]       0.0034            [1.06]
 Constant                                          ‐0.32**         [‐3.03]         ‐0.31         [‐1.43]     10.1***            [29.9]
 Observations                                               1291                        1291                           1291 
 R‐sq                                                       0.610                       0.618                          0.608 
 R‐sq Adj.                                                  0.602                       0.613                          0.602 
 Out‐of‐Sample R‐sq                                         0.588                       0.605                          0.594 
 Mean Absolute Error                                        0.032                       0.078                          0.139 
 Notes: Unit of observation is Grama Niladhari (GN) division. Variables were selected using Lasso regularization from the 
 candidate set of variables shown in table 1. * p<0.05, ** p<0.01, *** p<0.001 
  


                                                           13
To understand the magnitudes of the coefficients, we group independent variables and consider
the marginal effect of a one standard deviation increase of the underlying satellite feature.14 Table
3 presents these marginal effects tables. For some dependent variables, the reported marginal
effects reflect a combination of multiple underlying indicators, while for others they reflect single
variables, as indicated in the right most column.

The size of the GN, in square kilometers, is more strongly correlated with headcount or average
consumption. This suggests that households in the bottom decile are disproportionately found in
larger GN divisions. The presence of agricultural land is weakly and negatively associated with
poverty, controlling for other characteristics of the GN, although the result is not statistically
significant. Of the indicators related to the distribution of paddy vs. plantation land, The LASSO
procedure selected three of the indicators for 10 and 40 percent poverty incidence models, and two
for the log consumption model.15 The results indicate a discernible but fairly weak negative
relationship between the presence of paddy agricultural land and poverty, which is consistent with
the socioeconomically disadvantaged nature of the tea plantation sector in Sri Lanka.

 Table 3: Marginal Effects of One Standard Deviation Change  
                           Lower Poverty     Higher            Average Log Per 
                             Rate (10%     Poverty Rate            Capita 
                             Nat. Inc.)   (40% Nat. Inc.)       Consumption                          Variables 
 Area                      2.1 pp *           0.9 pp               ‐0.008          Area   
 Urban                     ‐1.0 pp            ‐1.7 pp              0.037           Urban Dummy  
 Agricultural land         ‐0.4 pp            ‐0.3 pp                              % of GN area that is agriculture 
 Agricultural type         ‐0.6 pp *          ‐1.9 pp **           0.026 *          Combined: Ag % paddy, Ag % plantation (‐), 
                                                                                   % area paddy 
 Cars                      ‐1.2 pp                                                 Combined: cars divided by road length, cars 
                                                                                   divided by Area, log cars 
                                              ‐2.2 pp              0.035           log cars 
 Road variables            ‐1.9 pp ***        ‐2.5 pp *            0.031           log sum of length  
                           ‐1.0 pp ***        ‐2.3 pp **           0.040 **        Fraction paved  
                           ‐0.2 pp                                 0.007           log length of airport runway 
                           0.3 pp                                  ‐0.012          log sum of railroads 
 Building Density          ‐2.7 pp **         ‐8.1 pp **           0.162  **       % of area with buildings and log of total count 
                                                                                   of buildings in GN combined  
 Vegetation                ‐0.2 pp            2.9 pp **            ‐0.044 **       Combined: NDVI, scale 64m, NDVI, scale 8m 
 Shadows                                                                           Combined: % shadows (building height) and 
                           3.0 pp ***         7.9 pp ***           ‐0.128 *** 
                                                                                   ln shadow pixels  
 Roofs                     1.7 pp  **         3.8 pp **            ‐0.06  **       Fraction of roofs clay 
                           0.6 pp ***         1.6 pp **            ‐0.028 **       Fraction of roofs aluminum 
                           ‐0.4 pp                                                 Fraction of roofs asbestos  
 Spatial Features          0.2 pp **          1.0 pp ***           ‐0.019 ***      Linear Binary Pattern Moments 
                           ‐0.3 pp                                                 Line support regions 
                           3.1 pp                                                  Fourier transform  
                           ‐1.5 pp                                                 Gabor filter 
                           ‐1.1 pp **         ‐0.8 pp              0.026           SURF 




14
     Except for percent of GN agriculture that is plantation, for which a one standard deviation decrease is considered.

15
   Since an increase in paddy land implies a reduction in agricultural land, for those GNs with agricultural land, the
latter is subtracted instead of added when calculating the marginal effect.

                                                              14
Compared with land type, the association between poverty and cars is mildly stronger. A one
standard deviation increase in log cars (to an average value of 3.1) is associated with a 2.2
percentage point decline in poverty at the higher poverty line, and a 0.035 increase in predicted
log per capita consumption. A one standard deviation increase in all three car related variables is
associated with a 1.2 percentage point decline in poverty at the lower percent rate. Road
characteristics are moderately associated with local poverty rates. Length of roads, fraction of
roads paved, and runways are negatively associated with poverty, though only the first two are
statistically significant, while GNs with more railways are poorer. A one standard deviation change
in total length of road is associated with a 1.9 percentage point decline in the lower poverty line, a
2.5 percentage point decline where poverty is defined at the higher line, and a .031 increase in log
consumption. The magnitudes of the marginal effects for fraction of roads that are paved are
broadly similar, though a one standard deviation increase is only associated with a weaker 1
percentage point decline at the lower poverty line.




    Figure 5: Model diagnostic plot of predicted against true average GN consumption


Measures of building density are strongly associated with log welfare and poverty. A one standard
deviation increase in these two variables is associated with a 2.7 percentage point decline at the
lower poverty rate, an 8.1 percentage point decline at the higher poverty rate, and a 0.16 increase
in log consumption. In the lower poverty rate model, a one standard deviation increase is associated
with a smaller 2.7 percentage point decline in poverty. Vegetation is moderately associated with
poverty. A one standard deviation reduction in vegetation is associated with a 2.9 percentage point

                                                 15
reduction at the higher poverty line, and a .04 increase in mean per capita consumption, which is
comparable to cars or the fraction of roads that are paved. For the lower poverty line model, both
NDVI measures are selected. The higher poverty line and log welfare models only include NDVI
calculated over blocks of 64 pixels, suggesting that very high spatial resolution imagery may not
be critical for generating informative measures of NDVI for prediction.

Two measures of shadows, a proxy for building height, are selected: the share of valid area covered
by shadows, and the log number of shadow pixels. A one standard deviation increase in both
measures is associated with a 3 percentage point increase in poverty at the lower poverty line, an
8 percentage point increase in poverty at the higher one, and a 0.13 decrease in mean log per capita
consumption. For roof type, the Lasso procedure selects both the fraction of roofs classified as
clay and aluminum, for all three models, and includes the fraction classified as asbestos for the
lower poverty line model. The signs on clay and aluminum in the poverty regressions are positive,
suggesting that these are generally inferior compared to the omitted category of grey concrete.
This appears to be consistent with an analysis in Kenya that documents that roofs with greater
luminosity, like aluminum, are associated with lower levels of poverty (Suri et al., 2015). The
marginal effect of a standard deviation in clay and aluminum roofs are, respectively, 1.7 and 0.6
percentage points for the lower poverty line model, and .06 and .03 for mean log per capita
consumption. These magnitudes are stronger than roads and vegetation, but considerably less than
those for building density and shadows.

Of the texture variables, five out of seven are selected for the 10 percent model (LBPM, LSR,
Gabor, Fourier, and SURF). Of these, only LBPM and SURF are selected for the 40 percent and
log per capita consumption model. In general, the estimated marginal effects for these variables
are modest. The main exception is the mean of the Fourier transform, which is positively associated
with poverty in the lower poverty line model, though the coefficient is not statistically significant.
A one standard deviation increase in SURF is associated with a one percentage point decline in
the lower poverty line model and a 0.03 increase in log per capita consumption. This is consistent
with wealthier areas being laid out in a more orderly way, with more “right angles” in housing
layouts.

Figure 7 <Note to check – figure 7 is not included here.> presents a map showing the true
welfare measures in the left panel, against the predicted welfare measures in the right, for a
particular DS division, Seethawaka. The top panel shows predicted welfare from the OLS model
against actual welfare. The model is able to distinguish the poorer eastern areas from the richer
western ones. Even poor GNs adjacent to richer ones can be distinguished; although the smallest
GNs are less than a half mile across, the HRSF model is able to distinguish with considerable
accuracy the variation in average consumption. The middle panel shows predicted and true
poverty rates defined at the lower poverty line. Again, the predicted model approximates the true
poverty rates with considerable accuracy. The lower poverty regions in the south and northeast
are replicated in the predicted values. The model tends to under-predict poverty in the lowest
poverty areas in the mid-west, suggesting that two-step or zero-inflated Poisson models may
perform better.




                                                 16
Figure 6: Predicted Versus True Welfare Measures, Average Consumption (top),
                  10% Poverty (middle) 40% Poverty (bottom)

                                      17
In sum, predictive models based on an urban indicator, the size of the GN, and a host of features
derived from satellite imagery predict poverty rates and mean log per capita consumption
remarkably well. Greater numbers of cars are associated with lower poverty, although the
relationship is not statistically significant, as is a denser road network and a larger share of paved
roads. The indicators most strongly associated with poverty are building density and shadows.
Shadows are positively associated with poverty, which suggests they are capturing variation in tree
cover that is inversely related to building density. Consistent with this, areas characterized by more
and lusher vegetation tend to be poorer. Clay and aluminum roofs, compared to grey roofs, are
associated with greater levels of poverty. Of the spatial features, SURF exhibits a fairly strong
association with poverty at the lower poverty line, suggesting that neighborhoods laid out in a
more orderly way tend to be less poor. The following sections consider the robustness of these
main findings.

4.2 Decomposition of Satellite Feature Explanatory Power
The results presented above indicate that features derived from satellite imagery explain a large
portion of village income or poverty, and that associations are particularly strong for measures of
building density and shadows. However, these results do not address the question of which
indicators account for the model’s predictive power. To address this issue, we decompose the
using a Shapley decomposition (Shorrocks, 2013; Huettner and Sunder, 2012; Israeli, 2007).
This procedure calculates the marginal      of a set of explanatory variables, as the amount by
which     declines when removing that set from the set of candidate variables. In other words, for
a model with 	sets of explanatory variables, the procedure will estimate 2        models and
average the marginal      obtained for each set of independent variables across all estimated
models. This ensures that the variable’s contribution to     is independent of the order in which it
appears in the model.

Table 4 presents the 	decomposition. The results confirm that measures of building density –
built up area, number of buildings, shadow pixels, and to a lesser extent vegetation, are powerful
contributors to predictive power. Collectively, these three sets of variables account for 39 to 45
percent of the model’s explanatory power. However, a number of other variables are moderately
important. GN area, urban classification, road characteristics, roof type, and the texture variables
each explain 8 to 12 percent of the variation. The car and agricultural variables explain a bit less
than that, between 5 and 7 percent each. In short, while broad measures of building density
explain a large share of the variation, virtually all sets of indicators contribute substantial
predictive power to the model.




                                                 18
Table 4: Shapley Decomposition of Share of Variance Explained (                                ) by High Resolution Spatial Feature 
Subgroup 

                                         Lower Poverty Rate               Higher Poverty                Average Log Per Capita 
                   
                                           (10% Nat. Inc.)              Rate (40% Nat. Inc.)                Consumption 

 Area                                              10.4                            8.3                              8.4 
 Urban                                             9.4                             9.7                              10.8 
 Agricultural land                                 0.9                             1.0                                 
 Paddy land                                        3.8                             4.6                              4.1 
 Cars                                              7.3                             5.6                              4.6 
 Building density                                  14.8                            19.5                             22.5 
 Vegetation                                        8.0                             6.2                              4.4 
 Shadows                                           14.4                            14.1                            14.0 
 Road variables                                    9.4                             7.7                              9.8 
 Roof Type                                         10.4                            8.3                              8.4 
 Texture variables                                  9.4                            9.7                              10.8 
 Observations                                     1291                            1291                             1291 
 R‐sq                                             0.610                           0.618                            0.608 
Notes: Agricultural variables include fraction agriculture plantation, fraction agriculture paddy, and fraction of GN area that is 
plantation.    Car  variables  include  log  of  car  count,  and  cars  per  total  road  length.  Building  density  variables  include  log  of 
developed area, shadow count (building height proxy), fraction of GN developed, fraction covered by shadow, NDVI at scales 64 
and 8. Road variables include log of unpaved road length, log of paved roads narrower than 5m, log of paved roads 5m+, log of 
airport roads, log of railroad length, and fraction of roads paved. Roof variables include count of roofs by type: clay, aluminum, 
asbestos, grey cement, and fraction of roofs of same type. Texture variables include Fourier series, Gabor, histogram of oriented 
gradients, Local Binary Pattern Moments mean and standard deviation, line support regions, and SURF.  


4.3     Comparisons to Night Time Lights
How does the predictive power of indicators derived from daytime imagery compare with night
time lights? To shed light on this, Table 5 presents OLS models covering the same sample area
using night time lights as the independent variable. The first three columns present poverty and
per capita consumption models. Aggregate night time lights is positively correlated with welfare
and negatively correlated with poverty, however the total explanatory power is low:         values
for the three regressions are between 0.10 and 0.147, with performance lowest for the 10 percent
headcount measure and highest for log consumption per capita. Adding higher order polynomials
up to a quartic only increases it to 0.15. Models built using high resolution satellite indicators
capture around four times as much variation in poverty or welfare as NTL. Columns 4-6 of table
4 show estimates that include DS division fixed effects. Night time lights is no longer significant
in any of the specifications, indicating that within DS divisions, NTL is weakly correlated with
welfare.

Given the prevalence, ease of use and familiarity with night time lights, one might also ask how
much more explanatory power do night time lights provide in addition to the indicators extracted
from daytime imagery? Table 6 answers that question, by adding night time lights to the Shapley
decomposition. The night time lights category includes average, squared, cubed, and average
standard deviation of NTL. The night time lights variables explain between 7 and 12 percent of
the variance in per capita consumption or poverty according to the decomposition, meaning there
is roughly a 90 percent additional variation in poverty or income that is captured through high
resolution satellite predictions. Furthermore, adding night time lights marginally increases the
overall    of the regression, by about 0.01. In this context, NTL is not a particularly accurate

                                                                       19
proxy for poverty and welfare, and adds very little explanatory power to the set of available
daytime indicators.


    Table 5: Model Estimates, Night Lights on Poverty/Average GN Consumption 
                       Lower         Higher      Average Log    Lower       Higher                                        Average Log 
                            Poverty       Poverty Rate        Per Capita        Poverty Rate  Poverty Rate                 Per Capita 
                          Rate (10%        (40% Nat.         Consumptio          (10% Nat.     (40% Nat.                  Consumptio
                          Nat. Inc.)          Inc.)               n                 Inc.)         Inc.)                        n 
     Night Lights          ‐0.583***        ‐1.546**           2.922**            ‐0.0383       ‐0.0898                      0.186 
     2012 
                            (‐3.53)           (‐3.38)            (3.32)               (‐0.79)         (‐0.67)                (0.64) 
     Observations            1291              1291               1291                 1291            1291                   1291 
     R‐sq                    0.109             0.131             0.147              0.000868        0.000842               0.00103 
     R‐sq Adj.               0.108             0.130             0.146              0.0000932       0.0000671              0.000258 
     R‐sq within                                                                    0.000868        0.000842               0.00103 
     R‐sq between                                                                      0.372           0.448                 0.527 
     R‐sq overall                                                                      0.109           0.131                 0.147 
     Divisional               No                No                 No                   Yes             Yes                    Yes 
     Secretariat FEs 
    Unit of observation is Grama Niladhari (GN) Division. All models include a regression constant which is omitted from the table. * p < 
    0.05, ** p < 0.01, *** p < 0.001 
 
    Table 6: Shapley Decomposition By High Resolution Spatial Feature Subgroup and Night Time 
    Lights 
                                         Lower Poverty 
                                                                Higher Poverty Rate              Average Log Per Capita 
                                         Rate (10% Nat. 
                                                                   (40% Nat. Inc.)                   Consumption 
                                              Inc.) 
    Area                                       10.2                         8.1                           8.0 
    Urban                                      8.7                          8.7                           9.5 
    Agricultural land                          0.9                          1.0                           3.3 
    Paddy land                                 3.3                          3.8                                        
    Cars                                       6.7                          5.1                           4.0 
    Buildings                                  13.0                         16.7                          19.0 
    Vegetation                                 8.0                          6.0                           4.1 
    Shadows                                    12.1                         13.0                          10.6 
    Road variables                             8.0                          8.0                           8.5 
    Roof Type                                  13.0                        12.0                           11.7 
    Texture variables                          8.5                          7.1                           8.9 
    Night time lights variables                7.6                         10.6                           12.1 
    Observations                              1291                         1291                          1291 
    R‐sq                                      0.621                        0.636                         0.632 
Notes: Night time lights category includes the following transformations of night time lights: average, squared, 
cubed, and standard deviation. Variable groupings are identical to those in table 5. 

4.4   Urban and Rural Linear Models
How does the relationship between indicators and welfare differ in urban and rural areas? Table 7
shows model estimates estimated separately for 393 urban villages and the 898 rural ones, based



                                                                 20
on Sri Lanka’s official definition of urban and rural areas.16 Variables were again selected through
Lasso estimation. The urban model selects fewer variables – 13 of the candidate variables in the
urban model are selected versus 16 for the rural model. R-squared values are slightly higher in
rural areas (0.656) and significantly lower in urban areas (0.445).17 For the urban model, log
number of cars, built-up development, and shadow pixels are important. In rural models,
agricultural variables, roof type, shadow pixels, NDVI, Pantex and LBPM are important. The
association between cars and poverty is significantly stronger in urban areas. In addition, the
association between NDVI and poverty is strongly negative in rural areas, as rural areas with more
vegetation and less built-up area are poorer. The coefficient on NDVI in urban areas, meanwhile,
is positive and not statistically significant, suggesting that if anything wealthier urban GNs are
characterized by a greater prevalence of lush vegetation.


 Table 7: Marginal Effects of One Standard Deviation Change for Urban and Rural Models  
                      Urban        Rural       Variables  
 Area                                      ‐0.032           Area   
 Agricultural land                         0.018 *          % of GN area that is agriculture 
 Paddy land                0.045                            Combined Paddy and plantation 
                                           0.026 **         % of GN agriculture that is paddy, % of GN agriculture that is 
                                                            plantation (‐) 
 Cars                      0.093 ***       0.029 ***        Log car count   
 Road variables                            0.030 *          log sum of length  
                           0.041           0.029 ***        Fraction paved  
                                           0.011 ***        Log length of airport runway 
                           ‐0.02                            Log sum of railroads 
 Building density          0.186 ***                        Both building density variables 
                                           0.038 ***        log of Total count of buildings in GN 
 Vegetation                0.041           ‐0.060 ***       NDVI mean, scale 64 
 Shadows                   ‐0.107 **                        % shadows  
                                           ‐0.061 ***       ln shadow pixels 
 Roofs                     0.036           ‐0.084 ***       Fraction of total roofs that are clay 
                           ‐0.022          ‐0.037 ***       Fraction of total roofs that are aluminum  
                                           ‐0.021 *         Fraction of total roofs that are asbestos   
 Spectral and Texture                      ‐0.018           Linear Binary Pattern Moments 
                           ‐0.006                           Line support regions 
                           ‐0.058                           Fourier transform  
                                           0.075 ***        Pantex 
 Observations              393             898               
 R‐sq                      0.446           0.656             
 R‐sq Adj.                 0.427           0.650             
 Out‐of‐Sample R‐sq        0.412           0.641             
 Mean Absolute Error       0.145           0.113             
Notes: Tables gives estimated marginal effect of a one standard deviation change in variable or variables listed in right column. 
For example, the combined marginal effect of a one standard deviation in all three cars variables on the 10 percent relative 
poverty rate is a reduction of 1.2 percentage points. Variables excluded from 40 percent poverty and log consumption models, 
as shown in Table 2, are also excluded when calculating marginal effects for those dependent variables.  For agricultural land, % 
of GN that is plantation is subtracted from the sum of % GN agriculture that is paddy and % total GN area that is paddy. * 
p<0.05, ** p<0.01, *** p<0.001 


16
   This definition is based on administrative units and has not been updated in many years. As a result, some areas
officially classified as rural have urban characteristics.
17
   This might be due to the presence of de-facto urban GNs in the rural sample. In addition, the nature of the
consumption module in the HIES, which could better capture consumption in rural than urban areas.

                                                               21
Table 8: Model Performance for Prediction of Average log per Capita Consumption at Different Points 
in the Welfare Distribution  
                                Bottom 20%         Bottom 40%         Bottom 60%         Bottom 80%         Full Sample 
 Observations                   259                517                775                1033               1291 
 R‐sq                           0.551              0.454              0.474              0.509              0.608 
 Adjusted R‐sq                  0.52               0.436              0.461              0.5                0.602 
 Out of sample                  0.487              0.425              0.447              0.475              0.595 
 Mean Absolute Error            0.064              0.0774             0.0909             0.115              0.139 
 Mean log p.c. income           8.83               8.95               9.00               9.09               9.16 
 Standard deviation             0.11               0.13               0.15               0.20               0.28 
Notes: Table reports model performance statistics for the national model for different subsamples of the bottom portion of the 
GN Division welfare distribution. The dependent variable is average predicted log GN per capita consumption. The rightmost 
column is identical to the results reported in the right column of Table 2.  




 Table 9: MLE Estimation Correcting for Spatial Autoregression 
                                                                                Average Log Per Capita Consumption 
                                                                                              coef         t 
 log Area (square meters)                                                                ‐0.046***         [‐4.01] 
 = 1 if urban                                                                              0.048+          [1.96] 
 % of GN area that is agriculture                                                         0.00022          [0.42] 
 % of GN agriculture that is paddy                                                       0.00046+          [1.74] 
 % of GN agriculture that is plantation                                                  0.00076**         [3.09] 
 % of Total GN area that is paddy                                                          0.00057         [0.79] 
 Total cars divided by total road length                                                     ‐0.93         [‐1.20] 
 Total cars divided by total GN Area                                                        401.4*         [2.28] 
 log number of cars                                                                       0.020***         [3.57] 
 % of area with buildings                                                                0.0083***         [4.19] 
 log of Total count of buildings in GN                                                      0.012          [1.23] 
 Vegetation Index (NDVI), mean, scale 64                                                     0.071         [1.54] 
 Vegetation Index (NDVI), mean, scale 8                                                     ‐0.042         [‐0.67] 
 log of Sum of length of roads                                                             0.029**         [2.70] 
 fraction of roads paved                                                                 0.0012***         [6.00] 
 ln length airport roads                                                                   0.0052          [1.50] 
 ln length railroads                                                                      ‐0.00092         [‐0.48] 
 Fraction of total roofs that are clay                                                  ‐0.0025***         [‐5.83] 
 Fraction of total roofs that are aluminum                                              ‐0.0034***         [‐4.92] 
 Fraction of total roofs are asbestos                                                      0.0014*         [2.26] 
 Linear Binary Pattern Moments (scale 32m), mean                                        ‐0.0080***         [‐3.38] 
 Line support regions (scale 8m), mean                                                       ‐1.25         [‐0.71] 
 Gabor filter (scale 64m) mean                                                              ‐0.053         [‐0.92] 
 Fourier transform, mean                                                                ‐0.0030***         [‐3.61] 
 SURF (scale 16m), mean                                                                   0.0052*          [2.24] 
 Constant                                                                                  9.74***         [51.6] 
 Observations                                                                                    1287 
 Notes: Standard errors have been corrected according to Conley (1999, 2008), with model estimation via GMM. + p<0.10, * 
 p<0.05, ** p<0.01, *** p<0.001 



                                                             22
 Figure 8: Average out of sample R-squared and Average GN welfare, by subsample of GNs  



4.5      Correcting for Spatial Autoregression
One unaddressed concern is whether the presence of either spatial autocorrelation or spatial
heterogeneity leads the standard errors to be underestimated. Spatial autocorrelation can occur in
the presence of geographic spillovers or interactions (Anselin, 2013), and considering the
village-level observations one could develop plausible stories by which poverty is influenced by
this mechanism. A Moran’s I test for the presence of such disturbances according to Anselin
(1996) rejects the null hypothesis that there is no spatial autocorrelation present. To correct for
the spatial autocorrelation, we model explicitly the spatial autoregression (SAR) process and
allow for SAR disturbances, a so-called SARAR model. This is implemented via a generalized
spatial two-stage least-squares (GS2SLS) as shown in Drukker et al. (2013). The results
presented in table 9 show that after correcting for spatial autocorrelation, most high-resolution
spatial features remain significant predictors of local area poverty. Although there is some
presence of autocorrelation, it is not sufficient to alter the joint significance of the spatial
variables.

4.6    Using an Alternative Measure of Simulated Welfare

The results so far have demonstrated that indicators derived from satellite imagery are strongly
predictive of variation in welfare and poverty, measured using imputed welfare into the 2011
census. Imputing into the 2011 census is necessary because the analysis is carried out at the GN
division level, and the HIES survey alone does not contain sufficient data to accurately estimate
welfare at that level. The baseline method uses, as the dependent variable, the average welfare or
poverty rates, taken across both households in the village and the 100 simulations of predicted

                                                23
residuals. This average is then regressed on various satellite features. Because the dependent
variable is an average taken over 100 simulations, it is a measure of expected poverty and
welfare across both simulations and GN households. It incorporates the full distribution of
potential outcomes into the measure of poverty and welfare. Averaging over the 100 simulations
per household also reduces the variance of the stochastic component of welfare by a factor of
100, which raises the explanatory power of the satellite indicators.

An alternative would be to compare satellite-based predictions against simulated poverty and
welfare. This would entail regressing the estimated poverty rate of the GN for each of the 100
simulations against the independent variables, and then averaging the resulting R-squared
statistics across the 100 simulations. In each case, the dependent variable is a single simulated
value of welfare rather than an average across simulations. This provides a lower bound estimate
of the explanatory power of the satellite indicators, were the census to include consumption data.
If actual consumption data were collected in the census, its unexplained portion of actual
consumption data may be correlated with the satellite indicators. In reality, the unexplained
portion of the prediction is constructed be pure noise that cannot be explained by the imagery
indicators.

The top row of Table 10 reports the in-sample R2 when using expected welfare as the dependent
variable, which is identical to the results reported in Table 2. The bottom row reports the average
R2 when using the simulated welfare approach. Using each individual simulation of welfare as
the dependent variable reduces the in-sample R2 for the base regression reported in Table 2 from
0.610 to 0.406 when predicting the 10 percent poverty rate, from 0.618 to 0.496 when predicting
the 40 percent poverty rate, and from 0.608 to 0.515 when predicting log per capita consumption.
Incorporating all 100 draws of the residual from the census prediction regression has a stronger
impact on the accuracy of the resulting poverty measure at the lower poverty threshold. This is
because a greater share of households have predicted welfare that fall above the 10 percent
threshold than the 40 percent threshold, meaning that the residuals from the predicted welfare
function play a larger role in determining poverty at the lower threshold. Overall, the
independent variables continue to predict much of the variation across GNs in welfare and
poverty, even when the GN estimate of welfare and poverty is based on one draw from the
residual of the prediction regression.


 Table 10: R2 of predicted poverty and welfare under an alternative measure of welfare and poverty   
                                                   Lower Poverty       Higher Poverty Rate      Average Log Per Capita 
                                                   Rate (10% Nat.         (40% Nat. Inc.)           Consumption 

 R2 using expected welfare                             0.610                    0.608                    0.618 
 Average R2 using simulated welfare                   0.406                    0.496                     0.515 
 Notes: In‐sample R2 reported. Unit of observation is Grama Niladhari (GN) division. Independent variables are identical 
 to those used in Table 2. Expected welfare refers to the average poverty rate or the average log per capita consumption 
 averaged across both GN households and one hundred simulations. The second row reports the average of R2 statistics 
 from separate regressions using each of the 100 simulations, averaged across GN households, as the dependent 
 variable.   



                                                       24
4.7     Do High Resolution Satellite Features Explain the Poverty Gap?

The poverty gap is a useful supplement to the headcount rate for understanding poverty because
it takes the depth of poverty into account. The poverty gap or        metric measures poverty
depth by considering how far the poor are from a given poverty line.18 We compute the average
poverty gap for each village, and use this measure as a dependent variable in a regression where
the right-hand side includes the size of the GN, a dummy indicating urban classification, and the
features created from high resolution satellite imagery. We consider again poverty lines defined
at the 10th and 40th percentiles of national consumption per capita. Table 11 presents the results
estimated via OLS. The coefficients can be interpreted as a unit change in the distance between
the poverty gap and the poverty line for the average village. As was the case for headcount rates,
high resolution features explain the poverty gap well, with adjusted     values between 0.588 and
0.609. Not surprisingly, building density and shadow variables are also strong correlates of the
poverty gap.


 Table 11: Estimating Poverty Gap Using High Res Features 
                                                Poverty Gap (FGT1 ‐ 10%)        Poverty Gap (FGT1 ‐ 40%) 
                                                    coef            t               coef             t 
 log Area (square km)                              0.0060**         [2.84]            0.0063             [1.02] 
 = 1 if urban                                       ‐0.0063        [‐2.00]            ‐0.013            [‐1.05] 
 % of GN area that is agriculture                 ‐0.000081        [‐1.29]          ‐0.00018            [‐0.76] 
 % of GN agriculture that is paddy               ‐0.000087**       [‐3.24]        ‐0.00033**            [‐3.10] 
 % of GN agriculture that is plantation          ‐0.000053**       [‐2.91]         ‐0.00021*            [‐2.63] 
 % of Total GN area that is paddy                  ‐2.3E‐05        [‐0.29]          ‐0.00025            [‐0.88] 
 Total cars divided by total road length             ‐0.09         [‐1.32]                                   
 Total cars divided by total GN Area                  9.55          [0.72]                                   
 log number of cars                                 ‐0.0014        [‐0.83]           ‐0.0058            [‐1.24] 
 log of Sum of length of roads                    ‐0.0049**        [‐2.97]           ‐0.011*            [‐2.48] 
 fraction of roads paved                         ‐0.000077**       [‐3.37]         ‐0.00023*            [‐2.67] 
 ln length airport roads                           ‐0.00027        [‐0.89]                                   
 ln length railroads                                0.00026         [1.35]                                   
 % of area with buildings                         ‐0.00062*        [‐2.16]          ‐0.0028*            [‐2.04] 
 % shadows (building height) covering valid 
 area                                               0.00053         [1.76]          0.0017               [1.54] 
 ln shadow pixels (building height)                 0.0037*         [2.19]          0.016*               [2.68] 
 Fraction of total roofs that are clay            0.00020**         [2.96]        0.00070**              [3.12] 
 Fraction of total roofs that are aluminum        0.00024**         [3.31]        0.00084**              [3.19] 
 Fraction of total roofs are asbestos              ‐9.1E‐05        [‐1.14]                                   
 log of Total count of buildings in GN             ‐0.0022*        [‐2.62]         ‐0.0073*             [‐2.09] 
 Vegetation Index (NDVI), mean, scale 64             0.017*         [2.33]          0.056**              [2.88] 
 Vegetation Index (NDVI), mean, scale 8            ‐0.019**        [‐2.95]                                   
 Linear Binary Pattern Moments (scale 32m)         0.00048*         [2.55]        0.0029***              [4.87] 
 Line support regions (scale 8m), mean                ‐0.27        [‐1.39]                                   



18
 We calculate for our sample the FGT_1 metric (Foster Greer and Thorbecke, 1984), which is defined as
FGT_1=1/N ∑_(i=1)▒((z-y_j)/z) , where y_j is an individual’s income, and z is the poverty threshold.

                                                      25
5     Out-of-Sample Performance with Two Applications

5.1     Estimating Poverty Using a Reduced Census Training

A key motivation for this analysis is to understand how HRSF compliments traditional surveys to
generate small area estimates. The standard small area estimation technique used to model
poverty combines a smaller household survey with a population census. Conducting a population
census is expensive, but needed to cover the full range of individuals within a country. Can
satellite features be combined with a smaller household survey alone to produce sufficiently
precise small area estimates? To assess this, we examine whether the predictive power of
satellite imagery remains when it is calibrated using a census extract, of approximately the size
of the Household Income and Expenditure Survey, rather than a full census.

We produce an alternative version of each of the three dependent variables (either per capita
consumption or GN poverty rate) using artificial subsamples of the census intended to mimic the
size of a household survey. This involves sampling 20% of GNs, and 5% of the actual
households in that GN, from the predicted welfare measure imputed into the household-level
census data. We use these artificial samples to build a model of poverty or consumption using
HRSF, which produces estimates of poverty that can then compared to actual estimates based on
the full census data. This sheds slight on the trade-off between reducing the size of the training
sample, which saves considerable money, and the resulting loss of precision of the estimated
poverty rates.

Figure 9 plots the results of the simulation exercise, where we have plotted R-squared values
between predicted welfare rates and true welfare rates, both in-sample (GNs within the
subsample) and out-of-sample (GNs excluded from the subsample), and mean absolute error.
Average R-squared values between predicted and true values do not depreciate significantly
when using the sample consisting of 20 percent of GNs and 5 percent of households within those
GNs. R-squared values decline modestly with the smaller sample size, but R-squared values
remain well above 0.5 even when sampling artificially many fewer households. The second
panel of figure 9 shows the same exercise with mean absolute error used as a metric, and the
results confirm there is negligible difference when using many fewer households to train the
model. These results suggest that it is possible to generate reasonably reliable estimates of
economic well-being by combining household survey data with features extracted from satellite
imagery.

5.2 Poverty Estimation via Geographic Extrapolation

Another major motivation for using satellite imagery is to extrapolate poverty estimates into
areas where survey data on economic well-being do not exist. While many poor countries are
unable to conduct regular surveys, several other countries collect welfare data regularly but omit
selected regions, due to political turmoil, violence, animosity towards the central government, or
prohibitive expense. For example, from 2002 through 2009/10, Sri Lanka’s HIES failed to cover
certain districts in the North and Eastern parts of the country due to civil conflict, and Pakistan’s
HIES exclude the Federally Administered Tribal Areas, Jammu and Kashmir.


                                                 26
       Figure 9: Model explanatory power and error with artificially reduced sample
       size. (20% of GNs sampled to estimate model, Households sampled as shown.)




To assess how well a model “travels” to a different geographic area, we fit a series of models,
where in each model we exclude a single Divisional Secretariat (DS), a larger administrative
area, from the model, and use the estimated model to predict into that excluded area. This is a
form of “leave-one-out cross-validation” (LOOCV), a common method used to infer statistical
out-of-sample performance (Gentle et al., 2012). We estimate both linear models and random
forest models19 to predict out of sample to determine if more flexible model specifications
perform better out-of-sample.

Our approach differs from the standard case in that for each estimation we exclude, or “leave
out”, an entire Divisional Secretariat (DS), an administrative sub-unit at the level immediately

19
     For each random forest model, we use 1,000 decision trees, sampling 13 of the predictors with replacement.



                                                          27
below the district. Table 12 shows model performance at predicting into adjacent areas,
comparing normalized root mean squared error, normalized mean absolute error, and the
correlation between predicted and true welfare rates using both random forest and linear models
to fit HRSF models. The adjacent prediction error rates are larger than when predicting randomly
out of sample using cross-validation. Normalized error rates divide average error by the average
value of welfare, therefore the error rates can be interpreted as fractions of average welfare.
Mean absolute error is estimated at 2.5% of log household consumption, 45% of the average
poverty rate at the lower poverty line, and 30% of the average poverty rate at the higher poverty
line using linear models to estimate and predict into adjacent areas. The error rates are lower
when using random forests to estimate and predict into adjacent areas. When predicting using
random forest models, the mean absolute error declines to 1.5% of log household consumption,
38% of the average poverty rate at the lower poverty line, and 25% of the higher poverty line.

While these error rates imply adjacent predictions will be too imprecise for producing welfare
measures intended as official statistics, they may be sufficient for generating rank ordering of
villages by poverty or income. The rank correlation between the predicted and the true values
results in a Spearman’s ρ estimated at between 0.67 and 0.7 for the linear models, and between
0.74 and 0.76 using the random forest models. We conclude from these results that HRSF cannot
yet be used to predict accurately into adjacent areas for official statistics, but the accuracy may
be acceptable for targeting or other applications, and is likely to improve as better machine
learning methods are employed.


 Table 12: Model Performance Predicting Into Adjacent Areas 
                                                                          Dependent Variable  
                                                      Average Log Per      Lower Poverty Rate    Higher Poverty Rate 
  
                                                    Capita Consumption       (10% Nat. Inc.)        (40% Nat. Inc.)  
                                                     Linear Models 
 Normalized Root Mean Squared Error (NRMSE)                 0.0836               0.9225                0.5376 
 Normalized Mean Absolute Error (NMAE)                      0.0242               0.4544                0.2923 
 Correlation, Predicted and True Poverty Rates              0.6983               0.6707                0.6772 
                                                  Random Forest Models 
 Normalized Root Mean Squared Error (NRMSE)               0.02008                0.5454                0.3373 
 Normalized Mean Absolute Error (NMAE)                    0.01537                0.3827                0.2561 
 Correlation, Predicted and True Poverty Rates              0.7608               0.7512                0.7423 




6    Conclusion

Traditionally, given the prohibitive cost of conducting surveys sufficiently large to provide
accurate statistics for small areas, generating small area poverty estimates requires pairing a
welfare survey with a census or inter-census survey. Census and inter-census data are expensive
to collect and therefore produced relatively infrequently. The data are also usually disseminated

                                                       28
with a significant lag, making it difficult to rapidly assess changes in local living standards. The
results above show that indicators derived from high spatial resolution imagery, when paired with
survey data, generate accurate predictions of local-level poverty and welfare, and that by and large
the conditional correlations are of sensible signs and magnitudes. Furthermore, predictions based
on specific features accurately predict mean per capita consumption throughout the welfare
distribution. While the welfare consequences of more frequent measures of poverty and inequality
are unknown, they may be large given the many applications of frequent local measures of
economic well-being, ranging from impact evaluation, to budget allocation to social transfers.

How well do indicators derived from satellite imagery predict poverty and which indicators are
most important? We investigate these questions using a sample of 1,291 villages in Sri Lanka,
linking measures of economic well-being with features derived from high resolution satellite
imagery. The results indicate that the correlation between satellite derived indicators and economic
well-being is remarkably strong. Simple linear models explain 60 to 61 percent of the variation in
poverty and average log per capita consumption. In both rural and urban areas, variables measuring
building density, built-up area, and shadows are the strongest predictors of variation in poverty.
As expected, the extent and lushness of vegetation is negatively correlated with welfare in rural
areas, and mildly positively correlated with incomes in urban areas, suggesting that trees and other
vegetation are a luxury in urban areas.

While these results are very encouraging, additional analysis suggests caution when extrapolating
predictions into geographically adjacent areas. The normalized error rates range from a quarter to
one-half of the poverty rates, depending on the incidence of poverty. The likely impediment to
extrapolation is geographic heterogeneity in the relationship between indicators and welfare. Using
models that learn from geographic heterogeneity, such as ensemble methods or deep learning
methods, will likely improve performance for this task. Another factor is the time differences at
which the satellite images were taken, which can contribute to noise in the independent variables
across geographic regions. This could impact particular indicators such as car counts, which can
vary greatly according to the day of the week the imagery was obtained. Measures of agriculture
also exhibit considerable seasonal variation, which may also confound extrapolation to adjacent
areas. This suggests that some indicators may particularly contribute to bias when extrapolating
across space, and that the date of the image is an important consideration when considering spatial
extrapolation using satellite-based indicators. We suspect this will improve as the revisit rates of
satellites improves. Planet, which has 190 imagery satellites in orbit, already claims daily revisit
rates for all the earth’s land mass, sometimes giving revisit rates as frequently as every hour.

These findings raise a host of questions for further work, and contribute to an ongoing discussion
regarding the use of predictive methods in public policy (Athey, 2017). The most immediate of
these is whether satellite indicators can substitute for census data in different contexts and for
different indicators. Does the strong correlation between satellite-based indicators and economic
well-being extend to wage income measured directly from an expenditure survey? Second, it is
important to better understand the extent to which these results generalize to different social and
ecological environments, such as Africa, the Middle East, and other parts of Asia. There is no
guarantee that the predictive power of building density, shadows, and other features documented
above will hold in all environments.



                                                29
A second line of research could explore whether changes in satellite imagery could be used to
forecast changes in economic well-being across space and time. Poverty surveys are typically
collected every three years and the most recent global estimates are produced with a three-year
lag. Therefore, the ability to “now-cast” measures of economic well-being by combining
frequently updated satellite imagery with the most recent survey-based measures of poverty has
great potential. Secondly, additional research can shed light on identifying the best way to
predict into adjacent areas not covered by surveys. Overall, the inevitable increase in the
availability of imagery and feature identification algorithms, in conjunction with the encouraging
results from this study, implies that satellite imagery will become an increasingly valuable tool to
help governments and stakeholders better understand the spatial nature of poverty.




                                                30
References
Abrahams, A., Lozano-Gracia, N., & Oram, C. (2015). Correcting Overglow in Night-time Lights Data.
Unpublished manuscript. Accessed at: https://sites. google. com/site/alexeiabrahams2.

Afzal, M., Hersh, J., and Newhouse, D. (2016). “Building a better model: Variable Selection to Predict
Poverty in Pakistan and Sri Lanka”. Mimeo, World Bank.

Athey, S. (2017). Beyond prediction: Using big data for policy problems. Science, 355(6324).

Athey, S., & Imbens, G. (2015). Machine Learning Methods for Estimating Heterogeneous Causal Effects.
arXiv preprint arXiv:1504.01132.

Anselin, Luc. Spatial econometrics: methods and models. Vol. 4. Springer Science & Business Media,
2013.
Anselin, Luc, et al. "Simple diagnostic tests for spatial dependence." Regional science
and urban economics 26.1 (1996): 77-104.
H. Bay, T. Tuytelaars, and L. V. Gool. SURF: Speeded Up Robust Features. Lecture Notes in Computer
Science, 3951:404–417, 2006

Belloni, Alexandre and Chernozhukov, V. (2013). “Least squares after model selection in high-
dimensional sparse models” Bernoulli. 19(2).

Besley, T., & Ghatak, M. (2006). “Public goods and economic development”. Understanding Poverty. (pp.
285-302). Oxford: Oxford University Press.

N. Dalal, and B. Triggs, “Histograms of oriented gradients for human detection,” in Computer Vision and
Pattern Recognition (CVPR), San Diego, CA, 2005, pp. 886-893.

Department of Census and Statistics and World Bank, 2015 “The Spatial Distribution of Poverty
in Sri Lanka”, available at: http://www.statistics.gov.lk/poverty/SpatialDistributionOfPoverty2012_13.pdf

Donaldson D., and Storeygard A. “Big Grids: Applications of Remote Sensing in Economics”, forthcoming,
JEP.

Drukker, David M., Ingmar R. Prucha, and Rafal Raciborski. "Maximum likelihood and generalized spatial
two-stage least-squares estimators for a spatial-autoregressive model with spatial-autoregressive
disturbances." University of Maryland, Department of Economics (2011).

Elbers, C., Lanjouw, J. O., & Lanjouw, P. (2003). Micro–level estimation of poverty and inequality.
Econometrica, 71(1), 355-364.

Elbers, Chris, Peter F. Lanjouw, and Phillippe G. Leite. "Brazil within Brazil: Testing the poverty map
methodology in Minas Gerais." World Bank Policy Research Working Paper Series, Vol (2008).




                                                   31
Elvidge, C. D., Baugh, K. E., Kihn, E. A., Kroehl, H. W., & Davis, E. R. (1997). Mapping city lights with
nighttime data from the DMSP Operational Linescan System. Photogrammetric Engineering and Remote
Sensing, 63(6), 727-734.

Engstrom, R., Ashcroft, E., Jewell, H., & Rain, D. (2011, April). Using remotely sensed data to map
variability in health and wealth indicators in Accra, Ghana. In Urban Remote Sensing Event (JURSE), 2011
Joint (pp. 145-148). IEEE.

Engstrom, R., Sandborn, A., Yu, Q. Burgdorfer, J., Stow, D., Weeks, J., and Graesser, J. (2015) Mapping
Slums Using Spatial Features in Accra, Ghana. Joint Urban and Remote Sensing Event Proceedings
(JURSE), Lausanne, Switzerland, 10.1109/JURSE.2015.7120494

Fan, J., & Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle
properties. Journal of the American statistical Association, 96(456), 1348-1360.

Foster, James; Joel Greer; Erik Thorbecke (1984). "A class of decomposable poverty measures".
Econometrica. 3. 52: 761–766.

S. W. Smith, The scientist and engineer’s guide to digital signal processing. San Diego, CA: California
Technical Publishing, 1997.

Gentle, J. E., Härdle, W. K., & Mori, Y. (Eds.). (2012). Handbook of computational statistics: concepts and
methods. Springer Science & Business Media.

J. Graesser, A. Cheriyadat, R. R. Vatsavai, V. Chandola, J. Long, and E. Bright, “Image based
characterization of formal and informal neighborhoods in an urban landscape,” IEEE J. Sel. Topics Appl.
Earth Observ. Remote Sens., vol. 5, no.4, pp. 1164-1176, Jul, 2012.

Henderson, J. V., Storeygard, A., & Weil, D. N. (2012). Measuring economic growth from outer space. The
American Economic Review, 102(2), 994-1028.

Kleinberg, J., Ludwig, J., Mullainathan, S., & Obermeyer, Z. (2015). Prediction policy problems. The
American Economic Review, 105(5), 491-495.

Gabor, D. (1946). Theory of Communication. Journal of the Optical Society of America-A, 2 (2),
1455-1471.

Glaeser, E. L., Kominers, S. D., Luca, M., & Naik, N. (2015). Big Data and Big Cities: The Promises and
Limitations of Improved Measures of Urban Life (No. w21778). National Bureau of Economic Research.

Huettner, Frank, and Marco Sunder. "Axiomatic arguments for decomposing goodness of fit according to
Shapley and Owen values." Electronic Journal of Statistics 6 (2012): 1239-1250.

Israeli, Osnat. "A Shapley-based decomposition of the R-square of a linear regression." The Journal of
Economic Inequality 5.2 (2007): 199-212.

Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional
neural networks. In Advances in neural information processing systems (pp. 1097-1105).

Michalopoulos, S. (2012). The origins of ethnolinguistic diversity. The American economic review, 102(4),
1508.

                                                    32
Mooney, D. F., Larson, J. A., Roberts, R. K., & English, B. C. (2009). Economics of the variable rate
technology investment decision for agricultural sprayers. In Southern agricultural economics association
annual meeting, Atlanta, Georgia, January.

Mullahy, J. (1998). Much ado about two: reconsidering retransformation and the two-part model in health
econometrics. Journal of health economics, 17(3), 247-281.

Mullainathan, S. (2014, August). Bugbears or legitimate threats?:(social) scientists' criticisms of machine
learning?. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery
and data mining (pp. 4-4). ACM.

Newhouse, David; Suarez Becerra, Pablo; Doan, Dung. 2016. Sri Lanka Poverty and Welfare: Recent
Progress and Remaining Challenges. World Bank, Washington, DC. © World Bank.
https://www.openknowledge.worldbank.org/handle/10986/23794 License: CC BY 3.0 IGO.

Nunn, N., & Puga, D. (2012). Ruggedness: The blessing of bad geography in Africa. Review of Economics
and Statistics, 94(1), 20-36.

M. Pesaresi, A. Gerhardinger, and F. Kayitakire, “A robust built-up area presence index by anisotropic
rotation-invariant textural measure,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 1, no. 3,
pp. 180-192, Oct, 2008.

Pinkovskiy, Maxim, and Xavier Sala-i-Martin. "Lights, Camera… Income! Illuminating the National
Accounts-Household Surveys Debate." The Quarterly Journal of Economics 131.2 (2016): 579-631.

Sandborn, A. and Engstrom, R (In Press) Determining the Relationship Between Census Data and Spatial
Features Derived From High Resolution Imagery in Accra, Ghana. IEEE Journal of Selected Topics in
Applied Earth Observations and Remote Sensing (JSTARS) Special Issue on Urban Remote Sensing

Serajuddin, U., Uematsu, H., Wieser, C., Yoshida, N., & Dabalen, A. (2015). Data deprivation: another
deprivation to end. World Bank Policy Research Working Paper, (7252).

Shorrocks, Anthony F. "Decomposition procedures for distributional analysis: a unified framework based
on the Shapley value." Journal of Economic Inequality (2013): 1-28.

Tucker CJ (1979). Red and photographic infrared linear combinations for monitoring vegetation. Remote
Sensing of Environment 8: 127-150.

Watmough, G. R., Atkinson, P. M., Saikia, A., & Hutton, C. W. (2016). Understanding the Evidence Base
for Poverty–Environment Relationships using Remotely Sensed Satellite Data: An Example from Assam,
India. World Development, 78, 188-203.

Jean, N., Burke, M., Xie, M., Davis, W. M., Lobell, D. B., & Ermon, S. (2016). Combining satellite imagery
and machine learning to predict poverty. Science, 353(6301), 790-794.

Marx, B., Stoker, T. M., & Suri, T. (2013). The Political Economy of Ethnicity and Property Rights in
Slums: Evidence from Kenya.

Varian, H. R. (2014). Big data: New tricks for econometrics. The Journal of Economic Perspectives, 28(2),
3-27.


                                                    33
W. P. Yu, G. W. Chu, and M. J. Chung, “A robust line extraction method by unsupervised line clustering,”
Pattern Recognition, vol. 32, no. 4, pp. 529-546, Apr, 1999.

L. Wang, and D. He, “Texture classification using texture spectrum,” Pattern Recognition, vol. 23, no. 8,
pp. 905-910, 1990.

Wong, T. H., Mansor, S. B., Mispan, M. R., Ahmad, N., & Sulaiman, W. N. A. (2003, May). Feature
extraction based on object oriented analysis. InProceedings of ATC 2003 Conference (Vol. 2021).




                                                   34