WPS8284 Policy Research Working Paper 8284 Poverty from Space Using High-Resolution Satellite Imagery for Estimating Economic Well-Being Ryan Engstrom Jonathan Hersh David Newhouse Poverty and Equity Global Practice Group December 2017 Policy Research Working Paper 8284 Abstract Can features extracted from high spatial resolution sat- approach. A simple linear regression model, using only these ellite imagery accurately estimate poverty and economic inputs as explanatory variables, explains nearly 60 percent well-being? This paper investigates this question by extract- of poverty headcount rates and average log consumption. ing object and texture features from satellite images of Sri In comparison, models built using night-time lights explain Lanka, which are used to estimate poverty rates and average only 15 percent of the variation in poverty or income. The log consumption for 1,291 administrative units (Grama predictions remain accurate when restricting the sample Niladhari divisions). The features that were extracted to poorer Gram Niladhari divisions. Two sample appli- include the number and density of buildings, prevalence cations, extrapolating predictions into adjacent areas and of shadows, number of cars, density and length of roads, estimating local area poverty using an artificially reduced type of agriculture, roof material, and a suite of texture census, confirm the out-of-sample predictive capabilities. and spectral features calculated using a nonoverlapping box This paper is a product of the Poverty and Equity Global Practice Group. It is part of a larger effort by the World Bank to provide open access to its research and make a contribution to development policy discussions around the world. Policy Research Working Papers are also posted on the Web at http://econ.worldbank.org. The authors may be contacted at dnewhouse@worldbank.org. The Policy Research Working Paper Series disseminates the findings of work in progress to encourage the exchange of ideas about development issues. An objective of the series is to get the findings out quickly, even if the presentations are less than fully polished. The papers carry the names of the authors and should be cited accordingly. The findings, interpretations, and conclusions expressed in this paper are entirely those of the authors. They do not necessarily represent the views of the International Bank for Reconstruction and Development/World Bank and its affiliated organizations, or those of the Executive Directors of the World Bank or the governments they represent. Produced by the Research Support Team Poverty from Space: Using High-Resolution Satellite Imagery for Estimating Economic Well-Being1 Ryan Engstrom2 Jonathan Hersh3 David Newhouse4 Keywords: poverty estimation, satellite imagery, machine learning JEL classification: I32, C50 1 This project benefited greatly from discussions with Sarah Antos, Ana Areias, Marianne Baxter, Sam Bazzi, Azer Bestavros, Jacob Bien, Kristen Butcher, John Byers, Pedro Conceição, Francisco Ferreira, Ray Fisman, Alex Guzey, Klaus-Peter Hellwig, Kristen Himelein, Selim Jahan, Matthew Kahn, Tariq Khokhar, Hannes Mueller, Trevor Monroe, Dilip Mookherjee, Pierre Perron, Hashem Pesaran, Bruno Sánchez-Andrade Nuño, Kiwako Sakamoto, Jacob Shapiro, David Shor, Benjamin Stewart, Andrew Whitby, Nat Wilcox, Nobuo Yoshida and seminar participants at Boston University, Chapman University, University of Southern California, Princeton University, UNDP, The World Bank, and the Department of Census and Statistics of Sri Lanka. All remaining errors in this paper remain the sole responsibility of the authors. Sarah Antos, Benjamin Stewart, and Andrew Copenhaver provided assistance with texture feature classification. Object imagery classification was assisted by James Crawford, Jeff Stein, and Nitin Panjwani at Orbital Insight, and Nick Hubing, Jacqlyn Ducharme, and Chris Lowe at Land Info, who also oversaw imagery pre-processing. Hafiz Zainudeen helped validate roof classifications in Colombo. Colleen Ditmars and her team at DigitalGlobe facilitated imagery acquisition, Dung Doan and Dilhanie Deepawansa developed and shared the census-based poverty estimates, and we thank Dr. Amare Satharasinghe for authorizing the use of the Sri Lankan census data. Liang Xu provided research assistance. Zubair Bhatti, Benu Bidani, Christina Malmberg-Calvo, Adarsh Desai, Nelly Obias, Dhusynanth Raju, Martin Rama, and Ana Revenga provided additional support and encouragement. The authors gratefully acknowledge financial support from the Strategic Research Program and World Bank Big Data for Innovation Challenge Grant, and the Hariri Institute at Boston University. The views expressed here do not necessarily reflect the views of the World Bank Group or its executive board, and should not be interpreted as such. 2 rengstro@gwu.edu Department of Geography, George Washington University, 1922 F Street, Washington DC 3 hersh@chapman.edu Argyros School of Business, Chapman University, 1 University Dr., Orange, CA 4 dnewhouse@worldbank.org, Poverty and Equity Global Practice, World Bank, 1818 H Street, Washington DC 1 Introduction Despite the best efforts of national statistics offices and the international development community, local area estimates of poverty and economic welfare remain rare. Between 2002 and 2011, as many as 57 countries conducted zero or only one survey capable of producing poverty statistics, and data are scarcest in the poorest countries (Serajuddin et al., 2015). But even in countries where data are collected regularly, household surveys are typically too small to produce reliable estimates below the district level. Generating welfare estimates for smaller areas requires both a household welfare survey and contemporaneous census data, and the latter are typically available once per decade at best. Furthermore, safety concerns may prohibit survey data collection in many conflict areas altogether. Satellite imagery has generated considerable enthusiasm as a potential supplement to household data that can help fill these severe data gaps. In recent years, private companies such as DigitalGlobe and Airbus have rapidly expanded the coverage and availability of high spatial resolution imagery (HSRI), driving down commercial prices. Planet (formerly Planetlabs) currently operates more satellites than any organization other than the U.S. and Russian governments, and just recently, successfully launched 88 dove satellites that will allow for coverage of the entire globe with imagery resolution of 3 to 5 m per pixel on a daily basis. Continued technological advances will increasingly allow social scientists to benefit from this type of imagery, which has been utilized intensively by the intelligence and military communities for decades. This paper investigates the ability of object and texture features derived from HSRI (High Spatial Resolution Imagery) to estimate and predict poverty rates at local levels. The area of our study covers 3,500 square kilometers in Sri Lanka, which contain 1,291 villages (Grama Niladhari (GN) divisions). For each village, we extract both object and texture features to use as explanatory variables in poverty prediction models. Object features extracted include the number of cars, number and size of buildings, type of farmland (plantation or paddy), the type of roofs, the share of shadow pixels (building height proxy), road extent and road material, along with textural measures. These features are identified using a combination of deep learning-based Convolutional Neural Networks (CNN) and classification of spectral and textural characteristics. These satellite derived features were then matched to household estimates of per capita consumption imputed into the 2011 Census for the 1,291 GN divisions. We investigate five main questions: 1) To what extent can variation in GN economic well-being - - poverty rates defined at the 10 and 40th percentiles of national income and average GN consumption -- be explained by high spatial-resolution features? 2) Which features are most strongly correlated with these measures of well-being? 3) Do these features predict equally well in poor and rich GNs? In urban and rural GNs? 4) Can these models predict into geographically adjacent areas? and 5) Are predictions robust to the use of a smaller sample of training data? We find that: i) satellite features are highly predictive of economic well-being and explain about 60 percent of the variation in both village average consumption and estimated poverty headcount rates; ii) Built-up area and roof type strongly correlate with welfare. Car counts and building height are strong correlates in urban areas, while the share of paved roads and agricultural type are strong correlates in rural areas; iii) Accuracy declines only slightly in the poorest decile of villages 2 (average consumption of $4.67 per day). Models are less accurate in urban areas than rural ones; iv) Predicting into adjacent areas produces less accurate poverty measures, but ranking between true and predicted rates is moderately high; and v) Using a 1 percent sample of the census based ground truth, designed to mimic the sampling strategy of the Household Income and Expenditure Survey, has little impact on the accuracy of the prediction. This paper contributes to a growing literature exploring how remotely sensed data may be used to assess welfare. Traditionally, the most popular remotely sensed measure for economic applications has been night-time lights (NTL), which measures the intensity of light captured passively by satellite. Strong correlations between NTL and GDP appear at the country level (Henderson et al., 2009, Pinkovskiy and Sala-I-Martin, 2016) although within a country NTL appears more strongly correlated with density than welfare. The relationship between lights and wages or other measures of income appears weak (Mellander et al., 2013), casting doubt on its reliability as a proxy for small area estimates of welfare. Additionally, NTL is ill-suited for identifying variation in welfare within small areas because of its low spatial resolution. Even the most advanced NTL satellite, the Visible Infrared Imaging Radiometer Suite VIIRS, has a spatial resolution at nadir of approximately 1.0 km2.5 Indeed, we find that NTL captures only 15% of the variation in poverty or income in the same area where high resolution spatial features capture 60 percent of the variation. Daytime imagery has recently emerged as a practical source of information on welfare, in large part due to new developments in computer vision algorithms. Advances in Deep Learning such as Convolutional Neural Networks (CNN) have the capability to algorithmically classify objects such as cars, building area, roads, crops and roof type (Krizhevsky, Sutskever, and Hinton, 2012). These objects may be more strongly correlated with local income and wealth than NTL. Furthermore, textural and spectral algorithms provide a simpler alternative to analyzing HSRI that does not rely on object classification (Graesser et al. 2012, Engstrom et al. 2015, Sandborn and Engstrom 2016). In this approach, the spatial and spectral variations in imagery are calculated over a neighborhood of pixels to characterize the local scale spatial pattern of the objects observed in the imagery. These measures, which we refer to as “texture” or “spectral” measures, capture information about an area that may not be clear from object recognition alone. This paper also contributes to a literature exploring how supervised learning techniques from machine learning may be applied to unstructured data to reveal information about human welfare (Athey, 2017). Glaeser, Kominers, Luca, and Naik (2015) apply texture-based machine vision classification to images that are captured from Google Street View, trained using subjective ratings of the images on the basis of the perceived safety. They estimate a support vector machine model and show the fitted model can reliably predict block level income in New York City. Jean et al. (2016) employ an innovative transfer learning approach, in which a set of 4,096 unstructured features are extracted from the penultimate layer of a convolutional neural network that uses Google Earth daytime imagery to predict the luminosity of NTL. These 4,096 features are then used to predict the average per capita consumption of enumeration areas (villages), taken from living standard measurement surveys using ridge regression to prevent overfitting. The resulting model predicts well and explains an average of 46 percent of the variation in 5 Pixel size can vary depending on the angle of the satellite relative to the ground site. 3 village per capita consumption, out of sample, across the four countries in which it was trained. While this innovative use of daytime imagery substantially improves on the use of night time lights alone, there are two issues with its applicability to poverty measurement. First, it is not clear that the transfer learning method generalizes well to other contexts where population density is low. Extensions of this approach in Haiti and Nepal (Head et al., 2017) show declines in predictive power, suggesting the NTL step in the transfer learning process may be ill suited for poor, low-density areas. Second, the transfer learning method is not necessarily optimal for predicting very poor areas. When the top two quintiles are excluded from their sample, restricting the sample to those below twice the international poverty line, the falls precipitously, to about 0.12. This illustrates the challenges this method faces in distinguishing welfare among the poorest of the poor, who in the African context most likely live in relative dark.6 This study utilizes imagery features that are based either on recognizable objects or “texture” algorithms developed for computer vision applications, derived from High Spatial Resolution Imagery (HSRI). This method offers several advantages for the estimation of poverty rates. First, it eliminates reliance on NTL, which is a coarse measure of welfare, to identify relevant features for model development. Second, it provides a more transparent understanding of the underlying factors that explain geographic variation in welfare in different contexts. Third, features developed from HSRI, such as roads and the extent of built-up area, are useful for policy analysis in other areas, such as transport and urban planning. Finally, a feature-based approach can easily be extended to alternative welfare indicators, such as headcount poverty rates measured at different thresholds. The paper proceeds as follows: Section 2 summarizes how the data were created and presents brief summary statistics. Section 3 presents the statistical methodology. Section 4 examines the predictive power of high resolution satellite features (HRSF) to estimate poverty in small areas at the village level. Section 5 examines out-of-sample performance using two applications from estimating local area economic well-being. Section 6 concludes. 2 Data Description Our analysis is restricted to a sample area of approximately 3,500 km2 in Sri Lanka. National coverage was not feasible due to the high cost and partial availability of high-resolution imagery; however, these data are rapidly becoming more available and less expensive as companies such a Planet and DigitalGlobe expand their archives and launch newer, more precise satellites with more frequent revisit rates. We sampled DS divisions conditional on HSRI being available, drawing areas from urban, rural, and estate sectors.7 According to the 2012 census, population by sector in Sir Lanka is rural (77.4%), urban (18.2%) and estate (4.4%) (Sri Lanka Department of 6 It is not straightforward to generate estimates of poverty headcount rates from predictions of mean consumption at the village level, since poverty rates depend on the dispersion of welfare within each village as well as its mean. 7 Sri Lanka classifies sectors as urban, rural, or estate. The estate sector refers to plantation areas of more than 20 acres with 10 or more residential laborers. Except for sample stratification, the estate sector is grouped with the rural sector. 4 Census and Statistics, 2012). Population by sector in our sample is rural (45.9%), urban (46.2%) and estate (7.8%). 2.1 Details on Satellite Imagery The satellite imagery consists of 55 unique “scenes” purchased from Digital Globe, covering areas specified in our sample area. Each “scene” is an individual image captured by a particular sensor at a particular time. Images were acquired by three different sensors: Worldview 2, GeoEye 1, and Quickbird 2. These sensors have a spatial resolution of 0.46m2, 0.41m2, and 0.61m2, respectively in the panchromatic band and 1.84m2, 1.65m2, and 2.4m2, respectively, in the multi-spectral bands. Pre-processing of imagery included pan-sharpening, ortho-rectification, and image mosaicking. Most imagery was captured in either 2011 or 2012, although some imagery from 2010 was also used. Figure 1: Coverage Area of High Resolution Satellite Imagery Notes: Sample area shown highlighted in white. 2.2 Details on Poverty Data Ideally village poverty and consumption statistics would be generated directly from the 2012/13 Household Income and Expenditure Survey (HIES), a detailed survey that measures the consumption patterns of 25,000 households on approximately 400 consumption items. The survey contains an average of 8.4 households per GN division in the 47 sampled DS divisions, making the HIES insufficient to generate consistent poverty estimates at the GN division level without supplementary data. We therefore draw on the most common method to impute welfare estimates (Elbers, Lanjouw, Lanjouw, 2003) into the 2011 Census of Population and Housing, which is identical to the method used to generate official poverty estimates at the DS division level (Department of Census and Statistics and World Bank, 2015). For each household in the 5 census, per capita consumption was estimated based on models developed from the HIES, using household indicators that are common to both the Census and the HIES. 8 We derive GN headcount poverty rates using the standard Foster-Greer-Thornbecke method (Foster, et al., 1984), for two poverty lines: poverty line 1 at the 10th percentile of the national per capita consumption distribution, and poverty line 2 at the 40th percentile. This is equivalent to $3.00 and $5.13 per day, respectively, in 2011 PPP terms, which compares to an extreme poverty line in 2011 prices of $1.90 per day. Imputing welfare into the census requires an assumption of spatial homogeneity within small areas. This assumption “may severely underestimate the variance of the error in predicting welfare estimates at the local level in the likely presence of small-area heterogeneity in the conditional distribution of expenditure or income” (Tarozzi and Deaton, 2009). To test the extent of spatial heterogeneity in practice, small area estimates of poverty have been compared to census-based measures in Mexico and Brazil, which each collect income information in their census. Considerable spatial heterogeneity is present in Mexico.9 In contrast, Elbers et al (2009) find significantly less in Minas Gerais, Brazil. The effect of spatial heterogeneity on the results presented below is unclear. We are not aware of any empirical estimate of the extent to which the spatial heterogeneity assumption leads to biased poverty headcount estimates at the local level. To the extent any additional noise in the poverty estimates due to uncaptured heterogeneity in the coefficients is independent across neighboring households within a GN, this noise would be significantly reduced after averaging over a large number of households. 2.3 Comparison of GN Poverty Rates and Mean NTL Reflectance A simple visual comparison between mean NTL and village poverty rates illustrates why NTL provides limited information on sub-national welfare. Figure 2 presents a panel of three images for the Divisional Secretariat of Seethawaka: mean raw NTL (left), poverty rates derived from the 10% national income threshold (middle), and log of mean population density (right). Comparing the left and middle panels, there is only a small association between villages that have low NTL reflectance and those that are high in poverty. Problems of overglow (Henderson et al., 2012) mean that poor villages adjacent to wealthy ones will be misclassified as non-poor.10 While NTL tracks the general contours of poverty for the DS – lower poverty areas in the Northwest and higher poverty areas in the Southeast – this coarse association is only of limited use for public policy applications such as poverty targeting or budget allocations. NTL appears to give a more accurate approximation of the population density of the underlying GN divisions, which is consistent with Mellander et al. (2013). Comparing the right and left panels shows a strong association between high NTL areas and areas with a high population density. We take this to suggest that the information content contained within NTL related to human welfare is limited. While lights at night may indicate gross associations, it is a highly imperfect measure of 8 One hundred simulations of consumption were estimated in the 2011 census using the PovMap 2.0 software. 9 Simulations indicate that in 10 percent of municipalities, the coverage rate of the estimated poverty rate is less than 50 percent. In other words, in these 10 percent of municipalities, confidence intervals from simulations that estimate headcount rates exclude the true poverty rate in more than half the simulations. 10 Abrahams (2015) describes a method to correct for overglow, but this method has yet to be widely adopted. 6 Figure 2: Comparison of Mean Night Time Lights (NTL), Poverty Rate, and Mean Population Density, Seethawaka, Sri Lanka welfare. We therefore investigate whether the much richer set of information contained in HSRI daytime imagery translates into more accurate welfare predictions. 2.4. Feature Extraction from High-Resolution Satellites The derived high-resolution spatial features fall into seven broad categories: (1) Agricultural Land, (2) Cars, (3) Building Density and Vegetation, (4) Shadows (building height proxy) (5) Road and Transportation; (6) Roof Type; and (7) Textural and Spectral characteristics. In addition to the satellite features, we use two geographic attributes of the GN division: Whether it is administratively classified as an urban area, and its area in square kilometers. Table 1 presents summary statistics for these variables. Deep learning-based object classification was used for classifying the share of the GN division that is built-up (i.e. consists of buildings), the number of cars in the GN, and the share of pixels in the GN that were identified as shadow pixels (proxy for building heights), and crop type. The classification method used is similar to Krizhevsky, Sutskever, and Hinton (2012), which utilizes convolutional neural networks (CNN) to build object predictions from raw imagery. Roof type, paved and unpaved roads of different widths, and railroads were classified using a combination of Trimble eCognition and Erdas Imagine software, utilizing a combination of support vector machines and visual identification. Classifier accuracy is great than 90 percent for all of the objects recognized. Details on the extraction and classification process are provided in detail in the online appendix, which includes an example ROC curve for buildings. 7 2.4.1 Object Classification Details The agricultural land variables consist of the fraction of GN agriculture identified as paddy (rice cultivation) or plantation (cash crops such as tea). These sum to 100 percent for GNs with agricultural land, so the excluded category in subsequent regressions is GN divisions with no agricultural land. We also calculated the fraction of total GN area that is either paddy, plantation, or any agriculture. Figure 4 shows an example of a developed area building classification, with raw image shown at the top and CNN classification accuracy shown below. On the bottom panel, true positives are highlighted green, with false positives highlighted red. Figure 5 shows a sample car classification. Cars that are positively identified are shown circled in blue. False negatives are most prevalent where there is considerable tree masking of pixels. Figure 3: Example Developed Area (Buildings) Classification  Notes: above image shows raw (left) and classified (right) for developed area building classifier from raw satellite imagery. Areas in green show are true positive building classifications. Images in red are false positives: erroneously classified areas as buildings.  Figure 4: Example Car Classification Notes: Cars identified by the convolutional neural network shown in blue.    8 Table 1: Grama Niladhari Summary Statistics     Mean  Sd  Min  Max       Economic Well‐Being          Avg Consumption in Rs  10274.2  3052.7  4881.9  21077  Avg Log Consumption   9.19  0.28  8.49  9.96  Rel. Pov. Rate at 10% Nat. Cons.  0.0903  0.066  0.0023  0.39  Rel. Pov. Rate at 40% Nat. Cons.  0.332  0.16  0.035  0.8       Geographic Descriptors          log Area (square meters)  14.73  1.01  12.1  18  = 1 if urban  0.304  0.46  0  1  province==[1] Western  0.587  0.49  0  1  province==[3] Southern  0.255  0.44  0  1  province==[6] North‐Western  0.0643  0.25  0  1  province==[7] North‐Central  0.0155  0.12  0  1  province==[8] UVA  0.0782  0.27  0  1       Agricultural Land          % of GN area that is agriculture  16.8  0.15  0  94  % of GN agriculture that is paddy  44.4  37.5  0  100  % of GN agriculture that is plantation  46.38  37.8  0  100  % of Total GN area that is paddy  8.629  10.9  0  74.7  % of Total GN area that is plantation  8.168  11  0  94.1       Cars          log number of cars  3.123  1.44  0  8.3  Total cars divided by total road length  0.00556  0.01  0  0.17  Total cars divided by total GN Area  0  0.00007  0  0.00093       Building Density and Vegetation          % of area with buildings  7.817  6.82  0.13  33.9  % shadows (building height) covering valid area  6.509  6.01  0.31  34.9  Vegetation Index (NDVI), mean, scale 64  0.427  0.21  0  0.86  Vegetation Index (NDVI), mean, scale 8  0.566  0.24  0  0.99       Shadows           ln shadow pixels (building height)  12.96  1.04  7.31  17.6  ln Number of Buildings   6.90  0.92  0  9.3       Road variables          log of Sum of length of roads  9.445  0.94  1.47  13.1  fraction of roads paved  38.3  28.7  0  100  ln length airport roads  0.013  0.33  0  9.25  ln length railroads  1.098  2.67  0  10.8       Roof type          Fraction of total roofs that are clay  36.5  22  0  100  Fraction of total roofs that are aluminum  14.08  7.06  0  71.9  Fraction of total roofs are asbestos  7.766  11.3  0  71.2       Textural and spectral characteristics          Pantex (human settlements), mean  0.627  0.54  0.02  2.94  Histogram of Oriented Gradients (scale 64m), mean  3509.4  2070.3  129.1  10381  Linear Binary Pattern Moments (scale 32m), mean  49.5  1.1  18.1  49.5  Line support regions (scale 8m), mean  0.00836  0.004  ‐2E‐07  0.035  Gabor filter (scale 64m), mean  0.469  0.28  0.014  1.3  Fourier transform, mean  84.34  17.8  4.51  113.4  SURF (scale 16m), mean  12.06  7.77  0.13  31.6    Observations  1291    9 Three car-related variables were calculated – the log total number of cars in a GN, total cars divided by total road length, and cars per square kilometer of the GN. The average GN division in our sample contains 50 cars. However, there is wide dispersion, as the 99th percentile of the car count distribution is equal to 577 cars and the maximum value is 4,000 cars. On the left side of the distribution, 136 of 1,291 GNs contain no cars. Because the distribution is skewed, we take the log of the car count, while imposing a smooth function for GNs with zero or few cars.11 Building density variables include the fraction of an area covered by built-up area and the number of roofs identified, Built-up area captures any human settlements – buildings, homes, etc. – regardless of use or condition. These are grouped with two measures of the Normalized Difference Vegetation Index. Although technically a spectral characteristic, the presence of vegetation in urban areas indicates development such as parks, trees, or lawns (i.e., area that is not built up) within the urban environment. In the rural environment it also indicates undeveloped areas, and the values can aid in describing variations in agricultural type and productivity depending on the timing of the image acquisition. The fifth category is two indicators that capture shadows of buildings: the log of the number of pixels classified as shadow as well as the fraction of shadows in a GN. The shadow variables use the angle of the sun as it shines on a building, and the shadows it displaces, to estimate the presence of shadows.12 The road variables we calculate are the log of total road length, fraction of roads that are paved, and length of airport runway and length of railroad identified. For roof type, we calculate the fraction of roofs in a village that are either clay, aluminum, asbestos, with the omitted category being roofs that are identified as none of the above, the vast majority being gray cement roofs. Roof type can be identified through remote sensing by using hyperspectral imaging, or using reflectance from several contiguous spectral bands. Different roof materials exhibit different spectral properties, particularly in the sub-visible bands of the spectrum. The roofs in our sample are clay (36.5%) aluminum (14.08%), asbestos (7.8%) or gray concrete (41.6%). 2.4.2 Details of Textural and Spectral Features We calculate six separate types of spectral and textural features: Fourier transform, Gabor filter, Histogram of Oriented Gradients (HoG), Line support regions (LSR), Pantex, and Speed-Up Robustness Features (SURF). These are often used in machine vision problems to decompose an image. They are intended to capture aspects of a neighborhood that are not so easily identified directly, including the presence of characteristics associated with slums such as many irregular building lines or high density. These features may be considered outputs from a dimension reduction technique, in that they are reduced dimensionality descriptions of a complex 2-D satellite imagery. Because these measures may be novel to readers without backgrounds in remote sensing, further description may be helpful. We consider Pantex here to be a measure of human settlements. It is a 11 The log car variable is calculated as the log of the sum of the car count and the square root of the car count plus one. 12 Valid area refers to areas at the foot of buildings where shadows may appear. 10 spatial similarity index, where each cell is compared to adjacent cells in all directions. Forests will have a low Pantex level, since cells in all directions have similar contrast, as will cells with straight roads. Cities dense with many buildings will have high Pantex values. HOG captures “local intensity gradients or edge directions” (Dalal and Triggs, 2005) and in the context here captures intensity of lines of development or agriculture. Local binary patterns (LBPM) captures local spatial patterns and gray scale contrast. SURF detects local features used for characterizing grid patterns, and measures orderliness of building development, the opposite of which is typically referred to as a slum. Areas with right angles, corners, or areas with regular grid patterns, will have larger SURF values relative to areas with chaotic or irregular spacing. For more detail on imagery and the feature extraction process, we refer the reader to the online appendix. 3 Statistical Methodology We estimate linear regressions in which the dependent variable is estimated poverty or log welfare (per capita household consumption) at the level of the GN divisions, derived from the census. Since these are linear models, the fitted values are not constrained to lie between zero and one, but this is a minor issue in the sample.13 The error term is assumed to be clustered at the level of the DS division, and the standard errors are robust to heteroscedasticity. Given the list of available covariates, variable choice is not obvious. Estimating a model with the full set of candidate variables in table 1 would likely produce predictions that are overfit, in the sense that they perform much better in-sample than out-of-sample (Athey and Imbens, 2015). One attractive method for variable selection among a large selection of covariates is Lasso regularization. Lasso is a regularized regression that estimates a regression model with an added constraint that enforces parsimony (Tibshirani, 1996). The motivation for the shrinkage estimator is that by reducing the parameters of the model, one increases bias at the expense of lower variance. Our baseline model is a “Post-Lasso” estimator (Belloni and Chernuzhukov, 2013). This two-step estimator first estimates a Lasso model over the full set of coefficients, followed by an OLS model over the set of non-zero coefficients from the Lasso step. The model we estimate in the Lasso step is defined as argmin (1) Where the poverty rate in a GN is given by and 0 is a parameter that penalizes the absolute values of the coefficients. At the extreme, full relaxation of the penalization factor, that 13 In 6 of the 1,291 GN divisions in the sample, the predicted 10 percent poverty rate is negative, with a minimum prediction of -1 percent. The predicted 40 percent poverty rate is positive in all GN divisions. As a robustness check, we estimated binomial regressions and obtained similar results, which are available upon request. 11 is setting to zero, yields unconstrained OLS estimates. Thus as → ∞ , → . As → ∞, the penalty increases and converges to the zero vector. Lasso regressions are useful as a variable selection methodology because the sharp ℓ metric shrinks variables exactly to zero if they prove unuseful in decreasing the sum of squared errors. This creates a type of variable selection. However, simultaneously the Lasso “shrinks” the magnitude of coefficients towards zero, even for those that remain non-zero (Varian, 2014). Thus, by subsequently estimating an OLS model in the second stage, we ensure the coefficient estimates are unbiased. To choose the appropriate value of , we apply 10-fold cross validation, and choose the value of that minimizes root-mean squared error (RMSE) across folds. GLM versions of the model, which ensures that predicted values lie between zero and one, do not change the results qualitatively and are available by request. Inferential standard errors are typically absent from Lasso models. Because of the Oracle property of the Lasso estimator (Fan and Li, 2001), we use the standard errors from the OLS model in the second stage as our measures of population inference. The Oracle property ensures that inference in the second stage using the reduced set of variables selected in the first stage is consistent with inference were we to use a single stage estimation strategy using only the selected variables present in the true data-generating process (Belloni and Chernuzhukov, 2013). 4 Results Table 2 presents the estimates from the main specification for the full sample. The first two columns show the model where GN poverty is defined at the lower poverty rate, the next two present the higher poverty rate models, and the next two present average GN consumption dependent variable models. Many extracted satellite features have high explanatory power, including agriculture type, length of roads and fraction of roads paved, number and density of buildings, NDVI, roof type, shadows (building height proxy) and two spatial features, LBPM, and Fourier transform. The models explain a high amount of the variation in poverty, summarized in the in-sample R-squared values between 0.608 and 0.618. Out-of-sample R- squared, estimated using ten-fold cross-validation, varies between 0.588 and 0.605. We conclude from the results that the models are not likely to be overfit to the data. The results suggest that, in words, a simple linear model that includes only the geographic size of the GN division, whether it is urban, and remotely sensed information explains 61 percent of the variation across GNs in headcount poverty rates. Figure 6 plots predicted against true average GN consumption, with colors assigned by province in which the GN is located. A LOWESS smoothing line is shown with associated confidence interval. A perfect model would have predictions exactly on the 45-degree line. While there is noise, the predictions tend to straddle the 45° line, indicating a high degree of agreement between the predicted and true welfare values. However, the model has a tendency to under-predict for wealthier GNs. 12 4.1 Marginal Effects of Satellite Features While the primary objective of this exercise is to obtain accurate predictions, the model coefficients also shed light on the nature and magnitude of the conditional correlations between imagery features and poverty. The coefficients may be difficult to interpret for two reasons: First, the independent variables are often measured in different units. Second, in some cases multiple independent variables are based on the same underlying feature. In these cases, it is meaningless to evaluate the conditional correlation of one variable while holding the others constant. Table 2: Prediction of Local Area Poverty Rates Using High‐Res Spatial Features  Lower Poverty Rate  Higher Poverty Rate  Average Log Per Capita    (10% Nat. Inc.)  (40% Nat. Inc.)   Consumption  Coef  t  coef  t  coef  T  log Area (square meters)  0.020*  [2.52]  0.0093  [0.60]  ‐0.0079  [‐0.31]  = 1 if urban  ‐0.023  [‐1.80]  ‐0.037  [‐1.06]  0.08  [1.18] % of GN area that is agriculture  ‐0.00025  [‐1.04]  ‐0.00017  [‐0.27]      % of GN agriculture that is paddy  ‐0.00033**  [‐2.97]  ‐0.00087**  [‐2.97]  0.0014**  [2.92]  % of GN agriculture that is plantation  ‐0.00021**  [‐2.84]  ‐0.00059*  [‐2.66] 0.0012**  [2.72]  % of Total GN area that is paddy  ‐0.00019  [‐0.58]  ‐0.00083  [‐1.10]  0.0016*  [2.10]  Total cars divided by total road length  ‐0.31  [‐1.17]          Total cars divided by total GN Area  29.6  [0.54]          log number of cars  ‐0.0059  [‐0.89]  ‐0.015  [‐1.39] 0.024 [1.60] log sum of length of roads  ‐0.020***  [‐3.64]  ‐0.027*  [‐2.32]  0.033  [1.67]  fraction of roads paved  ‐0.00035***  [‐4.24]  ‐0.00079**  [‐3.24]  0.0014**  [3.06] ln length airport roads  ‐0.0051  [‐1.45]      0.022  [1.52]  ln length railroads  0.00098  [1.31]      ‐0.0046  [‐1.26]  % of area with buildings  ‐0.0027*  [‐2.31]  ‐0.0093*  [‐2.34]  0.020*  [2.56]  log of Total count of buildings in GN  ‐0.0090**  [‐2.71]  ‐0.019*  [‐2.05]  0.029  [1.70]  Vegetation Index (NDVI), mean, scale 64  0.061*  [2.20]  0.14**  [2.94]  ‐0.21**  [‐2.93]  Vegetation Index (NDVI), mean, scale 8  ‐0.064**  [‐2.80]          % shadows (building height)   0.0022*  [2.04]  0.0064*  [2.18]  ‐0.013*  [‐2.27]  ln shadow pixels (building height)  0.016*  [2.51]  0.039*  [2.64]  ‐0.047  [‐1.95] Fraction of total roofs that are clay  0.00077**  [3.35]  0.0017**  [3.25]  ‐0.0027**  [‐3.15]  Fraction of total roofs that are aluminum  0.00091***  [3.63]  0.0022**  [3.15]  ‐0.0040**  [‐3.15]  Fraction of total roofs are asbestos  ‐0.00033  [‐1.08]          Linear Binary Pattern Moments (scale 32m)  0.0021**  [2.91]  0.0090***  [5.53]  ‐0.017***  [‐5.92]  Line support regions (scale 8m), mean  ‐0.66  [‐0.87]          Gabor filter (scale 64m) mean  ‐0.052  [‐1.53]          Fourier transform, mean  0.0017**  [3.42]          SURF (scale 16m), mean  ‐0.0014  [‐0.94]  ‐0.001  [‐0.59]  0.0034  [1.06] Constant  ‐0.32**  [‐3.03]  ‐0.31  [‐1.43]  10.1***  [29.9] Observations  1291  1291  1291  R‐sq  0.610  0.618  0.608  R‐sq Adj.  0.602  0.613  0.602  Out‐of‐Sample R‐sq  0.588  0.605  0.594  Mean Absolute Error   0.032  0.078  0.139  Notes: Unit of observation is Grama Niladhari (GN) division. Variables were selected using Lasso regularization from the  candidate set of variables shown in table 1. * p<0.05, ** p<0.01, *** p<0.001    13 To understand the magnitudes of the coefficients, we group independent variables and consider the marginal effect of a one standard deviation increase of the underlying satellite feature.14 Table 3 presents these marginal effects tables. For some dependent variables, the reported marginal effects reflect a combination of multiple underlying indicators, while for others they reflect single variables, as indicated in the right most column. The size of the GN, in square kilometers, is more strongly correlated with headcount or average consumption. This suggests that households in the bottom decile are disproportionately found in larger GN divisions. The presence of agricultural land is weakly and negatively associated with poverty, controlling for other characteristics of the GN, although the result is not statistically significant. Of the indicators related to the distribution of paddy vs. plantation land, The LASSO procedure selected three of the indicators for 10 and 40 percent poverty incidence models, and two for the log consumption model.15 The results indicate a discernible but fairly weak negative relationship between the presence of paddy agricultural land and poverty, which is consistent with the socioeconomically disadvantaged nature of the tea plantation sector in Sri Lanka. Table 3: Marginal Effects of One Standard Deviation Change     Lower Poverty  Higher  Average Log Per  Rate (10%  Poverty Rate  Capita  Nat. Inc.)  (40% Nat. Inc.)   Consumption  Variables  Area   2.1 pp *  0.9 pp  ‐0.008  Area    Urban   ‐1.0 pp  ‐1.7 pp  0.037  Urban Dummy   Agricultural land   ‐0.4 pp  ‐0.3 pp    % of GN area that is agriculture  Agricultural type   ‐0.6 pp *  ‐1.9 pp **  0.026 *   Combined: Ag % paddy, Ag % plantation (‐),  % area paddy  Cars  ‐1.2 pp      Combined: cars divided by road length, cars  divided by Area, log cars      ‐2.2 pp  0.035  log cars  Road variables  ‐1.9 pp ***  ‐2.5 pp *  0.031  log sum of length   ‐1.0 pp ***  ‐2.3 pp **  0.040 **  Fraction paved   ‐0.2 pp    0.007  log length of airport runway  0.3 pp    ‐0.012  log sum of railroads  Building Density   ‐2.7 pp **  ‐8.1 pp **  0.162  **  % of area with buildings and log of total count  of buildings in GN combined   Vegetation   ‐0.2 pp  2.9 pp **  ‐0.044 **  Combined: NDVI, scale 64m, NDVI, scale 8m  Shadows   Combined: % shadows (building height) and  3.0 pp ***  7.9 pp ***  ‐0.128 ***  ln shadow pixels   Roofs  1.7 pp  **  3.8 pp **  ‐0.06  **  Fraction of roofs clay    0.6 pp ***  1.6 pp **  ‐0.028 **  Fraction of roofs aluminum    ‐0.4 pp      Fraction of roofs asbestos   Spatial Features  0.2 pp **  1.0 pp ***  ‐0.019 ***  Linear Binary Pattern Moments    ‐0.3 pp      Line support regions    3.1 pp      Fourier transform     ‐1.5 pp      Gabor filter    ‐1.1 pp **  ‐0.8 pp  0.026  SURF  14 Except for percent of GN agriculture that is plantation, for which a one standard deviation decrease is considered. 15 Since an increase in paddy land implies a reduction in agricultural land, for those GNs with agricultural land, the latter is subtracted instead of added when calculating the marginal effect. 14 Compared with land type, the association between poverty and cars is mildly stronger. A one standard deviation increase in log cars (to an average value of 3.1) is associated with a 2.2 percentage point decline in poverty at the higher poverty line, and a 0.035 increase in predicted log per capita consumption. A one standard deviation increase in all three car related variables is associated with a 1.2 percentage point decline in poverty at the lower percent rate. Road characteristics are moderately associated with local poverty rates. Length of roads, fraction of roads paved, and runways are negatively associated with poverty, though only the first two are statistically significant, while GNs with more railways are poorer. A one standard deviation change in total length of road is associated with a 1.9 percentage point decline in the lower poverty line, a 2.5 percentage point decline where poverty is defined at the higher line, and a .031 increase in log consumption. The magnitudes of the marginal effects for fraction of roads that are paved are broadly similar, though a one standard deviation increase is only associated with a weaker 1 percentage point decline at the lower poverty line. Figure 5: Model diagnostic plot of predicted against true average GN consumption Measures of building density are strongly associated with log welfare and poverty. A one standard deviation increase in these two variables is associated with a 2.7 percentage point decline at the lower poverty rate, an 8.1 percentage point decline at the higher poverty rate, and a 0.16 increase in log consumption. In the lower poverty rate model, a one standard deviation increase is associated with a smaller 2.7 percentage point decline in poverty. Vegetation is moderately associated with poverty. A one standard deviation reduction in vegetation is associated with a 2.9 percentage point 15 reduction at the higher poverty line, and a .04 increase in mean per capita consumption, which is comparable to cars or the fraction of roads that are paved. For the lower poverty line model, both NDVI measures are selected. The higher poverty line and log welfare models only include NDVI calculated over blocks of 64 pixels, suggesting that very high spatial resolution imagery may not be critical for generating informative measures of NDVI for prediction. Two measures of shadows, a proxy for building height, are selected: the share of valid area covered by shadows, and the log number of shadow pixels. A one standard deviation increase in both measures is associated with a 3 percentage point increase in poverty at the lower poverty line, an 8 percentage point increase in poverty at the higher one, and a 0.13 decrease in mean log per capita consumption. For roof type, the Lasso procedure selects both the fraction of roofs classified as clay and aluminum, for all three models, and includes the fraction classified as asbestos for the lower poverty line model. The signs on clay and aluminum in the poverty regressions are positive, suggesting that these are generally inferior compared to the omitted category of grey concrete. This appears to be consistent with an analysis in Kenya that documents that roofs with greater luminosity, like aluminum, are associated with lower levels of poverty (Suri et al., 2015). The marginal effect of a standard deviation in clay and aluminum roofs are, respectively, 1.7 and 0.6 percentage points for the lower poverty line model, and .06 and .03 for mean log per capita consumption. These magnitudes are stronger than roads and vegetation, but considerably less than those for building density and shadows. Of the texture variables, five out of seven are selected for the 10 percent model (LBPM, LSR, Gabor, Fourier, and SURF). Of these, only LBPM and SURF are selected for the 40 percent and log per capita consumption model. In general, the estimated marginal effects for these variables are modest. The main exception is the mean of the Fourier transform, which is positively associated with poverty in the lower poverty line model, though the coefficient is not statistically significant. A one standard deviation increase in SURF is associated with a one percentage point decline in the lower poverty line model and a 0.03 increase in log per capita consumption. This is consistent with wealthier areas being laid out in a more orderly way, with more “right angles” in housing layouts. Figure 7 presents a map showing the true welfare measures in the left panel, against the predicted welfare measures in the right, for a particular DS division, Seethawaka. The top panel shows predicted welfare from the OLS model against actual welfare. The model is able to distinguish the poorer eastern areas from the richer western ones. Even poor GNs adjacent to richer ones can be distinguished; although the smallest GNs are less than a half mile across, the HRSF model is able to distinguish with considerable accuracy the variation in average consumption. The middle panel shows predicted and true poverty rates defined at the lower poverty line. Again, the predicted model approximates the true poverty rates with considerable accuracy. The lower poverty regions in the south and northeast are replicated in the predicted values. The model tends to under-predict poverty in the lowest poverty areas in the mid-west, suggesting that two-step or zero-inflated Poisson models may perform better. 16 Figure 6: Predicted Versus True Welfare Measures, Average Consumption (top), 10% Poverty (middle) 40% Poverty (bottom) 17 In sum, predictive models based on an urban indicator, the size of the GN, and a host of features derived from satellite imagery predict poverty rates and mean log per capita consumption remarkably well. Greater numbers of cars are associated with lower poverty, although the relationship is not statistically significant, as is a denser road network and a larger share of paved roads. The indicators most strongly associated with poverty are building density and shadows. Shadows are positively associated with poverty, which suggests they are capturing variation in tree cover that is inversely related to building density. Consistent with this, areas characterized by more and lusher vegetation tend to be poorer. Clay and aluminum roofs, compared to grey roofs, are associated with greater levels of poverty. Of the spatial features, SURF exhibits a fairly strong association with poverty at the lower poverty line, suggesting that neighborhoods laid out in a more orderly way tend to be less poor. The following sections consider the robustness of these main findings. 4.2 Decomposition of Satellite Feature Explanatory Power The results presented above indicate that features derived from satellite imagery explain a large portion of village income or poverty, and that associations are particularly strong for measures of building density and shadows. However, these results do not address the question of which indicators account for the model’s predictive power. To address this issue, we decompose the using a Shapley decomposition (Shorrocks, 2013; Huettner and Sunder, 2012; Israeli, 2007). This procedure calculates the marginal of a set of explanatory variables, as the amount by which declines when removing that set from the set of candidate variables. In other words, for a model with sets of explanatory variables, the procedure will estimate 2 models and average the marginal obtained for each set of independent variables across all estimated models. This ensures that the variable’s contribution to is independent of the order in which it appears in the model. Table 4 presents the decomposition. The results confirm that measures of building density – built up area, number of buildings, shadow pixels, and to a lesser extent vegetation, are powerful contributors to predictive power. Collectively, these three sets of variables account for 39 to 45 percent of the model’s explanatory power. However, a number of other variables are moderately important. GN area, urban classification, road characteristics, roof type, and the texture variables each explain 8 to 12 percent of the variation. The car and agricultural variables explain a bit less than that, between 5 and 7 percent each. In short, while broad measures of building density explain a large share of the variation, virtually all sets of indicators contribute substantial predictive power to the model. 18 Table 4: Shapley Decomposition of Share of Variance Explained ( ) by High Resolution Spatial Feature  Subgroup  Lower Poverty Rate  Higher Poverty  Average Log Per Capita     (10% Nat. Inc.)  Rate (40% Nat. Inc.)   Consumption  Area  10.4  8.3  8.4  Urban   9.4  9.7  10.8  Agricultural land   0.9  1.0    Paddy land   3.8  4.6  4.1  Cars  7.3  5.6  4.6  Building density  14.8  19.5  22.5  Vegetation   8.0  6.2  4.4  Shadows  14.4  14.1  14.0  Road variables  9.4  7.7  9.8  Roof Type  10.4  8.3  8.4  Texture variables  9.4  9.7  10.8  Observations  1291  1291  1291  R‐sq   0.610  0.618  0.608  Notes: Agricultural variables include fraction agriculture plantation, fraction agriculture paddy, and fraction of GN area that is  plantation.    Car  variables  include  log  of  car  count,  and  cars  per  total  road  length.  Building  density  variables  include  log  of  developed area, shadow count (building height proxy), fraction of GN developed, fraction covered by shadow, NDVI at scales 64  and 8. Road variables include log of unpaved road length, log of paved roads narrower than 5m, log of paved roads 5m+, log of  airport roads, log of railroad length, and fraction of roads paved. Roof variables include count of roofs by type: clay, aluminum,  asbestos, grey cement, and fraction of roofs of same type. Texture variables include Fourier series, Gabor, histogram of oriented  gradients, Local Binary Pattern Moments mean and standard deviation, line support regions, and SURF.   4.3 Comparisons to Night Time Lights How does the predictive power of indicators derived from daytime imagery compare with night time lights? To shed light on this, Table 5 presents OLS models covering the same sample area using night time lights as the independent variable. The first three columns present poverty and per capita consumption models. Aggregate night time lights is positively correlated with welfare and negatively correlated with poverty, however the total explanatory power is low: values for the three regressions are between 0.10 and 0.147, with performance lowest for the 10 percent headcount measure and highest for log consumption per capita. Adding higher order polynomials up to a quartic only increases it to 0.15. Models built using high resolution satellite indicators capture around four times as much variation in poverty or welfare as NTL. Columns 4-6 of table 4 show estimates that include DS division fixed effects. Night time lights is no longer significant in any of the specifications, indicating that within DS divisions, NTL is weakly correlated with welfare. Given the prevalence, ease of use and familiarity with night time lights, one might also ask how much more explanatory power do night time lights provide in addition to the indicators extracted from daytime imagery? Table 6 answers that question, by adding night time lights to the Shapley decomposition. The night time lights category includes average, squared, cubed, and average standard deviation of NTL. The night time lights variables explain between 7 and 12 percent of the variance in per capita consumption or poverty according to the decomposition, meaning there is roughly a 90 percent additional variation in poverty or income that is captured through high resolution satellite predictions. Furthermore, adding night time lights marginally increases the overall of the regression, by about 0.01. In this context, NTL is not a particularly accurate 19 proxy for poverty and welfare, and adds very little explanatory power to the set of available daytime indicators. Table 5: Model Estimates, Night Lights on Poverty/Average GN Consumption    Lower  Higher  Average Log  Lower  Higher  Average Log  Poverty  Poverty Rate  Per Capita  Poverty Rate  Poverty Rate  Per Capita  Rate (10%  (40% Nat.  Consumptio (10% Nat.  (40% Nat.  Consumptio Nat. Inc.)  Inc.)   n  Inc.)  Inc.)   n  Night Lights  ‐0.583***  ‐1.546**  2.922**  ‐0.0383  ‐0.0898  0.186  2012    (‐3.53)  (‐3.38)  (3.32)  (‐0.79)  (‐0.67)  (0.64)  Observations  1291  1291  1291  1291  1291  1291  R‐sq  0.109  0.131  0.147  0.000868  0.000842  0.00103  R‐sq Adj.  0.108  0.130  0.146  0.0000932  0.0000671  0.000258  R‐sq within        0.000868  0.000842  0.00103  R‐sq between        0.372  0.448  0.527  R‐sq overall        0.109  0.131  0.147  Divisional  No  No  No  Yes  Yes  Yes  Secretariat FEs  Unit of observation is Grama Niladhari (GN) Division. All models include a regression constant which is omitted from the table. * p <  0.05, ** p < 0.01, *** p < 0.001    Table 6: Shapley Decomposition By High Resolution Spatial Feature Subgroup and Night Time  Lights  Lower Poverty  Higher Poverty Rate  Average Log Per Capita     Rate (10% Nat.  (40% Nat. Inc.)   Consumption  Inc.)  Area  10.2  8.1  8.0  Urban   8.7  8.7  9.5  Agricultural land   0.9  1.0  3.3  Paddy land   3.3  3.8              Cars  6.7  5.1  4.0  Buildings  13.0  16.7  19.0  Vegetation   8.0  6.0  4.1  Shadows  12.1  13.0  10.6  Road variables  8.0  8.0  8.5  Roof Type  13.0  12.0  11.7  Texture variables  8.5  7.1  8.9  Night time lights variables    7.6  10.6  12.1  Observations  1291  1291  1291  R‐sq   0.621  0.636  0.632  Notes: Night time lights category includes the following transformations of night time lights: average, squared,  cubed, and standard deviation. Variable groupings are identical to those in table 5.  4.4 Urban and Rural Linear Models How does the relationship between indicators and welfare differ in urban and rural areas? Table 7 shows model estimates estimated separately for 393 urban villages and the 898 rural ones, based 20 on Sri Lanka’s official definition of urban and rural areas.16 Variables were again selected through Lasso estimation. The urban model selects fewer variables – 13 of the candidate variables in the urban model are selected versus 16 for the rural model. R-squared values are slightly higher in rural areas (0.656) and significantly lower in urban areas (0.445).17 For the urban model, log number of cars, built-up development, and shadow pixels are important. In rural models, agricultural variables, roof type, shadow pixels, NDVI, Pantex and LBPM are important. The association between cars and poverty is significantly stronger in urban areas. In addition, the association between NDVI and poverty is strongly negative in rural areas, as rural areas with more vegetation and less built-up area are poorer. The coefficient on NDVI in urban areas, meanwhile, is positive and not statistically significant, suggesting that if anything wealthier urban GNs are characterized by a greater prevalence of lush vegetation. Table 7: Marginal Effects of One Standard Deviation Change for Urban and Rural Models     Urban  Rural  Variables   Area     ‐0.032  Area    Agricultural land     0.018 *  % of GN area that is agriculture  Paddy land  0.045    Combined Paddy and plantation      0.026 **  % of GN agriculture that is paddy, % of GN agriculture that is  plantation (‐)  Cars  0.093 ***  0.029 ***  Log car count    Road variables    0.030 *  log sum of length     0.041  0.029 ***  Fraction paved       0.011 ***  Log length of airport runway    ‐0.02    Log sum of railroads  Building density   0.186 ***    Both building density variables      0.038 ***  log of Total count of buildings in GN  Vegetation   0.041  ‐0.060 ***  NDVI mean, scale 64  Shadows  ‐0.107 **    % shadows       ‐0.061 ***  ln shadow pixels  Roofs  0.036  ‐0.084 ***  Fraction of total roofs that are clay    ‐0.022  ‐0.037 ***  Fraction of total roofs that are aluminum       ‐0.021 *  Fraction of total roofs that are asbestos    Spectral and Texture    ‐0.018  Linear Binary Pattern Moments    ‐0.006    Line support regions    ‐0.058    Fourier transform       0.075 ***  Pantex  Observations  393  898    R‐sq  0.446  0.656    R‐sq Adj.  0.427  0.650    Out‐of‐Sample R‐sq  0.412  0.641    Mean Absolute Error   0.145  0.113    Notes: Tables gives estimated marginal effect of a one standard deviation change in variable or variables listed in right column.  For example, the combined marginal effect of a one standard deviation in all three cars variables on the 10 percent relative  poverty rate is a reduction of 1.2 percentage points. Variables excluded from 40 percent poverty and log consumption models,  as shown in Table 2, are also excluded when calculating marginal effects for those dependent variables.  For agricultural land, %  of GN that is plantation is subtracted from the sum of % GN agriculture that is paddy and % total GN area that is paddy. *  p<0.05, ** p<0.01, *** p<0.001  16 This definition is based on administrative units and has not been updated in many years. As a result, some areas officially classified as rural have urban characteristics. 17 This might be due to the presence of de-facto urban GNs in the rural sample. In addition, the nature of the consumption module in the HIES, which could better capture consumption in rural than urban areas. 21 Table 8: Model Performance for Prediction of Average log per Capita Consumption at Different Points  in the Welfare Distribution     Bottom 20%  Bottom 40%  Bottom 60%  Bottom 80%  Full Sample  Observations   259  517  775  1033  1291  R‐sq  0.551  0.454  0.474  0.509  0.608  Adjusted R‐sq  0.52  0.436  0.461  0.5  0.602  Out of sample   0.487  0.425  0.447  0.475  0.595  Mean Absolute Error  0.064  0.0774  0.0909  0.115  0.139  Mean log p.c. income  8.83  8.95  9.00  9.09  9.16  Standard deviation   0.11  0.13  0.15  0.20  0.28  Notes: Table reports model performance statistics for the national model for different subsamples of the bottom portion of the  GN Division welfare distribution. The dependent variable is average predicted log GN per capita consumption. The rightmost  column is identical to the results reported in the right column of Table 2.   Table 9: MLE Estimation Correcting for Spatial Autoregression    Average Log Per Capita Consumption     coef  t  log Area (square meters)  ‐0.046***  [‐4.01]  = 1 if urban  0.048+  [1.96]  % of GN area that is agriculture  0.00022  [0.42]  % of GN agriculture that is paddy  0.00046+  [1.74]  % of GN agriculture that is plantation  0.00076**  [3.09]  % of Total GN area that is paddy  0.00057  [0.79]  Total cars divided by total road length  ‐0.93  [‐1.20]  Total cars divided by total GN Area  401.4*  [2.28]  log number of cars  0.020***  [3.57]  % of area with buildings  0.0083***  [4.19]  log of Total count of buildings in GN  0.012  [1.23]  Vegetation Index (NDVI), mean, scale 64  0.071  [1.54]  Vegetation Index (NDVI), mean, scale 8  ‐0.042  [‐0.67]  log of Sum of length of roads  0.029**  [2.70]  fraction of roads paved  0.0012***  [6.00]  ln length airport roads  0.0052  [1.50]  ln length railroads  ‐0.00092  [‐0.48]  Fraction of total roofs that are clay  ‐0.0025***  [‐5.83]  Fraction of total roofs that are aluminum  ‐0.0034***  [‐4.92]  Fraction of total roofs are asbestos  0.0014*  [2.26]  Linear Binary Pattern Moments (scale 32m), mean  ‐0.0080***  [‐3.38]  Line support regions (scale 8m), mean  ‐1.25  [‐0.71]  Gabor filter (scale 64m) mean  ‐0.053  [‐0.92]  Fourier transform, mean  ‐0.0030***  [‐3.61]  SURF (scale 16m), mean  0.0052*  [2.24]  Constant  9.74***  [51.6]  Observations  1287  Notes: Standard errors have been corrected according to Conley (1999, 2008), with model estimation via GMM. + p<0.10, *  p<0.05, ** p<0.01, *** p<0.001  22 Figure 8: Average out of sample R-squared and Average GN welfare, by subsample of GNs   4.5 Correcting for Spatial Autoregression One unaddressed concern is whether the presence of either spatial autocorrelation or spatial heterogeneity leads the standard errors to be underestimated. Spatial autocorrelation can occur in the presence of geographic spillovers or interactions (Anselin, 2013), and considering the village-level observations one could develop plausible stories by which poverty is influenced by this mechanism. A Moran’s I test for the presence of such disturbances according to Anselin (1996) rejects the null hypothesis that there is no spatial autocorrelation present. To correct for the spatial autocorrelation, we model explicitly the spatial autoregression (SAR) process and allow for SAR disturbances, a so-called SARAR model. This is implemented via a generalized spatial two-stage least-squares (GS2SLS) as shown in Drukker et al. (2013). The results presented in table 9 show that after correcting for spatial autocorrelation, most high-resolution spatial features remain significant predictors of local area poverty. Although there is some presence of autocorrelation, it is not sufficient to alter the joint significance of the spatial variables. 4.6 Using an Alternative Measure of Simulated Welfare The results so far have demonstrated that indicators derived from satellite imagery are strongly predictive of variation in welfare and poverty, measured using imputed welfare into the 2011 census. Imputing into the 2011 census is necessary because the analysis is carried out at the GN division level, and the HIES survey alone does not contain sufficient data to accurately estimate welfare at that level. The baseline method uses, as the dependent variable, the average welfare or poverty rates, taken across both households in the village and the 100 simulations of predicted 23 residuals. This average is then regressed on various satellite features. Because the dependent variable is an average taken over 100 simulations, it is a measure of expected poverty and welfare across both simulations and GN households. It incorporates the full distribution of potential outcomes into the measure of poverty and welfare. Averaging over the 100 simulations per household also reduces the variance of the stochastic component of welfare by a factor of 100, which raises the explanatory power of the satellite indicators. An alternative would be to compare satellite-based predictions against simulated poverty and welfare. This would entail regressing the estimated poverty rate of the GN for each of the 100 simulations against the independent variables, and then averaging the resulting R-squared statistics across the 100 simulations. In each case, the dependent variable is a single simulated value of welfare rather than an average across simulations. This provides a lower bound estimate of the explanatory power of the satellite indicators, were the census to include consumption data. If actual consumption data were collected in the census, its unexplained portion of actual consumption data may be correlated with the satellite indicators. In reality, the unexplained portion of the prediction is constructed be pure noise that cannot be explained by the imagery indicators. The top row of Table 10 reports the in-sample R2 when using expected welfare as the dependent variable, which is identical to the results reported in Table 2. The bottom row reports the average R2 when using the simulated welfare approach. Using each individual simulation of welfare as the dependent variable reduces the in-sample R2 for the base regression reported in Table 2 from 0.610 to 0.406 when predicting the 10 percent poverty rate, from 0.618 to 0.496 when predicting the 40 percent poverty rate, and from 0.608 to 0.515 when predicting log per capita consumption. Incorporating all 100 draws of the residual from the census prediction regression has a stronger impact on the accuracy of the resulting poverty measure at the lower poverty threshold. This is because a greater share of households have predicted welfare that fall above the 10 percent threshold than the 40 percent threshold, meaning that the residuals from the predicted welfare function play a larger role in determining poverty at the lower threshold. Overall, the independent variables continue to predict much of the variation across GNs in welfare and poverty, even when the GN estimate of welfare and poverty is based on one draw from the residual of the prediction regression. Table 10: R2 of predicted poverty and welfare under an alternative measure of welfare and poverty    Lower Poverty  Higher Poverty Rate  Average Log Per Capita    Rate (10% Nat.  (40% Nat. Inc.)   Consumption  R2 using expected welfare        0.610            0.608           0.618  Average R2 using simulated welfare     0.406         0.496            0.515  Notes: In‐sample R2 reported. Unit of observation is Grama Niladhari (GN) division. Independent variables are identical  to those used in Table 2. Expected welfare refers to the average poverty rate or the average log per capita consumption  averaged across both GN households and one hundred simulations. The second row reports the average of R2 statistics  from separate regressions using each of the 100 simulations, averaged across GN households, as the dependent  variable.    24 4.7 Do High Resolution Satellite Features Explain the Poverty Gap? The poverty gap is a useful supplement to the headcount rate for understanding poverty because it takes the depth of poverty into account. The poverty gap or metric measures poverty depth by considering how far the poor are from a given poverty line.18 We compute the average poverty gap for each village, and use this measure as a dependent variable in a regression where the right-hand side includes the size of the GN, a dummy indicating urban classification, and the features created from high resolution satellite imagery. We consider again poverty lines defined at the 10th and 40th percentiles of national consumption per capita. Table 11 presents the results estimated via OLS. The coefficients can be interpreted as a unit change in the distance between the poverty gap and the poverty line for the average village. As was the case for headcount rates, high resolution features explain the poverty gap well, with adjusted values between 0.588 and 0.609. Not surprisingly, building density and shadow variables are also strong correlates of the poverty gap. Table 11: Estimating Poverty Gap Using High Res Features    Poverty Gap (FGT1 ‐ 10%)  Poverty Gap (FGT1 ‐ 40%)     coef  t  coef  t  log Area (square km)  0.0060**  [2.84]  0.0063  [1.02]  = 1 if urban  ‐0.0063  [‐2.00]  ‐0.013  [‐1.05]  % of GN area that is agriculture  ‐0.000081  [‐1.29]  ‐0.00018  [‐0.76]  % of GN agriculture that is paddy  ‐0.000087**  [‐3.24]  ‐0.00033**  [‐3.10]  % of GN agriculture that is plantation  ‐0.000053**  [‐2.91]  ‐0.00021*  [‐2.63]  % of Total GN area that is paddy  ‐2.3E‐05  [‐0.29]  ‐0.00025  [‐0.88]  Total cars divided by total road length  ‐0.09  [‐1.32]      Total cars divided by total GN Area  9.55  [0.72]      log number of cars  ‐0.0014  [‐0.83]  ‐0.0058  [‐1.24]  log of Sum of length of roads  ‐0.0049**  [‐2.97]  ‐0.011*  [‐2.48]  fraction of roads paved  ‐0.000077**  [‐3.37]  ‐0.00023*  [‐2.67]  ln length airport roads  ‐0.00027  [‐0.89]      ln length railroads  0.00026  [1.35]      % of area with buildings  ‐0.00062*  [‐2.16]  ‐0.0028*  [‐2.04]  % shadows (building height) covering valid  area  0.00053  [1.76]  0.0017  [1.54]  ln shadow pixels (building height)  0.0037*  [2.19]  0.016*  [2.68]  Fraction of total roofs that are clay  0.00020**  [2.96]  0.00070**  [3.12]  Fraction of total roofs that are aluminum  0.00024**  [3.31]  0.00084**  [3.19]  Fraction of total roofs are asbestos  ‐9.1E‐05  [‐1.14]      log of Total count of buildings in GN  ‐0.0022*  [‐2.62]  ‐0.0073*  [‐2.09]  Vegetation Index (NDVI), mean, scale 64  0.017*  [2.33]  0.056**  [2.88]  Vegetation Index (NDVI), mean, scale 8  ‐0.019**  [‐2.95]      Linear Binary Pattern Moments (scale 32m)  0.00048*  [2.55]  0.0029***  [4.87]  Line support regions (scale 8m), mean  ‐0.27  [‐1.39]      18 We calculate for our sample the FGT_1 metric (Foster Greer and Thorbecke, 1984), which is defined as FGT_1=1/N ∑_(i=1)▒((z-y_j)/z) , where y_j is an individual’s income, and z is the poverty threshold. 25 5 Out-of-Sample Performance with Two Applications 5.1 Estimating Poverty Using a Reduced Census Training A key motivation for this analysis is to understand how HRSF compliments traditional surveys to generate small area estimates. The standard small area estimation technique used to model poverty combines a smaller household survey with a population census. Conducting a population census is expensive, but needed to cover the full range of individuals within a country. Can satellite features be combined with a smaller household survey alone to produce sufficiently precise small area estimates? To assess this, we examine whether the predictive power of satellite imagery remains when it is calibrated using a census extract, of approximately the size of the Household Income and Expenditure Survey, rather than a full census. We produce an alternative version of each of the three dependent variables (either per capita consumption or GN poverty rate) using artificial subsamples of the census intended to mimic the size of a household survey. This involves sampling 20% of GNs, and 5% of the actual households in that GN, from the predicted welfare measure imputed into the household-level census data. We use these artificial samples to build a model of poverty or consumption using HRSF, which produces estimates of poverty that can then compared to actual estimates based on the full census data. This sheds slight on the trade-off between reducing the size of the training sample, which saves considerable money, and the resulting loss of precision of the estimated poverty rates. Figure 9 plots the results of the simulation exercise, where we have plotted R-squared values between predicted welfare rates and true welfare rates, both in-sample (GNs within the subsample) and out-of-sample (GNs excluded from the subsample), and mean absolute error. Average R-squared values between predicted and true values do not depreciate significantly when using the sample consisting of 20 percent of GNs and 5 percent of households within those GNs. R-squared values decline modestly with the smaller sample size, but R-squared values remain well above 0.5 even when sampling artificially many fewer households. The second panel of figure 9 shows the same exercise with mean absolute error used as a metric, and the results confirm there is negligible difference when using many fewer households to train the model. These results suggest that it is possible to generate reasonably reliable estimates of economic well-being by combining household survey data with features extracted from satellite imagery. 5.2 Poverty Estimation via Geographic Extrapolation Another major motivation for using satellite imagery is to extrapolate poverty estimates into areas where survey data on economic well-being do not exist. While many poor countries are unable to conduct regular surveys, several other countries collect welfare data regularly but omit selected regions, due to political turmoil, violence, animosity towards the central government, or prohibitive expense. For example, from 2002 through 2009/10, Sri Lanka’s HIES failed to cover certain districts in the North and Eastern parts of the country due to civil conflict, and Pakistan’s HIES exclude the Federally Administered Tribal Areas, Jammu and Kashmir. 26 Figure 9: Model explanatory power and error with artificially reduced sample size. (20% of GNs sampled to estimate model, Households sampled as shown.) To assess how well a model “travels” to a different geographic area, we fit a series of models, where in each model we exclude a single Divisional Secretariat (DS), a larger administrative area, from the model, and use the estimated model to predict into that excluded area. This is a form of “leave-one-out cross-validation” (LOOCV), a common method used to infer statistical out-of-sample performance (Gentle et al., 2012). We estimate both linear models and random forest models19 to predict out of sample to determine if more flexible model specifications perform better out-of-sample. Our approach differs from the standard case in that for each estimation we exclude, or “leave out”, an entire Divisional Secretariat (DS), an administrative sub-unit at the level immediately 19 For each random forest model, we use 1,000 decision trees, sampling 13 of the predictors with replacement. 27 below the district. Table 12 shows model performance at predicting into adjacent areas, comparing normalized root mean squared error, normalized mean absolute error, and the correlation between predicted and true welfare rates using both random forest and linear models to fit HRSF models. The adjacent prediction error rates are larger than when predicting randomly out of sample using cross-validation. Normalized error rates divide average error by the average value of welfare, therefore the error rates can be interpreted as fractions of average welfare. Mean absolute error is estimated at 2.5% of log household consumption, 45% of the average poverty rate at the lower poverty line, and 30% of the average poverty rate at the higher poverty line using linear models to estimate and predict into adjacent areas. The error rates are lower when using random forests to estimate and predict into adjacent areas. When predicting using random forest models, the mean absolute error declines to 1.5% of log household consumption, 38% of the average poverty rate at the lower poverty line, and 25% of the higher poverty line. While these error rates imply adjacent predictions will be too imprecise for producing welfare measures intended as official statistics, they may be sufficient for generating rank ordering of villages by poverty or income. The rank correlation between the predicted and the true values results in a Spearman’s ρ estimated at between 0.67 and 0.7 for the linear models, and between 0.74 and 0.76 using the random forest models. We conclude from these results that HRSF cannot yet be used to predict accurately into adjacent areas for official statistics, but the accuracy may be acceptable for targeting or other applications, and is likely to improve as better machine learning methods are employed. Table 12: Model Performance Predicting Into Adjacent Areas    Dependent Variable   Average Log Per  Lower Poverty Rate  Higher Poverty Rate    Capita Consumption  (10% Nat. Inc.)  (40% Nat. Inc.)   Linear Models  Normalized Root Mean Squared Error (NRMSE)  0.0836  0.9225  0.5376  Normalized Mean Absolute Error (NMAE)  0.0242  0.4544  0.2923  Correlation, Predicted and True Poverty Rates  0.6983  0.6707  0.6772  Random Forest Models  Normalized Root Mean Squared Error (NRMSE)  0.02008  0.5454  0.3373  Normalized Mean Absolute Error (NMAE)  0.01537  0.3827  0.2561  Correlation, Predicted and True Poverty Rates  0.7608  0.7512  0.7423  6 Conclusion Traditionally, given the prohibitive cost of conducting surveys sufficiently large to provide accurate statistics for small areas, generating small area poverty estimates requires pairing a welfare survey with a census or inter-census survey. Census and inter-census data are expensive to collect and therefore produced relatively infrequently. The data are also usually disseminated 28 with a significant lag, making it difficult to rapidly assess changes in local living standards. The results above show that indicators derived from high spatial resolution imagery, when paired with survey data, generate accurate predictions of local-level poverty and welfare, and that by and large the conditional correlations are of sensible signs and magnitudes. Furthermore, predictions based on specific features accurately predict mean per capita consumption throughout the welfare distribution. While the welfare consequences of more frequent measures of poverty and inequality are unknown, they may be large given the many applications of frequent local measures of economic well-being, ranging from impact evaluation, to budget allocation to social transfers. How well do indicators derived from satellite imagery predict poverty and which indicators are most important? We investigate these questions using a sample of 1,291 villages in Sri Lanka, linking measures of economic well-being with features derived from high resolution satellite imagery. The results indicate that the correlation between satellite derived indicators and economic well-being is remarkably strong. Simple linear models explain 60 to 61 percent of the variation in poverty and average log per capita consumption. In both rural and urban areas, variables measuring building density, built-up area, and shadows are the strongest predictors of variation in poverty. As expected, the extent and lushness of vegetation is negatively correlated with welfare in rural areas, and mildly positively correlated with incomes in urban areas, suggesting that trees and other vegetation are a luxury in urban areas. While these results are very encouraging, additional analysis suggests caution when extrapolating predictions into geographically adjacent areas. The normalized error rates range from a quarter to one-half of the poverty rates, depending on the incidence of poverty. The likely impediment to extrapolation is geographic heterogeneity in the relationship between indicators and welfare. Using models that learn from geographic heterogeneity, such as ensemble methods or deep learning methods, will likely improve performance for this task. Another factor is the time differences at which the satellite images were taken, which can contribute to noise in the independent variables across geographic regions. This could impact particular indicators such as car counts, which can vary greatly according to the day of the week the imagery was obtained. Measures of agriculture also exhibit considerable seasonal variation, which may also confound extrapolation to adjacent areas. This suggests that some indicators may particularly contribute to bias when extrapolating across space, and that the date of the image is an important consideration when considering spatial extrapolation using satellite-based indicators. We suspect this will improve as the revisit rates of satellites improves. Planet, which has 190 imagery satellites in orbit, already claims daily revisit rates for all the earth’s land mass, sometimes giving revisit rates as frequently as every hour. These findings raise a host of questions for further work, and contribute to an ongoing discussion regarding the use of predictive methods in public policy (Athey, 2017). The most immediate of these is whether satellite indicators can substitute for census data in different contexts and for different indicators. Does the strong correlation between satellite-based indicators and economic well-being extend to wage income measured directly from an expenditure survey? Second, it is important to better understand the extent to which these results generalize to different social and ecological environments, such as Africa, the Middle East, and other parts of Asia. There is no guarantee that the predictive power of building density, shadows, and other features documented above will hold in all environments. 29 A second line of research could explore whether changes in satellite imagery could be used to forecast changes in economic well-being across space and time. Poverty surveys are typically collected every three years and the most recent global estimates are produced with a three-year lag. Therefore, the ability to “now-cast” measures of economic well-being by combining frequently updated satellite imagery with the most recent survey-based measures of poverty has great potential. Secondly, additional research can shed light on identifying the best way to predict into adjacent areas not covered by surveys. Overall, the inevitable increase in the availability of imagery and feature identification algorithms, in conjunction with the encouraging results from this study, implies that satellite imagery will become an increasingly valuable tool to help governments and stakeholders better understand the spatial nature of poverty. 30 References Abrahams, A., Lozano-Gracia, N., & Oram, C. (2015). Correcting Overglow in Night-time Lights Data. Unpublished manuscript. Accessed at: https://sites. google. com/site/alexeiabrahams2. Afzal, M., Hersh, J., and Newhouse, D. (2016). “Building a better model: Variable Selection to Predict Poverty in Pakistan and Sri Lanka”. Mimeo, World Bank. Athey, S. (2017). Beyond prediction: Using big data for policy problems. Science, 355(6324). Athey, S., & Imbens, G. (2015). Machine Learning Methods for Estimating Heterogeneous Causal Effects. arXiv preprint arXiv:1504.01132. Anselin, Luc. Spatial econometrics: methods and models. Vol. 4. Springer Science & Business Media, 2013. Anselin, Luc, et al. "Simple diagnostic tests for spatial dependence." Regional science and urban economics 26.1 (1996): 77-104. H. Bay, T. Tuytelaars, and L. V. Gool. SURF: Speeded Up Robust Features. Lecture Notes in Computer Science, 3951:404–417, 2006 Belloni, Alexandre and Chernozhukov, V. (2013). “Least squares after model selection in high- dimensional sparse models” Bernoulli. 19(2). Besley, T., & Ghatak, M. (2006). “Public goods and economic development”. Understanding Poverty. (pp. 285-302). Oxford: Oxford University Press. N. Dalal, and B. Triggs, “Histograms of oriented gradients for human detection,” in Computer Vision and Pattern Recognition (CVPR), San Diego, CA, 2005, pp. 886-893. Department of Census and Statistics and World Bank, 2015 “The Spatial Distribution of Poverty in Sri Lanka”, available at: http://www.statistics.gov.lk/poverty/SpatialDistributionOfPoverty2012_13.pdf Donaldson D., and Storeygard A. “Big Grids: Applications of Remote Sensing in Economics”, forthcoming, JEP. Drukker, David M., Ingmar R. Prucha, and Rafal Raciborski. "Maximum likelihood and generalized spatial two-stage least-squares estimators for a spatial-autoregressive model with spatial-autoregressive disturbances." University of Maryland, Department of Economics (2011). Elbers, C., Lanjouw, J. O., & Lanjouw, P. (2003). Micro–level estimation of poverty and inequality. Econometrica, 71(1), 355-364. Elbers, Chris, Peter F. Lanjouw, and Phillippe G. Leite. "Brazil within Brazil: Testing the poverty map methodology in Minas Gerais." World Bank Policy Research Working Paper Series, Vol (2008). 31 Elvidge, C. D., Baugh, K. E., Kihn, E. A., Kroehl, H. W., & Davis, E. R. (1997). Mapping city lights with nighttime data from the DMSP Operational Linescan System. Photogrammetric Engineering and Remote Sensing, 63(6), 727-734. Engstrom, R., Ashcroft, E., Jewell, H., & Rain, D. (2011, April). Using remotely sensed data to map variability in health and wealth indicators in Accra, Ghana. In Urban Remote Sensing Event (JURSE), 2011 Joint (pp. 145-148). IEEE. Engstrom, R., Sandborn, A., Yu, Q. Burgdorfer, J., Stow, D., Weeks, J., and Graesser, J. (2015) Mapping Slums Using Spatial Features in Accra, Ghana. Joint Urban and Remote Sensing Event Proceedings (JURSE), Lausanne, Switzerland, 10.1109/JURSE.2015.7120494 Fan, J., & Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American statistical Association, 96(456), 1348-1360. Foster, James; Joel Greer; Erik Thorbecke (1984). "A class of decomposable poverty measures". Econometrica. 3. 52: 761–766. S. W. Smith, The scientist and engineer’s guide to digital signal processing. San Diego, CA: California Technical Publishing, 1997. Gentle, J. E., Härdle, W. K., & Mori, Y. (Eds.). (2012). Handbook of computational statistics: concepts and methods. Springer Science & Business Media. J. Graesser, A. Cheriyadat, R. R. Vatsavai, V. Chandola, J. Long, and E. Bright, “Image based characterization of formal and informal neighborhoods in an urban landscape,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 5, no.4, pp. 1164-1176, Jul, 2012. Henderson, J. V., Storeygard, A., & Weil, D. N. (2012). Measuring economic growth from outer space. The American Economic Review, 102(2), 994-1028. Kleinberg, J., Ludwig, J., Mullainathan, S., & Obermeyer, Z. (2015). Prediction policy problems. The American Economic Review, 105(5), 491-495. Gabor, D. (1946). Theory of Communication. Journal of the Optical Society of America-A, 2 (2), 1455-1471. Glaeser, E. L., Kominers, S. D., Luca, M., & Naik, N. (2015). Big Data and Big Cities: The Promises and Limitations of Improved Measures of Urban Life (No. w21778). National Bureau of Economic Research. Huettner, Frank, and Marco Sunder. "Axiomatic arguments for decomposing goodness of fit according to Shapley and Owen values." Electronic Journal of Statistics 6 (2012): 1239-1250. Israeli, Osnat. "A Shapley-based decomposition of the R-square of a linear regression." The Journal of Economic Inequality 5.2 (2007): 199-212. Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (pp. 1097-1105). Michalopoulos, S. (2012). The origins of ethnolinguistic diversity. The American economic review, 102(4), 1508. 32 Mooney, D. F., Larson, J. A., Roberts, R. K., & English, B. C. (2009). Economics of the variable rate technology investment decision for agricultural sprayers. In Southern agricultural economics association annual meeting, Atlanta, Georgia, January. Mullahy, J. (1998). Much ado about two: reconsidering retransformation and the two-part model in health econometrics. Journal of health economics, 17(3), 247-281. Mullainathan, S. (2014, August). Bugbears or legitimate threats?:(social) scientists' criticisms of machine learning?. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 4-4). ACM. Newhouse, David; Suarez Becerra, Pablo; Doan, Dung. 2016. Sri Lanka Poverty and Welfare: Recent Progress and Remaining Challenges. World Bank, Washington, DC. © World Bank. https://www.openknowledge.worldbank.org/handle/10986/23794 License: CC BY 3.0 IGO. Nunn, N., & Puga, D. (2012). Ruggedness: The blessing of bad geography in Africa. Review of Economics and Statistics, 94(1), 20-36. M. Pesaresi, A. Gerhardinger, and F. Kayitakire, “A robust built-up area presence index by anisotropic rotation-invariant textural measure,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 1, no. 3, pp. 180-192, Oct, 2008. Pinkovskiy, Maxim, and Xavier Sala-i-Martin. "Lights, Camera… Income! Illuminating the National Accounts-Household Surveys Debate." The Quarterly Journal of Economics 131.2 (2016): 579-631. Sandborn, A. and Engstrom, R (In Press) Determining the Relationship Between Census Data and Spatial Features Derived From High Resolution Imagery in Accra, Ghana. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (JSTARS) Special Issue on Urban Remote Sensing Serajuddin, U., Uematsu, H., Wieser, C., Yoshida, N., & Dabalen, A. (2015). Data deprivation: another deprivation to end. World Bank Policy Research Working Paper, (7252). Shorrocks, Anthony F. "Decomposition procedures for distributional analysis: a unified framework based on the Shapley value." Journal of Economic Inequality (2013): 1-28. Tucker CJ (1979). Red and photographic infrared linear combinations for monitoring vegetation. Remote Sensing of Environment 8: 127-150. Watmough, G. R., Atkinson, P. M., Saikia, A., & Hutton, C. W. (2016). Understanding the Evidence Base for Poverty–Environment Relationships using Remotely Sensed Satellite Data: An Example from Assam, India. World Development, 78, 188-203. Jean, N., Burke, M., Xie, M., Davis, W. M., Lobell, D. B., & Ermon, S. (2016). Combining satellite imagery and machine learning to predict poverty. Science, 353(6301), 790-794. Marx, B., Stoker, T. M., & Suri, T. (2013). The Political Economy of Ethnicity and Property Rights in Slums: Evidence from Kenya. Varian, H. R. (2014). Big data: New tricks for econometrics. The Journal of Economic Perspectives, 28(2), 3-27. 33 W. P. Yu, G. W. Chu, and M. J. Chung, “A robust line extraction method by unsupervised line clustering,” Pattern Recognition, vol. 32, no. 4, pp. 529-546, Apr, 1999. L. Wang, and D. He, “Texture classification using texture spectrum,” Pattern Recognition, vol. 23, no. 8, pp. 905-910, 1990. Wong, T. H., Mansor, S. B., Mispan, M. R., Ahmad, N., & Sulaiman, W. N. A. (2003, May). Feature extraction based on object oriented analysis. InProceedings of ATC 2003 Conference (Vol. 2021). 34