Poverty mapper
Harnessing the power of satellite data to map poverty.
The European Space Agency. Sentinel-2 Satellite.
About the Project
“…data is often most scarce in the areas where it is most desperately needed.”
Haishan Fu, Director of the World Bank’s Development Data Group
Poverty Mapper is a capstone project for the UC Berkeley Master of Data and Information Science program. The project uses deep learning on satellite data to make predictions about poverty prevalence within and across five Asian countries: Bangladesh, Nepal, The Philippines, Tajikistan, and Timor Leste.
Our mission is to help international development NGOs to make better decisions about how to allocate resources by filling gaps in poverty data.
Billions of dollars are spent on poverty reduction every year, but NGOs often lack timely information about where the poorest are located. Poverty data is historically gathered through national household surveys, which are costly and time-consuming to conduct. A 2015 study from the UN estimates that conducting a survey in a single country costs approximately $1.6 million USD.
The map below shows that only 16% of low and middle income countries have a recent available Demographic and Health Survey, the largest and longest running survey data program. Organizations like the World Bank also share poverty data, but it is usually at the regional or sub-regional level and is often based on projections due to lack of recent data.
Map data from DHS API accessed 11/26/2021
Open-source satellite imagery provides a unique advantage:
- Existing data source
- Available nearly real-time
- Global coverage (including hard to reach areas)
- Can be used for local-level prediction
mikemacmarketing / photo on flickr, CC BY 2.0, via Wikimedia Commons
Methods
Deep learning models are used to learn patterns in satellite imagery associated with the relative wealth of a geographic area. This method is applied to five Asian countries: Bangladesh, Nepal, The Philippines, Tajikistan, and Timor Leste.
The regional focus and countries were selected due to the availability of recent survey data for model evaluation. This project builds on existing research that has explored the use of satellite data to predict poverty across countries in Sub-Saharan Africa (Yeh et al., 2020, Lee K. & Braithwaite, J. 2020).
A global, cloud-free composite prepared by the European Commission Joint Research Center is the primary data source for the project. The composite has 10 meters spatial resolution and atmospheric corrections have been applied for image clarity.
Nationally representative Demographic and Health Surveys, provide ground truth data for model training. We use all available datasets for Central/South/Southeast Asia from the previous five years.
NASA’s Gridded Population of the World allows us to identify populated areas. We use 2020 population estimates which are provided at a resolution of approximately 1 square kilometer.
Satellite image from European Space Agency; Household Survey image by S. Phommavong.
We derive the International Wealth Index (IWI) from the Demographic and Health Surveys. The index is comparable across countries and is calculated using information about household assets.
- Consumer durables ( e.g. TV, fridge, phone, bike, car)
- Access to 2 public services (water and electricity)
- Housing characteristics (number of sleeping rooms, quality of floor material and toilet)
The index provides a score between 0 and 100 that indicates the asset wealth of a household. After calculating the wealth index for each household, we find the median asset wealth for each survey area.
Dataset Generation Process
- Create tiles: Tile satellite image composite into 224×224 pixel tiles, with each tile covering an area of about 5 sq km.
- Verify geography: Check whether each image tile is within its respective UTM zone (a map projection system that divides the world into 60 zones) and overlaps one of the project countries (Bangladesh, Nepal, The Philippines, Tajikistan, or Timor Leste).
- Check population density: Check whether image tile is in a populated area, defined as 50 or more people.
- Convert to PNG: Convert image to RGB PNG format
- Assign label: Assign image label through distance-based weighted averaging of wealth index within a 5 kilometer radius. This threshold was chosen because the center coordinates for each group of survey households is displaced for privacy up to 2km in urban areas and 5km in rural areas.
Dataset Characteristics
In the compiled dataset, the Philippines has the most satellite images due to its large land mass, while Bangladesh has the highest population density.
The histogram (below) also shows that there are differences in wealth index distribution by country. Tajikistan and the Philippines have more images at the higher end of the wealth index and Bangladesh and Nepal have more images at the lower end of the wealth index.
Labeled images are fed into a deep learning model called a convolutional neural network (CNN). This model learns patterns in these images and can then be used to make predictions on new unseen images. Our modeling approach involves:
- Transfer Learning
- Domain Adaptation
- Image Augmentation
- Resampling
Transfer learning is taking learning from one task and applying it to another task. We used models pre-trained on the ImageNet database and adapted them to predict wealth index from satellite imagery. This approach is often more effective than building a model from scratch because it leverages learnings from a very large dataset. It also saves time and compute resources. Domain adaptation is taking a model trained in one domain and applying it to another domain. In our case, domains are different country contexts. During training, each satellite image was flipped eight different ways to prevent overfitting. To address issues with class imbalance, training data was resampled to have balanced classes.
We utilized an AWS-based machine learning pipeline and the PyTorch machine learning framework for modeling.
Results
We trained two types of models: within country and across country. Within country models only use data from within a country. Across country models are trained using data from four countries and validated and tested on a held out country.
The wealth index is binned into classes (within each country for within country models and across all countries for across country models). We began by binning the wealth index into five quantiles. However, due to weak model performance, the task was simplified to a binary prediction problem – Is an image in the bottom 20% of the wealth index?
Model Training
The best results were achieved by fine tuning a resnet50 pretrained model. Models were tuned by varying the number of epochs, learning rate, step size, and gamma. The learning rate is a key hyperparameter that determines how much to adjust the weights of a network with respect to the gradient of the loss. The learning rate decays by a factor of gamma for every step size number of epochs that a model is trained.
Evaluation Metric
The primary evaluation metric is F1 score. A measure of accuracy used in binary classification, F1 score is the harmonic mean of precision and recall. It is a useful metric when both precision and recall are important for the prediction task. The selected model for each model type and country is the model with the highest validation F1 score across any epoch.
Within Country Results – Best performance
The within country models performed better than the across country models. The Bangladesh model had the best overall performance (F1 score 0.57). Timor Leste has less data than the other countries and overfitting appears to have impacted the results. The within country models are able to correctly identify 25 to 50% of areas in the bottom 20% of the wealth index. When a model makes a prediction that an area is in the bottom 20% of the wealth index, it is 56 to 68% accurate. While there is some signal, these models are not accurate enough for an NGO to use for decision making.
We also looked at F1 score by urban vs. rural areas, but we did not see a clear pattern across countries. However, we did see a higher F1 score, for all countries except one, when a smaller range of values was used to calculate an image label (i.e. the survey areas around the image had more similar wealth indices).
Across County Results – lagged in performance
The across country models lagged in performance. The large drop in F1 score between validation and test, indicates a significant problem with overfitting in the Bangladesh and Timor Leste models. There is also large variation in precision and recall across models. There are no results for Tajikistan because, when binning across countries, Tajikistan is entirely in the upper 80th percentile.
Similar Geography Experiments
To test whether domain adaptation – the ability of a model to generalize from one country context to another – was impacting our across country results, we conducted similar geography experiments. We grouped countries into sub-regions: Central/South Asia and Southeast Asia. We actually saw worse performance for across country prediction within similar geographies. This could be due to the fact that only looking at similar geographies reduces the size of the training datasets.
Example Images
These images illustrate the challenge of the prediction task and challenges around error analysis. It is not easy for the human eye to distinguish between the bottom 20% and upper 80% classes. Future work could involve attempting to isolate the image elements that drive model predictions.
Poverty Map
User Guide
The asset based International Wealth Index is used as a proxy to measure poverty. The map shows areas that were predicted to be in the bottom 20% and upper 80% of a country’s wealth index distribution. See Methods section for more details.
The predictions on this map were made using deep learning on satellite imagery. A separate model was trained for each country. See Methods section for more details.
This varies by country with models correctly identifying 25 to 50% of areas in the bottom 20% of the wealth index. When a model makes a prediction that an area is in the bottom 20%, it is 56 to 68% accurate. See Results section for more details.
This map is intended as a proof of concept and should not be used for decision making purposes.