Highest-Resolution Income Map of the US
March, 2019
I couldn't find a high-resolution map of household income for the whole United States, so I decided to make one. This map uses the highest-resolution available data form the US Census 5-year ACS, which goes down to the block group level (each block group contains about 1000 people).
A full resolution version is available here. I also have an interactive version, but due to a bug in Mapbox (or maybe some corrupt geometry from TIGER/Line) some of the block groups don't render.
I'm guessing that I couldn't find this map anywhere else because it was surprisingly difficult to make. Lower-resolution Census data is easy to access, but to get the raw, block group level estimates requires going through different channels that are quite cumbersome. It also requires downloading and processing a large amount of data, including many parameters which have nothing to do with income. Here are the instructions which I wrote down in case I needed to get the data again:
I got the data itself from the Census's FTP server. In particular, at this rather hard to find path: https://www2.census.gov/programs-surveys/acs/summary_file/2017/data/5_year_entire_sf/
There is some documentation in this PDF as well: https://www2.census.gov/programs-surveys/acs/summary_file/2017/documentation/tech_docs/2017_SummaryFile_Tech_Doc.pdf
The way that the data is stored is confusing to say the least. Here was the basic workflow:
- Every property has a "sequence number" which tells you which file it is stored in. Use an excel doc from here to find the sequence number for your property. In the case of median household income, it is 59.
- Somewhere (I forget where I downloaded them), there is a file called seq59.xlsx which shows what properties correspond with what columns of file 59. In the case of median household income, it is in position 177. So now we have that column 177 of file 59 will be the median household incomes.
- In the Tracts_Block_Groups_Only folder, there are folders for each state. In each of those folders, there will be folders of this form: e20175ma0059000.txt and m20175ma0059000.txt (note the first letter). The files that start with "e" are the estimates and "m" are the error margins. There are also "g" files but those will be important later. The 59 in the middle of "e20175ma0059000.txt" means that this is file for sequence number 59 (the one we want). There are other files for other sequence numbers too. We can now extract the 177th column of every row of the 59 file for every state.
- It gets more complicated if you want to associate this data with geography. In the folder for each state, there is a file of the form "g20175ma.csv". This file correlates rows with geography. Using a file called "2017_SFGeoFileTemplate.xls" (which I also forget where I got it), you get that the 5th element of each row of the "g" file is the "logical record number", and the 5th from last is the "geo id". Get the mapping from logical record numbers to geo ids.
- You will also get from seq59.xlsx that the 6th element is the logical record number. So now, you have for each row, the median household income and the logical record number. From the logical record number we can get the geo id (which addresses the block group). I should also mention that as far as I know, the ACS is only published at the block group level as opposed to the block level for privacy reasons.
- Finally, you can get shapefiles for every block group in each state from using the instructions in https://www2.census.gov/geo/tiger/GENZ2017/2017_file_name_def.pdf. These are labeled with geo ids, however they are in a slightly different format. After writing some simple code for conforming the geo ids you can finally correlate the median household incomes with logical record numbers which correlate with geo ids which correlate with polygons from the shapefiles!