Using the tigris
package, get Census Tiger shapefiles for census geographies. Tigris will return the shapefile in the sf
, or simple features, format.
us_geo <- tigris::states(class = "sf")
The “Simple Features” (sf
) data structure can easily be viewed and manipulated as a rectangular data frame, before visualizing. As an historical note – an sf
predecessor – the sp
data structure uses @data
slots to hold data. We’ll focus on the sf
package. Below are two methods of viewing the structure of the downloaded shapefiles.
class(us_geo)
[1] "sf" "data.frame"
glimpse(us_geo)
Rows: 56
Columns: 15
$ REGION <chr> "3", "3", "2", "2", "3", "1", "4", "1", "3", "1", "~
$ DIVISION <chr> "5", "5", "3", "4", "5", "1", "8", "1", "5", "1", "~
$ STATEFP <chr> "54", "12", "17", "27", "24", "44", "16", "33", "37~
$ STATENS <chr> "01779805", "00294478", "01779784", "00662849", "01~
$ GEOID <chr> "54", "12", "17", "27", "24", "44", "16", "33", "37~
$ STUSPS <chr> "WV", "FL", "IL", "MN", "MD", "RI", "ID", "NH", "NC~
$ NAME <chr> "West Virginia", "Florida", "Illinois", "Minnesota"~
$ LSAD <chr> "00", "00", "00", "00", "00", "00", "00", "00", "00~
$ MTFCC <chr> "G4000", "G4000", "G4000", "G4000", "G4000", "G4000~
$ FUNCSTAT <chr> "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "~
$ ALAND <dbl> 62266231560, 138947364717, 143779863817, 2062300654~
$ AWATER <dbl> 489271086, 31362872853, 6215723896, 18942261495, 69~
$ INTPTLAT <chr> "+38.6472854", "+28.4574302", "+40.1028754", "+46.3~
$ INTPTLON <chr> "-080.6183274", "-082.4091477", "-089.1526108", "-0~
$ geometry <MULTIPOLYGON [°]> MULTIPOLYGON (((-81.74725 3..., MULTIP~
And as noted, here’s the data frame view. Notice the geometry (polygon shape) in the far-right column of the data frame.
as_tibble(us_geo)
# A tibble: 56 x 15
REGION DIVISION STATEFP STATENS GEOID STUSPS NAME LSAD MTFCC
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 3 5 54 01779805 54 WV West Vir~ 00 G4000
2 3 5 12 00294478 12 FL Florida 00 G4000
3 2 3 17 01779784 17 IL Illinois 00 G4000
4 2 4 27 00662849 27 MN Minnesota 00 G4000
5 3 5 24 01714934 24 MD Maryland 00 G4000
6 1 1 44 01219835 44 RI Rhode Is~ 00 G4000
7 4 8 16 01779783 16 ID Idaho 00 G4000
8 1 1 33 01779794 33 NH New Hamp~ 00 G4000
9 3 5 37 01027616 37 NC North Ca~ 00 G4000
10 1 1 50 01779802 50 VT Vermont 00 G4000
# ... with 46 more rows, and 6 more variables: FUNCSTAT <chr>,
# ALAND <dbl>, AWATER <dbl>, INTPTLAT <chr>, INTPTLON <chr>,
# geometry <MULTIPOLYGON [°]>
If you want to see a very quick view of your mapping data, you can plot the geometry data with the plot
function. In this case we use the sf::st_geometry()
function to plot only the geometry. You can quickly generate a faceted map by excluding the st_geometry
function: e.g. plot(us_geo)
but that will consume computation cycles (i.e. wait time). Therefore, I recommend trying the smaller layer for now.
Note: Census geography for the USA will span the globe in part becuase Region 9 includes a multitude of pacific islands. Later we will limit to simply the “lower 48” states.
plot(st_geometry(us_geo))
I’ve already downloaded and stored some data from the Bureau of Labor Statistics. Those data are stored in an excel file in the data
directory of the repository: data/OES_Report.xlsx
. The goal is to attach this data to the previously downloaded shapefiles.
But you may be interested in how I gathered the data. below are some summary notes documenting my steps of gathering the data from the Bureau of Labor Statistics.
https://data.bls.gov/oes/#/occGeo/One%20occupation%20for%20multiple%20geographical%20areas
One occupation for multiple geographical areas
Mental Health and Substance Abuse Social Workers
State
All States in this list
Annual Mean wage
Read the Data in with the RStudio “Import Dataset” wizard available in the Environment tab. This will generate the code below and ensure the import
Salary4Helpers <-
read_excel("data/OES_Report.xlsx",
col_types = c("text", "numeric"),
skip = 4)
Salary4Helpers
# A tibble: 58 x 2
`Area Name` `Annual mean wage(2)`
<chr> <dbl>
1 Alabama(0100000) 38080
2 Alaska(0200000) 46050
3 Arizona(0400000) 34560
4 Arkansas(0500000) 40480
5 California(0600000) 64320
6 Colorado(0800000) 43480
7 Connecticut(0900000) 60450
8 Delaware(1000000) 54450
9 District of Columbia(1100000) 57210
10 Florida(1200000) 43410
# ... with 48 more rows
Before we join the BLS data to the shapefile we need to transform the structure of the downloaded BLS data
Use the dplyr::left_join()
function or the base-R merge()
function to append BLS data to the previously loaded shape object
HelperShapeObject <- left_join(us_geo, BlsWage_ToJoin, by = c("NAME" = "State"))
as_tibble(HelperShapeObject)
# A tibble: 56 x 16
REGION DIVISION STATEFP STATENS GEOID STUSPS NAME LSAD MTFCC
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 3 5 54 01779805 54 WV West Vir~ 00 G4000
2 3 5 12 00294478 12 FL Florida 00 G4000
3 2 3 17 01779784 17 IL Illinois 00 G4000
4 2 4 27 00662849 27 MN Minnesota 00 G4000
5 3 5 24 01714934 24 MD Maryland 00 G4000
6 1 1 44 01219835 44 RI Rhode Is~ 00 G4000
7 4 8 16 01779783 16 ID Idaho 00 G4000
8 1 1 33 01779794 33 NH New Hamp~ 00 G4000
9 3 5 37 01027616 37 NC North Ca~ 00 G4000
10 1 1 50 01779802 50 VT Vermont 00 G4000
# ... with 46 more rows, and 7 more variables: FUNCSTAT <chr>,
# ALAND <dbl>, AWATER <dbl>, INTPTLAT <chr>, INTPTLON <chr>,
# wages <dbl>, geometry <MULTIPOLYGON [°]>
qtm(HelperShapeObject, fill = "wages")
Filter to only the contiguous 50 states + D.C. Note that tigris::shift_geometry()
will shift and resize Alaska and Hawaii.
contiguous_states <- HelperShapeObject %>%
filter(REGION != 9) %>%
shift_geometry()
tm_shape(contiguous_states) +
tm_polygons("wages", id = "Name")
Mark likes the USA_Contiguous_Albers_Equal_Area_Conic_USGS_version projection for the continental US. EPSG:5070
contiguous_states %>%
st_transform(5070) %>%
tm_shape() +
tm_polygons("wages", id = "Name")
tm_shape(contiguous_states, projection = 5070) +
tm_polygons("wages", id = "Name")
tmap
syntax and functionstm_shape(contiguous_states, projection = 5070) +
tm_borders(col = "black", alpha = 0.4) +
tm_fill(col = "REGION", alpha = 0.6) +
tm_layout(title = "Regions of the USA",
attr.color = "navy",
title.position = c("center", "top"),
title.bg.color = "lightblue")
This session inspired by https://www.computerworld.com/article/3175623/data-analytics/mapping-in-r-just-got-a-whole-lot-easier.html
If you see mistakes or want to suggest changes, please create an issue on the source repository.
Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. Source code is available at https://github.com/libjohn/mapping-with-R, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".