Thematic Mapping with tmap

library(tidyverse)       # Tidyverse for Tidy Data
library(readxl)
library(tmap)            # Thematic Mapping
library(tmaptools) 
library(tigris)          # Get Census Geography Poloygons
library(sf)

Shapefiles as sf

Using the tigris package, get Census Tiger shapefiles for census geographies. Tigris will return the shapefile in the sf, or simple features, format.

us_geo <- tigris::states(class = "sf") 

Data Structure

The “Simple Features” (sf) data structure can easily be viewed and manipulated as a rectangular data frame, before visualizing. As an historical note – an sf predecessor – the sp data structure uses @data slots to hold data. We’ll focus on the sf package. Below are two methods of viewing the structure of the downloaded shapefiles.

class(us_geo)
[1] "sf"         "data.frame"
glimpse(us_geo)
Rows: 56
Columns: 15
$ REGION   <chr> "3", "3", "2", "2", "3", "1", "4", "1", "3", "1", "~
$ DIVISION <chr> "5", "5", "3", "4", "5", "1", "8", "1", "5", "1", "~
$ STATEFP  <chr> "54", "12", "17", "27", "24", "44", "16", "33", "37~
$ STATENS  <chr> "01779805", "00294478", "01779784", "00662849", "01~
$ GEOID    <chr> "54", "12", "17", "27", "24", "44", "16", "33", "37~
$ STUSPS   <chr> "WV", "FL", "IL", "MN", "MD", "RI", "ID", "NH", "NC~
$ NAME     <chr> "West Virginia", "Florida", "Illinois", "Minnesota"~
$ LSAD     <chr> "00", "00", "00", "00", "00", "00", "00", "00", "00~
$ MTFCC    <chr> "G4000", "G4000", "G4000", "G4000", "G4000", "G4000~
$ FUNCSTAT <chr> "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "~
$ ALAND    <dbl> 62266231560, 138947364717, 143779863817, 2062300654~
$ AWATER   <dbl> 489271086, 31362872853, 6215723896, 18942261495, 69~
$ INTPTLAT <chr> "+38.6472854", "+28.4574302", "+40.1028754", "+46.3~
$ INTPTLON <chr> "-080.6183274", "-082.4091477", "-089.1526108", "-0~
$ geometry <MULTIPOLYGON [°]> MULTIPOLYGON (((-81.74725 3..., MULTIP~

sf as data frame

And as noted, here’s the data frame view. Notice the geometry (polygon shape) in the far-right column of the data frame.

as_tibble(us_geo)
# A tibble: 56 x 15
   REGION DIVISION STATEFP STATENS  GEOID STUSPS NAME      LSAD  MTFCC
   <chr>  <chr>    <chr>   <chr>    <chr> <chr>  <chr>     <chr> <chr>
 1 3      5        54      01779805 54    WV     West Vir~ 00    G4000
 2 3      5        12      00294478 12    FL     Florida   00    G4000
 3 2      3        17      01779784 17    IL     Illinois  00    G4000
 4 2      4        27      00662849 27    MN     Minnesota 00    G4000
 5 3      5        24      01714934 24    MD     Maryland  00    G4000
 6 1      1        44      01219835 44    RI     Rhode Is~ 00    G4000
 7 4      8        16      01779783 16    ID     Idaho     00    G4000
 8 1      1        33      01779794 33    NH     New Hamp~ 00    G4000
 9 3      5        37      01027616 37    NC     North Ca~ 00    G4000
10 1      1        50      01779802 50    VT     Vermont   00    G4000
# ... with 46 more rows, and 6 more variables: FUNCSTAT <chr>,
#   ALAND <dbl>, AWATER <dbl>, INTPTLAT <chr>, INTPTLON <chr>,
#   geometry <MULTIPOLYGON [°]>

Quick Plotting

If you want to see a very quick view of your mapping data, you can plot the geometry data with the plot function. In this case we use the sf::st_geometry() function to plot only the geometry. You can quickly generate a faceted map by excluding the st_geometry function: e.g. plot(us_geo) but that will consume computation cycles (i.e. wait time). Therefore, I recommend trying the smaller layer for now.

Note: Census geography for the USA will span the globe in part becuase Region 9 includes a multitude of pacific islands. Later we will limit to simply the “lower 48” states.

plot(st_geometry(us_geo))

Get BLS data

I’ve already downloaded and stored some data from the Bureau of Labor Statistics. Those data are stored in an excel file in the data directory of the repository: data/OES_Report.xlsx. The goal is to attach this data to the previously downloaded shapefiles.

But you may be interested in how I gathered the data. below are some summary notes documenting my steps of gathering the data from the Bureau of Labor Statistics.

https://data.bls.gov/oes/#/occGeo/One%20occupation%20for%20multiple%20geographical%20areas

Salary4Helpers <- 
  read_excel("data/OES_Report.xlsx",
             col_types = c("text", "numeric"), 
             skip = 4)

Salary4Helpers
# A tibble: 58 x 2
   `Area Name`                   `Annual mean wage(2)`
   <chr>                                         <dbl>
 1 Alabama(0100000)                              38080
 2 Alaska(0200000)                               46050
 3 Arizona(0400000)                              34560
 4 Arkansas(0500000)                             40480
 5 California(0600000)                           64320
 6 Colorado(0800000)                             43480
 7 Connecticut(0900000)                          60450
 8 Delaware(1000000)                             54450
 9 District of Columbia(1100000)                 57210
10 Florida(1200000)                              43410
# ... with 48 more rows

Wrangle the data

Before we join the BLS data to the shapefile we need to transform the structure of the downloaded BLS data

BlsWage_ToJoin <- Salary4Helpers %>% 
  rename(Area = "Area Name") %>% 
  rename(wages = "Annual mean wage(2)") %>% 
  mutate(State = gsub("\\(\\d{7}\\)", "", Area)) %>% 
  filter(wages != "NA_character_") %>% 
  select(State, wages)

#BlsWage_ToJoin

Join data

Use the dplyr::left_join() function or the base-R merge() function to append BLS data to the previously loaded shape object

HelperShapeObject <- left_join(us_geo, BlsWage_ToJoin, by = c("NAME" = "State"))

as_tibble(HelperShapeObject)
# A tibble: 56 x 16
   REGION DIVISION STATEFP STATENS  GEOID STUSPS NAME      LSAD  MTFCC
   <chr>  <chr>    <chr>   <chr>    <chr> <chr>  <chr>     <chr> <chr>
 1 3      5        54      01779805 54    WV     West Vir~ 00    G4000
 2 3      5        12      00294478 12    FL     Florida   00    G4000
 3 2      3        17      01779784 17    IL     Illinois  00    G4000
 4 2      4        27      00662849 27    MN     Minnesota 00    G4000
 5 3      5        24      01714934 24    MD     Maryland  00    G4000
 6 1      1        44      01219835 44    RI     Rhode Is~ 00    G4000
 7 4      8        16      01779783 16    ID     Idaho     00    G4000
 8 1      1        33      01779794 33    NH     New Hamp~ 00    G4000
 9 3      5        37      01027616 37    NC     North Ca~ 00    G4000
10 1      1        50      01779802 50    VT     Vermont   00    G4000
# ... with 46 more rows, and 7 more variables: FUNCSTAT <chr>,
#   ALAND <dbl>, AWATER <dbl>, INTPTLAT <chr>, INTPTLON <chr>,
#   wages <dbl>, geometry <MULTIPOLYGON [°]>

Quick Thematic Map

qtm(HelperShapeObject, fill = "wages")

50 states

Filter to only the contiguous 50 states + D.C. Note that tigris::shift_geometry() will shift and resize Alaska and Hawaii.

contiguous_states <- HelperShapeObject %>% 
  filter(REGION != 9) %>% 
  shift_geometry()

Make Choropleth

tm_shape(contiguous_states) +
  tm_polygons("wages", id = "Name")

Projection

Mark likes the USA_Contiguous_Albers_Equal_Area_Conic_USGS_version projection for the continental US. EPSG:5070

contiguous_states %>% 
  st_transform(5070) %>% 
  tm_shape() +
  tm_polygons("wages", id = "Name") 

Alternative Syntax

tm_shape(contiguous_states, projection = 5070) +
  tm_polygons("wages", id = "Name")

Explore tmap syntax and functions

tm_shape(contiguous_states, projection = 5070) +
  tm_borders(col = "black", alpha = 0.4) +
  tm_fill(col = "REGION", alpha = 0.6) +
  tm_layout(title = "Regions of the USA", 
                  attr.color = "navy", 
                  title.position = c("center", "top"), 
                  title.bg.color = "lightblue")

End Notes

This session inspired by https://www.computerworld.com/article/3175623/data-analytics/mapping-in-r-just-got-a-whole-lot-easier.html

Corrections

If you see mistakes or want to suggest changes, please create an issue on the source repository.

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. Source code is available at https://github.com/libjohn/mapping-with-R, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".