library(tidyverse)
library(readxl)
library(tigris) # Get Census Geography Poloygons
library(sf)
library(tidycensus)Faceted Thematic Mapping
Shapefiles as sf
Using the tigris package get Census Tiger shapefiles for census geographies. Tigris will return the shapefile in the sf, or simple features, format.
us_geo <- tigris::states(class = "sf", cb = TRUE) %>% #cb is a more generalized, less detailed file.
shift_geometry() %>%
filter(as.numeric(GEOID) < 60)Get BLS data
I’ve already downloaded and stored some data from the Bureau of Labor Statistics. Those data are stored in an excel file in the data directory of the github repository: data/OES_Report.xlsx. The goal is to attach this data to the previously downloaded shapefiles.
But you may be interested in how I gathered the data. below are some summary notes documenting my steps of gathering the data from the Bureau of Labor Statistics.
https://data.bls.gov/oes/#/occGeo/One%20occupation%20for%20multiple%20geographical%20areas
One occupation for multiple geographical areas
Mental Health and Substance Abuse Social Workers (Also, Secondary School Teacher, Waiter, Legislator, and Paralegals)
State
All States in this list
Annual Mean wage
- Excel
Import the data into R
my_xl_files <- fs::dir_ls(path = "data", glob = "*.xlsx")
my_df <- my_xl_files %>%
map_dfr(read_excel,
col_types = c("text", "numeric"),
skip = 4,
.id = "sheet")
state_names <- us_geo %>%
select(NAME) %>%
st_drop_geometry() Wrangle the data
Before we join the BLS data to the shapefile us_geo we need to transform BLS data
my_df <- my_df %>%
rename(area = "Area Name",
wages = "Annual mean wage(2)",
type = sheet) %>%
mutate(State = str_extract(area, '.*(?=\\()')) %>%
mutate(type = str_extract(type, "(?<=data/OES_)\\w+"))Missing data
Some of the BLS data are missing making it hard to visualize. As a remedy to this problem we will wrangle the data by preserving shape geometry even for states without any wage data.
missing_states_legislators <- state_names %>%
anti_join(my_df %>% filter(type == "legislator"),
by = c("NAME" = "State")) %>%
mutate(type = "legislator") %>%
rename(State = NAME)
missing_states_legislatorsState <chr> | type <chr> | |||
|---|---|---|---|---|
| South Dakota | legislator | |||
| Kentucky | legislator | |||
| Vermont | legislator | |||
| District of Columbia | legislator | |||
| Maine | legislator | |||
| North Carolina | legislator |
my_df <- my_df %>%
bind_rows(missing_states_legislators)Join data
Using the dplyr::left_join, merge BLS data with the us_geo geometry, i.e. the shapefile object
my_df <- us_geo %>%
left_join(my_df, by = c("NAME" = "State"))Get census data – tidycensus
Identify and pick census variables
If you’re not sure what 2015 ACS Census data variables you need, you’ll want to investigate the variables.
variables_census <- load_variables(2015, "acs5", cache = TRUE)I’m using Median income.
- B01003_001E = Total Population
- B06011_001E = Median income in the past 12 months
Note: I realize mean and median are not the same measure. This is a demonstration of procedure, not a recommendation for research practice and data comparison. Of course, you will be more rigorous with your own research.
Get ACS median income
Now that I know the Census ACS variable, I can use the tidycensus::get_acs() function to gather the mean income variable for each state, along with the geometry (i.e. shapefiles).
us_pop <-
get_acs(geography = "state",
variables = "B06011_001E",
geometry = TRUE) %>%
shift_geometry()Getting data from the 2017-2021 5-year ACS
Downloading feature geometry from the Census website. To cache shapefiles for use in future sessions, set `options(tigris_use_cache = TRUE)`.
us_pop <- us_pop %>%
mutate(type = "USA Median Income") %>%
rename(wages = estimate) Append Census data
Now combine the tidycensus data and geometry (i.e. us_pop variable data & shapefiles) with the BLS data and previously associate shapefiles gatheried via tigris::states()
my_df <- bind_rows(my_df, us_pop)More munging
Make the category variable a categorical factor with levels. This will improve the order of the facets when displayed.
display_levels <- c("USA Median Income", "Legislator",
"Paralegals", "Substance Abuse Counselor",
"Teacher", "Waiters")
my_df <- my_df %>%
mutate(category = case_when(
type == "USA Median Income" ~ "USA Median Income",
type == "legislator" ~ "Legislator",
type == "paralegals" ~ "Paralegals",
type == "Report" ~ "Substance Abuse Counselor",
type == "secondary_school_teacher" ~ "Teacher",
type == "waiters" ~ "Waiters"
)) %>%
mutate(category = factor(category, display_levels))Display map
my_df %>%
ggplot(aes(fill = wages, color = wages)) +
geom_sf() +
coord_sf(crs = 5070, datum = NA) +
scale_fill_viridis_c(labels = c("$20K", "$55K", "$87K"), breaks = c(20000, 55000, 87000)) +
scale_color_viridis_c(labels = c("$20K", "$55K", "$87K"), breaks = c(20000, 55000, 87000)) +
facet_wrap(~ category,
nrow = 3, ncol = 2) +
theme(legend.position = "top") +
labs(title = "2015 Mean USA Wages by Professions",
subtitle = "A comparison of BLS mean income with Census median income",
color = "", fill = "",
caption = "Source: BLS & Census") Save map
ggsave("facet_map.svg", width = 8, height = 9, units = "in")