library(tidyverse)
library(readxl)
library(tigris) # Get Census Geography Poloygons
library(sf)
library(tidycensus)
Faceted Thematic Mapping
Shapefiles as sf
Using the tigris
package get Census Tiger shapefiles for census geographies. Tigris will return the shapefile in the sf
, or simple features, format.
<- tigris::states(class = "sf", cb = TRUE) %>% #cb is a more generalized, less detailed file.
us_geo shift_geometry() %>%
filter(as.numeric(GEOID) < 60)
Get BLS data
I’ve already downloaded and stored some data from the Bureau of Labor Statistics. Those data are stored in an excel file in the data
directory of the github repository: data/OES_Report.xlsx
. The goal is to attach this data to the previously downloaded shapefiles.
But you may be interested in how I gathered the data. below are some summary notes documenting my steps of gathering the data from the Bureau of Labor Statistics.
https://data.bls.gov/oes/#/occGeo/One%20occupation%20for%20multiple%20geographical%20areas
One occupation for multiple geographical areas
Mental Health and Substance Abuse Social Workers (Also, Secondary School Teacher, Waiter, Legislator, and Paralegals)
State
All States in this list
Annual Mean wage
- Excel
Import the data into R
<- fs::dir_ls(path = "data", glob = "*.xlsx")
my_xl_files
<- my_xl_files %>%
my_df map_dfr(read_excel,
col_types = c("text", "numeric"),
skip = 4,
.id = "sheet")
<- us_geo %>%
state_names select(NAME) %>%
st_drop_geometry()
Wrangle the data
Before we join the BLS data to the shapefile us_geo
we need to transform BLS data
<- my_df %>%
my_df rename(area = "Area Name",
wages = "Annual mean wage(2)",
type = sheet) %>%
mutate(State = str_extract(area, '.*(?=\\()')) %>%
mutate(type = str_extract(type, "(?<=data/OES_)\\w+"))
Missing data
Some of the BLS data are missing making it hard to visualize. As a remedy to this problem we will wrangle the data by preserving shape geometry even for states without any wage data.
<- state_names %>%
missing_states_legislators anti_join(my_df %>% filter(type == "legislator"),
by = c("NAME" = "State")) %>%
mutate(type = "legislator") %>%
rename(State = NAME)
missing_states_legislators
<- my_df %>%
my_df bind_rows(missing_states_legislators)
Join data
Using the dplyr::left_join
, merge BLS data with the us_geo
geometry, i.e. the shapefile object
<- us_geo %>%
my_df left_join(my_df, by = c("NAME" = "State"))
Get census data – tidycensus
Identify and pick census variables
If you’re not sure what 2015 ACS Census data variables you need, you’ll want to investigate the variables.
<- load_variables(2015, "acs5", cache = TRUE) variables_census
I’m using Median income.
- B01003_001E = Total Population
- B06011_001E = Median income in the past 12 months
Note: I realize mean and median are not the same measure. This is a demonstration of procedure, not a recommendation for research practice and data comparison. Of course, you will be more rigorous with your own research.
Get ACS median income
Now that I know the Census ACS variable, I can use the tidycensus::get_acs()
function to gather the mean income variable for each state, along with the geometry (i.e. shapefiles).
<-
us_pop get_acs(geography = "state",
variables = "B06011_001E",
geometry = TRUE) %>%
shift_geometry()
Getting data from the 2017-2021 5-year ACS
Downloading feature geometry from the Census website. To cache shapefiles for use in future sessions, set `options(tigris_use_cache = TRUE)`.
<- us_pop %>%
us_pop mutate(type = "USA Median Income") %>%
rename(wages = estimate)
Append Census data
Now combine the tidycensus data and geometry (i.e. us_pop
variable data & shapefiles) with the BLS data and previously associate shapefiles gatheried via tigris::states()
<- bind_rows(my_df, us_pop) my_df
More munging
Make the category variable a categorical factor with levels. This will improve the order of the facets when displayed.
<- c("USA Median Income", "Legislator",
display_levels "Paralegals", "Substance Abuse Counselor",
"Teacher", "Waiters")
<- my_df %>%
my_df mutate(category = case_when(
== "USA Median Income" ~ "USA Median Income",
type == "legislator" ~ "Legislator",
type == "paralegals" ~ "Paralegals",
type == "Report" ~ "Substance Abuse Counselor",
type == "secondary_school_teacher" ~ "Teacher",
type == "waiters" ~ "Waiters"
type %>%
)) mutate(category = factor(category, display_levels))
Display map
%>%
my_df ggplot(aes(fill = wages, color = wages)) +
geom_sf() +
coord_sf(crs = 5070, datum = NA) +
scale_fill_viridis_c(labels = c("$20K", "$55K", "$87K"), breaks = c(20000, 55000, 87000)) +
scale_color_viridis_c(labels = c("$20K", "$55K", "$87K"), breaks = c(20000, 55000, 87000)) +
facet_wrap(~ category,
nrow = 3, ncol = 2) +
theme(legend.position = "top") +
labs(title = "2015 Mean USA Wages by Professions",
subtitle = "A comparison of BLS mean income with Census median income",
color = "", fill = "",
caption = "Source: BLS & Census")
Save map
ggsave("facet_map.svg", width = 8, height = 9, units = "in")