This tutorial demonstrates how to do a drive time analysis in R. Drive time analyses can answer what areas lie within a specified driving time from a certain point, and what are the demographics of the people inside and outside that area.
The tutorial is adapted from code used in the analysis for In Appalachia and the Mississippi Delta, Millions Face Long Drives to Stroke Care. The story was a collaboration between KHN and InvestigateTV and published in May, 2021.
This is a simplified analysis based on that larger project. It looks at how many Mississippi state residents live within 45-minutes of a stroke center and how many live further away.
The basic process we’ll follow is:
Get a dataset of locations of interest
Retrieve a drive-time isochrone for each point
Calculate how much each Census block group in our area of interest overlaps with the isochrones
Get population data for each block group
Calculate how many people live within the isochrones and how many people live outside, using our block group data
IMPORTANT NOTE: the isochrones used in this analysis were calculated several weeks ago for the story published in early May. Road closures and other changes will affect the drive time results. If you try to replicate this and run the isochrone code here you will likely get slightly different results. This is to be expected.
This analysis requires intermediate R skills and some knowledge of geospatial operations. It also requires a HERE API key and a U.S. Census Bureau API key for data retrieval.
It requires the following packages. Install them if you haven’t already
install.packages(c("dplyr", "tidyr", "sf", "leaflet", "tigris", "ggplot2", "hereR", "censusapi"))
We’ll get stroke center locations from the published KHN analysis. You can read how we collected the locations in that repo.
library(dplyr)
library(tidyr)
library(sf)
library(leaflet)
library(tigris)
library(ggplot2)
library(hereR)
library(censusapi)
stroke_centers <- read.csv("https://raw.githubusercontent.com/khnews/2021-delta-appalachia-stroke-access/main/data/stroke-centers.csv", colClasses = c("fips_county" = "character", "zip" = "character"))
head(stroke_centers)
## hospital_id name certifier certification_type
## 1 10735007 SHELBY BAPTIST MEDICAL CENTER JC Primary
## 2 335211 BAPTIST MEDICAL CENTER - PRINCETON JC Primary
## 3 11935233 UNIVERSITY OF ALABAMA HOSPITAL JC Comprehensive
## 4 13535243 GRANDVIEW MEDICAL CENTER JC Primary
## 5 535259 SELECT SPECIALTY HOSPITAL JC Primary
## 6 11036301 SOUTHEAST ALABAMA MEDICAL CENTER DNV Comprehensive
## address city state zip fips_county
## 1 1000 FIRST STREET, NORTH ALABASTER AL 35007 01117
## 2 701 PRINCETON AVENUE, S.W. BIRMINGHAM AL 35211 01073
## 3 619 SOUTH 19TH STREET BIRMINGHAM AL 35233 01073
## 4 3690 GRANDVIEW PARKWAY BIRMINGHAM AL 35243 01073
## 5 2010 BROOKWOOD MEDICAL CENTER DRIVE BIRMINGHAM AL 35259 01073
## 6 1108 ROSS CLARK CIRCLE DOTHAN AL 36301 01069
## latitude longitude
## 1 33.25236 -86.81304
## 2 33.49801 -86.84719
## 3 33.50560 -86.80204
## 4 33.43293 -86.71744
## 5 33.46373 -86.77625
## 6 31.21630 -85.36363
# Make centers an sf object
stroke_centers <- st_as_sf(stroke_centers, coords = c("longitude", "latitude")) %>%
st_set_crs(4326)
The stroke centers dataset includes locations in the Mississippi Delta and Appalachia and surrounding states. For this tutorial we’ll just focus on those in and near Mississippi, but map the full dataset to see what it includes.
leaflet() %>%
setView(
-86.7667415602124, 36.188530779087586,
zoom = 5) %>%
addProviderTiles("CartoDB.Voyager") %>%
addCircles(
data = stroke_centers,
radius = 6,
color = "darkgreen",
opacity = 0.6,
popup = paste0(stroke_centers$name, "<br />",
stroke_centers$city, ", ", stroke_centers$state))
st_buffer
and st_intersects
We don’t need all of these locations for out Mississippi anaysis.
We’ll want to filter the dataset down to just stroke centers in and near Mississippi - anywhere you could possibly reach in a 45-minute drive from any point in the state.
To choose those relevant locations, we’ll create an 100 mile buffer from the state border. (100 miles is a lot for 45 minutes even if driving in a perfectly straight line, but let’s be safe.)
We’ll get high-quality state boundaries from the US Census Bureau using the tigris
package, select just Mississippi, and then expand the boundary by 100 miles in every direction.
Important note: for computations in sf we should use a planar projection, not a lat/long projection like we’d use for making Leaflet maps. We’ll use projection ESPG 2163.
states <- states()
ms_boundary <- states %>% filter(STUSPS == "MS")
ms_boundary <- ms_boundary %>% st_transform(2163)
Now make a buffer using the st_buffer()
function. Google tells me that 100 miles = 160,934 meters.
ms_buffer <- st_buffer(ms_boundary, dist = 160934)
# Show the buffer to make sure it makes sense
stroke_centers <- stroke_centers %>% st_transform(2163)
ggplot() +
geom_sf(data = ms_boundary, color = "blue") +
geom_sf(data = ms_buffer, fill = NA, color = "red") +
geom_sf(data = stroke_centers) +
theme_void()
Then filter the stroke centers to include just those within our 100 mile state buffer.
centers_ms <- stroke_centers %>% filter(st_intersects(geometry, ms_buffer, sparse = FALSE))
# Do the states make sense for a buffer around MS?
table(centers_ms$state)
##
## AL AR FL LA MS TN
## 12 2 4 17 13 10
# Now let's plot those on a leaflet map to see where they are
# Project back first - can't put projected coordinates on a leaflet map
centers_ms_map <- centers_ms %>% st_transform(4326)
leaflet() %>%
addProviderTiles("CartoDB.Voyager") %>%
addCircles(
data = centers_ms_map,
radius = 6,
color = "darkgreen",
opacity = 0.6)
hereR
packageNow that we have our set of stroke centers, we’ll get the 45-minute drive time radius for each one using the hereR package. This package wraps the HERE Technologies API. You’ll need a developer key to use it. This small project fits well within the freemium tier, as of May 2021.
We’ll use hereR’s isoline()
function to get the isochrones.
Once in a while you might have a server issue or some other error. I like to wrap the API function in a tryCatch()
function so that if any one point fails the rest still go forward. If you’re new to error handling, Hadley Wickham’s primer is helpful.
Then, if any points did fail, I loop through the list that failed to try them again. If you’re using a larger dataset I highly recommend having a setup like this so that you’re not making a bunch of API hits only to hit an error and abort your whole script.
tryLocation <- function(location) {
out <- tryCatch({
temp <- isoline(
poi = location,
# range is in seconds - we want 45 minutes, so multiply by 60
range = 45 * 60,
range_type = "time",
transport_mode = "car",
url_only = F,
optimize = "quality",
traffic = F,
aggregate = F
)
temp <- temp %>%
mutate(hospital_id = point_id)
return(temp)},
error = function(cond) {
message(paste("Hospital ID failed: ", point_id))
message(paste(cond))
# Choose a return value in case of error
return(NULL)},
warning = function(cond) {
message(paste("Hospital ID caused a warning:", point_id))
message(paste(cond))
# Choose a return value in case of warning
return(NULL)
})
return(out)
}
# Loop over points to make isochrones file
# Using a timer to avoid rate limit errors (Status 429)
isochrones <- NULL
error_rows <- NULL
for (i in 1:nrow(centers_ms)) {
print(i)
# Get isochrones for that point
# Lately the rate limiting has been more aggressive, so do an aggressive second delay
# In the past I've used 0.15 seconds without issue but not right now
Sys.sleep(0.4)
# Filter to ith point
point_temp <- centers_ms %>% filter(row_number() == i)
point_id <- point_temp$hospital_id
isochrones_temp <- tryLocation(point_temp)
# If the point errored out save it
if (is.null(isochrones_temp)) {
error_rows <- bind_rows(error_rows, point_temp)
} else {
isochrones <- bind_rows(isochrones, isochrones_temp)
}
}
rm(isochrones_temp, point_temp)
Now join information like hospital name and city from the centers_ms dataframe to the isochrones for later use.
# Add characteristics to isochrones file, save out
# Want to join a plain data frame to the isochrones, not an sf object
centers_df <- as.data.frame(centers_ms) %>%
select(-geometry)
isochrones <- left_join(isochrones, centers_df, by = "hospital_id")
isochrones <- isochrones %>%
mutate(drive_time = range/60) %>%
select(-range, -departure, -arrival, -id, -rank) %>%
select(hospital_id, drive_time, everything()) %>%
arrange(state, city, hospital_id, drive_time)
Map the isochrones to make sure they make sense. We should see one isochrone per point.
leaflet() %>%
addProviderTiles("CartoDB.Voyager") %>%
addPolygons(
data = isochrones,
fillColor = "forestgreen",
fillOpacity = 0.5,
stroke = FALSE)
To calculate how many people live within and outside of the drive time isochrones, we’ll need to identify the percent of each Census block group that lies within the isochrones.
Note: you might see some simplified analyses that look at block group centroids instead of calculating the overlap. If you’re working with a massive dataset that type of approach can make sense. But there’s no need to simplify so much here.
The block group shapefile is from the 2019 ACS via NHGIS. This file is clipped to shorelines but not interior bodies of water like lakes. It was clipped to water in the original project using the USGS NHD files. See that project for full details of the water clipping, which is not reproduced here.
Block group boundaries do not follow lake or other inland water boundaries. So if a lake takes up 50% of the block group, that makes a big difference for drive time calculations. Exclude water, since people don’t live in those lakes and ponds — unless you’re in an area with a lot of people in houseboats.
First, make our block groups and isochrones planar and then valid for computation (fixing any minor shape issues) using st_make_valid()
.
# Read in MS block group shapefile, clipped to water
bg_shp <- st_read("data/shp/ms-block-groups.shp")
## Reading layer `ms-block-groups' from data source `/Users/hannahrecht/Documents/r-drive-time-analysis-tutorial/data/shp/ms-block-groups.shp' using driver `ESRI Shapefile'
## Simple feature collection with 2161 features and 6 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -91.65501 ymin: 30.1741 xmax: -88.09844 ymax: 34.9961
## Geodetic CRS: WGS 84
# Make files planar and valid, needed for sf computation
bg_shp <- bg_shp %>% st_transform(2163)
bg_shp <- st_make_valid(bg_shp)
isochrones <- isochrones %>% st_transform(2163)
isochrones <- st_make_valid(isochrones)
For computation, we don’t care about individual isochrones, many of which overlap. Instead we just want to know how a block group fits in with any of the isochrones period, not each individual one. So combine them into one single feature with st_union()
and then make into a nice sf object with st_sf()
.
isochrones_joined <- st_union(isochrones)
isochrones_joined <- st_sf(iso_id = 1, geometry = isochrones_joined) %>%
st_transform(2163)
Make another quick reference map to make sure you have just one single isochrones feature. Remember before you could see all the overlapping shapes - now it should look like one.
isochrones_joined_map <- isochrones_joined %>% st_transform(4326)
leaflet() %>%
addProviderTiles("CartoDB.Voyager") %>%
addPolygons(
data = isochrones_joined_map,
fillColor = "forestgreen",
fillOpacity = 0.5,
stroke = FALSE)
Now calculate the percent overlap between each block group and the isochrones.
# Calculate area in all block groups
bg_shp <- mutate(bg_shp, bg_area = st_area(bg_shp))
# Calculate intersection - will take some minutes
intersect <- st_intersection(bg_shp, isochrones_joined) %>%
mutate(intersect_area = st_area(.)) %>%
select(GEOID, intersect_area) %>%
st_drop_geometry()
## Warning: attribute variables are assumed to be spatially constant throughout all
## geometries
# Merge intersection area by geoid
bg_shp <- left_join(bg_shp, intersect, by = "GEOID")
# Calculate overlap percent between block groups and isochrones
bg_shp <- bg_shp %>%
# If missing it's because it didn't overlap, so 0
mutate(intersect_area = ifelse(is.na(intersect_area), 0, intersect_area),
overlap = as.numeric(intersect_area/bg_area))
# Summary of the overlap percents
summary(bg_shp$overlap)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.0000 1.0000 0.6311 1.0000 1.0000
Now map the percent overlap with isochrones to make sure they make sense. Block groups that are fully within the isochrones (100%) should be darkest purple while those that are fully outside (0%) are white. Those one the edges should be in between. This is too much data for Rmarkdown to render easily so I’ve run it separately and saved a screenshot. Definitely do not run this on national block groups in R, it will crash!
bg_ms <- bg_shp %>% st_transform(4326)
pal <- colorNumeric("Purples", domain = bg_ms$overlap)
# All stroke centers
leaflet() %>%
addProviderTiles("CartoDB.Positron") %>%
setView(
-86.7667415602124, 36.188530779087586,
zoom = 5) %>%
addPolygons(
data = bg_ms ,
fillColor = ~pal(bg_ms$overlap),
fillOpacity = 1,
weight = 0.5,
smoothFactor = 0.2,
stroke = TRUE,
color = "black") %>%
# Isochrone outlines
addPolygons(
data = isochrones_joined_map,
fill = TRUE,
stroke = TRUE,
fillColor = "yellow",
fillOpacity = 0.05,
color = "red",
weight = 1.5) %>%
addLegend(pal = pal,
values = bg_ms$overlap,
position = "bottomright",
title = "Intersection with isochrones",
opacity = 1)
We’ll get the population in each block group using the censusapi library. If you’re interested in stratifying by stroke center type or race, see the original project code. Here we’ll just look at overall population within 45-minutes of any stroke center.
acs_raw <- getCensus(
name = "acs/acs5",
vintage = 2019,
vars = c("NAME", "B01001_001E"),
region = "block group:*",
regionin = "state:28&in=county:*&in=tract:*")
demographics_bg <- acs_raw %>%
rename(name = NAME,
population = B01001_001E,
fips_state = state) %>%
mutate(
fips_block_group = paste0(fips_state, county, tract, block_group),
) %>%
arrange(fips_state) %>%
select(fips_block_group, name, population)
head(demographics_bg)
## fips_block_group
## 1 280490105006
## 2 280470015023
## 3 280750011013
## 4 280119507021
## 5 280719503021
## 6 280490003022
## name population
## 1 Block Group 6, Census Tract 105, Hinds County, Mississippi 755
## 2 Block Group 3, Census Tract 15.02, Harrison County, Mississippi 1925
## 3 Block Group 3, Census Tract 11.01, Lauderdale County, Mississippi 560
## 4 Block Group 1, Census Tract 9507.02, Bolivar County, Mississippi 1954
## 5 Block Group 1, Census Tract 9503.02, Lafayette County, Mississippi 2017
## 6 Block Group 2, Census Tract 3.02, Hinds County, Mississippi 739
And finally! Multiply the population of each block group by its overlap percent to calculate population within and not within 45-minutes of a stroke center.
First, make a flat non-sf dataframe with the overlap information and join to population. Then multiply and summarize. This is the easiest part of the project.
bg_overlap <- bg_shp %>% select(geoid = GEOID, overlap) %>%
st_drop_geometry()
bg_overlap <- as.data.frame(bg_overlap)
# Join data
bg <- left_join(bg_overlap, demographics_bg, by = c("geoid" = "fips_block_group"))
# Calculate!
state_sums <- bg %>% select(geoid, overlap, population) %>%
summarize(within_total = sum(population * overlap),
population_total = sum(population)) %>%
mutate(within_total_pct = within_total/population_total) %>%
ungroup()
state_sums
## within_total population_total within_total_pct
## 1 2041932 2984418 0.6841979
And now we know: 68% of Mississippi residents live within 45 minutes of a stroke center, while the remaining 32% live further away.