censusapi
is a lightweight package that helps you
retrieve data from the U.S. Census Bureau’s 1,600 API
endpoints using one simple function, getCensus()
.
Additional functions provide information about what datasets are
available and how to use them.
This package returns the data as-is with the original variable names created by the Census Bureau and any quirks inherent in the data. Each dataset is a little different. Some are documented thoroughly, others have documentation that is sparse. Sometimes variable names change each year. This package can’t overcome those challenges, but tries to make it easier to get the data for use in your analysis. Make sure to thoroughly read the documentation for your dataset and see below for how to get help with Census data.
API key setup
censusapi
recommends but does not require using an API
key from the U.S. Census Bureau. The Census Bureau may limit the number
of requests made by your IP address if you do not use an API key.
You can sign up online to receive a key, which will be sent to your provided email address.
If you save the key with the name CENSUS_KEY
or
CENSUS_API_KEY
in your Renviron file,
censusapi
will use it by default without any extra work on
your part.
To save your API key, within R, run:
# Check to see if you already have a CENSUS_KEY or CENSUS_API_KEY saved
# If so, no further action is needed
get_api_key()
# If not, add your key to your Renviron file
Sys.setenv(CENSUS_KEY=PASTEYOURKEYHERE)
# Reload .Renviron
readRenviron("~/.Renviron")
# Check to see that the expected key is output in your R console
get_api_key()
In some instances you might not want to put your key in your
.Renviron - for example, if you’re on a shared school computer. You can
always choose to manually set key = "PASTEYOURKEYHERE"
as
an argument in getCensus()
if you prefer.
Basic usage
The main function in censusapi
is
getCensus()
, which makes an API call to a given endpoint
and returns a data frame with results. Each API has slightly different
parameters, but there are always a few required arguments:
-
name
: the programmatic name of the endpoint as defined by the Census, like “acs/acs5” or “timeseries/bds/firms” -
vintage
: the survey year, required for aggregate or microdata APIs -
vars
: a list of variables to retrieve -
region
: the geography level to retrieve, such as state or county, required for nearly all endpoints
Some APIs have additional required or optional arguments, like
time
for some timeseries datasets. Check the specific documentation
for your API and explore its metadata with
listCensusMetadata()
to see what options are allowed.
Let’s walk through an example getting uninsured rates using the Small Area Health Insurance Estimates API, which provides detailed annual state-level and county-level estimates of health insurance rates for people below age 65.
Choosing variables
censusapi
includes a metadata function called
listCensusMetadata()
to get information about an API’s
variable and geography options. Let’s see what variables are available
in the SAHIE API:
library(censusapi)
sahie_vars <- listCensusMetadata(
name = "timeseries/healthins/sahie",
type = "variables")
# See the full list of variables
sahie_vars$name
#> [1] "for" "in" "time" "NIPR_LB90" "NIPR_PT"
#> [6] "AGECAT" "GEOID" "NIC_PT" "STATE" "RACE_DESC"
#> [11] "YEAR" "IPRCAT" "PCTIC_UB90" "NIPR_MOE" "PCTUI_LB90"
#> [16] "NIC_MOE" "US" "COUNTY" "PCTUI_MOE" "NUI_UB90"
#> [21] "NIC_UB90" "NUI_MOE" "SEXCAT" "PCTUI_PT" "PCTIC_LB90"
#> [26] "PCTUI_UB90" "NUI_PT" "STABREV" "AGE_DESC" "NAME"
#> [31] "NIC_LB90" "PCTIC_PT" "PCTIC_MOE" "IPR_DESC" "NUI_LB90"
#> [36] "NIPR_UB90" "GEOCAT" "SEX_DESC" "RACECAT"
# Full info on the first several variables
head(sahie_vars)
name | label | concept | predicateType | group | limit | predicateOnly | required |
---|---|---|---|---|---|---|---|
for | Census API FIPS ‘for’ clause | Census API Geography Specification | fips-for | N/A | 0 | TRUE | NA |
in | Census API FIPS ‘in’ clause | Census API Geography Specification | fips-in | N/A | 0 | TRUE | NA |
time | ISO-8601 Date/Time value | Census API Date/Time Specification | datetime | N/A | 0 | TRUE | true |
NIPR_LB90 | Number in Demographic Group for Selected Income Range, Upper Bound for 90% Confidence Interval | NA | int | N/A | 0 | NA | NA |
NIPR_PT | Number in Demographic Group for Selected Income Range, Estimate | NA | int | N/A | 0 | NA | NA |
AGECAT | Age Category | NA | string | N/A | 0 | NA | default displayed |
Choosing regions
We can also use listCensusMetadata
to see which
geographic levels are available.
listCensusMetadata(
name = "timeseries/healthins/sahie",
type = "geographies")
name | geoLevelId | limit | referenceDate | requires | wildcard | optionalWithWCFor |
---|---|---|---|---|---|---|
us | 010 | 1 | 2015-01-01 | NULL | NULL | NA |
county | 050 | 3142 | 2015-01-01 | state | state | state |
state | 040 | 52 | 2015-01-01 | NULL | NULL | NA |
This API has three geographic levels: us
,
county
, and state
. County data can be queried
for all counties nationally or within a specific state.
Making a censusapi call
First, using getCensus()
, let’s get the percent
(PCTUI_PT
) and number (NUI_PT
) of people who
are uninsured, using the wildcard star (*) to retrieve data for all
counties.
sahie_counties <- getCensus(
name = "timeseries/healthins/sahie",
vars = c("NAME", "PCTUI_PT", "NUI_PT"),
region = "county:*",
time = 2021)
head(sahie_counties)
time | state | county | NAME | PCTUI_PT | NUI_PT |
---|---|---|---|---|---|
2021 | 01 | 001 | Autauga County, AL | 10.0 | 4912 |
2021 | 01 | 003 | Baldwin County, AL | 11.0 | 20432 |
2021 | 01 | 005 | Barbour County, AL | 12.7 | 2150 |
2021 | 01 | 007 | Bibb County, AL | 11.4 | 1905 |
2021 | 01 | 009 | Blount County, AL | 12.8 | 6145 |
2021 | 01 | 011 | Bullock County, AL | 12.2 | 824 |
We can also get data on detailed income and demographic groups from
the SAHIE. We’ll use region
to specify county-level results
and regionin
to filter to Virginia, state code 51. We’ll
get uninsured rates by income group, IPRCAT
.
sahie_virginia <- getCensus(
name = "timeseries/healthins/sahie",
vars = c("NAME", "IPRCAT", "IPR_DESC", "PCTUI_PT"),
region = "county:*",
regionin = "state:51",
time = 2021)
head(sahie_virginia, head = 12L)
time | state | county | NAME | IPRCAT | IPR_DESC | PCTUI_PT |
---|---|---|---|---|---|---|
2021 | 51 | 001 | Accomack County, VA | 0 | All Incomes | 13.4 |
2021 | 51 | 001 | Accomack County, VA | 1 | <= 200% of Poverty | 17.1 |
2021 | 51 | 001 | Accomack County, VA | 2 | <= 250% of Poverty | 16.8 |
2021 | 51 | 001 | Accomack County, VA | 3 | <= 138% of Poverty | 17.4 |
2021 | 51 | 001 | Accomack County, VA | 4 | <= 400% of Poverty | 15.6 |
2021 | 51 | 001 | Accomack County, VA | 5 | 138% to 400% of Poverty | 14.5 |
Because the SAHIE API is a timeseries dataset, as indicated in its
name
,, we can get multiple years of data at once by
changing time = YYYY
to
time = "from YYYY to YYYY"
, or get through the latest data
available using time = "from YYYY"
. Let’s get that data for
DeKalb County, Georgia using county fips code 089 and state fips code
13. You can look up fips codes on the Census
Bureau website.
sahie_years <- getCensus(
name = "timeseries/healthins/sahie",
vars = c("NAME", "PCTUI_PT"),
region = "county:089",
regionin = "state:13",
time = "from 2006")
sahie_years
time | state | county | NAME | PCTUI_PT |
---|---|---|---|---|
2006 | 13 | 089 | DeKalb County, GA | 19.0 |
2007 | 13 | 089 | DeKalb County, GA | 17.2 |
2008 | 13 | 089 | DeKalb County, GA | 22.5 |
2009 | 13 | 089 | DeKalb County, GA | 22.9 |
2010 | 13 | 089 | DeKalb County, GA | 25.8 |
2011 | 13 | 089 | DeKalb County, GA | 23.9 |
2012 | 13 | 089 | DeKalb County, GA | 21.7 |
2013 | 13 | 089 | DeKalb County, GA | 22.1 |
2014 | 13 | 089 | DeKalb County, GA | 19.4 |
2015 | 13 | 089 | DeKalb County, GA | 16.9 |
2016 | 13 | 089 | DeKalb County, GA | 15.3 |
2017 | 13 | 089 | DeKalb County, GA | 15.9 |
2018 | 13 | 089 | DeKalb County, GA | 17.1 |
2019 | 13 | 089 | DeKalb County, GA | 16.9 |
2020 | 13 | 089 | DeKalb County, GA | 14.0 |
2021 | 13 | 089 | DeKalb County, GA | 14.2 |
We can also filter the data by income group using the
IPRCAT
variable. See the possible values of
IPRCAT
using listCensusMetadata()
.
IPRCAT = 3
represents <=138% of the federal poverty
line. That is the threshold for Medicaid
eligibility in states that have expanded it under the Affordable
Care Act.
listCensusMetadata(
name = "timeseries/healthins/sahie",
type = "values",
variable = "IPRCAT")
code | label |
---|---|
0 | All Incomes |
1 | Less than or Equal to 200% of Poverty |
2 | Less than or Equal to 250% of Poverty |
3 | Less than or Equal to 138% of Poverty |
4 | Less than or Equal to 400% of Poverty |
5 | 138% to 400% Poverty |
Getting this data for Los Angeles county (fips code 06037) we can see the dramatic decrease in the uninsured rate in this income group after California expanded Medicaid.
sahie_138 <- getCensus(
name = "timeseries/healthins/sahie",
vars = c("NAME", "PCTUI_PT", "NUI_PT"),
region = "county:037",
regionin = "state:06",
IPRCAT = 3,
time = "from 2010")
sahie_138
time | state | county | NAME | PCTUI_PT | NUI_PT | IPRCAT |
---|---|---|---|---|---|---|
2010 | 06 | 037 | Los Angeles County, CA | 37.4 | 894385 | 3 |
2011 | 06 | 037 | Los Angeles County, CA | 35.1 | 867577 | 3 |
2012 | 06 | 037 | Los Angeles County, CA | 34.4 | 865516 | 3 |
2013 | 06 | 037 | Los Angeles County, CA | 33.0 | 818978 | 3 |
2014 | 06 | 037 | Los Angeles County, CA | 24.9 | 607542 | 3 |
2015 | 06 | 037 | Los Angeles County, CA | 17.8 | 402977 | 3 |
2016 | 06 | 037 | Los Angeles County, CA | 15.4 | 329251 | 3 |
2017 | 06 | 037 | Los Angeles County, CA | 14.3 | 281842 | 3 |
2018 | 06 | 037 | Los Angeles County, CA | 13.9 | 255520 | 3 |
2019 | 06 | 037 | Los Angeles County, CA | 15.1 | 254740 | 3 |
2020 | 06 | 037 | Los Angeles County, CA | 14.4 | 230380 | 3 |
2021 | 06 | 037 | Los Angeles County, CA | 15.1 | 249186 | 3 |
Finding your API
What if you don’t already know your dataset’s name
? To
see a current table of every available endpoint, use
listCensusApis()
. This data frame includes useful
information for making your API call, including the dataset’s name,
vintage if applicable, description, and title.
apis <- listCensusApis()
colnames(apis)
#> [1] "title" "name" "vintage" "type" "temporal"
#> [6] "spatial" "url" "modified" "description" "contact"
You can also get information on a subset of datasets using the
optional name
and/or vintage
parameters. For
example, get information about 2020 Decennial Census datasets.
dec_apis <- listCensusApis(name = "dec", vintage = 2020)
dec_apis[, 1:6]
title | name | vintage | type | temporal | spatial |
---|---|---|---|---|---|
Decennial Census: 118th Congressional District Summary File | dec/cd118 | 2020 | Aggregate | 2020/2020 | US |
Decennial Census of Island Areas: American Samoa Detailed Crosstabulations | dec/crosstabas | 2020 | Aggregate | 2020/2020 | American Samoa |
Decennial Census of Island Areas: Guam Detailed Crosstabulations | dec/crosstabgu | 2020 | Aggregate | 2020/2020 | Guam |
Decennial Census of Island Areas: Commonwealth of the Northern Mariana Islands Detailed Crosstabulations | dec/crosstabmp | 2020 | Aggregate | 2020/2020 | Northern Mariana Islands |
Decennial Census of Island Areas: U.S. Virgin Islands Detailed Crosstabulations | dec/crosstabvi | 2020 | Aggregate | 2020/2020 | U.S. Virgin Islands |
Decennial Census: Detailed Demographic and Housing Characteristics File A | dec/ddhca | 2020 | Aggregate | 2020/2020 | United States |
Decennial Census: Demographic and Housing Characteristics | dec/dhc | 2020 | Aggregate | 2020/2020 | United States |
Decennial Census of Island Areas: American Samoa Demographic and Housing Characteristics | dec/dhcas | 2020 | Aggregate | 2020/2020 | American Samoa |
Decennial Census of Island Areas: Guam Demographic and Housing Characteristics | dec/dhcgu | 2020 | Aggregate | 2020/2020 | Guam |
Decennial Census of Island Areas: Commonwealth of the Northern Mariana Islands Demographic and Housing Characteristics | dec/dhcmp | 2020 | Aggregate | 2020/2020 | Commonwealth of the Northern Mariana Islands |
Decennial Census of Island Areas: U.S. Virgin Islands Demographic and Housing Characteristics | dec/dhcvi | 2020 | Aggregate | 2020/2020 | U.S. Virgin Islands |
Decennial Census: Demographic Profile | dec/dp | 2020 | Aggregate | 2020/2020 | United States |
Decennial Census of Island Areas: American Samoa Demographic Profile | dec/dpas | 2020 | Aggregate | 2020/2020 | United States |
Decennial Census of Island Areas: Guam Demographic Profile | dec/dpgu | 2020 | Aggregate | 2020/2020 | United States |
2020 Commonwealth of the Northern Mariana Islands Demographic Profile | dec/dpmp | 2020 | Aggregate | 2020/2020 | United States |
Decennial Census of Island Areas: U.S. Virgin Islands Demographic Profile | dec/dpvi | 2020 | Aggregate | 2020/2020 | United States |
Decennial Census: Decennial Post-Enumeration Survey | dec/pes | 2020 | Aggregate | 2020/2020 | US |
Decennial Census: Redistricting Data (PL 94-171) | dec/pl | 2020 | Aggregate | 2020/2020 | United States |
Decennial Census: Decennial Self-Response Rate | dec/responserate | 2020 | Aggregate | NA | NA |
Dataset types
There are three types of datasets included in the Census Bureau API
universe: aggregate, microdata, and timeseries. These type names were
defined by the Census Bureau and are included as a column in
listCensusApis()
.
table(apis$type)
#>
#> Aggregate Microdata Timeseries
#> 624 895 81
Most users will work with summary data, either aggregate or timeseries. Summary data contains pre-calculated numbers or percentages for a given statistic — like the number of children in a state or the median household income. The examples below and in the broader list of censusapi examples use summary data.
Aggregate datasets, like the American Community Survey or Decennial
Census, include data for only one time period (a vintage
),
usually one year. Datasets like the American Community Survey contain
thousands of these pre-computed variables.
Timeseries datasets, including the Small Area Income and Poverty Estimates, the Quarterly Workforce Estimates, and International Trade statistics, allow users to query data over time in a single API call.
Microdata contains the individual-level responses for a survey for
use in custom analysis. One row represents one person. Only advanced
analysts will want to use microdata. Learn more about what microdata is
and how to use it with censusapi
in Accessing
microdata.
Variable groups
For some surveys, including the American Community Survey and
Decennial Census, you can get many related variables at once using a
variable group
. These groups are defined by the Census
Bureau. In some other data tools, like data.census.gov, this concept
is referred to as a table
.
Some groups have several dozen variables, others just have a few. As
an example, we’ll use the American Community Survey to get the estimate,
margin of error and annotations for median household income in the past
12 months for Census places (cities, towns, etc) in Alabama using group
B19013
.
First, see descriptions of the variables in group B19013:
group_B19013 <- listCensusMetadata(
name = "acs/acs5",
vintage = 2022,
type = "variables",
group = "B19013")
group_B19013
name | label | concept | predicateType | group | limit | predicateOnly | universe |
---|---|---|---|---|---|---|---|
B19013_001MA | Annotation of Margin of Error!!Median household income in the past 12 months (in 2022 inflation-adjusted dollars) | Median Household Income in the Past 12 Months (in 2022 Inflation-Adjusted Dollars) | string | B19013 | 0 | TRUE | Households |
B19013_001EA | Annotation of Estimate!!Median household income in the past 12 months (in 2022 inflation-adjusted dollars) | Median Household Income in the Past 12 Months (in 2022 Inflation-Adjusted Dollars) | string | B19013 | 0 | TRUE | Households |
B19013_001E | Estimate!!Median household income in the past 12 months (in 2022 inflation-adjusted dollars) | Median Household Income in the Past 12 Months (in 2022 Inflation-Adjusted Dollars) | int | B19013 | 0 | TRUE | Households |
B19013_001M | Margin of Error!!Median household income in the past 12 months (in 2022 inflation-adjusted dollars) | Median Household Income in the Past 12 Months (in 2022 Inflation-Adjusted Dollars) | int | B19013 | 0 | TRUE | Households |
Now, retrieve the data using vars = "group(B19013)"
. You
could alternatively manually list each variable as
vars = c("NAME", "B19013_001E", "B19013_001EA", "B19013_001M", "B19013_001MA")
,
but using the groups is much easier.
acs_income_group <- getCensus(
name = "acs/acs5",
vintage = 2022,
vars = "group(B19013)",
region = "place:*",
regionin = "state:01")
head(acs_income_group)
state | place | B19013_001E | B19013_001EA | B19013_001M | B19013_001MA | GEO_ID | NAME |
---|---|---|---|---|---|---|---|
01 | 00100 | 29263 | NA | 2846 | NA | 1600000US0100100 | Abanda CDP, Alabama |
01 | 00124 | 35147 | NA | 15376 | NA | 1600000US0100124 | Abbeville city, Alabama |
01 | 00460 | 58631 | NA | 13426 | NA | 1600000US0100460 | Adamsville city, Alabama |
01 | 00484 | 47188 | NA | 6288 | NA | 1600000US0100484 | Addison town, Alabama |
01 | 00676 | 53929 | NA | 35679 | NA | 1600000US0100676 | Akron town, Alabama |
01 | 00820 | 89423 | NA | 6760 | NA | 1600000US0100820 | Alabaster city, Alabama |
Advanced geographies
Some geographies, particularly Census tracts and blocks, need to be
specified within larger geographies like states and counties. This
varies by API endpoint, so make sure to read the documentation for your
specific API and run
listCensusMetadata(type = "geographies")
to see the
available options.
Tract-level data from the 2010 Decennial Census can only be requested
from one state at a time. In this example, we use the built in
fips
list of state FIPS
codes to request tract-level data from each state and join into a
single data frame.
tracts <- NULL
for (f in fips) {
stateget <- paste("state:", f, sep="")
temp <- getCensus(
name = "dec/sf1",
vintage = 2010,
vars = "P001001",
region = "tract:*",
regionin = stateget)
tracts <- rbind(tracts, temp)
}
# How many tracts are present?
nrow(tracts)
#> [1] 73057
head(tracts)
state | county | tract | P001001 |
---|---|---|---|
01 | 001 | 020100 | 1912 |
01 | 001 | 020500 | 10766 |
01 | 001 | 020300 | 3373 |
01 | 001 | 020400 | 4386 |
01 | 001 | 020200 | 2170 |
01 | 001 | 020600 | 3668 |
The regionin
argument of getCensus()
can
also be used with a string of nested geographies, as shown below.
The 2010 Decennial Census summary file 1 requires you to specify a
state and county to retrieve block-level data. Use region
to request block level data, and regionin
to specify the
desired state and county.
data2010 <- getCensus(
name = "dec/sf1",
vintage = 2010,
vars = "P001001",
region = "block:*",
regionin = "state:36+county:027+tract:010000")
head(data2010)
state | county | tract | block | P001001 |
---|---|---|---|---|
36 | 027 | 010000 | 1000 | 31 |
36 | 027 | 010000 | 1011 | 17 |
36 | 027 | 010000 | 1028 | 41 |
36 | 027 | 010000 | 1001 | 0 |
36 | 027 | 010000 | 1031 | 0 |
36 | 027 | 010000 | 1002 | 4 |
For many more examples, frequently asked questions, troubleshooting, and advanced topics check out all of the articles.