This package provides basic support for the Census’s new microdata
APIs, using the same getCensus()
functions used for summary
data. Getting the data with getCensus()
is easy. Using it
responsibly takes some homework.
About microdata
Microdata contains individual-level responses: one row per person. It is a vital tool to perform custom analysis, but with great power comes great responsibility. Appropriately weighting the individual-level responses is required. You’ll often need to work with household relationships and will need to handle responses that aren’t in the universe of the question (for example, removing children in an analysis about college graduation rate.)
If you’re new to working with microdata you’ll need to do some reading before diving in. Here are some resources from the Census Bureau:
- What is microdata and why should I use it? (video and transcript)
- Census Microdata API User Guide (pdf)
- Microdata API documentation
As for all other endpoints, censusapi
retrieves the data
so that you can perform your own analysis using your methodology of
choice. If you’re looking for an interactive microdata analysis tool,
try the data.census.gov
microdata interactive tool or the IPUMS online data analysis
tool.
Once you’ve learned how to use microdata and gained and understanding
of weighting, getting the data using censusapi
is
simple.
Getting microdata with censusapi
As an example, we’ll get data from the 2020 Current Population Survey Voting Supplement. This survey asks people if they voted, how, and when, and includes useful demographic data.
See the available variables:
voting_vars <- listCensusMetadata(
name = "cps/voting/nov",
vintage = 2020,
type = "variables")
head(voting_vars)
name | label | concept | predicateType | group | limit | predicateOnly | suggested_weight | is_weight |
---|---|---|---|---|---|---|---|---|
for | Census API FIPS ‘for’ clause | Census API Geography Specification | fips-for | N/A | 0 | TRUE | NA | NA |
in | Census API FIPS ‘in’ clause | Census API Geography Specification | fips-in | N/A | 0 | TRUE | NA | NA |
ucgid | Uniform Census Geography Identifier clause | Census API Geography Specification | ucgid | N/A | 0 | TRUE | NA | NA |
PEEDUCA | Demographics-highest level of school completed | NA | int | N/A | 0 | NA | PWSSWGT | NA |
PUBUS1 | Labor Force-unpaid work in family business/farm,y/n | NA | int | N/A | 0 | NA | PWCMPWGT | NA |
PRCOW1 | Indus.&Occ.-(main job)class of worker-recode | NA | int | N/A | 0 | NA | PWCMPWGT | NA |
From the CPS Voting supplement, get data on method of voting in New
York state using PES5
(Vote in person or by mail?) and
PESEX
(gender), along with the appropriate weighting
variable, PWSSWGT
. We’ll only get data for people with a
response of 1
(yes) to PES1
(Did you
vote?).
cps_voting <- getCensus(
name = "cps/voting/nov",
vintage = 2020,
vars = c("PES5", "PESEX", "PWSSWGT"),
region = "state:36",
PES1 = 1)
head(cps_voting)
state | PES5 | PESEX | PWSSWGT | PES1 |
---|---|---|---|---|
36 | 1 | 1 | 4571.216 | 1 |
36 | 1 | 2 | 4806.369 | 1 |
36 | 1 | 2 | 3440.301 | 1 |
36 | -3 | 1 | 5204.566 | 1 |
36 | -3 | 2 | 4993.819 | 1 |
36 | 1 | 2 | 4602.958 | 1 |
Making a data dictionary
Most of microdata variables are encoded, which means that your data will have a lot of numbers instead of text labels.
A data dictionary, which includes the definitions and labels for
every variable in the dataset, is helpful. This is possible with
listCensusMetasdata(include_values = "TRUE)
returns a data
dictionary with one row for each variable-label pair. That means if
there are 30 codes for a given variable, it will have 30 rows in the
data dictionary. Variables that don’t have value labels in the metadata
will have only one row.
voting_dict <- listCensusMetadata(
name = "cps/voting/nov",
vintage = 2020,
type = "variables",
include_values = TRUE)
head(voting_dict)
name | label | concept | predicateType | group | limit | predicateOnly | suggested_weight | is_weight | values_code | values_label |
---|---|---|---|---|---|---|---|---|---|---|
for | Census API FIPS ‘for’ clause | Census API Geography Specification | fips-for | N/A | 0 | TRUE | NA | NA | NA | NA |
in | Census API FIPS ‘in’ clause | Census API Geography Specification | fips-in | N/A | 0 | TRUE | NA | NA | NA | NA |
ucgid | Uniform Census Geography Identifier clause | Census API Geography Specification | ucgid | N/A | 0 | TRUE | NA | NA | NA | NA |
PEEDUCA | Demographics-highest level of school completed | NA | int | N/A | 0 | NA | PWSSWGT | NA | 46 | DOCTORATE DEGREE(EX:PhD,EdD) |
PEEDUCA | Demographics-highest level of school completed | NA | int | N/A | 0 | NA | PWSSWGT | NA | 33 | 5th Or 6th Grade |
PEEDUCA | Demographics-highest level of school completed | NA | int | N/A | 0 | NA | PWSSWGT | NA | 44 | MASTER’S DEGREE(EX:MA,MS,MEng,MEd,MSW) |
You can also look up the meaning of those codes for a single variable
using the same function, listCensusMetadata()
. Here are the
values of PES5
, the variable for “Vote in person or by
mail?”
PES5_values <- listCensusMetadata(
name = "cps/voting/nov",
vintage = 2020,
type = "values",
variable = "PES5")
PES5_values
code | label |
---|---|
2 | By Mail |
-2 | Don’t Know |
1 | In person |
-1 | Not in Universe |
-9 | No Response |
-3 | Refused |
Other ways to access microdata
The Census Bureau microdata APIs are helpful for working with a limited just-released datasets. But they’re not your only option. Some other ways to get microdata are:
- Retrieve standardized, cleaned microdata data from IPUMS and import with the impumsr package. IPUMS is widely used in research when the data needed is not brand new. I highly recommend that you check out IPUMS’ cleaned files microdata files as well as historic geographic data. These standardized files are generally released months to a year after the raw Census microdata that is available directly from the Census Bureau.
- Download complete bulk files from the Census FTPs (file transfer protocols.) This is helpful if you need the a large number of variables. You might run in to size limitations getting many variables through the APIs.
- Retrieve American Community Survey microdata via the Census APIs with tidycensus, which has helpful functions for working with those endpoints.