Getting started with censusapi • censusapi

censusapi is a lightweight package that helps you retrieve data from the U.S. Census Bureau’s 1,600 API endpoints using one simple function, getCensus(). Additional functions provide information about what datasets are available and how to use them.

This package returns the data as-is with the original variable names created by the Census Bureau and any quirks inherent in the data. Each dataset is a little different. Some are documented thoroughly, others have documentation that is sparse. Sometimes variable names change each year. This package can’t overcome those challenges, but tries to make it easier to get the data for use in your analysis. Make sure to thoroughly read the documentation for your dataset and see below for how to get help with Census data.

API key setup

censusapi recommends but does not require using an API key from the U.S. Census Bureau. The Census Bureau may limit the number of requests made by your IP address if you do not use an API key.

You can sign up online to receive a key, which will be sent to your provided email address.

If you save the key with the name CENSUS_KEY or CENSUS_API_KEY in your Renviron file, censusapi will use it by default without any extra work on your part.

To save your API key, within R, run:

# Check to see if you already have a CENSUS_KEY or CENSUS_API_KEY saved
# If so, no further action is needed
get_api_key()

# If not, add your key to your Renviron file
Sys.setenv(CENSUS_KEY=PASTEYOURKEYHERE)

# Reload .Renviron
readRenviron("~/.Renviron")

# Check to see that the expected key is output in your R console
get_api_key()

In some instances you might not want to put your key in your .Renviron - for example, if you’re on a shared school computer. You can always choose to manually set key = "PASTEYOURKEYHERE" as an argument in getCensus() if you prefer.

Basic usage

The main function in censusapi is getCensus(), which makes an API call to a given endpoint and returns a data frame with results. Each API has slightly different parameters, but there are always a few required arguments:

name: the programmatic name of the endpoint as defined by the Census, like “acs/acs5” or “timeseries/bds/firms”
vintage: the survey year, required for aggregate or microdata APIs
vars: a list of variables to retrieve
region: the geography level to retrieve, such as state or county, required for nearly all endpoints

Some APIs have additional required or optional arguments, like time for some timeseries datasets. Check the specific documentation for your API and explore its metadata with listCensusMetadata() to see what options are allowed.

Let’s walk through an example getting uninsured rates using the Small Area Health Insurance Estimates API, which provides detailed annual state-level and county-level estimates of health insurance rates for people below age 65.

Choosing variables

censusapi includes a metadata function called listCensusMetadata() to get information about an API’s variable and geography options. Let’s see what variables are available in the SAHIE API:

library(censusapi)

sahie_vars <- listCensusMetadata(
    name = "timeseries/healthins/sahie", 
    type = "variables")

# See the full list of variables
sahie_vars$name

#>  [1] "for"        "in"         "time"       "NIPR_LB90"  "NIPR_PT"   
#>  [6] "AGECAT"     "GEOID"      "NIC_PT"     "STATE"      "RACE_DESC" 
#> [11] "YEAR"       "IPRCAT"     "PCTIC_UB90" "NIPR_MOE"   "PCTUI_LB90"
#> [16] "NIC_MOE"    "US"         "COUNTY"     "PCTUI_MOE"  "NUI_UB90"  
#> [21] "NIC_UB90"   "NUI_MOE"    "SEXCAT"     "PCTUI_PT"   "PCTIC_LB90"
#> [26] "PCTUI_UB90" "NUI_PT"     "STABREV"    "AGE_DESC"   "NAME"      
#> [31] "NIC_LB90"   "PCTIC_PT"   "PCTIC_MOE"  "IPR_DESC"   "NUI_LB90"  
#> [36] "NIPR_UB90"  "GEOCAT"     "SEX_DESC"   "RACECAT"

# Full info on the first several variables
head(sahie_vars)

name	label	concept	predicateType	group	predicateOnly	required
for	Census API FIPS ‘for’ clause	Census API Geography Specification	fips-for	N/A	TRUE	NA
in	Census API FIPS ‘in’ clause	Census API Geography Specification	fips-in	N/A	TRUE	NA
time	ISO-8601 Date/Time value	Census API Date/Time Specification	datetime	N/A	TRUE	true
NIPR_LB90	Number in Demographic Group for Selected Income Range, Upper Bound for 90% Confidence Interval	NA	int	N/A	NA	NA
NIPR_PT	Number in Demographic Group for Selected Income Range, Estimate	NA	int	N/A	NA	NA
AGECAT	Age Category	NA	string	N/A	NA	default displayed

Choosing regions

We can also use listCensusMetadata to see which geographic levels are available.

listCensusMetadata(
    name = "timeseries/healthins/sahie", 
    type = "geographies")

name	geoLevelId	limit	referenceDate	requires	wildcard	optionalWithWCFor
us	010	1	2015-01-01	NULL	NULL	NA
county	050	3142	2015-01-01	state	state	state
state	040	52	2015-01-01	NULL	NULL	NA

This API has three geographic levels: us, county, and state. County data can be queried for all counties nationally or within a specific state.

Making a censusapi call

First, using getCensus(), let’s get the percent (PCTUI_PT) and number (NUI_PT) of people who are uninsured, using the wildcard star (*) to retrieve data for all counties.

sahie_counties <- getCensus(
    name = "timeseries/healthins/sahie",
    vars = c("NAME", "PCTUI_PT", "NUI_PT"), 
    region = "county:*", 
    time = 2021)
head(sahie_counties)

time	state	county	NAME	PCTUI_PT	NUI_PT
2021	01	001	Autauga County, AL	10.0	4912
2021	01	003	Baldwin County, AL	11.0	20432
2021	01	005	Barbour County, AL	12.7	2150
2021	01	007	Bibb County, AL	11.4	1905
2021	01	009	Blount County, AL	12.8	6145
2021	01	011	Bullock County, AL	12.2	824

We can also get data on detailed income and demographic groups from the SAHIE. We’ll use region to specify county-level results and regionin to filter to Virginia, state code 51. We’ll get uninsured rates by income group, IPRCAT.

sahie_virginia <- getCensus(
    name = "timeseries/healthins/sahie",
    vars = c("NAME", "IPRCAT", "IPR_DESC", "PCTUI_PT"), 
    region = "county:*", 
    regionin = "state:51", 
    time = 2021)
head(sahie_virginia, head = 12L)

time	state	county	NAME	IPRCAT	IPR_DESC	PCTUI_PT
2021	51	001	Accomack County, VA	0	All Incomes	13.4
2021	51	001	Accomack County, VA	1	<= 200% of Poverty	17.1
2021	51	001	Accomack County, VA	2	<= 250% of Poverty	16.8
2021	51	001	Accomack County, VA	3	<= 138% of Poverty	17.4
2021	51	001	Accomack County, VA	4	<= 400% of Poverty	15.6
2021	51	001	Accomack County, VA	5	138% to 400% of Poverty	14.5

Because the SAHIE API is a timeseries dataset, as indicated in its name,, we can get multiple years of data at once by changing time = YYYY to time = "from YYYY to YYYY", or get through the latest data available using time = "from YYYY". Let’s get that data for DeKalb County, Georgia using county fips code 089 and state fips code 13. You can look up fips codes on the Census Bureau website.

sahie_years <- getCensus(
    name = "timeseries/healthins/sahie",
    vars = c("NAME", "PCTUI_PT"), 
    region = "county:089", 
    regionin = "state:13",
    time = "from 2006")
sahie_years

time	state	county	NAME	PCTUI_PT
2006	13	089	DeKalb County, GA	19.0
2007	13	089	DeKalb County, GA	17.2
2008	13	089	DeKalb County, GA	22.5
2009	13	089	DeKalb County, GA	22.9
2010	13	089	DeKalb County, GA	25.8
2011	13	089	DeKalb County, GA	23.9
2012	13	089	DeKalb County, GA	21.7
2013	13	089	DeKalb County, GA	22.1
2014	13	089	DeKalb County, GA	19.4
2015	13	089	DeKalb County, GA	16.9
2016	13	089	DeKalb County, GA	15.3
2017	13	089	DeKalb County, GA	15.9
2018	13	089	DeKalb County, GA	17.1
2019	13	089	DeKalb County, GA	16.9
2020	13	089	DeKalb County, GA	14.0
2021	13	089	DeKalb County, GA	14.2

We can also filter the data by income group using the IPRCAT variable. See the possible values of IPRCAT using listCensusMetadata().

IPRCAT = 3 represents <=138% of the federal poverty line. That is the threshold for Medicaid eligibility in states that have expanded it under the Affordable Care Act.

listCensusMetadata(
    name = "timeseries/healthins/sahie",
    type = "values",
    variable = "IPRCAT")

code	label
0	All Incomes
1	Less than or Equal to 200% of Poverty
2	Less than or Equal to 250% of Poverty
3	Less than or Equal to 138% of Poverty
4	Less than or Equal to 400% of Poverty
5	138% to 400% Poverty

Getting this data for Los Angeles county (fips code 06037) we can see the dramatic decrease in the uninsured rate in this income group after California expanded Medicaid.

sahie_138 <- getCensus(
    name = "timeseries/healthins/sahie",
    vars = c("NAME", "PCTUI_PT", "NUI_PT"), 
    region = "county:037", 
    regionin = "state:06", 
    IPRCAT = 3,
    time = "from 2010")
sahie_138

time	state	county	NAME	PCTUI_PT	NUI_PT	IPRCAT
2010	06	037	Los Angeles County, CA	37.4	894385	3
2011	06	037	Los Angeles County, CA	35.1	867577	3
2012	06	037	Los Angeles County, CA	34.4	865516	3
2013	06	037	Los Angeles County, CA	33.0	818978	3
2014	06	037	Los Angeles County, CA	24.9	607542	3
2015	06	037	Los Angeles County, CA	17.8	402977	3
2016	06	037	Los Angeles County, CA	15.4	329251	3
2017	06	037	Los Angeles County, CA	14.3	281842	3
2018	06	037	Los Angeles County, CA	13.9	255520	3
2019	06	037	Los Angeles County, CA	15.1	254740	3
2020	06	037	Los Angeles County, CA	14.4	230380	3
2021	06	037	Los Angeles County, CA	15.1	249186	3

Finding your API

What if you don’t already know your dataset’s name? To see a current table of every available endpoint, use listCensusApis(). This data frame includes useful information for making your API call, including the dataset’s name, vintage if applicable, description, and title.

apis <- listCensusApis()
colnames(apis)

#>  [1] "title"       "name"        "vintage"     "type"        "temporal"   
#>  [6] "spatial"     "url"         "modified"    "description" "contact"

You can also get information on a subset of datasets using the optional name and/or vintage parameters. For example, get information about 2020 Decennial Census datasets.

dec_apis <- listCensusApis(name = "dec", vintage = 2020)
dec_apis[, 1:6]

title	name	vintage	type	temporal	spatial
Decennial Census: 118th Congressional District Summary File	dec/cd118	2020	Aggregate	2020/2020	US
Decennial Census of Island Areas: American Samoa Detailed Crosstabulations	dec/crosstabas	2020	Aggregate	2020/2020	American Samoa
Decennial Census of Island Areas: Guam Detailed Crosstabulations	dec/crosstabgu	2020	Aggregate	2020/2020	Guam
Decennial Census of Island Areas: Commonwealth of the Northern Mariana Islands Detailed Crosstabulations	dec/crosstabmp	2020	Aggregate	2020/2020	Northern Mariana Islands
Decennial Census of Island Areas: U.S. Virgin Islands Detailed Crosstabulations	dec/crosstabvi	2020	Aggregate	2020/2020	U.S. Virgin Islands
Decennial Census: Detailed Demographic and Housing Characteristics File A	dec/ddhca	2020	Aggregate	2020/2020	United States
Decennial Census: Demographic and Housing Characteristics	dec/dhc	2020	Aggregate	2020/2020	United States
Decennial Census of Island Areas: American Samoa Demographic and Housing Characteristics	dec/dhcas	2020	Aggregate	2020/2020	American Samoa
Decennial Census of Island Areas: Guam Demographic and Housing Characteristics	dec/dhcgu	2020	Aggregate	2020/2020	Guam
Decennial Census of Island Areas: Commonwealth of the Northern Mariana Islands Demographic and Housing Characteristics	dec/dhcmp	2020	Aggregate	2020/2020	Commonwealth of the Northern Mariana Islands
Decennial Census of Island Areas: U.S. Virgin Islands Demographic and Housing Characteristics	dec/dhcvi	2020	Aggregate	2020/2020	U.S. Virgin Islands
Decennial Census: Demographic Profile	dec/dp	2020	Aggregate	2020/2020	United States
Decennial Census of Island Areas: American Samoa Demographic Profile	dec/dpas	2020	Aggregate	2020/2020	United States
Decennial Census of Island Areas: Guam Demographic Profile	dec/dpgu	2020	Aggregate	2020/2020	United States
2020 Commonwealth of the Northern Mariana Islands Demographic Profile	dec/dpmp	2020	Aggregate	2020/2020	United States
Decennial Census of Island Areas: U.S. Virgin Islands Demographic Profile	dec/dpvi	2020	Aggregate	2020/2020	United States
Decennial Census: Decennial Post-Enumeration Survey	dec/pes	2020	Aggregate	2020/2020	US
Decennial Census: Redistricting Data (PL 94-171)	dec/pl	2020	Aggregate	2020/2020	United States
Decennial Census: Decennial Self-Response Rate	dec/responserate	2020	Aggregate	NA	NA

Dataset types

There are three types of datasets included in the Census Bureau API universe: aggregate, microdata, and timeseries. These type names were defined by the Census Bureau and are included as a column in listCensusApis().

table(apis$type)

#> 
#>  Aggregate  Microdata Timeseries 
#>        624        895         81

Most users will work with summary data, either aggregate or timeseries. Summary data contains pre-calculated numbers or percentages for a given statistic — like the number of children in a state or the median household income. The examples below and in the broader list of censusapi examples use summary data.

Aggregate datasets, like the American Community Survey or Decennial Census, include data for only one time period (a vintage), usually one year. Datasets like the American Community Survey contain thousands of these pre-computed variables.

Timeseries datasets, including the Small Area Income and Poverty Estimates, the Quarterly Workforce Estimates, and International Trade statistics, allow users to query data over time in a single API call.

Microdata contains the individual-level responses for a survey for use in custom analysis. One row represents one person. Only advanced analysts will want to use microdata. Learn more about what microdata is and how to use it with censusapi in Accessing microdata.

Variable groups

For some surveys, including the American Community Survey and Decennial Census, you can get many related variables at once using a variable group. These groups are defined by the Census Bureau. In some other data tools, like data.census.gov, this concept is referred to as a table.

Some groups have several dozen variables, others just have a few. As an example, we’ll use the American Community Survey to get the estimate, margin of error and annotations for median household income in the past 12 months for Census places (cities, towns, etc) in Alabama using group B19013.

First, see descriptions of the variables in group B19013:

group_B19013 <- listCensusMetadata(
    name = "acs/acs5",
    vintage = 2022,
    type = "variables",
    group = "B19013")
group_B19013

name	label	concept	predicateType	group	predicateOnly	universe
B19013_001MA	Annotation of Margin of Error!!Median household income in the past 12 months (in 2022 inflation-adjusted dollars)	Median Household Income in the Past 12 Months (in 2022 Inflation-Adjusted Dollars)	string	B19013	TRUE	Households
B19013_001EA	Annotation of Estimate!!Median household income in the past 12 months (in 2022 inflation-adjusted dollars)	Median Household Income in the Past 12 Months (in 2022 Inflation-Adjusted Dollars)	string	B19013	TRUE	Households
B19013_001E	Estimate!!Median household income in the past 12 months (in 2022 inflation-adjusted dollars)	Median Household Income in the Past 12 Months (in 2022 Inflation-Adjusted Dollars)	int	B19013	TRUE	Households
B19013_001M	Margin of Error!!Median household income in the past 12 months (in 2022 inflation-adjusted dollars)	Median Household Income in the Past 12 Months (in 2022 Inflation-Adjusted Dollars)	int	B19013	TRUE	Households

Now, retrieve the data using vars = "group(B19013)". You could alternatively manually list each variable as vars = c("NAME", "B19013_001E", "B19013_001EA", "B19013_001M", "B19013_001MA"), but using the groups is much easier.

acs_income_group <- getCensus(
    name = "acs/acs5", 
    vintage = 2022, 
    vars = "group(B19013)", 
    region = "place:*", 
    regionin = "state:01")
head(acs_income_group)

state	place	B19013_001E	B19013_001EA	B19013_001M	B19013_001MA	GEO_ID	NAME
01	00100	29263	NA	2846	NA	1600000US0100100	Abanda CDP, Alabama
01	00124	35147	NA	15376	NA	1600000US0100124	Abbeville city, Alabama
01	00460	58631	NA	13426	NA	1600000US0100460	Adamsville city, Alabama
01	00484	47188	NA	6288	NA	1600000US0100484	Addison town, Alabama
01	00676	53929	NA	35679	NA	1600000US0100676	Akron town, Alabama
01	00820	89423	NA	6760	NA	1600000US0100820	Alabaster city, Alabama

Advanced geographies

Some geographies, particularly Census tracts and blocks, need to be specified within larger geographies like states and counties. This varies by API endpoint, so make sure to read the documentation for your specific API and run listCensusMetadata(type = "geographies") to see the available options.

Tract-level data from the 2010 Decennial Census can only be requested from one state at a time. In this example, we use the built in fips list of state FIPS codes to request tract-level data from each state and join into a single data frame.

tracts <- NULL
for (f in fips) {
    stateget <- paste("state:", f, sep="")
    temp <- getCensus(
        name = "dec/sf1",
        vintage = 2010,
        vars = "P001001",
        region = "tract:*",
        regionin = stateget)
    tracts <- rbind(tracts, temp)
}
# How many tracts are present?
nrow(tracts)

#> [1] 73057

head(tracts)

state	county	tract	P001001
01	001	020100	1912
01	001	020500	10766
01	001	020300	3373
01	001	020400	4386
01	001	020200	2170
01	001	020600	3668

The regionin argument of getCensus() can also be used with a string of nested geographies, as shown below.

The 2010 Decennial Census summary file 1 requires you to specify a state and county to retrieve block-level data. Use region to request block level data, and regionin to specify the desired state and county.

data2010 <- getCensus(
    name = "dec/sf1",
    vintage = 2010,
    vars = "P001001", 
    region = "block:*",
    regionin = "state:36+county:027+tract:010000")
head(data2010)

state	county	tract	block	P001001
36	027	010000	1000	31
36	027	010000	1011	17
36	027	010000	1028	41
36	027	010000	1001	0
36	027	010000	1031	0
36	027	010000	1002	4

For many more examples, frequently asked questions, troubleshooting, and advanced topics check out all of the articles.