Skip to contents

censusapi is a lightweight package that helps you retrieve data from the U.S. Census Bureau’s 1,600 API endpoints using one simple function, getCensus(). Additional functions provide information about what datasets are available and how to use them.

This package returns the data as-is with the original variable names created by the Census Bureau and any quirks inherent in the data. Each dataset is a little different. Some are documented thoroughly, others have documentation that is sparse. Sometimes variable names change each year. This package can’t overcome those challenges, but tries to make it easier to get the data for use in your analysis. Make sure to thoroughly read the documentation for your dataset and see below for how to get help with Census data.

API key setup

censusapi recommends but does not require using an API key from the U.S. Census Bureau. The Census Bureau may limit the number of requests made by your IP address if you do not use an API key.

You can sign up online to receive a key, which will be sent to your provided email address.

If you save the key with the name CENSUS_KEY or CENSUS_API_KEY in your Renviron file, censusapi will use it by default without any extra work on your part.

To save your API key, within R, run:

# Check to see if you already have a CENSUS_KEY or CENSUS_API_KEY saved
# If so, no further action is needed
get_api_key()

# If not, add your key to your Renviron file
Sys.setenv(CENSUS_KEY=PASTEYOURKEYHERE)

# Reload .Renviron
readRenviron("~/.Renviron")

# Check to see that the expected key is output in your R console
get_api_key()

In some instances you might not want to put your key in your .Renviron - for example, if you’re on a shared school computer. You can always choose to manually set key = "PASTEYOURKEYHERE" as an argument in getCensus() if you prefer.

Basic usage

The main function in censusapi is getCensus(), which makes an API call to a given endpoint and returns a data frame with results. Each API has slightly different parameters, but there are always a few required arguments:

  • name: the programmatic name of the endpoint as defined by the Census, like “acs/acs5” or “timeseries/bds/firms”
  • vintage: the survey year, required for aggregate or microdata APIs
  • vars: a list of variables to retrieve
  • region: the geography level to retrieve, such as state or county, required for nearly all endpoints

Some APIs have additional required or optional arguments, like time for some timeseries datasets. Check the specific documentation for your API and explore its metadata with listCensusMetadata() to see what options are allowed.

Let’s walk through an example getting uninsured rates using the Small Area Health Insurance Estimates API, which provides detailed annual state-level and county-level estimates of health insurance rates for people below age 65.

Choosing variables

censusapi includes a metadata function called listCensusMetadata() to get information about an API’s variable and geography options. Let’s see what variables are available in the SAHIE API:

library(censusapi)

sahie_vars <- listCensusMetadata(
    name = "timeseries/healthins/sahie", 
    type = "variables")

# See the full list of variables
sahie_vars$name
#>  [1] "for"        "in"         "time"       "NIPR_LB90"  "NIPR_PT"   
#>  [6] "AGECAT"     "GEOID"      "NIC_PT"     "STATE"      "RACE_DESC" 
#> [11] "YEAR"       "IPRCAT"     "PCTIC_UB90" "NIPR_MOE"   "PCTUI_LB90"
#> [16] "NIC_MOE"    "US"         "COUNTY"     "PCTUI_MOE"  "NUI_UB90"  
#> [21] "NIC_UB90"   "NUI_MOE"    "SEXCAT"     "PCTUI_PT"   "PCTIC_LB90"
#> [26] "PCTUI_UB90" "NUI_PT"     "STABREV"    "AGE_DESC"   "NAME"      
#> [31] "NIC_LB90"   "PCTIC_PT"   "PCTIC_MOE"  "IPR_DESC"   "NUI_LB90"  
#> [36] "NIPR_UB90"  "GEOCAT"     "SEX_DESC"   "RACECAT"
# Full info on the first several variables
head(sahie_vars)
name label concept predicateType group limit predicateOnly required
for Census API FIPS ‘for’ clause Census API Geography Specification fips-for N/A 0 TRUE NA
in Census API FIPS ‘in’ clause Census API Geography Specification fips-in N/A 0 TRUE NA
time ISO-8601 Date/Time value Census API Date/Time Specification datetime N/A 0 TRUE true
NIPR_LB90 Number in Demographic Group for Selected Income Range, Upper Bound for 90% Confidence Interval NA int N/A 0 NA NA
NIPR_PT Number in Demographic Group for Selected Income Range, Estimate NA int N/A 0 NA NA
AGECAT Age Category NA string N/A 0 NA default displayed

Choosing regions

We can also use listCensusMetadata to see which geographic levels are available.

listCensusMetadata(
    name = "timeseries/healthins/sahie", 
    type = "geographies")
name geoLevelId limit referenceDate requires wildcard optionalWithWCFor
us 010 1 2015-01-01 NULL NULL NA
county 050 3142 2015-01-01 state state state
state 040 52 2015-01-01 NULL NULL NA

This API has three geographic levels: us, county, and state. County data can be queried for all counties nationally or within a specific state.

Making a censusapi call

First, using getCensus(), let’s get the percent (PCTUI_PT) and number (NUI_PT) of people who are uninsured, using the wildcard star (*) to retrieve data for all counties.

sahie_counties <- getCensus(
    name = "timeseries/healthins/sahie",
    vars = c("NAME", "PCTUI_PT", "NUI_PT"), 
    region = "county:*", 
    time = 2021)
head(sahie_counties)
time state county NAME PCTUI_PT NUI_PT
2021 01 001 Autauga County, AL 10.0 4912
2021 01 003 Baldwin County, AL 11.0 20432
2021 01 005 Barbour County, AL 12.7 2150
2021 01 007 Bibb County, AL 11.4 1905
2021 01 009 Blount County, AL 12.8 6145
2021 01 011 Bullock County, AL 12.2 824

We can also get data on detailed income and demographic groups from the SAHIE. We’ll use region to specify county-level results and regionin to filter to Virginia, state code 51. We’ll get uninsured rates by income group, IPRCAT.

sahie_virginia <- getCensus(
    name = "timeseries/healthins/sahie",
    vars = c("NAME", "IPRCAT", "IPR_DESC", "PCTUI_PT"), 
    region = "county:*", 
    regionin = "state:51", 
    time = 2021)
head(sahie_virginia, head = 12L)
time state county NAME IPRCAT IPR_DESC PCTUI_PT
2021 51 001 Accomack County, VA 0 All Incomes 13.4
2021 51 001 Accomack County, VA 1 <= 200% of Poverty 17.1
2021 51 001 Accomack County, VA 2 <= 250% of Poverty 16.8
2021 51 001 Accomack County, VA 3 <= 138% of Poverty 17.4
2021 51 001 Accomack County, VA 4 <= 400% of Poverty 15.6
2021 51 001 Accomack County, VA 5 138% to 400% of Poverty 14.5

Because the SAHIE API is a timeseries dataset, as indicated in its name,, we can get multiple years of data at once by changing time = YYYY to time = "from YYYY to YYYY", or get through the latest data available using time = "from YYYY". Let’s get that data for DeKalb County, Georgia using county fips code 089 and state fips code 13. You can look up fips codes on the Census Bureau website.

sahie_years <- getCensus(
    name = "timeseries/healthins/sahie",
    vars = c("NAME", "PCTUI_PT"), 
    region = "county:089", 
    regionin = "state:13",
    time = "from 2006")
sahie_years
time state county NAME PCTUI_PT
2006 13 089 DeKalb County, GA 19.0
2007 13 089 DeKalb County, GA 17.2
2008 13 089 DeKalb County, GA 22.5
2009 13 089 DeKalb County, GA 22.9
2010 13 089 DeKalb County, GA 25.8
2011 13 089 DeKalb County, GA 23.9
2012 13 089 DeKalb County, GA 21.7
2013 13 089 DeKalb County, GA 22.1
2014 13 089 DeKalb County, GA 19.4
2015 13 089 DeKalb County, GA 16.9
2016 13 089 DeKalb County, GA 15.3
2017 13 089 DeKalb County, GA 15.9
2018 13 089 DeKalb County, GA 17.1
2019 13 089 DeKalb County, GA 16.9
2020 13 089 DeKalb County, GA 14.0
2021 13 089 DeKalb County, GA 14.2

We can also filter the data by income group using the IPRCAT variable. See the possible values of IPRCAT using listCensusMetadata().

IPRCAT = 3 represents <=138% of the federal poverty line. That is the threshold for Medicaid eligibility in states that have expanded it under the Affordable Care Act.

listCensusMetadata(
    name = "timeseries/healthins/sahie",
    type = "values",
    variable = "IPRCAT")
code label
0 All Incomes
1 Less than or Equal to 200% of Poverty
2 Less than or Equal to 250% of Poverty
3 Less than or Equal to 138% of Poverty
4 Less than or Equal to 400% of Poverty
5 138% to 400% Poverty

Getting this data for Los Angeles county (fips code 06037) we can see the dramatic decrease in the uninsured rate in this income group after California expanded Medicaid.

sahie_138 <- getCensus(
    name = "timeseries/healthins/sahie",
    vars = c("NAME", "PCTUI_PT", "NUI_PT"), 
    region = "county:037", 
    regionin = "state:06", 
    IPRCAT = 3,
    time = "from 2010")
sahie_138
time state county NAME PCTUI_PT NUI_PT IPRCAT
2010 06 037 Los Angeles County, CA 37.4 894385 3
2011 06 037 Los Angeles County, CA 35.1 867577 3
2012 06 037 Los Angeles County, CA 34.4 865516 3
2013 06 037 Los Angeles County, CA 33.0 818978 3
2014 06 037 Los Angeles County, CA 24.9 607542 3
2015 06 037 Los Angeles County, CA 17.8 402977 3
2016 06 037 Los Angeles County, CA 15.4 329251 3
2017 06 037 Los Angeles County, CA 14.3 281842 3
2018 06 037 Los Angeles County, CA 13.9 255520 3
2019 06 037 Los Angeles County, CA 15.1 254740 3
2020 06 037 Los Angeles County, CA 14.4 230380 3
2021 06 037 Los Angeles County, CA 15.1 249186 3

Finding your API

What if you don’t already know your dataset’s name? To see a current table of every available endpoint, use listCensusApis(). This data frame includes useful information for making your API call, including the dataset’s name, vintage if applicable, description, and title.

apis <- listCensusApis()
colnames(apis)
#>  [1] "title"       "name"        "vintage"     "type"        "temporal"   
#>  [6] "spatial"     "url"         "modified"    "description" "contact"

You can also get information on a subset of datasets using the optional name and/or vintage parameters. For example, get information about 2020 Decennial Census datasets.

dec_apis <- listCensusApis(name = "dec", vintage = 2020)
dec_apis[, 1:6]
title name vintage type temporal spatial
Decennial Census: 118th Congressional District Summary File dec/cd118 2020 Aggregate 2020/2020 US
Decennial Census of Island Areas: American Samoa Detailed Crosstabulations dec/crosstabas 2020 Aggregate 2020/2020 American Samoa
Decennial Census of Island Areas: Guam Detailed Crosstabulations dec/crosstabgu 2020 Aggregate 2020/2020 Guam
Decennial Census of Island Areas: Commonwealth of the Northern Mariana Islands Detailed Crosstabulations dec/crosstabmp 2020 Aggregate 2020/2020 Northern Mariana Islands
Decennial Census of Island Areas: U.S. Virgin Islands Detailed Crosstabulations dec/crosstabvi 2020 Aggregate 2020/2020 U.S. Virgin Islands
Decennial Census: Detailed Demographic and Housing Characteristics File A dec/ddhca 2020 Aggregate 2020/2020 United States
Decennial Census: Demographic and Housing Characteristics dec/dhc 2020 Aggregate 2020/2020 United States
Decennial Census of Island Areas: American Samoa Demographic and Housing Characteristics dec/dhcas 2020 Aggregate 2020/2020 American Samoa
Decennial Census of Island Areas: Guam Demographic and Housing Characteristics dec/dhcgu 2020 Aggregate 2020/2020 Guam
Decennial Census of Island Areas: Commonwealth of the Northern Mariana Islands Demographic and Housing Characteristics dec/dhcmp 2020 Aggregate 2020/2020 Commonwealth of the Northern Mariana Islands
Decennial Census of Island Areas: U.S. Virgin Islands Demographic and Housing Characteristics dec/dhcvi 2020 Aggregate 2020/2020 U.S. Virgin Islands
Decennial Census: Demographic Profile dec/dp 2020 Aggregate 2020/2020 United States
Decennial Census of Island Areas: American Samoa Demographic Profile dec/dpas 2020 Aggregate 2020/2020 United States
Decennial Census of Island Areas: Guam Demographic Profile dec/dpgu 2020 Aggregate 2020/2020 United States
2020 Commonwealth of the Northern Mariana Islands Demographic Profile dec/dpmp 2020 Aggregate 2020/2020 United States
Decennial Census of Island Areas: U.S. Virgin Islands Demographic Profile dec/dpvi 2020 Aggregate 2020/2020 United States
Decennial Census: Decennial Post-Enumeration Survey dec/pes 2020 Aggregate 2020/2020 US
Decennial Census: Redistricting Data (PL 94-171) dec/pl 2020 Aggregate 2020/2020 United States
Decennial Census: Decennial Self-Response Rate dec/responserate 2020 Aggregate NA NA

Dataset types

There are three types of datasets included in the Census Bureau API universe: aggregate, microdata, and timeseries. These type names were defined by the Census Bureau and are included as a column in listCensusApis().

table(apis$type)
#> 
#>  Aggregate  Microdata Timeseries 
#>        624        895         81

Most users will work with summary data, either aggregate or timeseries. Summary data contains pre-calculated numbers or percentages for a given statistic — like the number of children in a state or the median household income. The examples below and in the broader list of censusapi examples use summary data.

Aggregate datasets, like the American Community Survey or Decennial Census, include data for only one time period (a vintage), usually one year. Datasets like the American Community Survey contain thousands of these pre-computed variables.

Timeseries datasets, including the Small Area Income and Poverty Estimates, the Quarterly Workforce Estimates, and International Trade statistics, allow users to query data over time in a single API call.

Microdata contains the individual-level responses for a survey for use in custom analysis. One row represents one person. Only advanced analysts will want to use microdata. Learn more about what microdata is and how to use it with censusapi in Accessing microdata.

Variable groups

For some surveys, including the American Community Survey and Decennial Census, you can get many related variables at once using a variable group. These groups are defined by the Census Bureau. In some other data tools, like data.census.gov, this concept is referred to as a table.

Some groups have several dozen variables, others just have a few. As an example, we’ll use the American Community Survey to get the estimate, margin of error and annotations for median household income in the past 12 months for Census places (cities, towns, etc) in Alabama using group B19013.

First, see descriptions of the variables in group B19013:

group_B19013 <- listCensusMetadata(
    name = "acs/acs5",
    vintage = 2022,
    type = "variables",
    group = "B19013")
group_B19013
name label concept predicateType group limit predicateOnly universe
B19013_001MA Annotation of Margin of Error!!Median household income in the past 12 months (in 2022 inflation-adjusted dollars) Median Household Income in the Past 12 Months (in 2022 Inflation-Adjusted Dollars) string B19013 0 TRUE Households
B19013_001EA Annotation of Estimate!!Median household income in the past 12 months (in 2022 inflation-adjusted dollars) Median Household Income in the Past 12 Months (in 2022 Inflation-Adjusted Dollars) string B19013 0 TRUE Households
B19013_001E Estimate!!Median household income in the past 12 months (in 2022 inflation-adjusted dollars) Median Household Income in the Past 12 Months (in 2022 Inflation-Adjusted Dollars) int B19013 0 TRUE Households
B19013_001M Margin of Error!!Median household income in the past 12 months (in 2022 inflation-adjusted dollars) Median Household Income in the Past 12 Months (in 2022 Inflation-Adjusted Dollars) int B19013 0 TRUE Households

Now, retrieve the data using vars = "group(B19013)". You could alternatively manually list each variable as vars = c("NAME", "B19013_001E", "B19013_001EA", "B19013_001M", "B19013_001MA"), but using the groups is much easier.

acs_income_group <- getCensus(
    name = "acs/acs5", 
    vintage = 2022, 
    vars = "group(B19013)", 
    region = "place:*", 
    regionin = "state:01")
head(acs_income_group)
state place B19013_001E B19013_001EA B19013_001M B19013_001MA GEO_ID NAME
01 00100 29263 NA 2846 NA 1600000US0100100 Abanda CDP, Alabama
01 00124 35147 NA 15376 NA 1600000US0100124 Abbeville city, Alabama
01 00460 58631 NA 13426 NA 1600000US0100460 Adamsville city, Alabama
01 00484 47188 NA 6288 NA 1600000US0100484 Addison town, Alabama
01 00676 53929 NA 35679 NA 1600000US0100676 Akron town, Alabama
01 00820 89423 NA 6760 NA 1600000US0100820 Alabaster city, Alabama

Advanced geographies

Some geographies, particularly Census tracts and blocks, need to be specified within larger geographies like states and counties. This varies by API endpoint, so make sure to read the documentation for your specific API and run listCensusMetadata(type = "geographies") to see the available options.

Tract-level data from the 2010 Decennial Census can only be requested from one state at a time. In this example, we use the built in fips list of state FIPS codes to request tract-level data from each state and join into a single data frame.

tracts <- NULL
for (f in fips) {
    stateget <- paste("state:", f, sep="")
    temp <- getCensus(
        name = "dec/sf1",
        vintage = 2010,
        vars = "P001001",
        region = "tract:*",
        regionin = stateget)
    tracts <- rbind(tracts, temp)
}
# How many tracts are present?
nrow(tracts)
#> [1] 73057
head(tracts)
state county tract P001001
01 001 020100 1912
01 001 020500 10766
01 001 020300 3373
01 001 020400 4386
01 001 020200 2170
01 001 020600 3668

The regionin argument of getCensus() can also be used with a string of nested geographies, as shown below.

The 2010 Decennial Census summary file 1 requires you to specify a state and county to retrieve block-level data. Use region to request block level data, and regionin to specify the desired state and county.

data2010 <- getCensus(
    name = "dec/sf1",
    vintage = 2010,
    vars = "P001001", 
    region = "block:*",
    regionin = "state:36+county:027+tract:010000")
head(data2010)
state county tract block P001001
36 027 010000 1000 31
36 027 010000 1011 17
36 027 010000 1028 41
36 027 010000 1001 0
36 027 010000 1031 0
36 027 010000 1002 4

For many more examples, frequently asked questions, troubleshooting, and advanced topics check out all of the articles.