censusapi is a wrapper for the United States Census Bureau’s APIs. As of 2017 over 200 Census API endpoints are available, including Decennial Census, American Community Survey, Poverty Statistics, and Population Estimates APIs. This package is designed to let you get data from all of those APIs using the same main function—getCensus—and the same syntax for each dataset.

censusapi generally uses the APIs’ original parameter names so that users can easily transition between Census’s documentation and examples and this package. It also includes metadata functions to return data frames of available APIs, variables, and geographies.

API key setup

To use the Census APIs, sign up for an API key. Then, if you’re on a non-shared computer, add your Census API key to your .Renviron profile and call it CENSUS_KEY. censusapi will use it by default without any extra work on your part. Within R, run:

# Add key to .Renviron
Sys.setenv(CENSUS_KEY=YOURKEYHERE)
# Reload .Renviron
readRenviron("~/.Renviron")
# Check to see that the expected key is output in your R console
Sys.getenv("CENSUS_KEY")

In some instances you might not want to put your key in your .Renviron - for example, if you’re on a shared school computer. You can always choose to specify your key within getCensus instead.

Finding your API

To get started, load the censusapi library.

library(censusapi)

The Census APIs have over 200 endpoints, covering dozens of different datasets.

To see a current table of every available endpoint, run listCensusApis:

apis <- listCensusApis()
View(apis)

This returns useful information about each endpoint, including name, which you’ll need to make your API call.

Using getCensus

The main function in censusapi is getCensus, which makes an API call to a given Census API and returns a data frame of results. Each API has slightly different parameters, but there are always a few required arguments:

  • name: the name of the API as defined by the Census, like “acs5” or “timeseries/bds/firms”
  • vintage: the dataset year, generally required for non-timeseries APIs
  • vars: the list of variable names to get
  • region: the geography level to return, like state or county

Some APIs have additional required or optional arguments, like time, monthly, or period. Check the specific documentation for your API to see what options are allowed.

Let’s walk through an example getting uninsured rates by income group using the Small Area Health Insurance Estimates API, which provides detailed annual state-level and county-level estimates of health insurance rates.

Choosing variables

censusapi includes a metadata function called listCensusMetadata to get information about an API’s variable options and geography options. Let’s see what variables are available in the SAHIE API:

sahie_vars <- listCensusMetadata(name = "timeseries/healthins/sahie", 
    type = "variables")
head(sahie_vars)
name label concept predicateType group limit required
AGE_DESC Age Category Description Demographic ID int N/A 0 NA
NUI_LB90 Number Uninsured, Lower Bound for 90% Confidence Interval Uncertainty Measure int N/A 0 NA
STATE State FIPS Code Geographic ID int N/A 0 NA
NIC_MOE Number Insured, Margin of Error Uncertainty Measure int N/A 0 NA
NIPR_PT Number in Demographic Group for Selected Income Range, Estimate Estimate int N/A 0 NA
RACECAT Race Category Demographic ID int N/A 0 default displayed

We’ll use a few of these variables to get uninsured rates by income group:

  • IPRCAT: Income Poverty Ratio Category
  • IPR_DESC: Income Poverty Ratio Category Description
  • PCTUI_PT: Percent Uninsured in Demographic Group for Selected Income Range, Estimate
  • NAME: Name of the geography returned (e.g. state or county name)

Choosing regions

We can also use listCensusMetadata to see which geographic levels we can get data for using the SAHIE API.

listCensusMetadata(name = "timeseries/healthins/sahie", 
    type = "geography")
name geoLevelId referenceDate requires wildcard optionalWithWCFor
us 010 2015-01-01 NULL NULL NA
county 050 2015-01-01 state state state
state 040 2015-01-01 NULL NULL NA

This API has three geographic levels: us, county within states, and state.

First, using getCensus, let’s get uninsured rate by income group at the national level for 2015.

getCensus(name = "timeseries/healthins/sahie",
    vars = c("NAME", "IPRCAT", "IPR_DESC", "PCTUI_PT"), 
    region = "us:*", 
    time = 2016)
time us NAME IPRCAT IPR_DESC PCTUI_PT
2016 1 United States 0 All Incomes 10.0
2016 1 United States 1 <= 200% of Poverty 17.0
2016 1 United States 2 <= 250% of Poverty 16.3
2016 1 United States 3 <= 138% of Poverty 17.4
2016 1 United States 4 <= 400% of Poverty 14.0
2016 1 United States 5 138% to 400% of Poverty 12.1

We can also get this data at the state level for every state by changing region to "state:*":

sahie_states <- getCensus(name = "timeseries/healthins/sahie",
    vars = c("NAME", "IPRCAT", "IPR_DESC", "PCTUI_PT"), 
    region = "state:*", 
    time = 2016)
head(sahie_states)
time state NAME IPRCAT IPR_DESC PCTUI_PT
2016 01 Alabama 0 All Incomes 10.8
2016 01 Alabama 1 <= 200% of Poverty 18.1
2016 01 Alabama 2 <= 250% of Poverty 17.0
2016 01 Alabama 3 <= 138% of Poverty 19.2
2016 01 Alabama 4 <= 400% of Poverty 14.2
2016 01 Alabama 5 138% to 400% of Poverty 11.0

Finally, we can get county-level data. The geography metadata showed that we can choose to get county-level data within states. We’ll use region to specify county-level results and regionin to request data for Alabama and Alaska.

sahie_counties <- getCensus(name = "timeseries/healthins/sahie",
    vars = c("NAME", "IPRCAT", "IPR_DESC", "PCTUI_PT"), 
    region = "county:*", 
    regionin = "state:1,2", 
    time = 2016)
head(sahie_counties, n=12L)
time state county NAME IPRCAT IPR_DESC PCTUI_PT
2016 01 001 Autauga County, AL 0 All Incomes 8.5
2016 01 001 Autauga County, AL 1 <= 200% of Poverty 15.9
2016 01 001 Autauga County, AL 2 <= 250% of Poverty 14.7
2016 01 001 Autauga County, AL 3 <= 138% of Poverty 17.2
2016 01 001 Autauga County, AL 4 <= 400% of Poverty 11.5
2016 01 001 Autauga County, AL 5 138% to 400% of Poverty 9.0
2016 01 003 Baldwin County, AL 0 All Incomes 10.7
2016 01 003 Baldwin County, AL 1 <= 200% of Poverty 20.0
2016 01 003 Baldwin County, AL 2 <= 250% of Poverty 18.4
2016 01 003 Baldwin County, AL 3 <= 138% of Poverty 21.2
2016 01 003 Baldwin County, AL 4 <= 400% of Poverty 14.9
2016 01 003 Baldwin County, AL 5 138% to 400% of Poverty 11.8

Because the SAHIE API is a timeseries (as indicated in its name), we can get multiple years of data at once using the time argument.

sahie_years <- getCensus(name = "timeseries/healthins/sahie",
    vars = c("NAME", "PCTUI_PT"), 
    region = "state:1", 
    time = "from 2006 to 2016")
head(sahie_years)
time state NAME PCTUI_PT
2006 01 Alabama 15.7
2007 01 Alabama 14.6
2008 01 Alabama 15.3
2009 01 Alabama 15.8
2010 01 Alabama 16.9
2011 01 Alabama 16.6

American Community Survey annotations variable groups

The American Community Survey (ACS) APIs include estimates (variable names ending in “E”), annotations, margins of error, and statistical significance, depending on the data set. Read more on ACS variable types and annotation symbol meanings on the Census website.

You can retrieve these annotation variables manually, by specifying a list of variables. We’ll get the estimate, margin of error and annotations for median household income in the past 12 months for Census tracts in Alaska.

acs_income <- getCensus(name = "acs/acs5",
    vintage = 2016, 
    vars = c("NAME", "B19013_001E", "B19013_001EA", "B19013_001M", "B19013_001MA"), 
    region = "tract:*", 
    regionin = "state:02")
head(acs_income)
state county tract NAME B19013_001E B19013_001EA B19013_001M B19013_001MA
02 013 000100 Census Tract 1, Aleutians East Borough, Alaska 65926 NA 2430 NA
02 016 000100 Census Tract 1, Aleutians West Census Area, Alaska 59167 NA 4680 NA
02 016 000200 Census Tract 2, Aleutians West Census Area, Alaska 92083 NA 4791 NA
02 020 000101 Census Tract 1.01, Anchorage Municipality, Alaska 101420 NA 15802 NA
02 020 000102 Census Tract 1.02, Anchorage Municipality, Alaska 76690 NA 14441 NA
02 020 000201 Census Tract 2.01, Anchorage Municipality, Alaska 93636 NA 17769 NA

You can also retrieve also estimates and annotations for a group of variables in one command. Here’s the group call for that same table, B19013.

acs_income_group <- getCensus(name = "acs/acs5", 
    vintage = 2016, 
    vars = c("NAME", "group(B19013)"), 
    region = "tract:*", 
    regionin = "state:02")
head(acs_income_group)
state county tract NAME B19013_001E B19013_001M B19013_001M_1 B19013_001EA B19013_001MA
02 013 000100 Census Tract 1, Aleutians East Borough, Alaska 65926 2430 2430 NA NA
02 016 000100 Census Tract 1, Aleutians West Census Area, Alaska 59167 4680 4680 NA NA
02 016 000200 Census Tract 2, Aleutians West Census Area, Alaska 92083 4791 4791 NA NA
02 020 000101 Census Tract 1.01, Anchorage Municipality, Alaska 101420 15802 15802 NA NA
02 020 000102 Census Tract 1.02, Anchorage Municipality, Alaska 76690 14441 14441 NA NA
02 020 000201 Census Tract 2.01, Anchorage Municipality, Alaska 93636 17769 17769 NA NA

Some variable groups contain many related variables and their associated annotations. As an example, we’ll get table B17020, poverty status by age.

acs_poverty_group <- getCensus(name = "acs/acs5",
    vintage = 2016, 
    vars = c("NAME", "group(B17020)"), 
    region = "tract:*",
    regionin = "state:02")
# List column names
colnames(acs_poverty_group)
#>  [1] "state"         "county"        "tract"         "NAME"         
#>  [5] "B17020_001E"   "B17020_001M"   "B17020_002E"   "B17020_002M"  
#>  [9] "B17020_003E"   "B17020_003M"   "B17020_004E"   "B17020_004M"  
#> [13] "B17020_005E"   "B17020_005M"   "B17020_006E"   "B17020_006M"  
#> [17] "B17020_007E"   "B17020_007M"   "B17020_008E"   "B17020_008M"  
#> [21] "B17020_009E"   "B17020_009M"   "B17020_010E"   "B17020_010M"  
#> [25] "B17020_011E"   "B17020_011M"   "B17020_012E"   "B17020_012M"  
#> [29] "B17020_013E"   "B17020_013M"   "B17020_014E"   "B17020_014M"  
#> [33] "B17020_015E"   "B17020_015M"   "B17020_016E"   "B17020_016M"  
#> [37] "B17020_017E"   "B17020_017M"   "B17020_001M_1" "B17020_001EA" 
#> [41] "B17020_001MA"  "B17020_002M_1" "B17020_002EA"  "B17020_002MA" 
#> [45] "B17020_003M_1" "B17020_003EA"  "B17020_003MA"  "B17020_004M_1"
#> [49] "B17020_004EA"  "B17020_004MA"  "B17020_005M_1" "B17020_005EA" 
#> [53] "B17020_005MA"  "B17020_006M_1" "B17020_006EA"  "B17020_006MA" 
#> [57] "B17020_007M_1" "B17020_007EA"  "B17020_007MA"  "B17020_008M_1"
#> [61] "B17020_008EA"  "B17020_008MA"  "B17020_009M_1" "B17020_009EA" 
#> [65] "B17020_009MA"  "B17020_010M_1" "B17020_010EA"  "B17020_010MA" 
#> [69] "B17020_011M_1" "B17020_011EA"  "B17020_011MA"  "B17020_012M_1"
#> [73] "B17020_012EA"  "B17020_012MA"  "B17020_013M_1" "B17020_013EA" 
#> [77] "B17020_013MA"  "B17020_014M_1" "B17020_014EA"  "B17020_014MA" 
#> [81] "B17020_015M_1" "B17020_015EA"  "B17020_015MA"  "B17020_016M_1"
#> [85] "B17020_016EA"  "B17020_016MA"  "B17020_017M_1" "B17020_017EA" 
#> [89] "B17020_017MA"

Advanced geographies

Some geographies, particularly Census tracts and blocks, need to be specified within larger geographies like states and counties. This varies by API endpoint, so make sure to read the documentation for your specific API and run listCensusMetadata to see the available geographies.

You may want to get get data for many geographies that require a parent geography. For example, tract-level data from the 1990 Decennial Census can only be requested from one state at a time.

In this example, we use the built in fips list of state FIPS codes to request tract-level data from each state and join into a single data frame.

fips
#>  [1]  1  2  4  5  6  8  9 10 11 12 13 15 16 17 18 19 20 21 22 23 24 25 26
#> [24] 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 44 45 46 47 48 49 50
#> [47] 51 53 54 55 56
tracts <- NULL
for (f in fips) {
    stateget <- paste("state:", f, sep="")
    temp <- getCensus(name = "sf3",
        vintage = 1990,
        vars = c("P0070001", "P0070002", "P114A001"),
        region = "tract:*",
        regionin = stateget)
    tracts <- rbind(tracts, temp)
}
head(tracts)
state county tract P0070001 P0070002 P114A001
01 001 020100 944 917 11663
01 001 020200 917 1060 8555
01 001 020300 1451 1518 11782
01 001 020400 2166 2223 15323
01 001 020500 1604 1582 14522
01 001 020600 1784 1661 10630

The regionin argument of getCensus can also be used with a string of nested geographies, as shown below.

The 2010 Decennial Census summary file 1 requires you to specify a state and county to retrieve block-level data. Use region to request block level data, and regionin to specify the desired state and county.

data2010 <- getCensus(name = "dec/sf1",
    vintage = 2010,
    vars = "P001001", 
    region = "block:*",
    regionin = "state:36+county:027")
head(data2010)
state county tract block P001001
36 027 010000 1000 31
36 027 010000 1011 17
36 027 010000 1028 41
36 027 010000 1001 0
36 027 010000 1031 0
36 027 010000 1002 4

For the 2000 Decennial Census summary file 1, tract is also required to retrieve block-level data. This example requests data for all blocks within Census tract 010000 in county 027 of state 36.

data2000 <- getCensus(name = "sf1",
    vintage = 2000,
    vars = "P001001", 
    region = "block:*",
    regionin = "state:36+county:027+tract:010000")
head(data2000)
state county tract block P001001
36 027 010000 1000 18
36 027 010000 1001 26
36 027 010000 1002 59
36 027 010000 1003 67
36 027 010000 1004 52
36 027 010000 1005 116

Additional resources

Disclaimer

This product uses the Census Bureau Data API but is not endorsed or certified by the Census Bureau.