censusapi is a wrapper for the United States Census Bureau’s APIs. As of 2017 over 200 Census API endpoints are available, including Decennial Census, American Community Survey, Poverty Statistics, and Population Estimates APIs. This package is designed to let you get data from all of those APIs using the same main function—getCensus—and the same syntax for each dataset.

censusapi generally uses the APIs’ original parameter names so that users can easily transition between Census’s documentation and examples and this package. It also includes metadata functions to return data frames of available APIs, variables, and geographies.

API key setup

To use the Census APIs, sign up for an API key. Then, if you’re on a non-shared computer, add your Census API key to your .Renviron profile and call it CENSUS_KEY. censusapi will use it by default without any extra work on your part. Within R, run:

# Add key to .Renviron
Sys.setenv(CENSUS_KEY=YOURKEYHERE)
# Reload .Renviron
readRenviron("~/.Renviron")
# Check to see that the expected key is output in your R console
Sys.getenv("CENSUS_KEY")

In some instances you might not want to put your key in your .Renviron - for example, if you’re on a shared school computer. You can always choose to specify your key within getCensus instead.

Finding your API

To get started, load the censusapi library.

library(censusapi)

The Census APIs have over 200 endpoints, covering dozens of different datasets.

To see a current table of every available endpoint, run listCensusApis:

apis <- listCensusApis()
View(apis)

This returns useful information about each endpoint, including name, which you’ll need to make your API call.

Using getCensus

The main function in censusapi is getCensus, which makes an API call to a given Census API and returns a data frame of results. Each API has slightly different parameters, but there are always a few required arguments:

  • name: the name of the API as defined by the Census, like “acs5” or “timeseries/bds/firms”
  • vintage: the dataset year, generally required for non-timeseries APIs
  • vars: the list of variable names to get
  • region: the geography level to return, like state or county

Some APIs have additional required or optional arguments, like time, monthly, or period. Check the specific documentation for your API to see what options are allowed.

Let’s walk through an example getting uninsured rates by income group using the Small Area Health Insurance Estimates API, which provides detailed annual state-level and county-level estimates of health insurance rates.

Choosing variables

censusapi includes a metadata function called listCensusMetadata to get information about an API’s variable options and geography options. Let’s see what variables are available in the SAHIE API:

sahie_vars <- listCensusMetadata(name = "timeseries/healthins/sahie", 
    type = "variables")
head(sahie_vars)
name label concept predicateType group limit required
AGE_DESC Age Category Description Demographic ID int N/A 0 NA
NUI_LB90 Number Uninsured, Lower Bound for 90% Confidence Interval Uncertainty Measure int N/A 0 NA
STATE State FIPS Code Geographic ID int N/A 0 NA
NIC_MOE Number Insured, Margin of Error Uncertainty Measure int N/A 0 NA
NIPR_PT Number in Demographic Group for Selected Income Range, Estimate Estimate int N/A 0 NA
RACECAT Race Category Demographic ID int N/A 4 default displayed

We’ll use a few of these variables to get uninsured rates by income group:

  • IPRCAT: Income Poverty Ratio Category
  • IPR_DESC: Income Poverty Ratio Category Description
  • PCTUI_PT: Percent Uninsured in Demographic Group for Selected Income Range, Estimate
  • NAME: Name of the geography returned (e.g. state or county name)

Choosing regions

We can also use listCensusMetadata to see which geographic levels we can get data for using the SAHIE API.

listCensusMetadata(name = "timeseries/healthins/sahie", 
    type = "geography")
name geoLevelId limit referenceDate requires wildcard optionalWithWCFor
us 010 1 2015-01-01 NULL NULL NA
county 050 3142 2015-01-01 state state state
state 040 52 2015-01-01 NULL NULL NA

This API has three geographic levels: us, county within states, and state.

First, using getCensus, let’s get uninsured rate by income group at the national level for 2017.

getCensus(name = "timeseries/healthins/sahie",
    vars = c("NAME", "IPRCAT", "IPR_DESC", "PCTUI_PT"), 
    region = "us:*", 
    time = 2017)
time us NAME IPRCAT IPR_DESC PCTUI_PT
2017 1 United States 0 All Incomes 10.2
2017 1 United States 1 <= 200% of Poverty 17.2
2017 1 United States 2 <= 250% of Poverty 16.5
2017 1 United States 3 <= 138% of Poverty 17.4
2017 1 United States 4 <= 400% of Poverty 14.2
2017 1 United States 5 138% to 400% of Poverty 12.6

We can also get this data at the state level for every state by changing region to "state:*":

sahie_states <- getCensus(name = "timeseries/healthins/sahie",
    vars = c("NAME", "IPRCAT", "IPR_DESC", "PCTUI_PT"), 
    region = "state:*", 
    time = 2017)
head(sahie_states)
time state NAME IPRCAT IPR_DESC PCTUI_PT
2017 01 Alabama 0 All Incomes 11.0
2017 02 Alaska 0 All Incomes 14.8
2017 04 Arizona 0 All Incomes 12.1
2017 05 Arkansas 0 All Incomes 9.3
2017 06 California 0 All Incomes 8.2
2017 08 Colorado 0 All Incomes 8.7

Finally, we can get county-level data. The geography metadata showed that we can choose to get county-level data within states. We’ll use region to specify county-level results and regionin to request data for Alabama and Alaska.

sahie_counties <- getCensus(name = "timeseries/healthins/sahie",
    vars = c("NAME", "IPRCAT", "IPR_DESC", "PCTUI_PT"), 
    region = "county:*", 
    regionin = "state:01,02", 
    time = 2017)
head(sahie_counties, n=12L)
time state county NAME IPRCAT IPR_DESC PCTUI_PT
2017 01 003 Baldwin County, AL 0 All Incomes 11.3
2017 01 001 Autauga County, AL 0 All Incomes 8.7
2017 01 015 Calhoun County, AL 0 All Incomes 11.9
2017 01 005 Barbour County, AL 0 All Incomes 12.2
2017 01 007 Bibb County, AL 0 All Incomes 10.2
2017 01 009 Blount County, AL 0 All Incomes 13.4
2017 01 011 Bullock County, AL 0 All Incomes 11.4
2017 01 013 Butler County, AL 0 All Incomes 11.2
2017 01 027 Clay County, AL 0 All Incomes 13.9
2017 01 017 Chambers County, AL 0 All Incomes 11.9
2017 01 019 Cherokee County, AL 0 All Incomes 11.2
2017 01 021 Chilton County, AL 0 All Incomes 13.8

Because the SAHIE API is a timeseries (as indicated in its name), we can get multiple years of data at once using the time argument.

sahie_years <- getCensus(name = "timeseries/healthins/sahie",
    vars = c("NAME", "PCTUI_PT"), 
    region = "state:01", 
    time = "from 2006 to 2017")
head(sahie_years)
time state NAME PCTUI_PT
2006 01 Alabama 15.7
2007 01 Alabama 14.6
2008 01 Alabama 15.3
2009 01 Alabama 15.8
2010 01 Alabama 16.9
2011 01 Alabama 16.6

Advanced topics

This package allows access to the full range of the U.S. Census Bureau’s APIs. Where the API allows it, you can specify complicated geographies or filter based on a range of parameters. Each API is a little different, so be sure to read the documentation for the specific API that you’re using. Also see more examples in the example masterlist.

Miscellaneous parameters

Some of the APIs allow complex calls, including specifying a country FIPS code or age. The most commonly used parameters, including time, date, and sic are included as built-in options in getCensus, but you can also specify other parameters yourself. (Note: this generally does not apply to the popular American Community Survey and Decennial Census APIs.)

In the SAHIE API, we can filter data by the categorical variables AGECAT (age group), IPRCAT (income group), RACECAT (race) and SEXCAT (sex), in addition to geography and time. More information on those variables is available in the online documentation.

Here’s how to get the uninsured rate (PCTUI_PT) for non-elderly adults (AGECAT = 1) with incomes of 138 to 400% of the poverty line (IPRCAT = 5), by race (RACECAT) and state.

sahie_nonelderly <- getCensus(name = "timeseries/healthins/sahie",
    vars = c("NAME", "PCTUI_PT", "IPR_DESC", "AGE_DESC", "RACECAT", "RACE_DESC"), 
    region = "state:*", 
    time = 2017,
    IPRCAT = 5,
    AGECAT = 1)
head(sahie_nonelderly)
time state NAME PCTUI_PT IPR_DESC AGE_DESC RACECAT RACE_DESC IPRCAT AGECAT
2017 01 Alabama 14.6 138% to 400% of Poverty 18 to 64 years 0 All Races 5 1
2017 02 Alaska 24.3 138% to 400% of Poverty 18 to 64 years 0 All Races 5 1
2017 04 Arizona 16.6 138% to 400% of Poverty 18 to 64 years 0 All Races 5 1
2017 05 Arkansas 12.4 138% to 400% of Poverty 18 to 64 years 0 All Races 5 1
2017 06 California 13.6 138% to 400% of Poverty 18 to 64 years 0 All Races 5 1
2017 08 Colorado 14.6 138% to 400% of Poverty 18 to 64 years 0 All Races 5 1

Note: data by race is only returned where the population is large enough, so some states will not have rows for some race groups. Here’s another example, getting national data from percent uninsured (PCTUI_PT) and number uninsured (NUI_PT), along with the associated margins of error, by race group and income group for all years.

sahie_nonelderly_annual <- getCensus(name = "timeseries/healthins/sahie",
    vars = c("NAME", "PCTUI_PT", "PCTUI_MOE", "NUI_PT", "NUI_MOE", "IPRCAT", "IPR_DESC", "AGE_DESC", "RACECAT", "RACE_DESC"), 
    region = "us:*", 
    time = "from 2006 to 2017",
    AGECAT = 1)
head(sahie_nonelderly_annual)
time us NAME PCTUI_PT PCTUI_MOE NUI_PT NUI_MOE IPRCAT IPR_DESC AGE_DESC RACECAT RACE_DESC AGECAT
2006 1 United States 19.5 0.3 36363986 549708 0 All Incomes 18 to 64 years 0 All Races 1
2006 1 United States 39.5 0.7 19368368 440544 1 <= 200% of Poverty 18 to 64 years 0 All Races 1
2006 1 United States 36.5 0.6 23595529 455801 2 <= 250% of Poverty 18 to 64 years 0 All Races 1
2006 1 United States 13.7 0.3 17094552 364661 0 All Incomes 18 to 64 years 1 White alone, not Hispanic 1
2006 1 United States 32.0 0.8 7846458 277188 1 <= 200% of Poverty 18 to 64 years 1 White alone, not Hispanic 1
2006 1 United States 28.5 0.7 9614431 291480 2 <= 250% of Poverty 18 to 64 years 1 White alone, not Hispanic 1

Other APIs can be filtered too. For example, the International Data Base population projections APIs allow you to get data by age and country.

See what variables the IDB 1 year API allows:

listCensusMetadata(name = "timeseries/idb/1year", 
    type = "variables")
name label concept predicateType group limit required
AREA_KM2 Area in square kilometers Geographic Characteristics int N/A 0 NA
FIPS FIPS country/area code Geographic Characteristics string N/A 0 NA
NAME Country or area name Geographic Characteristics string N/A 0 NA
AGE Single year of age from 0-100+ Age and Sex int N/A 0 true
SEX Sex Age and Sex int N/A 0 default displayed
POP Total mid-year population Total Midyear Population int N/A 0 NA
YR Year Required variable int N/A 0 NA

Here’s a simple call getting projected population by age for all countries in 2050.

pop_2050 <- getCensus(name = "timeseries/idb/1year",
    vars = c("FIPS", "NAME", "AGE", "POP"),
    time = 2050)
head(pop_2050)
time FIPS NAME AGE POP
2050 AA Aruba 0 1554
2050 AA Aruba 1 1554
2050 AA Aruba 2 1551
2050 AA Aruba 3 1554
2050 AA Aruba 4 1550
2050 AA Aruba 5 1553

But we can make a much more specific call by specifying FIPS and AGE to get just the population projections for teenagers in Portugal.

pop_portugal <- getCensus(name = "timeseries/idb/1year",
    vars = c("NAME", "POP"),
    time = 2050,
    FIPS = "PO",
    AGE = "13:19")
pop_portugal
time NAME POP FIPS AGE
2050 Portugal 82014 PO 13
2050 Portugal 82573 PO 14
2050 Portugal 83083 PO 15
2050 Portugal 83540 PO 16
2050 Portugal 83812 PO 17
2050 Portugal 83919 PO 18
2050 Portugal 83880 PO 19

The Quarterly Workforce Indicators APIs allow even more specific calls. Here’s one example:

qwi <- getCensus(name = "timeseries/qwi/sa",
                                 region = "state:02",
                                 vars = c("Emp", "sex"),
                                 year = 2012,
                                 quarter = 1,
                                 agegrp = "A07",
                                 ownercode = "A05",
                                 firmsize = 1,
                                 seasonadj = "U",
                                 industry = 21)
qwi
Emp sex year quarter agegrp ownercode firmsize seasonadj industry state
61 0 2012 1 A07 A05 1 U 21 02
55 1 2012 1 A07 A05 1 U 21 02
6 2 2012 1 A07 A05 1 U 21 02

Variable groups

For some surveys, particularly the American Community Survey and Decennial Census, you can get many related variables at once using a group, defined by the Census Bureau. In some other data tools, like American FactFinder, this idea is referred to as a table.

The American Community Survey (ACS) APIs include estimates (variable names ending in “E”), annotations, margins of error, and statistical significance, depending on the data set. Read more on ACS variable types and annotation symbol meanings on the Census website.

You can retrieve these annotation variables manually, by specifying a list of variables. We’ll get the estimate, margin of error and annotations for median household income in the past 12 months for Census tracts in Alaska.

acs_income <- getCensus(name = "acs/acs5",
    vintage = 2017, 
    vars = c("NAME", "B19013_001E", "B19013_001EA", "B19013_001M", "B19013_001MA"), 
    region = "tract:*", 
    regionin = "state:02")
head(acs_income)
state county tract NAME B19013_001E B19013_001EA B19013_001M B19013_001MA
02 261 000300 Census Tract 3, Valdez-Cordova Census Area, Alaska 89000 NA 20435 NA
02 122 000600 Census Tract 6, Kenai Peninsula Borough, Alaska 58125 NA 5725 NA
02 122 001100 Census Tract 11, Kenai Peninsula Borough, Alaska 69028 NA 5941 NA
02 261 000100 Census Tract 1, Valdez-Cordova Census Area, Alaska 49076 NA 7165 NA
02 122 000200 Census Tract 2, Kenai Peninsula Borough, Alaska 57694 NA 6526 NA
02 122 000800 Census Tract 8, Kenai Peninsula Borough, Alaska 50904 NA 3723 NA

You can also retrieve also estimates and annotations for a group of variables in one command. Here’s the group call for that same table, B19013.

# See descriptions of the variables in group B19013
group_B19013 <- listCensusMetadata(name = "acs/acs5",
    vintage = 2017,
    type = "variables",
    group = "B19013")
group_B19013
name label concept predicateType group limit predicateOnly
B19013_001E Estimate!!Median household income in the past 12 months (in 2017 inflation-adjusted dollars) MEDIAN HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2017 INFLATION-ADJUSTED DOLLARS) int B19013 0 TRUE
B19013_001M Margin of Error!!Median household income in the past 12 months (in 2017 inflation-adjusted dollars) MEDIAN HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2017 INFLATION-ADJUSTED DOLLARS) int B19013 0 TRUE
B19013_001EA Annotation of Estimate!!Median household income in the past 12 months (in 2017 inflation-adjusted dollars) NA string B19013 0 TRUE
B19013_001MA Annotation of Margin of Error!!Median household income in the past 12 months (in 2017 inflation-adjusted dollars) NA string B19013 0 TRUE
acs_income_group <- getCensus(name = "acs/acs5", 
    vintage = 2017, 
    vars = c("NAME", "group(B19013)"), 
    region = "tract:*", 
    regionin = "state:02")
#> Warning in responseFormat(raw): NAs introduced by coercion
head(acs_income_group)
state county tract NAME GEO_ID B19013_001E B19013_001M NAME_1 B19013_001EA B19013_001MA
02 261 000300 Census Tract 3, Valdez-Cordova Census Area, Alaska 1400000US02261000300 89000 20435 NA NA NA
02 122 000600 Census Tract 6, Kenai Peninsula Borough, Alaska 1400000US02122000600 58125 5725 NA NA NA
02 122 001100 Census Tract 11, Kenai Peninsula Borough, Alaska 1400000US02122001100 69028 5941 NA NA NA
02 261 000100 Census Tract 1, Valdez-Cordova Census Area, Alaska 1400000US02261000100 49076 7165 NA NA NA
02 122 000200 Census Tract 2, Kenai Peninsula Borough, Alaska 1400000US02122000200 57694 6526 NA NA NA
02 122 000800 Census Tract 8, Kenai Peninsula Borough, Alaska 1400000US02122000800 50904 3723 NA NA NA

Some variable groups contain many related variables and their associated annotations. As an example, we’ll get the list of variables included in group B17020, poverty status by age.

group_B17020 <- listCensusMetadata(name = "acs/acs5",
    vintage = 2017,
    type = "variables",
    group = "B17020")
head(group_B17020)
name label concept predicateType group limit predicateOnly
B17020_002M Margin of Error!!Total!!Income in the past 12 months below poverty level POVERTY STATUS IN THE PAST 12 MONTHS BY AGE int B17020 0 TRUE
B17020_002E Estimate!!Total!!Income in the past 12 months below poverty level POVERTY STATUS IN THE PAST 12 MONTHS BY AGE int B17020 0 TRUE
B17020_001M Margin of Error!!Total POVERTY STATUS IN THE PAST 12 MONTHS BY AGE int B17020 0 TRUE
B17020_001E Estimate!!Total POVERTY STATUS IN THE PAST 12 MONTHS BY AGE int B17020 0 TRUE
B17020_004M Margin of Error!!Total!!Income in the past 12 months below poverty level!!6 to 11 years POVERTY STATUS IN THE PAST 12 MONTHS BY AGE int B17020 0 TRUE
B17020_004E Estimate!!Total!!Income in the past 12 months below poverty level!!6 to 11 years POVERTY STATUS IN THE PAST 12 MONTHS BY AGE int B17020 0 TRUE

Advanced geographies

Some geographies, particularly Census tracts and blocks, need to be specified within larger geographies like states and counties. This varies by API endpoint, so make sure to read the documentation for your specific API and run listCensusMetadata to see the available geographies.

You may want to get get data for many geographies that require a parent geography. For example, tract-level data from the 1990 Decennial Census can only be requested from one state at a time.

In this example, we use the built in fips list of state FIPS codes to request tract-level data from each state and join into a single data frame.

fips
#>  [1] "01" "02" "04" "05" "06" "08" "09" "10" "11" "12" "13" "15" "16" "17"
#> [15] "18" "19" "20" "21" "22" "23" "24" "25" "26" "27" "28" "29" "30" "31"
#> [29] "32" "33" "34" "35" "36" "37" "38" "39" "40" "41" "42" "44" "45" "46"
#> [43] "47" "48" "49" "50" "51" "53" "54" "55" "56"
tracts <- NULL
for (f in fips) {
    stateget <- paste("state:", f, sep="")
    temp <- getCensus(name = "sf3",
        vintage = 1990,
        vars = c("P0070001", "P0070002", "P114A001"),
        region = "tract:*",
        regionin = stateget)
    tracts <- rbind(tracts, temp)
}
head(tracts)
state county tract P0070001 P0070002 P114A001
01 001 020100 944 917 11663
01 001 020200 917 1060 8555
01 001 020300 1451 1518 11782
01 001 020400 2166 2223 15323
01 001 020500 1604 1582 14522
01 001 020600 1784 1661 10630

The regionin argument of getCensus can also be used with a string of nested geographies, as shown below.

The 2010 Decennial Census summary file 1 requires you to specify a state and county to retrieve block-level data. Use region to request block level data, and regionin to specify the desired state and county.

data2010 <- getCensus(name = "dec/sf1",
    vintage = 2010,
    vars = "P001001", 
    region = "block:*",
    regionin = "state:36+county:027+tract:010000")
head(data2010)
state county tract block P001001
36 027 010000 1000 31
36 027 010000 1011 17
36 027 010000 1028 41
36 027 010000 1001 0
36 027 010000 1031 0
36 027 010000 1002 4

Troubleshooting

The APIs contain hundreds of API endpoints and dozens of datasets, each of which work a little differently. The Census Bureau also makes frequent updates, which unfortunately are not always announced in advance. If you’re getting an error message or unexpected results, here are some things to check.

Variables

Use listCensusMetadata(type = "variables") on your API to see the table of available variables.

  • Occasionally the variable names will change with data updates or API updates. The names may be different from year to year.
  • The Census APIs are case-sensitive, which means that if the variable name you want is uppercase you’ll need to write it uppercase in your request. Most of the APIs use uppercase variable names, but some use lowercase and some even use sentence case.

Geographies

Use listCensusMetadata(type = "geographies") on your dataset to check which geographies you can use.

  • Each API has its own list of valid geographies and they occasionally change as the Census Bureau makes updates. If a previously available geography isn’t available anymore, email cnmp.developers.list@census.gov detailing the issue.
  • If you’re specifying a region by FIPS code, for example state:01, make sure to use the full code, padded with 0s if necessary. The APIs did not always enforce this (previously, state:1 usually worked), but now they do. See the Census reference files for valid FIPS codes.

Unexpected errors

Occasionally you might get the general error message "There was an error while running your query. We've logged the error and we'll correct it ASAP. Sorry for the inconvenience." This comes from the Census Bureau and could be caused by any number of problems, including server issues. Try rerunning your API call. If that doesn’t work and you are requesting a large amount of data, try reducing the amount that you’re requesting, for example getting only one state at a time. If you’re still having trouble, email cnmp.developers.list@census.gov. Include in your email the raw API call that’s provided in your getCensus error message (not your R code) so that they can try to help.

Other ways to get help

Additional resources

Disclaimer

This product uses the Census Bureau Data API but is not endorsed or certified by the Census Bureau.