Wednesday, September 25, 2019

Downloading Census Micro Data: IPUMS or Census.gov

My current book project reviews studies that use the American Community Survey (ACS) micro data in a way that sheds relevant light on contemporary social controversies, illustrates best statistical practices, or both.

Most of the studies I include in the book use data from a database at the University of Minnesota called "IPUMS" (which stands for The Integrated Public Use Microdata Series.) It is also possible, at least for some years, to directly download the ACS data from the web page. Which option is preferable?  In this post I provide some answers to this question, as well as share a data file illustrating all the variables available by directly downloading the ACS data from the webpage.

IPUMS is the favored source of Census micro data for most researchers because it is much more user friendly. An IPUMS user simply has to select variables and years and download a file, whereas a user will at a minimum have to download two files and merge them together, and if he wanted data from all states would have to also append two files data (the Census breaks up nationwide ACS files into two due to their large size.)

Because of this, I always get the data from IPUMS, but it can be useful to see the variables produced by the Census Bureau. Here I provide links to various files that I have used in my teaching. First is a data file that I downloaded from and which contains all of the person and household variables available in the 2015 ACS for one area in California (known as "Public-Use Microdata Area, or PUMA, 068511, an area near San Jose, CA.) I also include the Codebook here which describes the meaning of the coding of each of the variables in the XLSX file. Finally, it can be helpful to consult the Questionnaire to see precisely how the survey questions were phrased. Studying these files and documents, a researcher interested in using the ACS would see all variables contained in the survey.
Keep in mind IPUMS renames variables and sometimes variable coding. IPUMS also adds variables (such as DENSITY and an indicator for same-sex married couples.) Thus there are big advantages to using IPUMS. I have heard from researchers using the Current Population Survey (CPS) that some variables are not available from IPUMS and only available by direclty downloading the data from (an example was the occupational licensing variable) but most of the time, IPUMS is the preferable source.

