Friday, July 5, 2019

Reproducing Costa and Kahn (2011) in R

I just produced this R script that reproduces the results presented in the 2011 paper, "Electricity Consumption and Durable Housing: Understanding Cohort Effects" by Dora Costa and Matt Kahn.



The authors did their analysis in Stata but I wanted to reproduce it using R. My R script uses the data supplied by the authors (which is why I'm calling it a reproduction not a replication) available at the journal webpage here.

A few of the R programming techniques that are illustrated in my R script are:
  • opening Stata data files in R
  • merging data from two sources
  • recoding variables
  • estimating clustered standard errors for individual-level cross-sectional data 
On this last point I found the web page by Matthieu Gomez on R for Stata Users to be helpful in finding an R package (lfe) with a command felm that enables estimating clustered standard errors in the same way as the areg command in Stata, which is what Costa and Kahn used.

I did also get the code (the Stata do file) supplied by the authors to run after a couple minor modifications, and it produces the same results as in the paper, as does my reproduction in R.  The only difference between their analysis in Stata and my analysis in R is that, for some reason I couldn't pinpoint, Stata drops two observations when using the weighting option, while R does not. This is a trivial differences as N=139,343 in Stata and N=139,345 in R, but it does illustrate one way an analyst's choice of statistical package could in principle affect the results.








No comments:

Post a Comment