Saturday, March 4, 2017

Using a spreadsheet to calculate the distance between two points

This article describes how to use my spreadsheet tool to calculate the distance between two points.



We will use data from the US Census Bureau's 2010 Gazetteer files.  To follow allow with this demonstration, click this link and then click Census Tracts (the fifth option) and then select California.  (Or just click this link to be taken directly to the California data.)  Save the file to your hard drive.  (You might have to right click on the page and select "save as".)  Then open up MS Excel and select "Open" (or CNTL+O) and open this file.  (You may have to change the selection from "All Excel Files" to "All Files" to view the TXT file you saved.)

After you select the census_tracts_list_06.txt file, the Excel Import Wizard will open.  Select "Delimited" at step 1, "Tab" at step 2, and then "Finish". You will now see something like this:



Each row of this file contains data from one California Census tract.  For this example we will only be using data from San Francisco and you can download the Gazetteer data for these tracts here

Column C contains population, column G contains land area, and columns I and J contains the latitude and longitude for the point near the center of the tract.

San Francisco city hall is located at the following geocode: 37.779260 -122.419265.  You can verify this by typing San Francisco city hall into the Google Earth search bar, and noting the geocode that it returns after centering the map on the point.

We will calculate the distance from each SF Census Tract to City Hall.  To do this, download my "Distance Calculator" by clicking here.  (This distance calculator contains some instructions and references in the notes calculator, if you are interested to learn how it works.)

To use the calculator, replace the data in columns D and E of the calculator with the geocodes from the Gazeetter file (columns I and J).  Then replace the msa_center lat and lon (columns G and H of the distance calculator) with the geocode for SF city hall, as shown below.  You can ignore columns F, I, J, K, L and M.  The distance between the two points is shown in the last column N.  The last step you need to do is apply the calculations to all the tracts (initially it calculates them only for the first dozen or so.)  To do this, simply left-click on the little black square on the bottom right corner of the highlighted cells, and drag it down until you reach the last cell of data.


 You can now copy the distance from the calculator and paste it into the SF_tracts file.  (Make sure when you paste you right click and select paste special then choose values.)  This will look as shown below:



Note I have also added a population density variable in column L, which is produced by dividing the population by land area (the syntax for this calculation is shown in the formula bar of the image above.)

Now that we've added distance from each tract to city hall, and we have calculated pop density for each tract, we can create a scatter plot, showing density on the y-axis and distance on the x-axis.  Note there is a negative relationship between these variables as indicated by the negative slope on the trend line.  The tracts with the highest population density are near city hall, and the density tends to decline as tracts are farther away.  This is a common pattern we observe across city, but urban economists are interested in comparing how steep these "density gradients" are across cities.  Often density gradients are estimated by first taking the natural log of density and distance, and regressing log density on log distance.



Click here to read an article I wrote on determining the central point of U.S. cities.  In that article I provide a link to a data file containing geocodes for the central city of each US metropolitan area.

No comments:

Post a Comment