Data and Code

W-NOMINATE and the Constitutional Convention of 1787

In my paper “Ideology and Participation at the Constitutional Convention of 1787,” I talk about the replication of W-NOMINATE scores for delegates at the Federal Convention of 1787.  Delegate W-NOMINATE scores were originally computed  by Jac C. Heckelman and Keith Dougherty in “A Spatial Analysis of Delegate Voting at the Constitutional Convention” (2013)  based on work outlined here, here, and here using the W-NOMINATE Program developed by Keith Poole and Howard Rosenthal (for more on W-NOMINATE, see The W-NOMINATE Program).   However, Heckelman and Dougherty do not provide replication code and NOMINATE is particular to implement.  I have provided a walk-through on how to compute the delegate W-NOMINATE scores which is available for download here.

The U.S. Appeals Courts Database

In “The Beliefs and Behaviors of Appellate Court Judges,” Doug M. Johnson and I use the Songer U.S. Appeals Courts Database to generate judge-case observations for our analysis. “Creating Judge-Case Observations from the Songer Court of Appeals Database” details how this is done.  “CoAJudgeCaseCreation.R” provides a R file to execute the data transformation.

Miscellaneous Work

Recently I took a quick break from working on my research to play around with some fun data. I wanted to look at the relationship between 311 calls in the City of New York, the weather, and economic performance. The purpose of the project is to see if 311 call volume can be predicted. If call volume and complaints can be accurately predicted this would have several different potential applications. The most immediate application is for the City itself. If the City knows to expect a rush of calls or even better, a specific type of call, they can better staff their call center and prepare response teams. Additionally, private companies might also be able to improve their targeted marketing. For example, I found that rodent complaints spike at regular intervals (see the plot titled “311 Complaint Types Across 2013 at here). By finding what might predict these spikes, companies that provide extermination services, or companies that offer products to address rodent problems, can increase their marketing efforts precisely when demand is higher, optimizing their marketing budgets. It might also be a time when pet adoption agencies should be increasing their efforts as more people might be interested in adopting cats out of shelters to deal with mice and rats.

I gathered data from three sources. First are all 311 calls in New York City in 2013 available for download at NYC OpenData ( specifically here). This is a 124 MB .csv file. From looking through the 311 data, I noticed that the most common call type was regarding heating. Thus I thought it important to collect weather data. I downloaded daily weather reports for New York City from 2013 via NOAA’s National Centers for Environmental Information  which provided a .54MB .csv file. Finally, I thought that economic performance might influence how people feel about their current situation. If the economy is doing well, perhaps they will be slightly less cranky or as easily annoyed and therefore be less likely to call 311. To capture the economic conditions, I make use of daily information regarding the performance of the Dow Jones Industrial Average (DJIA). To get this data, I use Quandl’s native R package and use the data described here. To visually asses the three data sources, please see my figure here.

Once I had all of my data, I subset my data to only Manhattan based 311 calls. I then created daily counts of 311 calls to use as my dependent variable. I then created a simple regression model using the change in the day’s DJIA, the day’s DJIA volume, the minimum observed temperature, and the amount of precipitation. While I found no effect for change in DJIA or DJIA volume, I did find that lower temperatures significantly increased the number of 311 calls. Serial auto-correlation is of course an issue in these time series data, and I checked the robustness of my results appropriately. While I found no effect for the economic indicator I used, I believe that project is still worth pursuing for the following reasons. First, I did find a significant predictor of 311 call volume. Second, I did not disaggregate complaint types in my analysis; heating complaints are the single most common type of 311 complaint and it’s likely that my model results are being driven by this result. More nuanced investigation might reveal that different classes of complaints are predicted by different identifiable events and circumstances.

R for this project can be found, along with code for the other projects mentioned here, on my GitHub at github.com/DavidAGelman.