"THE LUNAR EFFECT" AND ITS CONTRIBUTION TOÂ CRIME IN CHICAGO
Northwestern: EECS 349 Machine Learning

BACKGROUND
Goal - Use Machine Learning to Better Predict Crime
Everyone has intuition about crime. 'It happens at night.' 'Alleys are dangerous'. 'Homicide is rampant in Chicago.' Ian and Ben decided to separate fact from fiction and dig into crime data right here in Chicago. The goal of our project was to see if we could use past crime data to predict future crimes.

THE "LUNAR EFFECT"
According to an article titled "Full moon and crime" published in the British Medical Journal (1984) "incidences of crimes committed on full moon days was much higher than on all other days." This report was based on crime data collected from three different police stations in India over 4 years. Even more interesting is that this same phenomena is widely believed by ER doctors and policemen alike here in the US and often to referred to as the "Lunar Effect".
THE DATASET
Sources of information and preprocessing
The majority of the data used for analysis was taken directly from the City of Chicago website. In 2012 Mayor Rahm Emanuel issued the "Open Data Executive Order" which ensured the creation of an open online data portal - https://data.cityofchicago.org/. This data portal contains millions of reported crimes going back to 2001*. It is important to note that some of the details of the crimes are hidden (identities, age, gender, race, etc). Also, some of the data is intentionally altered. For example the GPS location of the crimes is always changed, but still remains on the same city block. There's no telling how this alternation is done or how random it is.
We programmatically added an additional parameter related to the moon: "full moon" or "not full moon" to determine whether it increased prediction accuracy. The code we used to compute the lunar phase was based on
open-source code by John Walker (http://www.fourmilab.ch/).
​
Overall, the parameters we used in the analysis are as follows:
-Day of Week: Sunday, Monday, Tuesday, Wednesday, Thursday, Friday, Saturday
-Time of Day: Morning, Afternoon, Evening, and Night
-Location Description: Street, School, etc
-Latitude
-Longitude
-Moon: Full Moon or Not Full Moon
CLASSIFICATION-Illinois Uniform Crime Reporting (IUCR) code: 401 unique codes describing the type of crime such as theft < $500 or homicide - first degree murder
*For our analysis we only used 2017 data since we found the rise and fall of crime follows an annual cadence and providing any more data to Weka resulted in issues completing program execution.
ANALYSIS METHODS
We tested the predictive capabilities of seven different classifiers on the same data set and reported the 10-fold cross-validation accuracy below. The type of crime was the classification we were trying to predict. We used all the crime data from 2017 (~10,000 examples) to train the models and then tested the predictive accuracy using 2018 crime data (as recent as June 3, 2018).
Additionally, we tried grouping all the less common classifications together by replacing them with an "other" category in an attempt to boost predictive accuracy, but we found that the base case (ZeroR) would always beat all other prediction methods by using "other" as its prediction. So ultimately we returned to using the full set of crime classifiers recognizing that we may never get an extremely predictive classifier.
​
Once we found that the decision tree method was the most accurate, we rebuilt this particular model using data that included the "full moon" or "not full moon" parameter to see if it could increase the prediction accuracy. Contrary to the referenced article, we found that crime was no more predictable knowing the phase of the moon.


CONCLUSION
Wrapping Everything Up
In conclusion, crime prediction is a difficult task. Even if we were able to get our crime prediction classifier up to 99.99%, we'd still only be able to guess the crime that happened given that we know a crime had happened. In other words, since the data we trained on was always a type of crime, the results we predict must also be a type of crime. A more sophisticated classifier would be able to take in the time, day of the week, etc and predict "no crime".
Furthermore, despite the analysis showing that there are no predictive gains from knowing it's a full moon doesn't necessarily disprove the relationship. There are confounding variables to consider which were not obtainable for this project. For example, the number of police officers on patrol during full moons may be more than the rest of the days and thus fewer individuals would be willing to commit a crime.
