UD Crime AI - HenHacks 2024
3/2/2024
48 hour project. Generative Machine Learning model trained on UD Police Daily Statistics from 2017-2021. When provided a LOCATION, DATE, & TIME it generates a prediction of what crime description may be committed.
Web Scraping Data:
First, the data needed to be gathered by the UD Police Statistics website. It was not easily available to download and was separated by day. A simple Python script got all the data into a CSV file.
Cleaning Data:
Although the data seemed usable it required some standardizing for an ML model.
- 1) The entries contained human error and typos so there was a lot of duplicate data with incorrect spelling or formatting. For example: Trabant Student Center & Trabant Building
- 2) The dates needed to be separated into DAY, MONTH, YEAR
- 3) The times needed to be standardized to military time and : removed
Training the Model:
- 1) Using sklearn the model was trained on a DecisionTreeClassifier()
- 2) The data was split into training and testing groups. The test size was 20% of the data.
- 3) All the data was encoded so that they were numerical values because ML models are essentially mathematical models.
- 4) The range of accuracy (since March 2024) is 20% - 30%
software engineeringmachine learningpythonsklearnjupyter notebookudel