ZOË Y. VALLADARES

UD Crime AI - HenHacks 2024

3/2/2024

48 hour project. Generative Machine Learning model trained on UD Police Daily Statistics from 2017-2021. When provided a LOCATION, DATE, & TIME it generates a prediction of what crime description may be committed.

Web Scraping Data:

First, the data needed to be gathered by the UD Police Statistics website. It was not easily available to download and was separated by day. A simple Python script got all the data into a CSV file.

Cleaning Data:

Although the data seemed usable it required some standardizing for an ML model.

  • 1) The entries contained human error and typos so there was a lot of duplicate data with incorrect spelling or formatting. For example: Trabant Student Center & Trabant Building
  • 2) The dates needed to be separated into DAY, MONTH, YEAR
  • 3) The times needed to be standardized to military time and : removed

Training the Model:

  • 1) Using sklearn the model was trained on a DecisionTreeClassifier()
  • 2) The data was split into training and testing groups. The test size was 20% of the data.
  • 3) All the data was encoded so that they were numerical values because ML models are essentially mathematical models.
  • 4) The range of accuracy (since March 2024) is 20% - 30%
software engineeringmachine learningpythonsklearnjupyter notebookudel