3rd November, 2023

So further since my data was not able to be finalized for the plotting of the data of the cities and states with dense populations, I worked in a different direction to see that if once my data is actually matched I can make it so that I can plot the cities and then look at the heat maps of the population densities and accordingly work with that to see if the clusters and high population areas show any relation or closeness.

So to figure out clustering we went with K means one to start with and I picked California as the example as the same in class because it’s one of the few states where the data is seemingly isolated from other states and it has enough to seemily form some sort of legible clusters.

K means clustering with K=4 for the data of california

As we can see from the clustering, the clustering alone does not make a lot of sense to us and we can’t tell if this clustering is what we even need unless we have some other data to compare it to and make it more useful than just clusters on it’s own.

This is where either population density data or police station data can be input.

import pandas as pd
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
import geopandas as gpd
import cartopy.crs as ccrs
import cartopy.feature as cfeature
# Specify the full path to your Excel file using a raw string
excel_file_path = r’C:\Users\91766\Desktop\fatal-police-shootings-data.xlsx’
# Read data from Excel file
df = pd.read_excel(excel_file_path)
# Filter shootings in the state of California and remove rows with missing latitudes or longitudes
df_ca = df[(df[‘state’] == ‘CA’) & (df[‘latitude’].notna()) & (df[‘longitude’].notna())]
# Extract latitude and longitude columns
coordinates = df_ca[[‘latitude’, ‘longitude’]]
# Perform K-means clustering with K = 4
kmeans = KMeans(n_clusters=4, random_state=42)
df_ca[‘cluster’] = kmeans.fit_predict(coordinates)
# Create a map of California using Cartopy
fig, ax = plt.subplots(subplot_kw={‘projection’: ccrs.PlateCarree()}, figsize=(12, 9))
ax.set_extent([-125, -113, 32, 37])  # California bounding box
# Plotting the clustered coordinates
for cluster in range(4):
    cluster_data = df_ca[df_ca[‘cluster’] == cluster]
    latitudes = cluster_data[‘latitude’].tolist()
    longitudes = cluster_data[‘longitude’].tolist()
    ax.scatter(longitudes, latitudes, label=f’Cluster {cluster + 1}’, s=20)
# Add map features
ax.coastlines(resolution=’10m’, color=’black’, linewidth=1)
ax.add_feature(cfeature.BORDERS, linestyle=’:’)
ax.legend()
# Draw state lines
ax.add_feature(cfeature.STATES, linestyle=’-‘, edgecolor=’black’)
# Show the plot
plt.title(‘K-means Clustering of Fatal Police Shootings in California (K=4)’)
plt.show()

Leave a Reply

Your email address will not be published. Required fields are marked *