6th November, 2023

Continuing from last time we try to do the same comparison but across multiple different types of clustering.

The point of this exercise is just more or less compare how the different clusters look but it’s not strictly useful unless we have any other data for comparison that we are looking a.

K = 4, K means for california

 

K = 3 K Means clustering for California

 

import pandas as pd
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
import geopandas as gpd
import cartopy.crs as ccrs
import cartopy.feature as cfeature
# Specify the full path to your Excel file using a raw string
excel_file_path = r’C:\Users\91766\Desktop\fatal-police-shootings-data.xlsx’
# Read data from Excel file
df = pd.read_excel(excel_file_path)
# Filter shootings in the state of California and remove rows with missing latitudes or longitudes
df_ca = df[(df[‘state’] == ‘CA’) & (df[‘latitude’].notna()) & (df[‘longitude’].notna())]
# Extract latitude and longitude columns
coordinates = df_ca[[‘latitude’, ‘longitude’]]
# Perform K-means clustering with K = 4
kmeans = KMeans(n_clusters=3, random_state=42)
df_ca[‘cluster’] = kmeans.fit_predict(coordinates)
# Create a map of California using Cartopy
fig, ax = plt.subplots(subplot_kw={‘projection’: ccrs.PlateCarree()}, figsize=(100, 28))
ax.set_extent([-130, -106, 20, 50])  # California bounding box
# Plotting the clustered coordinates
for cluster in range(4):
    cluster_data = df_ca[df_ca[‘cluster’] == cluster]
    latitudes = cluster_data[‘latitude’].tolist()
    longitudes = cluster_data[‘longitude’].tolist()
    ax.scatter(longitudes, latitudes, label=f’Cluster {cluster + 1}’, s=20)
# Add map features
ax.coastlines(resolution=’10m’, color=’black’, linewidth=1)
ax.add_feature(cfeature.BORDERS, linestyle=’:’)
ax.legend()
# Draw state lines
ax.add_feature(cfeature.STATES, linestyle=’-‘, edgecolor=’black’)
# Show the plot
plt.title(‘K-means Clustering of Fatal Police Shootings in California (K=4)’)
plt.show()

Leave a Reply

Your email address will not be published. Required fields are marked *