So further since my data was not able to be finalized for the plotting of the data of the cities and states with dense populations, I worked in a different direction to see that if once my data is actually matched I can make it so that I can plot the cities and then look at the heat maps of the population densities and accordingly work with that to see if the clusters and high population areas show any relation or closeness.
So to figure out clustering we went with K means one to start with and I picked California as the example as the same in class because it’s one of the few states where the data is seemingly isolated from other states and it has enough to seemily form some sort of legible clusters.
K means clustering with K=4 for the data of california
As we can see from the clustering, the clustering alone does not make a lot of sense to us and we can’t tell if this clustering is what we even need unless we have some other data to compare it to and make it more useful than just clusters on it’s own.
This is where either population density data or police station data can be input.
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
import geopandas as gpd
import cartopy.crs as ccrs
import cartopy.feature as cfeature
# Specify the full path to your Excel file using a raw string