30th October, 2023 – gautammarathe

Continuing from last time, my major ideas were to see if we could find any correlation between the density of shootings by having a related plot of population density with it.

This would help us identify more easily if more shootings have been taking place in more populated areas and in a way we could be able to work out if the frequency of shootings is a function of the population of the people.

Trying to code this has some challenges that I have been facing because I can’t seem to really actualize the data I’m looking for in terms of population densities hence it has been a challenge.

A lot of popular python libraries contain data regarding county level population, but I can’t plot data at a county level when looking at a state level. A lot of population data online however does not have coordinates but rather names for the cities which has been another challenge.

Trying to figure the two out for now and coding the lookups required to match the data do not seem to be working as of right now, but I am working on fixing it because I do find it an interesting direction to be going in.

Furthermore I would maybe like to have police station coordinate data plotted to see if there is any pattern of data there in terms of the distances from police stations.

The code below is not yet fully working but It’s a reference point for the work later

import pandas as pd

import matplotlib.pyplot as plt

import geopandas as gpd

import cartopy.crs as ccrs

import cartopy.feature as cfeature

from sklearn.cluster import KMeans

import us

# Specify the full path to your Excel file using a raw string

excel_file_path = r’C:\Users\91766\Desktop\fatal-police-shootings-data.xlsx’

# Read data from Excel file

df = pd.read_excel(excel_file_path)

# Extract latitude and longitude columns

latitude_column = ‘latitude’ # Replace with your actual column name

longitude_column = ‘longitude’ # Replace with your actual column name

latitudes = df[latitude_column].tolist()

longitudes = df[longitude_column].tolist()

# Perform K-means clustering

kmeans = KMeans(n_clusters=5, random_state=42)

df[‘cluster’] = kmeans.fit_predict(df[[latitude_column, longitude_column]])

# Create a map of the USA using Cartopy

fig, ax = plt.subplots(subplot_kw={‘projection’: ccrs.PlateCarree()}, figsize=(12, 9))

ax.set_extent([-125, -66, 24, 49]) # USA bounding box

# Plotting the coordinates with cluster colors

scatter = ax.scatter(df[longitude_column], df[latitude_column], s=10, c=df[‘cluster’], cmap=’viridis’, marker=’o’, alpha=0.7, edgecolor=’k’, transform=ccrs.Geodetic())

# Add colorbar

cbar = plt.colorbar(scatter, ax=ax, orientation=’vertical’, fraction=0.03, pad=0.05)

cbar.set_label(‘Cluster’)

# Add map features

ax.coastlines(resolution=’10m’, color=’black’, linewidth=1)

ax.add_feature(cfeature.BORDERS, linestyle=’:’)

# Get and plot capital cities using the us library

for state in us.STATES:

capital = us.states.lookup(state.capital)

ax.text(capital.longitude, capital.latitude, state.capital, transform=ccrs.PlateCarree(), fontsize=8, ha=’right’, va=’bottom’, color=’blue’)

# Draw state lines

ax.add_feature(cfeature.STATES, linestyle=’-‘, edgecolor=’black’)

# Show the plot

plt.title(‘K-Means Clustering on the Map of the USA with State Capitals Highlighted’)

plt.show()

Leave a Reply Cancel reply