6th December, 2023 – gautammarathe

So looking at the time series data we had to start the analysis in a way that would enable us to look at trends. Since the data that we have is present in the a granular level, we cannot use it directly. We need to apply some level of transformations for the data to be accessible for us initially.

This can be done in the form of using the existing time data and we can use it to a perform a group by on the data level of the required granularity that we need and hence we can check for trends for the respective granularity level wether that is daily, weekly or monthly.

We start of by doing a monthly, 6monthly, and yearly, split. We also use grouped by day just to check out the levelof granularity plotting that is evident at the daily stage.

We used the following code below to make sure we accurately split our plotting across the diffferent levels of granularity to plot the time series model.

import pandas as pd

import os

# Set the directory where your CSV files are located

directory_path = r”C:\Users\91766\Desktop\stats3″

# Get a list of all files in the directory starting with “tpm” and ending with “.csv”

csv_files = [file for file in os.listdir(directory_path) if file.startswith(‘tmp’) and file.endswith(‘.csv’)]

# Initialize an empty DataFrame to store the concatenated data

concatenated_df = pd.DataFrame()

# Loop through each CSV file and concatenate its data to the main DataFrame

for file in csv_files:

file_path = os.path.join(directory_path, file)

df = pd.read_csv(file_path)

df[[‘date’, ‘time’]] = df[‘OCCURRED_ON_DATE’].str.split(‘ ‘, expand=True)

df[‘date’] = pd.to_datetime(df[‘date’])

concatenated_df = pd.concat([concatenated_df, df], ignore_index=True)

# Display the concatenated DataFrame

#print(concatenated_df)

#print(df[‘date’])

# Save the concatenated DataFrame to an Excel file

#output_excel_path = r”C:\Users\91766\Desktop\stats3\concatenated_data.csv”

#concatenated_df.to_csv(output_excel_path, index=False)

#print(f”Concatenated data saved to {output_excel_path}”)

grouped_by_day = concatenated_df.groupby(pd.to_datetime(concatenated_df[‘date’]).dt.date).size().reset_index(name=’count’)

grouped_by_day.to_excel(r”C:\Users\91766\Desktop\stats3\grouped_by_day.xlsx”, index=False)

# Output 2: Group by month

grouped_by_month = concatenated_df.groupby(pd.to_datetime(concatenated_df[‘date’]).dt.to_period(‘M’)).size().reset_index(name=’count’)

grouped_by_month.to_excel(r”C:\Users\91766\Desktop\stats3\grouped_by_month.xlsx”, index=False)

# Output 3: Group by 6 months

grouped_by_6_months = concatenated_df.groupby(pd.to_datetime(concatenated_df[‘date’]).dt.to_period(‘6M’)).size().reset_index(name=’count’)

grouped_by_6_months.to_excel(r”C:\Users\91766\Desktop\stats3\grouped_by_6_months.xlsx”, index=False)

grouped_by_week = concatenated_df.groupby(pd.to_datetime(concatenated_df[‘date’]).dt.to_period(‘W’)).size().reset_index(name=’count’)

grouped_by_week.to_excel(r”C:\Users\91766\Desktop\stats3\grouped_by_week.xlsx”, index=False)

print(“complete”)

Leave a Reply Cancel reply