So to start with the analysis process for the project 3 we are looking at Public Boston data which is the analysis of the various economic indicators for the city of boston, collected over months at a time for 7 years from 2013 to 2020.
We initially wanted to check what was possible in terms of making it possible to look at the data we wanted to see in terms of what we were actually really looking at because it was not going to be possible to look at nearly 8 different variables and looking at the type of input that would be shown by each.
Also we wanted to make sure that we kept the interaction to a minimum to start off with, which is basically individiually explore each parameter. What is happening in this data is that by month wise it does give us the information for a variety of different things for the city of boston that help indicate it’s major economic and social health.
After looking at the entire data set we determined that there are not a lot of data points that we can use as it is a relatively small data set with only about 200 entries and nearly half of them are empty for a quite a few of the variables that have been described in the excel.
Hence it would not make a lot of sense to try and fit statistical learning methods on this data to try and fit or predict any sort of model. However we will so far just look at the descriptive statistics of each of the columns to see if we can find anything worthwhile to look at.
Clearly plotting the distributions for a lot of this data would not show much except that there is a tendency around certain time periods and regions since a lot of this data is mostly indexed by time.