27th September, 2023

So working on just fine tuning my model a bit further I was looking at what terms I need to keep in my model for the most significance and remove those that do not affect the relation but increase complexity. Here I found that for modelling diabetes on inactivity and obesity, we are better off with their 2nd power rather than any higher order polynomials.

A look at P values for all the terms of the model

As you can see only the 2nd power ones are of significance to us and we reduce the model down to just those and we see that the correlation has not changed below.

Fit with removed linear terms
Fit with removed linear terms

Finally I also did a K fold cross validation test to check for the test error of the model and compare it to that of the linear model.

K fold Cross Validation
K fold Cross Validation

As you can see, even though it’s minimal there is a slight edge to the non linear model.

Leave a Reply

Your email address will not be published. Required fields are marked *