So working on just fine tuning my model a bit further I was looking at what terms I need to keep in my model for the most significance and remove those that do not affect the relation but increase complexity. Here I found that for modelling diabetes on inactivity and obesity, we are better off with their 2nd power rather than any higher order polynomials.

As you can see only the 2nd power ones are of significance to us and we reduce the model down to just those and we see that the correlation has not changed below.

Finally I also did a K fold cross validation test to check for the test error of the model and compare it to that of the linear model.

As you can see, even though it’s minimal there is a slight edge to the non linear model.