Hi, so continuing from last time, I was having some difficulties trying to code a monte carlo simulation for the test between black ages and white ages. This was to be done because it is not advised to do multiple T tests, this is just going to increase the error.
The problem with running a monte carlo simulation so far had been that it was tricky to control the test and control groups, as well as the sampling and this was leading to a lot of increased errors while running the code.
The errors were ocurring more often than not because of the issues of grouping and scaling for the plotting the histogram. While I continue to work on the plotting of the histogram, I will do a brief overview of what my code is trying to achieve with.
It starts with taking 3 data sets, we will be using them 2 at a time for the most part and keeping the white ages or the ‘wage’ group as the control group for our hypotheses testing and while this is the case for most tests, we are only doing this as doing a lot of pariwise T tests is not advises due to the errors.
The solution to that is to take our control group and test group, draw a fixed size of samples from each of the groups with replacement and thus adding a nature of randomness to the test. We use this to calculate the difference in the means of the two groups and we store them.
This is done using the for loop for a huge number of iterations so that we have enough data to be able to plot a histogram for the frequency of the value, that is the difference in the means of the values for the two data sets that we have been using. The point of the test is to determine if on an average, over a large number of simulations that are completely random, if we see a recurring pattern that is, there is actually a diffference in the means, and it is the result for the monte carlo simulation as well.
This would be a good marker to represent something more substantial and visaully easy to grasp when it comes to defining if something is ocurring by complete chance or there is something causing it. So far we would just like to understand and reject the Null Hypotheses, that the difference in means for the age groups is by chance.