One of the reasons that we are studying inferential analysis is prediction. In some cases, it is impossible to gather whole data to evaluate data character. For example, it is nonsense if you try to check all the products which are produced in your factory to check data quality management. Instead, we take a certain amount of sample and believe the result of evaluation as if we took all the population.
How does it possible ?
Today I will briefly introduce two major theories make above sample test reliable.
First things is in line with central limit theorem. I think I've already explained overview of central limit theorem in previous post. One thing I didn't clearly prove was average value.
In other words, as you take a mean of sample data over and over again, your sample average will be the average of population.
I am going to show you by simple test using R.
I've make a new function to get a mean from normal distribution (mean = 5, sd = 10)
genRandom <- function ( t, i) {
for (i in 1:t) {
rr[i] <- mean(rnorm(i,5,10))
}
return(rr)
}
We've already know that our population mean is 5, therefore, we can simulate the result in accordance with the test number.
> hh <- genRandom(5,5)
> mean(hh)
[1] 5.409457
> hh <- genRandom(100,5)
> mean(hh)
[1] 5.339642
> hh <- genRandom(1000,5)
> mean(hh)
[1] 4.999683
> hh <- genRandom(10000,5)
> mean(hh)
[1] 5.001229
As you keep testing over and over again, sample mean of sample distribution will be the population mean. This is very interesting thing because we don't need to gather whole data , instead ,we can predict population mean by gathering sufficient sample data to find out population mean.
I hope it makes sense to you.
댓글 없음:
댓글 쓰기