2013년 2월 12일 화요일

Central Limit Theorem - No15

I think, central limit theorem is one of the most important theories to understand inferential statistics. Without understanding this, we can't go further for advanced
statistics analysis.

Anyway, I begin to explore the central limit theorem by defining the exact meaning.
There are so many articles to explain this but I chose this excerpt because it is the most appropriate definition for central limit theorem.
The central limit theorem states that, given certain conditions, the mean of a sufficiently large number of independent random variables, each with finite mean and variance, will be approximately normally distributed.  

Before moving on to the next step, it is necessary to pay attention to the important message. In order to guarantee good normal distribution, we have to apply sufficient large number. I ll show you this intuitively using R function.

First, I've prepared a simple function which generates mean from random variables between 1 to 100 s (parent distribution => uniform probability distribution)
And then I'll keep doing this over and over again.





genRandom <- function ( j, k  ) {
rr <- 1
for (i in 1:k)  {
rr[i] <- mean(sample(0:100, j, replace=T))
}
return(rr)
}

I can specify the sample size so we can see the difference in accordance with the sample size.

1) Sample number 3 , repeatedly calculate the mean of sample number for 10.000 times
2) Sample number 30 , repeatedly calculate the mean of sample number for 10.000 times
3) Sample number 100 , repeatedly calculate the mean of sample number for 10.000 times
4) Sample number 250 , repeatedly calculate the mean of sample number for 10.000 times


> hist(SampleMean)
> SampleMean <- genRandom(3,  10000)
> hist(SampleMean, main= 'N3')
> SampleMean <- genRandom(30,  10000)
> hist(SampleMean, main= 'N30')
> SampleMean <- genRandom(100,  10000)
> hist(SampleMean, main= 'N100')
> SampleMean <- genRandom(250,  10000)
> hist(SampleMean, main= 'N250')



                                                

As you can see above. As my sample size got larger, it's a better fit for normal distribution. (I ll talk about normal distribution more next post)

Interesting thing is our parent distribution is just uniform probability distribution rather than the normal distribution. In other words, we don't need to care about our parent distribution, just take a sample from any distribution and then average them. If I keep doing this over and over again, the result will be approximately normal distribution.

We called this distribution is "Sample distribution of sample mean"
Another interesting thing is mean of "Sample distribution of sample mean" will be approximately a same mean as your original distribution.

It is quite interesting, because we might predict our population mean from sample distribution of sample mean. this is the reason that normal distribution is essential of inferential statistics.

댓글 없음:

댓글 쓰기