2013년 2월 28일 목요일

Inferential analysis basic-2 - No18

I think,  another fundamental concept of inferential analysis is standard error of the mean.
In order to explain this, I am going use same function I used before.

As you can see the  below, the larger your sample number , the smaller  standard deviation. In other words, the shape of your sample distribution of sample mean will become more normal distribution.





 genRandom <- function ( t, j ) {
 rr <- 1
 for (i in 1:t)  {
 rr[i] <- mean(rnorm(j,5,10))
 }
 return(rr)
 }



> hh <- genRandom(10000,10)
> hist(hh, main="Sample size 10" , xlim=c(-10,10))
> hh <- genRandom(10000,30)
> hist(hh, main="Sample size 30" , xlim=c(-10,10))
> hh <- genRandom(10000,100)
> hist(hh, main="Sample size 100" , xlim=c(-10,10))
> hh <- genRandom(10000,200)
> hist(hh, main="Sample size 200" , xlim=c(-10,10))

By analyzing above experiments, We can figure out that the standard deviation of sample mean is inversely proportion to the sample size. The scientist has deduced a famous formula from these experiments. The standard error of the mean is a formula of standard deviation of the sample mean.

The standard deviation of sample mean can be calculated by standard deviation of the population divided by squared root of sample size N.


SD_\bar{x}\ = \frac{\sigma}{\sqrt{n}}




I think, the easiest way to prove this formula is seeing this phenomenon with our eyes.
I am going to use above function one more time. As you can see we've already knew that  population standard deviation because we are going to randomly sample from the normal distribution with standard deviation is 10.
Therefore, if above formula is true, the result of calculated standard deviation of sample mean is supposed to be close to the actual value of standard deviation of sample mean.

Let's look at the result.


> hh <- genRandom(1000, 10)
> sd(hh)
[1] 4.575883
> 15/sqrt(10)
[1] 4.743416
> hh <- genRandom(1000, 30)
> sd(hh)
[1] 2.829178
> 15/sqrt(30)
[1] 2.738613
> hh <- genRandom(1000, 100)
> sd(hh)
[1] 1.532839
> 15/sqrt(100)
[1] 1.5
> hh <- genRandom(1000, 1000)
> sd(hh)
[1] 0.4900068
> 15/sqrt(1000)
[1] 0.4743416


As you increase your sample size , the standard deviation of sample distribution of sample mean gets pretty close to the result which is deduced by formula.

It's pretty interesting.





 

댓글 없음:

댓글 쓰기