2013년 7월 16일 화요일

Simple Hypothesis Test using R -No22

Today I am going to introduce a hypothesis test using R.
Before moving on to the main test, why don't we review the hypothesis test.
A hypothesis test is a statistical method of decision making which is commonly used.
In order to begin this test, you have to define null hypothesis and alternative hypothesis.

Given the null hypothesis is true, we can figure out the probability and then we make a decision whether we will reject null hypothesis or not. If this probability is really really small, then the result leads us that null hypothesis isn't true.
As a result, we will reject the null hypothesis and favor to alternative hypothesis.
The probability of getting extreme result is called "p-value". Generally,the  decision that whether we should accept null hypothesis or not is depends on our threshold.

As you can see below, I will sample 1,000 data  from the normal distribution which has a standard deviation is 10 and mean is zero.

> data <- rnorm(1000,0,10)

Before we conduct a hypothesis test, we need to define a hypothesis first.
1) Null Hypothesis :  True mean is equal to 0
2) Alternative Hypothesis : True mean is not equal to 0
In fact, we've already know that population mean is 0 because we selected a data from the population distribution whose mean is 0.
Therefore, t-test result should not reject null hypothesis.
Let's look at the test result. This command conducts a test that if the population mean is 0 , in given condition.





> t.test(data, mu=0)

        One Sample t-test

data:  data 
t = 0.3781, df = 999, p-value = 0.7054
alternative hypothesis: true mean is not equal to 0 
95 percent confidence interval:
 -0.4931662  0.7285949 
sample estimates:
mean of x 
0.1177143 


As you expected, p-value is very large number, that mean we do not reject null hypothesis test and favor to null hypothesis.

Then let's review the meaning of the hypothesis with me.

First, at the bottom ,you can see the number (0.1177143). this number
        is calculated by this simple calculation :
> mean(data)
[1] 0.1177143
>

Second, t-value is also calculated by below Z-score logic.

> mean(data)/(sd(data)/sqrt(1000))
[1] 0.3781356

Third , P-value is also computed by below calculation.

 > pt(-mean(data)/(sd(data)/sqrt(1000)), df=999) *2
[1] 0.7054102

It's interesting. we can calculate the  t-test result with the knowledge we learned last few posts.

댓글 없음:

댓글 쓰기