Chi-square test is one of the representative nonparametric test. This test verify the correlation between categorical variables.
In order to verify two discrete variables, chi-square tests statistical differences between observation data and expectations.
This test is also statistical hypothesis test whose null hypothesis is that two categorical variables have no relations.
It is quite easy to understand Chi-square test by conducting a test with different samples.
As you can see below, there are two table which has a data of Candidate preference by man and woman. I would like to compare two data. Intuitively, we can guess that there is no difference by man and woman on case-2. However, there is a quite difference preference on case-1.
[Case-1]
Candidate-1 | Candidate-2 | |
man | 35 | 8 |
woman | 10 | 45 |
[Case-2]
Candidate-1 | Candidate-2 | |
man | 10 | 35 |
woman | 11 | 34 |
Let's do a test right away.
[ CASE1 Chi-test result ]
> chisq.test(zz, simulate.p.value = TRUE)
Pearson's Chi-squared test with simulated p-value (based on 2000
replicates)
data: zz
X-squared = 38.8319, df = NA, p-value = 0.0004998
[ CASE2 Chi-test result ]
> chisq.test(zz, simulate.p.value = TRUE)
Pearson's Chi-squared test with simulated p-value (based on 2000
replicates)
data: zz
X-squared = 0.0621, df = NA, p-value = 1
Pearson's Chi-squared test with simulated p-value (based on 2000
replicates)
data: zz
X-squared = 0.0621, df = NA, p-value = 1
As you expected, first test reject null hypothesis, so we are strongly believed that there is possibility man and woman has a different preference.
Second test result leads us that there is no difference of preference by sex.
* I would like to add how to get X-squared value (X is a Greek capital letter Chi)
Why don't we calculate the case-1 by ourselves using the calculation logic which is attributed to Karl Pearson.
This test is conducted with the assumption that observation frequency should be consistent with the expectation frequency up to a certain point.
Let's calculate it
==(35-19.74)^2/19.74+(10-25.26)^2/25.26+(8-23.26)^2/23.26+(45-29.74)^2/29.74
==>>
38.85718483 |
This value is quite similar to the value from R chi square.
X-squared = 38.8319
It is interesting.
댓글 없음:
댓글 쓰기