2013년 7월 25일 목요일

Chi-square Test [nonparametric test] -No24


Chi-square test is one of the representative nonparametric test. This test verify the correlation between categorical variables.

In order to verify two discrete variables, chi-square tests statistical differences between observation data and expectations.
This test is also statistical hypothesis test whose null hypothesis is that two categorical variables have no relations.

It is quite easy to understand Chi-square test by conducting a test with different samples.

As you can see below, there are two table which has a data of Candidate preference by man and woman. I would like to compare two data. Intuitively, we can guess that there is no difference by man and woman on case-2. However, there is a quite difference preference on case-1.


2013년 7월 23일 화요일

Correlaton-No23


Correlation analysis is to measure the correlation between two variables. Important things is this measure just focused on the degree of correlation. but do not explain the exact causality between two values. If you want to get exact causality between the independent variable and dependent variable in given the mathematical equation, you should use the regression test.
There are several types of correlation coefficient to explain correlation.
Today I will focus on the one of the famous tests which is "Pearson correlation coefficient".

This is a value  of the linear correlation range from +1 to -1.
Positive 1 represents strong correlation between two variables. But  negative 1 indicates that two variable is likely to be inversely proportional relation.


2013년 7월 16일 화요일

Simple Hypothesis Test using R -No22

Today I am going to introduce a hypothesis test using R.
Before moving on to the main test, why don't we review the hypothesis test.
A hypothesis test is a statistical method of decision making which is commonly used.
In order to begin this test, you have to define null hypothesis and alternative hypothesis.

Given the null hypothesis is true, we can figure out the probability and then we make a decision whether we will reject null hypothesis or not. If this probability is really really small, then the result leads us that null hypothesis isn't true.
As a result, we will reject the null hypothesis and favor to alternative hypothesis.
The probability of getting extreme result is called "p-value". Generally,the  decision that whether we should accept null hypothesis or not is depends on our threshold.

As you can see below, I will sample 1,000 data  from the normal distribution which has a standard deviation is 10 and mean is zero.

> data <- rnorm(1000,0,10)

Before we conduct a hypothesis test, we need to define a hypothesis first.
1) Null Hypothesis :  True mean is equal to 0
2) Alternative Hypothesis : True mean is not equal to 0
In fact, we've already know that population mean is 0 because we selected a data from the population distribution whose mean is 0.
Therefore, t-test result should not reject null hypothesis.
Let's look at the test result. This command conducts a test that if the population mean is 0 , in given condition.