2013년 9월 11일 수요일

Regression analysis - No26

Today I am going to explain regression analysis.
 Before I talk about analysis method, I would like to start by talking about concept first.
This would be very challenge to me, but I ll try to do my best to explain easily as possible as I can.
I've already mentioned in previous post, regression analysis is focused on causality
between independent variable (response variable) and dependent variable (explanatory variable).
What is this mean ?
In order to answer this question, we can't go further without correlation test.
Once you've tested correlation analysis,
 you can get correlation coefficient value to judge the strength of the relation between two variables.
The correlation coefficient value ranges from -1 to +1 which stands for positive relation or negative trend relation.
 However, this results doesn't explain the exact causality between two variables.
 In other words, By getting the regression coefficient,
we can figure out the ratio of the variability in
independent variable x that affects the explanatory variable y.

Let's assume there are two data sample,
Type 1 shows a positive correlation but Type 2 shows a negative correlation. And
we can get a measureable data which is called  correlation coefficient.
In regression analysis, we use similar index to gauge the strength of correlation.
We called this is  R-squared and this value can get from the square of correlation coefficient.

> cor(Temperature, Discomfort_Index)
[1] 0.9963963
> cor(Temperature, Sales_Long_trousers)
[1] -0.8676816

2013년 8월 20일 화요일

T-test (two samples)-No25

Today I am going to introduce t-test for two samples.

t-test is a statistical hypothesis test to compare two group.
t-test uses a mean and standard deviation of sample data to determine whether the population means of two group have a relation or not.
Actually, there is an another test which is ANOVA (analysis of variance) has a same purpose. Usually t-test is much more simple than ANOVA test.

This test might be applied in many cases. Such as following
case1) Income gab between city and urban.
case2) New medicine test for patient


2013년 7월 25일 목요일

Chi-square Test [nonparametric test] -No24


Chi-square test is one of the representative nonparametric test. This test verify the correlation between categorical variables.

In order to verify two discrete variables, chi-square tests statistical differences between observation data and expectations.
This test is also statistical hypothesis test whose null hypothesis is that two categorical variables have no relations.

It is quite easy to understand Chi-square test by conducting a test with different samples.

As you can see below, there are two table which has a data of Candidate preference by man and woman. I would like to compare two data. Intuitively, we can guess that there is no difference by man and woman on case-2. However, there is a quite difference preference on case-1.


2013년 7월 23일 화요일

Correlaton-No23


Correlation analysis is to measure the correlation between two variables. Important things is this measure just focused on the degree of correlation. but do not explain the exact causality between two values. If you want to get exact causality between the independent variable and dependent variable in given the mathematical equation, you should use the regression test.
There are several types of correlation coefficient to explain correlation.
Today I will focus on the one of the famous tests which is "Pearson correlation coefficient".

This is a value  of the linear correlation range from +1 to -1.
Positive 1 represents strong correlation between two variables. But  negative 1 indicates that two variable is likely to be inversely proportional relation.


2013년 7월 16일 화요일

Simple Hypothesis Test using R -No22

Today I am going to introduce a hypothesis test using R.
Before moving on to the main test, why don't we review the hypothesis test.
A hypothesis test is a statistical method of decision making which is commonly used.
In order to begin this test, you have to define null hypothesis and alternative hypothesis.

Given the null hypothesis is true, we can figure out the probability and then we make a decision whether we will reject null hypothesis or not. If this probability is really really small, then the result leads us that null hypothesis isn't true.
As a result, we will reject the null hypothesis and favor to alternative hypothesis.
The probability of getting extreme result is called "p-value". Generally,the  decision that whether we should accept null hypothesis or not is depends on our threshold.

As you can see below, I will sample 1,000 data  from the normal distribution which has a standard deviation is 10 and mean is zero.

> data <- rnorm(1000,0,10)

Before we conduct a hypothesis test, we need to define a hypothesis first.
1) Null Hypothesis :  True mean is equal to 0
2) Alternative Hypothesis : True mean is not equal to 0
In fact, we've already know that population mean is 0 because we selected a data from the population distribution whose mean is 0.
Therefore, t-test result should not reject null hypothesis.
Let's look at the test result. This command conducts a test that if the population mean is 0 , in given condition.