2013년 2월 28일 목요일

Inferential analysis basic-2 - No18

I think,  another fundamental concept of inferential analysis is standard error of the mean.
In order to explain this, I am going use same function I used before.

As you can see the  below, the larger your sample number , the smaller  standard deviation. In other words, the shape of your sample distribution of sample mean will become more normal distribution.


2013년 2월 26일 화요일

Inferential analysis basic-1 - No17

One of the reasons that we are studying inferential analysis is prediction. In some cases, it is impossible to gather whole data to evaluate data character. For example, it is nonsense if you try to check all the products which are produced in your factory to check data quality management. Instead, we take a certain amount of sample and believe the result of evaluation as if  we took all the population.
How does it possible ?
Today I will briefly introduce two major theories make above sample test reliable.

First things is in line with central limit theorem. I think I've already explained overview of central limit theorem in previous post. One thing I didn't clearly prove was average value.
In other words, as you take a mean of sample data over and over again, your sample average will be the  average of population.

I am going to show you by simple test using R.

I've make a new function to get a mean from normal distribution (mean = 5, sd = 10)


2013년 2월 14일 목요일

Normal distribution - No16

When it comes to the probability distribution, you might be think two distributions.

  • Discrete distribution 
  • Continuous distribution

Normal distribution is very popular distribution which is a continuous probability distribution defined by below formula.


f(x) = \frac{1}{\sigma\sqrt{2\pi}} e^{ -\frac{(x-\mu)^2}{2\sigma^2} }



2013년 2월 12일 화요일

Central Limit Theorem - No15

I think, central limit theorem is one of the most important theories to understand inferential statistics. Without understanding this, we can't go further for advanced
statistics analysis.

Anyway, I begin to explore the central limit theorem by defining the exact meaning.
There are so many articles to explain this but I chose this excerpt because it is the most appropriate definition for central limit theorem.
The central limit theorem states that, given certain conditions, the mean of a sufficiently large number of independent random variables, each with finite mean and variance, will be approximately normally distributed.  

Before moving on to the next step, it is necessary to pay attention to the important message. In order to guarantee good normal distribution, we have to apply sufficient large number. I ll show you this intuitively using R function.

First, I've prepared a simple function which generates mean from random variables between 1 to 100 s (parent distribution => uniform probability distribution)
And then I'll keep doing this over and over again.