2013년 1월 13일 일요일

Data Interpretation(Quartile)- No10

In order to have a good insight of the data, I think, we have to enhance an ability to interpret our data. There are so many traditional indexes to interpret our group data such as average, max, standard deviation etc. Furthermore, data can be visualized in diverse graph such as chart graph , bar graph or pie graph.

Fortunately, thanks to great mathematician' achievements, we just need to understand mathematics meaning.

R gives us simple data summaries which has a min,max,median, mean, and qualtile.
Sample result as follows


> x <- c(1,2,3,4,5,6,7,8,9)
> summary(x)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
      1       3       5       5       7       9 

I will explain mean, median next posts and this post I am going to focus on Qualtiles. 





I can get a same result by using a "qualtile" expression in MS Excel sheet.

 Expr : QUARTILE(A1:L1,1), QUARTILE(A1:L1,2),QUARTILE(A1:IL,3),QUARTILE(A1:I1,4)

1 2 3 4 5 6 7 8 9 3  1st quantile (25%)
1 2 3 4 5 6 7 8 9 5   2nd quantile (50%)
1 2 3 4 5 6 7 8 9 7   3rd quantile (75%)
1 2 3 4 5 6 7 8 9 9   4th quantile (100%)










The qualtiles of a set of values can get by sorting your data set in ascending order first and then divide the ordered data set into four equal groups. A qualtile is a type of quantile.

  • first quartile (designated Q1) = lower quartile = splits lowest 25% of data = 25th percentile
  • second quartile (designated Q2) = median = cuts data set in half = 50th percentile
  • third quartile (designated Q3) = upper quartile = splits highest 25% of data, or lowest 75% = 75th percentile

This is general formula to get a quartile value.

(Example data )data <- c(1,2,3,4,5,6,7,8,9,10,11,12)

1. Get a second qualtile value by finding out median  of whole data
    => between 6 and 7 => (6+7)/2 => 6.5
2. Find out a first quartile value by getting a median value with lower half.
    => (1,2,3,4,5,6) => (3+4)/2 => 3.5
....


Interesting thing is formula get a quartile value is quite different in accordance with sample data set. (Data sample is not always as simple as I presented above )

Check out  R help webpage to review 9 classic algorithms that calculate quartile.
and we can find out that R adopt Type 7

> ?quantile


> x
[1] 1 2 3 4 5 6 7 8 9
> quantile(x,type=1)
  0%  25%  50%  75% 100% 
   1    3    5    7    9 
> quantile(x,type=2)
  0%  25%  50%  75% 100% 
   1    3    5    7    9 
> quantile(x,type=3)
  0%  25%  50%  75% 100% 
   1    2    4    7    9 
> quantile(x,type=4)
  0%  25%  50%  75% 100% 
1.00 2.25 4.50 6.75 9.00 
> quantile(x,type=7)
  0%  25%  50%  75% 100% 
   1    3    5    7    9 
> quantile(x,type=9)
    0%    25%    50%    75%   100% 
1.0000 2.6875 5.0000 7.3125 9.0000 



As you can see above, result values is quite different depends on your choice.

In conclusion, I think quantile and quartile is the one of the typical ways we can interpret data group created by great mathematicians. I hope this post help you understand quartile concept.


댓글 없음:

댓글 쓰기