2013년 1월 28일 월요일

Basic Graph (2) - No14]

This is the second part of basic graph, I am going to introduce different graph.
Some graphs require you to install a specific packages.
New packages can be installed by command (install.packages()) and then you can download a specific package from your favorite mirror site.
In order to use a package function, you have to load package library.

(1)  Scatter plot matrix
First graph I'd like to introduce is scatter plot matrix.
I'll show you two different ways to generate graph.

Scatter plot matrix is useful if your data has a multi variable to compare each other.

* trees data will be used for further test.


> plot(trees)

2013년 1월 27일 일요일

Basic Graph (1) - No13

When it comes to the R, no doubt, you will be hearing that one of the most powerful functions in R is graphic support. Of course, I feel the same way, too.

Today I will talk about some of R graphic functions and  I will discuss more detail one more time.

I am going to use trees data which is built-in sample data for our demonstration.


> trees
   Girth Height Volume
1    8.3     70   10.3
2    8.6     65   10.3
3    8.8     63   10.2
4   10.5     72   16.4
5   10.7     81   18.8
6   10.8     83   19.7
7   11.0     66   15.6
8   11.0     75   18.2
9   11.1     80   22.6
10  11.2     75   19.9
11  11.3     79   24.2
12  11.4     76   21.0
13  11.4     76   21.4
14  11.7     69   21.3
15  12.0     75   19.1
16  12.9     74   22.2
17  12.9     85   33.8
18  13.3     86   27.4
19  13.7     71   25.7

As you can see, there are three columns. If you want to analyze the relations between
Girth and Volume. 

2013년 1월 19일 토요일

Standard Deviation- No12

Last two posts, we reviewed the meaning of basic statistics such as qualtile, median, mean etc. Among them, I think average is one of the most common statistics and we use it in our daily lives very often.
For example, math test average score of your class or average height of your class.
However, we can't calculate further meaning with average number.

Let's assume that there are two classes and their mathematics test result are as follows.



>classA <- c( 80, 90, 75, 70, 80, 85. 80)
>classB <- c(100,100,100,100, 55,50,55)

And we can calculate average.

> mean(classA)
[1] 80
> mean(classB)
[1] 80

As you can see, two classes has same average.
Can you tell students of two classes have a similar educational attainment ?
I don't think so because scores of classB are not distributed evenly.
In other words, distance from the average of classB is further than classA.
As a result, I can tell that classA is much more stable than classB in terms of their score.

2013년 1월 14일 월요일

Data Interpretation(Mean)- No11

Before continuing, review the summary data again.


> x <- c(1,2,3,4,5,6,7,8,9)
> summary(x)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
      1       3         5             5        7             9 


I explained the meaning of quartile last post.
This time, I will explain the box-whisker plot using above information. It is going to be very fun this time because we can learn how to visualize data set so that we can interpret data set more easily.

First, why don't we  focus on the Max, Min result of summary.
Let me skip these two because Minimum or maximum values are commonly used in our daily lives,
Last one is I didn't explain is average.
This is one of the most common statistic and I believe everybody knows about this.
However, average itself is not an appropriate value to judge a measure of dispersion.
I will introduce the meaning of variance and standard deviation next post.

Anyway, I think we are ready to draw a box-whisker plot.
As you can see , bold line in the box one the middle tells you the median value (second qualtile) and this box range between first qualtile and third qualtile.
Lastly, 2 lines connected to dotted line means the maximum and minimum value.

Box and whisker plot is a useful graph to understand whole data set intuitively.

> boxplot(x)

2013년 1월 13일 일요일

Data Interpretation(Quartile)- No10

In order to have a good insight of the data, I think, we have to enhance an ability to interpret our data. There are so many traditional indexes to interpret our group data such as average, max, standard deviation etc. Furthermore, data can be visualized in diverse graph such as chart graph , bar graph or pie graph.

Fortunately, thanks to great mathematician' achievements, we just need to understand mathematics meaning.

R gives us simple data summaries which has a min,max,median, mean, and qualtile.
Sample result as follows


> x <- c(1,2,3,4,5,6,7,8,9)
> summary(x)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
      1       3       5       5       7       9 

I will explain mean, median next posts and this post I am going to focus on Qualtiles. 


2013년 1월 9일 수요일

Making function(2)- No9

Last post we learned how to make your own function. In order to make a complicated function, using control statement is inevitable.

If fact, I already used "for" statement to make a simple function last post.
Today I will introduce two more classic control statement "while" "if"

First let's review the function we made last post.



 Addyourdata <- function (x, N) {
 for (j in 1:N) {
 j=j+1
 x=x+1
 }
 return(x)
 }


2013년 1월 6일 일요일

Making function(1)- No8

Some people might be think that R is just fabulous scientific calculator but that's wrong.
R is a programming language, you can make your own function or package. Furthermore, if you are a advanced R programmer, you can contribute R package list by submitting and registering your packages to CRAN, then everybody will use your package.

Today I will make a very simple function.
As you are aware, function is a basic unit of R command. you can select and download any packages from CRAN site and then you can use appropriate function for your own purpose.

Let's make a simple scenario.