Thursday, January 24, 2013

the wRath of R

If you are not familiar with R; it's a programming language used for statistical computing. In this post I will be talking about my rendezvous with this language via a course called "Computing for data Analysis" (offered by Coursera)

credit: Computing for Data Analysis taught by Johns Hopkins' Roger D. Peng
For the past couple of days now, I have been busy programming in two languages; R (for Computing for Data Analysis) and SML (for Programming Languages course). Needless to say; 80% of the time was in R.

The Course assignment was very stressful to say the very least (the title itself is a mouthful), The course dives in to R too deep too quickly. The course started at the beginning of the year (2nd January to be precise) and we are almost done with the course (Just yesterday I handed in Assignment 3, one more Assignment and we are done!), The course runs for a month. (four quizzes, and four programming assignments, although the first programming assignment we did not  hand in anything but rather, we had to solve the problems via programming).

If anyone wants to have a "lite" meal of R, I would suggest try R or, if you are the kind of person who learns best from Videos (like myself), then I suggest this site. I would also suggest doing any of the two above mentioned before taking the Computing for data analysis class; this isn't something that I did but I wish I had, I heard of the two later on,but it would have made it easier as they are easier to handle and grasp. 

Now, R is not an easy language to grasp at first; if you have prior programming biases, you will have a hard time adapting to the syntax and semantics of the language. Thanks to the assignments and quizzes however, you should eventually get the hang of the proper usage of R.

For assignment 3, we had to use R on a data file (.csv file) that contains information about 30-day mortality and readmission rates for heart attacks, heart failure, and pneumonia for over 4,600 hospitals in the United States. each record has 43 fields, so you can imagine the size of this data. The first 4 parts of the assignment we had to do several types of plots on different type of data; we had to organize, categorize display in xy plots, histograms, boxplots (I never knew of boxplots until today) etc..
The first four parts were not graded but they were good practice for the last three parts; writing a function that takes in the two-letter State (US state) name and the outcome (heart attack, heart failure and pneumonia) and finding the best hospital (lowest 30-mortality) in the specified outcome in the requested State. We then had to write a function that had to find the nth rank (rather then best). Finally, we had to write a function that displays the nth rank for all the States. 

You have to be really good at using all the "tools" that the Instructor has provided; you have to put things together. For example, you have to know how to "Filter-filters" of data (i.e multiple filtering techniques) and know how to work with the data-structures.

Needless to say, the forum (i.e community) is one of the essential parts of learning the material.

I had to use up one of my late-days (we have 5 in total) in-order for me to be able to finish this assignment.

I have thus far, earned 80% for this course and should get a certificate (70% is required). So now I have to think "opportunity cost"; Should I earn the remaining 20% or concentrate on my other MOOCs... since time is the limited resource.. maybe I should do a "data analysis" on that!   


Update: cleaned-up and edited by my dear friend +Ian Belcher . Thanks bro :)

4 comments:

  1. This comment has been removed by a blog administrator.

    ReplyDelete
  2. This comment has been removed by the author.

    ReplyDelete
  3. I will take the course on september 2013, thanks for your suggestions and links!!!!

    ReplyDelete