Crosstab to Flat File in R with melt()

Most of the time when using R your data frames will be structured as a cross tabulation. This means each row corresponds to a single observation and each column corresponds to a single variable. For example, let us create a data frame consisting of height and weight data for 4 imaginary people in the form of a cross tab.

# generating data 
name = c('Nick','Josh','James','Kelly')
weight = c(72,85,62,50)
height = c(180,162,170,175)

# forming a cross tab
crossTab.df = data.frame(name,weight,height)

# viewing
crossTab.df
   name weight height
1  Nick     72    180
2  Josh     85    162
3 James     62    170
4 Kelly     50    175

Sometimes you may want your data frame to be re-structured as a flat file, particularly if you are using the data visualization and dash-boarding software Tableau to display output from R. The solution lies in the melt() function found in the package reshape2. Lets transform our example data into a flat file:

# loading the required library
library(reshape2)

# forming a flat file
flatFile.df = melt(data=crossTab.df,id='name') 

# viewing
flatFile.df
 
   name variable value
1  Nick   weight    72
2  Josh   weight    85
3 James   weight    62
4 Kelly   weight    50
5  Nick   height   180
6  Josh   height   162
7 James   height   170
8 Kelly   height   175

Of course, to reverse the process you have the function recast() at your disposal…

# back to cross tab
original.df = recast(flatFile.df,name~variable)

# viewing
original.df
 
   name weight height
1 James     62    170
2  Josh     85    162
3 Kelly     50    175
4  Nick     72    180

Note the order of the observations has changed but the data is as it was originally.

Advertisements

Fixing broken loops with tryCatch() in R

Say you are using a loop that calls a function at each iteration. What happens if that function returns a error part way through the process? Your loop wont complete even if some suitable data remains!

One solution is the tryCatch() function in R. It can be used to first ‘try’ the loop and upon encountering an error it will call a predetermined function. This allows your loop to complete.

Below is a (silly) example but illustrates the idea.

Say we want to generate a vector of correlations:

# generating columns
x = rnorm(4)
y = rnorm(4)
z = rnorm(4)
q = rnorm(4)

# forming a data frame
data = data.frame(cbind(x,y,z,q))

# lets make the 'z' column a collection of character values!
data$z = as.character(data$z)

# viewing
str(data)

# creating an empty vector
correlations = rep(0,length(data))

# trying to run a loop
for(i in 1:length(data)){
  correlations[i] = cor(data$x,data[,i])}

# oops!!! We have an error!!!
correlations

# using tryCatch()
# the function will try the expression first. When this fails, 0 will be returned.
for(i in 1:length(data)){
  correlations[i] = tryCatch({expr=cor(data$x,data[,i])},
                             error=function(i){return(NA)})
}

# result!
correlations

Hope this helps!