DIY Header Comments in R

It is always a good idea to include a header at the start of your R scripts that provide a title, record the author, the date of writing and provide a description of what your code actually does! This makes it easy for someone else (or the future you!) to understand what you have done.

There is no need to manually type these out. header_comment() from commentr package prints a really nice header to the R console:

library(commentr)

header_comment("Amazing Program", "Does everything you could ever want!",
               author       = "John Roberts",
               contact      = "john.roberts@someemailprovider.com",
               client       = "A Big Firm",
               date_created = date(),
               width        = "a4landscape")

Try it!

Otherwise, if you would rather the header took a different form, you could write your own function. What follows is an example function I quickly wrote that uses cat() to print a header. Go ahead and change it to suit your styles!

# the function takes 5 arguments:
# title (required with no default)
# version (default: NULL)
# author (default: system user)
# contact (default: NULL) 
# description (default: NULL)

header = function(title=NULL,version=NULL,
  author=Sys.getenv('USERNAME'),
  contact=NULL,description=NULL){  
  # you must have a title
  if (is.null(title)){
    stop('You need a title!')
  } 
  # defines the bars
  bar = c(rep('#',4), rep('-',nchar(date())), rep('#',4),'\n')
  # what will print regarding version
  version.str = c('Version:', version,'\n')
  # what will print regarding contact
  contact.str = c('Contact:',contact,'\n')
  # what will print regarding description
  description.str = c('Description:',description,rep('\n',2)) 
  # if no version was entered, nothing will print
  if(is.null(version)){
    version.str = NULL
  }
  # same for contact
  if(is.null(contact)){
    contact.str = NULL
  }
  # same for description
  if(is.null(description)){
    description.str = NULL
  }
  # comment construction
  cat(  
    '',
    rep(x=bar,times=2),'\n', 
    'Title:',title,'\n',  
    version.str,
    'Date:',date(),'\n', 
    '\n',
    'Author:', author,'\n', 
    contact.str,'\n',
    description.str,
    rep(x=bar,times=2) 
  ) 
}

header(title="Amazing Program",version=2.1,author="John Roberts",
contact="john.roberts@someemailprovider.com",
description="Does everything you could ever want!")

Output:

 # # # # - - - - - - - - - - - - - - - # # # 
 # # # # - - - - - - - - - - - - - - - # # # 
 
 Title: Amazing Program 
 Version: 2.1 
 Date: Mon Jul 20 21:14:30 2015 
 
 Author: John Roberts 
 Contact: john.roberts@someemailprovider.com 
 
 Description: Does everything you could ever want! 
 
 # # # # - - - - - - - - - - - - - - - # # # 
 # # # # - - - - - - - - - - - - - - - # # #

References
https://cran.r-project.org/web/packages/commentr/index.html

Advertisements

Maximum-likelihood Fitting with R

So you have gathered your data and now you want use maximum-likelihood estimation along with a selected distribution to fit a model. Here are two methods:

Method 1

In R, you could define the log-likelihood and use R’s optim() to maximise this function with respect to the parameters. This gets you your maximum likelihood estimates which then enable you to fit a probability distribution to the data.

The process is perhaps best shown by way of an example.

Say we have some count data that we think may best be modelled using the Poisson distribution. The data looks like this:


set.seed(100)

# let’s generate some synthetic sample data:
x = rpois(n=10000,lambda=2)

# plotting
hist(x,col='lightblue')

hist of x 2

Hint: We know the data has been generated from a Poisson distribution with \lambda=2!

The Poisson probability function has the form:

P(X=x) = \frac{\lambda^{x}e^{-\lambda}}{x!}

The likelihood function considering n independent data points is therefore:

L(\lambda;x) = \prod_{i=1}^{n}\frac{\lambda^{x_i}e^{-\lambda}}{x_i!} = \frac{\lambda^{x_1}e^{-\lambda}}{x_1!}\times\frac{\lambda^{x_2}e^{-\lambda}}{x_2!}\times\ldots\times\frac{\lambda^{x_n}e^{\lambda}}{x_n!}

= \frac{\lambda^{\sum x_i}e^{-n\lambda}}{x_1!x_2!\dots x_n!}

And the log-likelihood function:

\log(L(\lambda;x))=\log(\lambda^{\sum x_i}e^{-n\lambda})-\log(x_1!x_2!\dots x_n!)

Which, ignoring the terms not featuring the parameter \lambda,

=\sum x_i\log(\lambda)-n\lambda

Expressing this in R:


# the log-likelihood is a function of lambda and the data
poissonLogLik = function(lambda,data){
  # total number of observations
  n = length(data)
  # equation
  logLik = (sum(data)*log(lambda)-n*lambda)
}

To find our estimate \hat\lambda we need to maximise the log-likelihood function:

\frac{\mathrm \partial}{\mathrm \partial \lambda} \big(\log(L(\hat\lambda;x))\big)=0

Doing it in R:


# with a starting value of 1, minimize the function! 
mle = optim(par=1,fn=poissonLogLik,data=x,control=list(fnscale=-1))

> mle
$par
[1] 2.000195

$value
[1] -6136.363

$counts
function gradient 
      30       NA 

$convergence
[1] 0

$message
NULL

Note that by default, optim() finds a minimum solution. This is no good because we are only interested in maximising! Adding control=list(fnscale=-1) solves the problem. Check out optim()’s help file for more details.

Now we have our estimate, \hat\lambda, how close are our observed and expected values?

Our observed values:


# observed
table(x)

x
   0    1    2    3    4    5    6    7    8    9 
1327 2781 2676 1790  880  367  126   38   14    1 

Our expected values:


# expected
expected = dpois(x=min(x):max(x), lambda=mle$par)*10000

> expected
 [1] 1353.088531 2706.441338 2706.705639 1804.646644  902.411439  360.999826
 [7]  120.345027   34.387651    8.597752    1.910798

Overlaying the expected values onto the original histogram (with a red line):

# observed
hist(x,col='lightblue')

# expected
lines(min(x):max(x),expected,col='red',lwd=2)

hist of x expected

The the model appears to fit well!

Method 2

The function fitdistr() in the MASS package allows us to fit a range distributions to data using maximum-likelihood. This means we don’t have to use method 1 each time! woohoo! It’s as easy as handing over your data, choosing your distribution and pressing RUN. You also get your estimated standard errors which is a bonus.

# loading the MASS package
library(MASS)

# fitting a Poisson distribution using maximum-likelihood 
fitdistr(x,densfun='Poisson')
     lambda  
  2.00010000 
 (0.01414249)

Very nice!

References
http://cran.r-project.org/doc/contrib/Ricci-distributions-en.pdf