It is always a good idea to include a header at the start of your R scripts that provide a title, record the author, the date of writing and provide a description of what your code actually does! This makes it easy for someone else (or the future you!) to understand what you have done.

There is no need to manually type these out. header_comment() from commentr package prints a really nice header to the R console:

library(commentr)

header_comment("Amazing Program", "Does everything you could ever want!",
author       = "John Roberts",
contact      = "john.roberts@someemailprovider.com",
client       = "A Big Firm",
date_created = date(),
width        = "a4landscape")


Try it!

Otherwise, if you would rather the header took a different form, you could write your own function. What follows is an example function I quickly wrote that uses cat() to print a header. Go ahead and change it to suit your styles!

# the function takes 5 arguments:
# title (required with no default)
# version (default: NULL)
# author (default: system user)
# contact (default: NULL)
# description (default: NULL)

contact=NULL,description=NULL){
# you must have a title
if (is.null(title)){
stop('You need a title!')
}
# defines the bars
bar = c(rep('#',4), rep('-',nchar(date())), rep('#',4),'\n')
# what will print regarding version
version.str = c('Version:', version,'\n')
# what will print regarding contact
contact.str = c('Contact:',contact,'\n')
# what will print regarding description
description.str = c('Description:',description,rep('\n',2))
# if no version was entered, nothing will print
if(is.null(version)){
version.str = NULL
}
# same for contact
if(is.null(contact)){
contact.str = NULL
}
# same for description
if(is.null(description)){
description.str = NULL
}
# comment construction
cat(
'',
rep(x=bar,times=2),'\n',
'Title:',title,'\n',
version.str,
'Date:',date(),'\n',
'\n',
'Author:', author,'\n',
contact.str,'\n',
description.str,
rep(x=bar,times=2)
)
}

contact="john.roberts@someemailprovider.com",
description="Does everything you could ever want!")



Output:

 # # # # - - - - - - - - - - - - - - - # # #
# # # # - - - - - - - - - - - - - - - # # #

Title: Amazing Program
Version: 2.1
Date: Mon Jul 20 21:14:30 2015

Author: John Roberts
Contact: john.roberts@someemailprovider.com

Description: Does everything you could ever want!

# # # # - - - - - - - - - - - - - - - # # #
# # # # - - - - - - - - - - - - - - - # # #


# Maximum-likelihood Fitting with R

So you have gathered your data and now you want use maximum-likelihood estimation along with a selected distribution to fit a model. Here are two methods:

Method 1

In R, you could define the log-likelihood and use R’s optim() to maximise this function with respect to the parameters. This gets you your maximum likelihood estimates which then enable you to fit a probability distribution to the data.

The process is perhaps best shown by way of an example.

Say we have some count data that we think may best be modelled using the Poisson distribution. The data looks like this:


set.seed(100)

# let’s generate some synthetic sample data:
x = rpois(n=10000,lambda=2)

# plotting
hist(x,col='lightblue')



Hint: We know the data has been generated from a Poisson distribution with $\lambda=2$!

The Poisson probability function has the form:

$P(X=x) = \frac{\lambda^{x}e^{-\lambda}}{x!}$

The likelihood function considering $n$ independent data points is therefore:

$L(\lambda;x) = \prod_{i=1}^{n}\frac{\lambda^{x_i}e^{-\lambda}}{x_i!} = \frac{\lambda^{x_1}e^{-\lambda}}{x_1!}\times\frac{\lambda^{x_2}e^{-\lambda}}{x_2!}\times\ldots\times\frac{\lambda^{x_n}e^{\lambda}}{x_n!}$

$= \frac{\lambda^{\sum x_i}e^{-n\lambda}}{x_1!x_2!\dots x_n!}$

And the log-likelihood function:

$\log(L(\lambda;x))=\log(\lambda^{\sum x_i}e^{-n\lambda})-\log(x_1!x_2!\dots x_n!)$

Which, ignoring the terms not featuring the parameter $\lambda$,

$=\sum x_i\log(\lambda)-n\lambda$

Expressing this in R:


# the log-likelihood is a function of lambda and the data
poissonLogLik = function(lambda,data){
# total number of observations
n = length(data)
# equation
logLik = (sum(data)*log(lambda)-n*lambda)
}



To find our estimate $\hat\lambda$ we need to maximise the log-likelihood function:

$\frac{\mathrm \partial}{\mathrm \partial \lambda} \big(\log(L(\hat\lambda;x))\big)=0$

Doing it in R:


# with a starting value of 1, minimize the function!
mle = optim(par=1,fn=poissonLogLik,data=x,control=list(fnscale=-1))


> mle
$par [1] 2.000195$value
[1] -6136.363

$counts function gradient 30 NA$convergence
[1] 0

$message NULL  Note that by default, optim() finds a minimum solution. This is no good because we are only interested in maximising! Adding control=list(fnscale=-1) solves the problem. Check out optim()’s help file for more details. Now we have our estimate, $\hat\lambda$, how close are our observed and expected values? Our observed values:  # observed table(x)  x 0 1 2 3 4 5 6 7 8 9 1327 2781 2676 1790 880 367 126 38 14 1  Our expected values:  # expected expected = dpois(x=min(x):max(x), lambda=mle$par)*10000


> expected
[1] 1353.088531 2706.441338 2706.705639 1804.646644  902.411439  360.999826
[7]  120.345027   34.387651    8.597752    1.910798


Overlaying the expected values onto the original histogram (with a red line):

# observed
hist(x,col='lightblue')

# expected
lines(min(x):max(x),expected,col='red',lwd=2)


The the model appears to fit well!

Method 2

The function fitdistr() in the MASS package allows us to fit a range distributions to data using maximum-likelihood. This means we don’t have to use method 1 each time! woohoo! It’s as easy as handing over your data, choosing your distribution and pressing RUN. You also get your estimated standard errors which is a bonus.

# loading the MASS package
library(MASS)

# fitting a Poisson distribution using maximum-likelihood
fitdistr(x,densfun='Poisson')

     lambda
2.00010000
(0.01414249)


Very nice!