So you have gathered your data and now you want use maximum-likelihood estimation along with a selected distribution to fit a model. Here are two methods:

**Method 1**

In R, you could define the log-likelihood and use R’s optim() to maximise this function with respect to the parameters. This gets you your maximum likelihood estimates which then enable you to fit a probability distribution to the data.

The process is perhaps best shown by way of an example.

Say we have some count data that we think may best be modelled using the Poisson distribution. The data looks like this:

set.seed(100) # let’s generate some synthetic sample data: x = rpois(n=10000,lambda=2) # plotting hist(x,col='lightblue')

Hint: We know the data has been generated from a Poisson distribution with !

The Poisson probability function has the form:

The likelihood function considering independent data points is therefore:

And the log-likelihood function:

Which, ignoring the terms not featuring the parameter ,

Expressing this in R:

# the log-likelihood is a function of lambda and the data poissonLogLik = function(lambda,data){ # total number of observations n = length(data) # equation logLik = (sum(data)*log(lambda)-n*lambda) }

To find our estimate we need to maximise the log-likelihood function:

Doing it in R:

# with a starting value of 1, minimize the function! mle = optim(par=1,fn=poissonLogLik,data=x,control=list(fnscale=-1))

> mle $par [1] 2.000195 $value [1] -6136.363 $counts function gradient 30 NA $convergence [1] 0 $message NULL

Note that by default, optim() finds a minimum solution. This is no good because we are only interested in maximising! Adding control=list(fnscale=-1) solves the problem. Check out optim()’s help file for more details.

Now we have our estimate, , how close are our observed and expected values?

Our observed values:

# observed table(x)

x 0 1 2 3 4 5 6 7 8 9 1327 2781 2676 1790 880 367 126 38 14 1

Our expected values:

# expected expected = dpois(x=min(x):max(x), lambda=mle$par)*10000

> expected [1] 1353.088531 2706.441338 2706.705639 1804.646644 902.411439 360.999826 [7] 120.345027 34.387651 8.597752 1.910798

Overlaying the expected values onto the original histogram (with a red line):

# observed hist(x,col='lightblue') # expected lines(min(x):max(x),expected,col='red',lwd=2)

The the model appears to fit well!

**Method 2**

The function fitdistr() in the MASS package allows us to fit a range distributions to data using maximum-likelihood. This means we don’t have to use method 1 each time! woohoo! It’s as easy as handing over your data, choosing your distribution and pressing RUN. You also get your estimated standard errors which is a bonus.

# loading the MASS package library(MASS) # fitting a Poisson distribution using maximum-likelihood fitdistr(x,densfun='Poisson')

lambda 2.00010000 (0.01414249)

Very nice!

References

http://cran.r-project.org/doc/contrib/Ricci-distributions-en.pdf