Demystifying the GLM (Part 1)

Upon being thrown a prickly binary classification problem, most data practitioners will have dug deep into their statistical tool box and pulled out the trusty logistic regression model.

Essentially, logistic regression can help us predict a binary (yes/no) response with consideration given to other, hopefully related, variables. For example, one might want to predict whether a person will experience a heart attack given their weight and age. In this case, we have reason to believe weight and age are related to the incidence of heart attacks.

So, they will have sorted their data, fired up R and typed something along the lines of:

glm(heartAttack ~ weight + age, data = heartData, family=binomial())

But what is a glm? What does family = binomial() actually mean?

It turns out the logistic regression model is a member of a broad group of models known as generalised linear models, or GLMs for short.

This series will endeavor to help demystify these highly useful models.

Stay tuned.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s