Demystifying the GLM (Part 1)

Upon being thrown a prickly binary classification problem, most data practitioners will have dug deep into their statistical tool box and pulled out the trusty logistic regression model.

Essentially, logistic regression can help us predict a binary (yes/no) response with consideration given to other, hopefully related, variables. For example, one might want to predict whether a person will experience a heart attack given their weight and age. In this case, we have reason to believe weight and age are related to the incidence of heart attacks.

So, they will have sorted their data, fired up R and typed something along the lines of:

glm(heartAttack ~ weight + age, data = heartData, family=binomial())

But what is a glm? What does family = binomial() actually mean?

It turns out the logistic regression model is a member of a broad group of models known as generalised linear models, or GLMs for short.

This series will endeavor to help demystify these highly useful models.

Stay tuned.