Modelling churn, part 1 – Setting up the analytics project

This is the first blogpost in a series about predicting churn. In this post we’re going to talk about analytics methodology. But first, let’s start with a simple definition:


Churn – a customer stops being a customer


This includes:

  • A customer cancels their subscription service
  • A customer with money invested in your services either lose all value or withdraws the money
  • A recurring customer stops coming back to you for repeat purchases



To model this, first you need data on your customers – some of which are still with you and some who has churned. Like, in the table below customer Id1 and Id4 are still customers after April, but Id2, 3 and 5 are not:

Jan Feb Mar Apr
Id1 1 1 1 1
Id2 1 1 0 0
Id3 1 1 1 0
Id4 1 1 1 1
Id5 1 0 0 0


Second, you need data to use for predicting who is going to churn. What data you have and how you use it is very important, and will be covered in more detail in a separate blogpost. For the time being we will make do with an example table below, consisting of both categorical and numerical data.

Gender Age AvgValue Frequency
Id1 Female 31 302 High
Id2 Male 22 195 Low
Id3 Female 29 152 Mid
Id4 Female 47 412 Low
Id5 Male 39 353 Mid


Once you’ve got the data, you must decide on how to go about predicting the churn risk for your customers. Churn prediction is a classification problem, and as such statistical methods (like logistic regression and survival regression) and classification algorithms (like decision trees, random forests and k-NN) are all under consideration. One also must decide on whether to use a single model or to use a whole ensemble when predicting.


In a recent project that we did for a client we opted for a single model logistic regression approach. Our main reasons for doing this was:

  • We wanted an estimated model that could be inserted into Tableau for daily updated churn probabilities on all the companies’ customers
  • Model coefficients are interpretable and comparable, which is good when communicating results to non-data scientists
  • Insights from the variable selection process can be leveraged to the wider organization – knowing which phenomenon are not related to churn can sometimes prove as valuable an insight as the knowledge of which phenomenon that are related to churn


Now, when predicting churn there is no right or wrong when choosing methodology. Sometimes some methodologies are less viable than others, but more often it’s the strengths of one methodology in combination with the restrictions imposed on you by the project that that make the choice for you. No right, no wrong, just picking the tool best suited for the job at hand.

Leave a comment

Your email address will not be published. Required fields are marked *