Modelling churn, part 1 – Setting up the analytics project

Categories: Churn Modelling | Posted by Marcus on February 12, 2018

This is the first blogpost in a series about predicting churn. In this post we’re going to talk about analytics methodology. But first, let’s start with a simple definition:

Churn – a customer stops being a customer

This includes:

A customer cancels their subscription service
A customer with money invested in your services either lose all value or withdraws the money
A recurring customer stops coming back to you for repeat purchases

To model this, first you need data on your customers – some of which are still with you and some who has churned. Like, in the table below customer Id1 and Id4 are still customers after April, but Id2, 3 and 5 are not:

	Jan	Feb	Mar	Apr
Id1	1	1	1	1
Id2	1	1	0	0
Id3	1	1	1	0
Id4	1	1	1	1
Id5	1	0	0	0

Second, you need data to use for predicting who is going to churn. What data you have and how you use it is very important, and will be covered in more detail in a separate blogpost. For the time being we will make do with an example table below, consisting of both categorical and numerical data.

	Gender	Age	AvgValue	Frequency
Id1	Female	31	302	High
Id2	Male	22	195	Low
Id3	Female	29	152	Mid
Id4	Female	47	412	Low
Id5	Male	39	353	Mid

Once you’ve got the data, you must decide on how to go about predicting the churn risk for your customers. Churn prediction is a classification problem, and as such statistical methods (like logistic regression and survival regression) and classification algorithms (like decision trees, random forests and k-NN) are all under consideration. One also must decide on whether to use a single model or to use a whole ensemble when predicting.

In a recent project that we did for a client we opted for a single model logistic regression approach. Our main reasons for doing this was:

We wanted an estimated model that could be inserted into Tableau for daily updated churn probabilities on all the companies’ customers
Model coefficients are interpretable and comparable, which is good when communicating results to non-data scientists
Insights from the variable selection process can be leveraged to the wider organization – knowing which phenomenon are not related to churn can sometimes prove as valuable an insight as the knowledge of which phenomenon that are related to churn

Now, when predicting churn there is no right or wrong when choosing methodology. Sometimes some methodologies are less viable than others, but more often it’s the strengths of one methodology in combination with the restrictions imposed on you by the project that that make the choice for you. No right, no wrong, just picking the tool best suited for the job at hand.

Modelling churn, part 1 – Setting up the analytics project

Leave a comment Cancel reply