Prediction of churning is one of the best known applications in the field of Machine Learning, Big Data and Data Prediction.
Data-based prediction technologies have been simplified so much that they have been made available not only for big companies, even to those of any size. Tools such as those developed by BigML are bringing Machine Learning closer to companies, but it should not be forgotten that the raw material for any predictive system is data. Not in vain, the time we use in the phase of collecting and processing data prior to the algorithm is between 80% and 90% of the total project time. In any case, let’s see how predictions can be made with highly prepared example data.
Predicting in 3 Steps
In the previous article we made a short introduction to Machine Learning. In this one, we will look at a practical step-by-step example; how to exploit data that we have in the company to make decisions, in this case to prevent churning (and probably going to the competition).
Our fictitious company offers telephony services and the question we want to answer is: “Is this customer going to churn next X months?” The possible answers (our prediction) are two: yes or no.
The 3 steps we will go through are:
- Collect a set of historical value data.
- Create a model with the data (we will train an algorithm).
- Make predictions.
Step 1: Data collection
The selection and preparation of data to train the system is one of the most important tasks in the process. As we’ll see in the rest of the article, predictions are so easy to make with BigML that we can get used to think that more data we have, the better the predictions will be. But no, it is not worth to keep all data. We need quality and well-structured data. If we do not select them correctly, what we will do is introduce noise into the system that will produce predictions of little or none value. Although during the algorithm training process the system itself can detect and discard data that is considered superfluous, it is important to make a selection of the data that make sense to answer the question we are asking. In the case we are going to deal with in this article, this task is already done, but if you want more information, check out this blog post that explains in more detail why it is important to clean, select and transform the data.
In our example, we are going to characterize each customer (or subscriber) with some key data from their profile, such as their seniority or the number of calls they have made. The minimum piece of information we use to define the subscriber profile is called a feature. The set of subscribers, together with their characteristics, form the database (a CSV file) that we will use to make the predictions.
To predict the churn of a telephony service we can divide the characteristics into 4 groups:
- Customer characteristics: basic user information (e. g. age, sex, city of residence…).
- Support features: data on the user’s interaction with the customer service (number of calls, questions asked, assessment of their satisfaction…).
- Characteristics of use: use made by the system subscriber (number of interactions with the service, contracted plans, monthly expenses…).
- Additional or contextual features: other types of information useful for prediction (e. g. customer seniority).
Let’s go with the concrete example. We will base the exercise of this article on this csv file to predict the loss of customers in a fictitious telephone company. As an example, the characteristics don’t exactly match what we would like to see, but they are useful to watch the prediction process.
We will use a CSV file with information about 3,333 telecom suscribers. It looks like this:
Each row corresponds to one subscriber, with its characteristics, and a last column indicating whether or not that subscriber left the service (column “churn”). We have divided the original file in two, one with 80% of the data and another with 20%. To train the system (create a model) we will use the file with 80%. To verify if the model makes good predictions we will use the remaining 20%.
Let’s get to work. Now it’s time to upload the data to the system. If you haven’t, create an account in BigML (it’s free). On the control panel (“Dashboard”), click on the folder icon and select the file with 80% of the data on your computer or simply drag it from the desktop to the workspace (drag&drop).
The file will appear in the “Sources” list:
By clicking on the file name you will be able to see a sample of the data you have uploaded (up to 25 instances). Notice that the rows have been converted into columns and the columns into rows:
BigML has detected the data type of each feature. In this case we only have text and numerical data, represented by “ABC” and “123” respectively.
Once the data has been uploaded we will create a Dataset, i. e. transform the CSV into a format that BigML can process and in which we will be able to make a preliminary analysis of the data.
By clicking on the cloud icon represented by a lightning bolt we create the Dataset (“1-CLICK DATASET”):
The following window will appear automatically:
Here we can do a preliminary analysis of the data. The histograms on the right side are used to analyze the variation and distribution of each characteristic. We won’t go into detail, but it is interesting to scroll down through the figures to see their properties. It is also interesting to look at the last row, which BigML automatically assigns the last column of the file as target, but it can be changed:
It should also be noticed that the first row “State” has an exclamation mark for the legend “This field is not preferred”. The system detected that this field is not significant for prediction of churn, as we can see in the histogram, this data that can be considered random. It’s a fact that we thought it would be interesting to make the prediction, but BigML rules it out because it doesn’t add value and it can introduce noise into the predictions (we could use it if we consider that BigML has made a mistake in its assessment).
Step 2: Create a model (“train” the system)
After the Dataset was created, we will create and train the model. In this step BigML will detect behavior patterns that lead subscribers to churn. In the Dataset view, click again on the cloud icon represented by the lightning and click this time on “1-CLICK MODEL”:
The pattern tree will appear on your screen as follows:
A pattern tree represents a model in which each node is associated with a question to a value of a characteristic, with a number of possible responses represented by the branches, in which the leaves are associated with the output values. The answer to the first question is at the node on the top. As you go down through the nodes of the tree, more questions are answered. The value of the last node gives us the prediction of the model. Each node is associated with a percentage confidence.
At this point we already have the model trained with historical data. The system has detected the patterns along with each confidence. Let’s make predictions now.
Step 3: Make predictions
There are several ways to make predictions. In this article we are going to use one of the simplest ones. It is an individual prediction, of a single subscriber, with the characteristics that we define.
To start, in the model view click on the cloud icon represented by the ray and then on “PREDICT”:
A screen appears automatically to set a value for each characteristic:
What values have you set? What does the model predict for this subscriber?
A one-to-one prediction is not practical in many scenarios. For”massive” predictions we can use an input file with the data of all the users we want to predict. They are made from the “BATCH PREDICTION” option that you may have seen when you last clicked on the icon of the cloud represented by the lightning bolt.
This option, in addition to making massive predictions, is also used to verify if the model is working properly. Do you remember the file with 20% of the data? It’s time to use it. You have to upload it, create a Dataset and make Batch Prediction predictions.
Have you already done it? Is the model we have created making reliable predictions? what percentage of the predictions were accurate? We would like to let you know that this model can be improved, but we will explain it in future articles. We’ll give you a hint if you want to do it yourself: “Ensembles”.
Machine Learning is approaching to companies. But it would be a mistake to think that all we need is a service like BigML to make predictions. It should not be forgotten that data must be collected, cleaned, transformed…. The quality of a prediction depends much more on the data, its structured and how we treat it than on the algorithm we use. On the other hand, just as we have seen how to predict churn, why not predict which plan is most appropriate for each client? The data is there. It’s time to start exploiting them to help the evolution of your business. There is no longer any need to install dedicated infrastructure with high implementation and management costs. Are you up for this?
Original article (in Spanish): “Machine Learning: predicciones basadas en datos con BigML”
Translation: Sergio Paul Ramos Moreno