Basic Machine Learning Concepts

by Andrés González

Machine Learning technologies are making the leap from the academic world and gaining strength in the business one. Nowadays anyone can use them to put their data to work and achieve competitive advantages that until recently, were only available to large companies and institutions.

We have compiled some ideas and basic concepts of Machine Learning to help in its understanding for those who have just landed in this exciting world.

Supervised and unsupervised machine learning

Machine Learning is divided into two main areas: supervised learning and unsupervised learning. Although it may seem that the first refers to prediction with human intervention and the second does not, these two concepts are more related with what we want to do with the data.

One of the most widespread uses of supervised learning is to make future predictions based on behaviors or characteristics that have been seen in the data already stored (historical data). Supervised learning makes possible to search for patterns in historical data by relating all fields to a special field, called “the target field”. For example, e-mails are labeled as “spam” or “legitimate” by users. The prediction process begins with an analysis of which characteristics or patterns the emails that were already marked with both tags have. It can be determined, for example, that spam email is the one that comes from certain IP addresses, and also has a certain text/images relationship, and also contains certain words, and there is no one in the “To:” field, and also (so many “and also” more) … This would be just one of the patterns. Once all patterns have been determined (this phase is called “learning”), new mails that have never been marked as spam or legitimate are compared with patterns and classified (defined) as “spam” or “legitimate” based on their characteristics.

On the other hand, unsupervised learning uses historical data that has no target field. The aim is to explore the data and find some structure or to organize it. For example, it is often used to group customers with characteristics or behaviors similar to those of highly segmented marketing campaigns.

Classification and regression

These are concepts of supervised machine learning. A classification system predicts a category, while a regression predicts a number.

An example of classification is the previously mentioned about spam. Emails are “categorized” as “spam” or “legitimate”. Another classic example of classification in the machine learning world is the prediction of churn, for example, in a telecom company. The objective in this case is to detect the behavioral patterns of customers that will be used to predict whether they are going to the competition. In this case, customers are classified as “churn” or “no churn”.

The regression, on the other hand, predicts a number, such as what the price of an item is going to have, or the number of reservations that will be made in a hotel in May.

Data mining

It is not uncommon to see how data mining and machine learning concepts are used indifferently. These are strongly related concepts. From our point of view, the main difference lies in the objective of each discipline. While data mining uncovers previously unknown patterns, machine learning is used to reproduce known patterns and make predictions based on patterns.

In short, it could be said that data mining has an exploratory function while machine learning focuses on prediction.

Learning, training

It is the process in which the patterns of a data set are detected, that is the heart of machine learning. Once patterns are identified, predictions can be made with new data entered into the system.

For example, historical data from book purchases on an online website can be used to analyze customer behavior in their purchasing processes (titles visited, categories, purchase history…), group them into behavioral patterns, and make purchase recommendations to new customers who follow known or learned patterns.


It is the raw material of the prediction system. This is the historical data used to train the system that detects patterns. The dataset is composed of instances, and instances of factors, characteristics or properties.

Instance, sample, record

An instance is each of the data available for analysis. If you want to predict the behavior of telephony service customers, each instance would correspond to a subscriber. Each instance, in turn, is composed of features that describes it, such as the age of the customer in the company, money spent daily on calls, etc. In a spreadsheet, the instances would be the rows; the characteristics, the columns.

Feature, attribute, property, field

These are the attributes that describe each of the instances in the dataset. Names are used interchangeably depending on the author and context. In the case of a customer portfolio, we would be talking about the number of purchases of each customer, age, whether she or he is a follower in social networks, if he or she has registered in the newsletter, what products bought… In a spreadsheet, would be the columns.


It is the attribute or factor that we want to predict, the objective of prediction, such as the probability of readmission of a patient after surgery.

Feature Engineering

This is the previous process to the creation of the prediction model in which an analysis, cleaning and structuring of the data fields is carried out. This process is one of the most important and costly of prediction process. The objective is to eliminate the fields that do not help to make the prediction and organize them properly so that the model does not receive information that is not useful, that could provoke predictions of poor quality or confidence.

In few words, it is the process that eliminates noise from signal.


After training the system (that is, after detecting patterns in the data), a model is created to make predictions. We can assimilate a model to a filter in which new data are entered and whose output is the classification of that data according to the patterns that have been detected in training. For example, if a model with historical data is trained to detect the risk of credit card cancellation, the model will classify new customers based on their behavior to predict the cancellation.

Decision tree

It is the skeleton of the prediction model that is usually represented graphically as a tree, in which the branches are the recognized patterns in the learning process. Predictions for each pattern would be placed on the leaves of the branches.


It is the probability of success that the system calculates for each prediction.


Do you have any questions, or would you like to extend this list? Tell us and we will be glad to answer 🙂

Original article (in Spanish): “Conceptos básicos de Machine Learning

Translation: Sergio Paul Ramos Moreno