Clustering is an unsupervised machine learning algorithm where the target is not known. … The intent of clustering is fundamental to segmentation in grouping similar customers and products in a marketing activity.

This is a machine learning technique, where there are no defined dependent and independent variables. The patterns in the data are used to identify / group similar observations. … We will use are k-means clustering for creating customer segments based on their income and spend data

Clustering is an unsupervised machine learning algorithm where the target is not known. Target is estimated by grouping indistinguishable observations in a single cluster while isolating those which are entirely disparate. Clustering involves partitioning of n number of observations into p-type of clusters. For instance, in marketing analysis, an analyst has the assess to several other measurements to statistically segment the customer groups such as age, income, sex, geographic locations etc. Based on the given parameters one must perform market segmentation by recognizing distinct and perceptible subgroups of people who might be more receptive to a form of advertising or more likely to purchase a certain product. A segment is typically a cluster of customer observations to make the strategic decision on how to up-sell and cross-sell entities based on user need and wants.

Now, why do we need customer segmentation or clustering? The intent of clustering is fundamental to segmentation in grouping similar customers and products in a marketing activity. The companies cannot target each customer but rather it apportions customer based on their preferences to target individual clusters by positioning themselves in a unique segment. For case, a firm might want to segregate customer based on their price sensitivity, quality of product and brand loyalty. The resulting variables measured through a Likert scale, the higher value signifies the greater inclination towards price sensitivity, quality of product and brand loyalty whereas a low value confers to lower intensity.

Dell mainly uses customer segmentation in its market strategy along with the product segmentation where it is targeting several market segments and designing separate products or offers for them. On one hand, geographically, Dell has segmented the market into the US/Americas, EMEA and Asia Pacific-Japan where each area has different pricing and marketing strategies. On the other hand, demographically, there is no age, gender or race bias but income, occupation, and education play a role in deciding the customer needs and hence the product offer.

The branding and segmentation helped business to create customer-driven market strategies to gain insight from the customer preferences to develop valuable customers. To give an instance, a company like Dell one of the world’s largest computer systems companies targets two classes of customers based on relationship customer and the transactional customer. Relationship-based customers are corporations, government, and education sector that shares a substantial portion of the profit. On the other hand, traditional customers are price-sensitive looking for low cost, more reliable, quality service and added value products.

Moreover, there are several approaches to partitioning into groups. These approaches are hierarchical methods, partitioning methods (more precisely, k-means), and two-step clustering which is largely a combination of the first two methods. An important quandary in the application of cluster analysis is the decision regarding how many clusters should be derived from the data. There is always a trade-off between choosing many clusters allowing to identify numerous segments and tremendous subtle differences between segments rather taking a few clusters as possible to make them easy to understand and actionable.

Hierarchical Methods

The method follows a typical tree-based approach to cluster elements. The clustering is based on similarity and dissimilarity measure. This can be evaluated by calculating distances between the given pair of objects generally the objects with shorter distances are clustered into the same groups otherwise they considered as dissimilar.Agglomerative clustering is a bottom-up clustering method where clusters have sub-clusters, starts from the partitioning of the data set into singleton nodes and merges step by step the current pair of mutually nearest nodes into a new node until there is one final node left, which comprises the entire data set. The underlying technique comprises of various clustering schemes differing in the way in which the measure of inter-cluster dissimilarity is updated after each step. The seven most common methods are termed single, complete, average (UPGMA), weighted (WPGMA), Ward, centroid (UPGMC) and median (WPGMC) linkage.

Divisive Hierarchical Clustering, a top-down clustering method and is less commonly used. It works in a similar way to agglomerative clustering but in the opposite direction. This method starts with a single cluster containing all objects, and then successively splits resulting clusters until only clusters of individual objects remain.

Distance Metrics: Euclidean distance- Commonly practiced distance metric, Euclidean distance computes the root of the square difference between the coordinates of the pair of objects.

City block or Manhattan distance- Manhattan distance computes the absolute differences between coordinates of the pair of objects

Chebyshev distance- Chebyshev Distance also known as maximum value distance and computed as the absolute magnitude of the differences between the coordinate of a pair of objects. The metric applied when observations are ordinal.

Minkowski Distance- This distance can be used for both ordinal and quantitative variables

Another important set of clustering procedure is K-means partitioning method, one of the most powerful techniques for the market research which are entirely different from the previous algorithm discussed. The algorithm demands the computation of k number of centroid every item will then be assigned to the nearest centroids and the process repeats iteratively until every observation are clustered into the groups.

K-means Clustering

Another important set of clustering procedure is K-means partitioning method one of the most powerful techniques for market research which are entirely different from the previous algorithm discussed. The algorithm demands the computation of k number of centroids every item will then be assigned to the nearest centroid and the process repeats iteratively until every observation are clustered into groups.

An initial step is to identify the number of k partitioning centroids

Based on the above parameters characteristics within the elements will be homogeneous while maximizing the differences between the groups.

Two-Step Clustering

The method resolves the issue of analyzing mixed variables measured on different scale levels. The algorithm is based on a two-stage approach: In the first stage, the algorithm undertakes a procedure similar to the k-means algorithm. Based on the output from the previous step, the two-step procedure conducts a modified hierarchical agglomerative clustering procedure that combines the objects sequentially to form homogeneous clusters. This is done by building a so-called cluster feature tree whose leaves represent distinct objects in the dataset. The procedure can handle categorical and continuous variables simultaneously by calculating measures-of-fit such as Akaike’s Information Criterion (AIC) or Bayes Information Criterion (BIC). Furthermore, a good marketing strategy not only entails segmenting customer groups but also targeting and positioning groups based on customer profiling, businesses bucketize different segments to make informed decisions in terms of sales and marketing dollars to increase ROI. Eventually, this helps businesses to deliver enhanced customer service and boost customer satisfaction.

From Malvika Mathur

Clustering Algorithms For Customer Segmentation

From Swmya Vivek

Context
In today’s competitive world, it is crucial to understand customer behavior and categorize customers based on their demography and buying behavior. This is a critical aspect of customer segmentation that allows marketers to better tailor their marketing efforts to various audience subsets in terms of promotional, marketing and product development strategies.

Objective
This article demonstrates the concept of segmentation of a customer data setfrom an e-commerce site using k-means clustering in python. The data set contains the annual income of ~300 customers and their annual spend on an e-commerce site. We will use the k-means clustering algorithm to derive the optimum number of clusters and understand the underlying customer segments based on the data provided.

About the data set
The dataset consists of Annual income (in $000) of 303 customers and their total spend (in $000) on an e-commerce site for a period of one year. Let us explore the data using numpy and pandas libraries in python.

#Load the required packages
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
#Plot styling
import seaborn as sns; sns.set()  # for plot styling
%matplotlib inline
plt.rcParams['figure.figsize'] = (16, 9)
plt.style.use('ggplot')
#Read the csv file
dataset=pd.read_csv('CLV.csv')
#Explore the dataset
dataset.head()#top 5 columns
len(dataset) # of rows

#descriptive statistics of the dataset
dataset.describe().transpose()

dataset.head()
dataset.describe().transpose()

The dataset consists of 303 rows. The mean annual income is 245000 and the mean annual spend is 149000. The distribution of the annual income and annual spend has been illustrated with a distplot and violinplot.

Visualizing the data
The displot and violinplot give an indication of the data distribution of Income and Spend.

#Visualizing the data - displot
plot_income = sns.distplot(dataset["INCOME"])
plot_spend = sns.distplot(dataset["SPEND"])
plt.xlabel('Income / spend')
Distribution plot of Income & Spend
#Violin plot of Income and Spend
f, axes = plt.subplots(1,2, figsize=(12,6), sharex=True, sharey=True)
v1 = sns.violinplot(data=dataset, x='INCOME', color="skyblue",ax=axes[0])
v2 = sns.violinplot(data=dataset, x='SPEND',color="lightgreen", ax=axes[1])
v1.set(xlim=(0,420))

Clustering Fundamentals
Clustering is an unsupervised machine learning technique, where there are no defined dependent and independent variables. The patterns in the data are used to identify / group similar observations.

Original Dataset
After Clustering

The objective of any clustering algorithm is to ensure that the distance between datapoints in a cluster is very low compared to the distance between 2 clusters. In other words, members of a group are very similar, and members of different groups are extremely dissimilar.

We will use are k-means clustering for creating customer segments based on their income and spend data.

K-Means Clustering
K-means clustering is an iterative clustering algorithm where the number of clusters K is predetermined and the algorithm iteratively assigns each data point to one of the K clusters based on the feature similarity.

Broad Steps of The K-means Algorithm

The Mathematics of Clustering

The mathematics behind clustering, in very simple terms involves minimizing the sum of square of distances between the cluster centroid and its associated data points:

  • K = number of clusters
  • N= number of data points
  • C=centroid of cluster j
  • (xij — cj)– Distance between data point and centroid to which it is assigned

Deciding on the optimum number of clusters ‘K’
The main input for k-means clustering is the number of clusters. This is derived using the concept of minimizing within cluster sum of square (WCSS). A scree plot is created which plots the number of clusters in the X axis and the WCSS for each cluster number in the y-axis.

Scree plot/ Elbow method to determine optimum number of clusters

As the number of clusters increase, the WCSS keeps decreasing. The decrease of WCSS is initially steep and then the rate of decrease slows down resulting in an elbow plot. The number of clusters at the elbow formation usually gives an indication on the optimum number of clusters. This combined with specific knowledge of the business requirement should be used to decide on the optimum number of clusters.

For our dataset, we will arrive at the optimum number of clusters using the elbow method:

#Using the elbow method to find the optimum number of clusters
from sklearn.cluster import KMeans
wcss = []
for i in range(1,11):
    km=KMeans(n_clusters=i,init='k-means++', max_iter=300, n_init=10, random_state=0)
    km.fit(X)
    wcss.append(km.inertia_)
plt.plot(range(1,11),wcss)
plt.title('Elbow Method')
plt.xlabel('Number of clusters')
plt.ylabel('wcss')
plt.show()
Scree plot of given dataset on customer Income & Spend

Based on the elbow plot, we could choose 4,5 or 6 clusters. Let us try both the number of clusters and visualize the clusters to decide on the final number of clusters.

Fitting the k-means to the dataset with k=4

##Fitting kmeans to the dataset with k=4
km4=KMeans(n_clusters=4,init='k-means++', max_iter=300, n_init=10, random_state=0)
y_means = km4.fit_predict(X)
#Visualizing the clusters for k=4
plt.scatter(X[y_means==0,0],X[y_means==0,1],s=50, c='purple',label='Cluster1')
plt.scatter(X[y_means==1,0],X[y_means==1,1],s=50, c='blue',label='Cluster2')
plt.scatter(X[y_means==2,0],X[y_means==2,1],s=50, c='green',label='Cluster3')
plt.scatter(X[y_means==3,0],X[y_means==3,1],s=50, c='cyan',label='Cluster4')
plt.scatter(km4.cluster_centers_[:,0], km4.cluster_centers_[:,1],s=200,marker='s', c='red', alpha=0.7, label='Centroids')
plt.title('Customer segments')
plt.xlabel('Annual income of customer')
plt.ylabel('Annual spend from customer on site')
plt.legend()
plt.show()
Cluster plot : k=4

The plot shows the distribution of the 4 clusters. We could interpret them as the following customer segments:

  1. Cluster 1: Customers with medium annual income and low annual spend
  2. Cluster 2: Customers with high annual income and medium to high annual spend
  3. Cluster 3: Customers with low annual income
  4. Cluster 4: Customers with medium annual income but high annual spend

Cluster 4 straight away is one potential customer segment. However, Cluster 2 and 3 can be segmented further to arrive at a more specific target customer group. Let us now look at how the clusters are created when k=6:

##Fitting kmeans to the dataset - k=6
km4=KMeans(n_clusters=6,init='k-means++', max_iter=300, n_init=10, random_state=0)
y_means = km4.fit_predict(X)
#Visualizing the clusters
plt.scatter(X[y_means==0,0],X[y_means==0,1],s=50, c='purple',label='Cluster1')
plt.scatter(X[y_means==1,0],X[y_means==1,1],s=50, c='blue',label='Cluster2')
plt.scatter(X[y_means==2,0],X[y_means==2,1],s=50, c='green',label='Cluster3')
plt.scatter(X[y_means==3,0],X[y_means==3,1],s=50, c='cyan',label='Cluster4')
plt.scatter(X[y_means==4,0],X[y_means==4,1],s=50, c='magenta',label='Cluster5')
plt.scatter(X[y_means==5,0],X[y_means==5,1],s=50, c='orange',label='Cluster6')
plt.scatter(km.cluster_centers_[:,0], km.cluster_centers_[:,1],s=200,marker='s', c='red', alpha=0.7, label='Centroids')
plt.title('Customer segments')
plt.xlabel('Annual income of customer')
plt.ylabel('Annual spend from customer on site')
plt.legend()
plt.show()
Cluster plot : k=6

Setting the number of clusters to 6 seems to provide a more meaningful customer segmentation.

  1. Cluster 1: Medium income, low annual spend
  2. Cluster 2: Low income, low annual spend
  3. Cluster 3: High income, high annual spend
  4. Cluster 4: Low income, high annual spend
  5. Cluster 5: Medium income, low annual spend
  6. Cluster 6: Very high income, high annual spend

Thus it is evident that 6 clusters provides a more meaningful segmentation of the customers.

Marketing strategies for the customer segments
Based on the 6 clusters, we could formulate marketing strategies relevant to each cluster:

  • A typical strategy would focus certain promotional efforts for the high value customers of Cluster 6 & Cluster 3.
  • Cluster 4 is a unique customer segment, where in spite of their relatively lower annual income, these customers tend to spend more on the site, indicating their loyalty. There could be some discounted pricing based promotional campaigns for this group so as to retain them.
  • For Cluster 2 where both the income and annual spend are low, further analysis could be needed to find the reasons for the lower spend and price-sensitive strategies could be introduced to increase the spend from this segment.
  • Customers in clusters 1 and 5 are not spending enough on the site in spite of a good annual income — further analysis of these segments could lead to insights on the satisfaction / dissatisfaction of these customers or lesser visibility of the e-commerce site to these customers. Strategies could be evolved accordingly.

We have thus seen, how we could arrive at meaningful insights and recommendations by using clustering algorithms to generate customer segments. For the sake of simplicity, the dataset used only 2 variables — income and spend. In a typical business scenario, there could be several variables which could possibly generate much more realistic and business-specific insights.