Clustering
is an unsupervised **machine learning** algorithm where the target is not
known. … The intent of clustering is fundamental to **segmentation** in grouping similar customers and
products in a marketing activity.

This
is a machine learning technique, where there are no defined dependent and
independent variables. The patterns in the **data** are
used to identify / group similar observations. … We will use are k-means
clustering for creating customer **segments** based
on their income and spend **data**

**Clustering is an unsupervised machine learning
algorithm where the target is not known.** Target is estimated by
grouping indistinguishable observations in a single cluster while isolating
those which are entirely disparate. Clustering involves partitioning of n
number of observations into p-type of clusters. For instance, in marketing
analysis, an analyst has the assess to several other measurements to
statistically segment the customer groups such as age, income, sex, geographic
locations etc. Based on the given parameters one must perform market
segmentation by recognizing distinct and perceptible subgroups of people who
might be more receptive to a form of advertising or more likely to purchase a
certain product. A segment is typically a cluster of customer observations to
make the strategic decision on how to up-sell and cross-sell entities based on
user need and wants.

Now, why do we need customer segmentation or clustering? The intent of clustering is fundamental to segmentation in grouping similar customers and products in a marketing activity. The companies cannot target each customer but rather it apportions customer based on their preferences to target individual clusters by positioning themselves in a unique segment. For case, a firm might want to segregate customer based on their price sensitivity, quality of product and brand loyalty. The resulting variables measured through a Likert scale, the higher value signifies the greater inclination towards price sensitivity, quality of product and brand loyalty whereas a low value confers to lower intensity.

Dell mainly uses customer segmentation in its market strategy along with the product segmentation where it is targeting several market segments and designing separate products or offers for them. On one hand, geographically, Dell has segmented the market into the US/Americas, EMEA and Asia Pacific-Japan where each area has different pricing and marketing strategies. On the other hand, demographically, there is no age, gender or race bias but income, occupation, and education play a role in deciding the customer needs and hence the product offer.

The branding and segmentation helped business to create customer-driven market strategies to gain insight from the customer preferences to develop valuable customers. To give an instance, a company like Dell one of the world’s largest computer systems companies targets two classes of customers based on relationship customer and the transactional customer. Relationship-based customers are corporations, government, and education sector that shares a substantial portion of the profit. On the other hand, traditional customers are price-sensitive looking for low cost, more reliable, quality service and added value products.

Moreover, there are several approaches to partitioning into groups. These approaches are hierarchical methods, partitioning methods (more precisely, k-means), and two-step clustering which is largely a combination of the first two methods. An important quandary in the application of cluster analysis is the decision regarding how many clusters should be derived from the data. There is always a trade-off between choosing many clusters allowing to identify numerous segments and tremendous subtle differences between segments rather taking a few clusters as possible to make them easy to understand and actionable.

**Hierarchical Methods**

The method follows a typical tree-based approach to
cluster elements. The clustering is based on similarity and dissimilarity
measure. This can be evaluated by calculating distances between the given pair
of objects generally the objects with shorter distances are clustered into the
same groups otherwise they considered as dissimilar.**Agglomerative
clustering** is a bottom-up clustering method where clusters
have sub-clusters, starts from the partitioning of the data set into singleton
nodes and merges step by step the current pair of mutually nearest nodes into a
new node until there is one final node left, which comprises the entire data
set. The underlying technique comprises of various clustering schemes differing
in the way in which the measure of inter-cluster dissimilarity is updated after
each step. The seven most common methods are termed single, complete, average
(UPGMA), weighted (WPGMA), Ward, centroid (UPGMC) and median (WPGMC) linkage.

**Divisive Hierarchical Clustering, **a top-down clustering method and is
less commonly used. It works in a similar way to agglomerative clustering but
in the opposite direction. This method starts with a single cluster containing
all objects, and then successively splits resulting clusters until only
clusters of individual objects remain.

**Distance Metrics:**
Euclidean distance-
Commonly practiced distance metric, Euclidean distance computes the root of the
square difference between the coordinates of the pair of objects.

City block or Manhattan distance- Manhattan distance computes the absolute differences between coordinates of the pair of objects

Chebyshev distance- Chebyshev Distance also known as maximum value distance and computed as the absolute magnitude of the differences between the coordinate of a pair of objects. The metric applied when observations are ordinal.

Minkowski Distance- This distance can be used for both ordinal and quantitative variables

Another important set of clustering procedure is K-means partitioning method, one of the most powerful techniques for the market research which are entirely different from the previous algorithm discussed. The algorithm demands the computation of k number of centroid every item will then be assigned to the nearest centroids and the process repeats iteratively until every observation are clustered into the groups.

**K-means Clustering**

Another important set of clustering procedure is K-means partitioning method one of the most powerful techniques for market research which are entirely different from the previous algorithm discussed. The algorithm demands the computation of k number of centroids every item will then be assigned to the nearest centroid and the process repeats iteratively until every observation are clustered into groups.

An initial step is to identify the number of k partitioning centroids

Based on the above parameters characteristics within the elements will be homogeneous while maximizing the differences between the groups.

**Two-Step Clustering**

The method resolves the issue of analyzing mixed variables measured on different scale levels. The algorithm is based on a two-stage approach: In the first stage, the algorithm undertakes a procedure similar to the k-means algorithm. Based on the output from the previous step, the two-step procedure conducts a modified hierarchical agglomerative clustering procedure that combines the objects sequentially to form homogeneous clusters. This is done by building a so-called cluster feature tree whose leaves represent distinct objects in the dataset. The procedure can handle categorical and continuous variables simultaneously by calculating measures-of-fit such as Akaike’s Information Criterion (AIC) or Bayes Information Criterion (BIC). Furthermore, a good marketing strategy not only entails segmenting customer groups but also targeting and positioning groups based on **customer profiling**, businesses bucketize different segments to make informed decisions in terms of sales and marketing dollars to increase ROI. Eventually, this helps businesses to deliver enhanced customer service and boost customer satisfaction.

From Malvika Mathur

**Clustering Algorithms For Customer Segmentation**

From Swmya Vivek

**Context**

In today’s competitive world, it is crucial to understand customer behavior and
categorize customers based on their demography and buying behavior. This is a
critical aspect of customer segmentation that allows marketers to better tailor
their marketing efforts to various audience subsets in terms of promotional,
marketing and product development strategies.

**Objective**

This article demonstrates the concept of segmentation of a customer data setfrom an e-commerce site
using k-means clustering in python. The data set contains the * annual
income* of ~300
customers and their

*on an e-commerce site. We will use the k-means clustering algorithm to derive the optimum number of clusters and understand the underlying customer segments based on the data provided.*

**annual spend****About the data set**

The dataset consists of Annual income (in $000) of 303 customers and their
total spend (in $000) on an e-commerce site for a period of one year. Let us
explore the data using *numpy* and *pandas* libraries in python.

#Load the required packages

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

#Plot styling

import seaborn as sns; sns.set() # for plot styling

%matplotlib inline

plt.rcParams['figure.figsize'] = (16, 9)

plt.style.use('ggplot')

#Read the csv file

dataset=pd.read_csv('CLV.csv')

#Explore the dataset

dataset.head()#top 5 columns

len(dataset) # of rows

#**descriptive statistics of the dataset**

dataset.describe().transpose()

The dataset consists of
303 rows. The mean annual income is 245000 and the mean annual spend is 149000.
The distribution of the annual income and annual spend has been illustrated
with a *distplot* and *violinplot*.

**Visualizing the data****
**The displot and violinplot give an indication of the
data distribution of Income and Spend.

#Visualizing the data - displot

plot_income = sns.distplot(dataset["INCOME"])

plot_spend = sns.distplot(dataset["SPEND"])

plt.xlabel('Income / spend')

#Violin plot of Income and Spend

f, axes = plt.subplots(1,2, figsize=(12,6), sharex=True, sharey=True)

v1 = sns.violinplot(data=dataset, x='INCOME', color="skyblue",ax=axes[0])

v2 = sns.violinplot(data=dataset, x='SPEND',color="lightgreen", ax=axes[1])

v1.set(xlim=(0,420))

**Clustering Fundamentals **Clustering is an unsupervised machine learning technique, where there are no defined dependent and independent variables. The patterns in the data are used to identify / group similar observations.

The objective of any clustering algorithm is to ensure that the distance between datapoints in a cluster is very low compared to the distance between 2 clusters. In other words, members of a group are very similar, and members of different groups are extremely dissimilar.

We will use are k-means clustering for creating customer segments based on their income and spend data.

**K-Means Clustering**

K-means clustering is an iterative clustering algorithm where the number of clusters K is predetermined and the algorithm iteratively assigns each data point to one of the K clusters based on the feature similarity.

**Broad Steps of The K-means Algorithm**

**The Mathematics of Clustering**

The mathematics behind clustering, in very simple terms involves minimizing the sum of square of distances between the cluster centroid and its associated data points:

*K*= number of clusters*N*= number of data points*C*=centroid of cluster j- (
*xij**—**cj*)– Distance between data point and centroid to which it is assigned

**Deciding on the optimum number of clusters ‘K’****
**The
main input for k-means clustering is the number of clusters. This is derived
using the concept of

*. A scree plot is created which plots the number of clusters in the X axis and the WCSS for each cluster number in the y-axis.*

**minimizing within cluster sum of square (WCSS)**As the number of clusters increase, the WCSS keeps decreasing. The decrease of WCSS is initially steep and then the rate of decrease slows down resulting in an elbow plot. The number of clusters at the elbow formation usually gives an indication on the optimum number of clusters. This combined with specific knowledge of the business requirement should be used to decide on the optimum number of clusters.

For our dataset, we will arrive at the optimum number of clusters using the elbow method:

#Using the elbow method to find the optimum number of clusters

from sklearn.cluster import KMeans

wcss = []

for i in range(1,11):

km=KMeans(n_clusters=i,init='k-means++', max_iter=300, n_init=10, random_state=0)

km.fit(X)

wcss.append(km.inertia_)

plt.plot(range(1,11),wcss)

plt.title('Elbow Method')

plt.xlabel('Number of clusters')

plt.ylabel('wcss')

plt.show()

Based on the elbow plot, we could choose 4,5 or 6 clusters. Let us try both the number of clusters and visualize the clusters to decide on the final number of clusters.

**Fitting the k-means to the dataset with k=4**

##Fitting kmeans to the dataset with k=4

km4=KMeans(n_clusters=4,init='k-means++', max_iter=300, n_init=10, random_state=0)

y_means = km4.fit_predict(X)

#Visualizing the clusters for k=4

plt.scatter(X[y_means==0,0],X[y_means==0,1],s=50, c='purple',label='Cluster1')

plt.scatter(X[y_means==1,0],X[y_means==1,1],s=50, c='blue',label='Cluster2')

plt.scatter(X[y_means==2,0],X[y_means==2,1],s=50, c='green',label='Cluster3')

plt.scatter(X[y_means==3,0],X[y_means==3,1],s=50, c='cyan',label='Cluster4')

plt.scatter(km4.cluster_centers_[:,0], km4.cluster_centers_[:,1],s=200,marker='s', c='red', alpha=0.7, label='Centroids')

plt.title('Customer segments')

plt.xlabel('Annual income of customer')

plt.ylabel('Annual spend from customer on site')

plt.legend()

plt.show()

The plot shows the distribution of the 4 clusters. We could interpret them as the following customer segments:

- Cluster 1: Customers with medium annual income and low annual spend
- Cluster 2: Customers with high annual income and medium to high annual spend
- Cluster 3: Customers with low annual income
- Cluster 4: Customers with medium annual income but high annual spend

Cluster 4 straight away is one potential customer segment. However, Cluster 2 and 3 can be segmented further to arrive at a more specific target customer group. Let us now look at how the clusters are created when k=6:

##Fitting kmeans to the dataset - k=6

km4=KMeans(n_clusters=6,init='k-means++', max_iter=300, n_init=10, random_state=0)

y_means = km4.fit_predict(X)

#Visualizing the clusters

plt.scatter(X[y_means==0,0],X[y_means==0,1],s=50, c='purple',label='Cluster1')

plt.scatter(X[y_means==1,0],X[y_means==1,1],s=50, c='blue',label='Cluster2')

plt.scatter(X[y_means==2,0],X[y_means==2,1],s=50, c='green',label='Cluster3')

plt.scatter(X[y_means==3,0],X[y_means==3,1],s=50, c='cyan',label='Cluster4')

plt.scatter(X[y_means==4,0],X[y_means==4,1],s=50, c='magenta',label='Cluster5')

plt.scatter(X[y_means==5,0],X[y_means==5,1],s=50, c='orange',label='Cluster6')

plt.scatter(km.cluster_centers_[:,0], km.cluster_centers_[:,1],s=200,marker='s', c='red', alpha=0.7, label='Centroids')

plt.title('Customer segments')

plt.xlabel('Annual income of customer')

plt.ylabel('Annual spend from customer on site')

plt.legend()

plt.show()

Setting the number of clusters to 6 seems to provide a more meaningful customer segmentation.

- Cluster 1: Medium income, low annual spend
- Cluster 2: Low income, low annual spend
- Cluster 3: High income, high annual spend
- Cluster 4: Low income, high annual spend
- Cluster 5: Medium income, low annual spend
- Cluster 6: Very high income, high annual spend

Thus it is evident that 6 clusters provides a more meaningful segmentation of the customers.

**Marketing strategies for the customer segments**

Based on the 6 clusters, we could formulate marketing strategies relevant to
each cluster:

- A typical strategy would focus certain promotional efforts for the high value customers of Cluster 6 & Cluster 3.
- Cluster 4 is a unique customer segment, where in spite of their relatively lower annual income, these customers tend to spend more on the site, indicating their loyalty. There could be some discounted pricing based promotional campaigns for this group so as to retain them.
- For Cluster 2 where both the income and annual spend are low, further analysis could be needed to find the reasons for the lower spend and price-sensitive strategies could be introduced to increase the spend from this segment.
- Customers in clusters 1 and 5 are not spending enough on the site in spite of a good annual income — further analysis of these segments could lead to insights on the satisfaction / dissatisfaction of these customers or lesser visibility of the e-commerce site to these customers. Strategies could be evolved accordingly.

We have thus seen, how we could arrive at meaningful insights and recommendations by using clustering algorithms to generate customer segments. For the sake of simplicity, the dataset used only 2 variables — income and spend. In a typical business scenario, there could be several variables which could possibly generate much more realistic and business-specific insights.