Umuzi Tech Department > Topics > Data-science-specifics > K-Means Clustering

K-Means Clustering

There are many different clustering algorithms to cluster, or group, objects based on how similar (or close in terms of distance) their attributes are.

We will look at just one type of clustering, K-Means clustering, but many other types exist. You can read more about other methods of clustering here.

Introduction

K-Means clustering is an unsupervised learning technique used in processes such as market segmentation, document clustering, image segmentation and image compression.

Usually we do K-Means clustering to:

Understand the structure of the data, and group similar observations.
Cluster the data into subgroups and then do different predictions on the different subgroups.

If we think that subgroup behaviours differ substantially, then we will get more accurate models by making separate models for each subgroup, than one model for all groups.

Tutorials

Guided tutorials

Unguided tutorial: Flower features

This tutorial is not compulsory, but you can go through it on your own for a gentle introduction to clustering. It is easier than the clustering assignment given in Projects.

Data: Iris species

Use K-Means cluster analysis to cluster different iris species. Make an elbow plot and/or use silhouette analysis to find the optimal number of clusters.
What are the factors that differ between different iris species?
Create a plot of the clusters.