Introduction to K-Means Clustering in Python with Scikit-learn
Keywords of interest:
  • [Main] K-Means, K-Means Clustering, Scikit-learn
  • Machine learning, Unsupervised Learning
  • [Optional] Deep Learning, Neural Networks

Outline:  Reframe this article by following a similar structure of the Anomaly Detection article:

- storytelling about the problem
- introduction to clustering
- intro to k-means
- use-case (you can use reuse the Titanic example here) 
- conclusion
- what's next

  • Do not link any of the resources of page one for K-Means  or K-Means Clustering on Google search.
  • Before submitting the article / draft run it on Grammarly!

Soft guide-lines for writing a good article:

  • Story telling
  • Keep in mind your audience
  • Make it easily parsable
  • Use pictures and videos
  • Show me the code (and use comments!)

Need inspiration about the guidelines?!

Have fun :)
  • Have you ever organized your bookshelf in a way that the books pertaining to the same subjects are in the same racks or same block? You most likely have. You already know about grouping similar objects together. While the idea is drastically simple, the amount of use cases influenced by this idea is enormous. In machine learning literature, this is often referred to as clustering - automatically grouping similar objects to the same groups.

  • In this article, we are going to take a look at the old faithful K-Means clustering algorithm which has impacted a very huge number of applications in a wide variety of domains. We will start off by building the general notion of clustering and some of the rules that govern it. We will review some of the different types of clustering briefly and then we will dive into the nitty gritty details of  K-Means. We’ll conclude this article by seeing K-Means in action in Python using a toy dataset. By the time you are done, you’ll have working knowledge of the algorithm and can start applying it to your own use cases.

  • Sounds interesting? Let’s begin.  

Making sense of unsupervised learning

In the world of machine learning, it is not always the case where you will be working with a labeled dataset. Why, you ask? In most of the cases, data is generally labeled by us, human beings. And it is not always possible for us to annotate data to certain categories or classes. Sometimes, the data itself may not be directly accessible. At other times, it may not be very cost-efficient to explicitly annotate data.