Search by Algolia
Add InstantSearch and Autocomplete to your search experience in just 5 minutes
product

Add InstantSearch and Autocomplete to your search experience in just 5 minutes

A good starting point for building a comprehensive search experience is a straightforward app template. When crafting your application’s ...

Imogen Lovera

Senior Product Manager

Best practices of conversion-focused ecommerce website design
e-commerce

Best practices of conversion-focused ecommerce website design

The inviting ecommerce website template that balances bright colors with plenty of white space. The stylized fonts for the headers ...

Catherine Dee

Search and Discovery writer

Ecommerce product listing pages: what they are and how to optimize them for maximum conversion
e-commerce

Ecommerce product listing pages: what they are and how to optimize them for maximum conversion

Imagine an online shopping experience designed to reflect your unique consumer needs and preferences — a digital world shaped completely around ...

Vincent Caruana

Senior Digital Marketing Manager, SEO

DevBit Recap: Winter 2023 — Community
engineering

DevBit Recap: Winter 2023 — Community

Winter is here for those in the northern hemisphere, with thoughts drifting toward cozy blankets and mulled wine. But before ...

Chuck Meyer

Sr. Developer Relations Engineer

How to create the highest-converting product detail pages (PDPs)
e-commerce

How to create the highest-converting product detail pages (PDPs)

What if there were a way to persuade shoppers who find your ecommerce site, ultimately making it to a product ...

Vincent Caruana

Senior Digital Marketing Manager, SEO

Highlights from GopherCon Australia 2023
engineering

Highlights from GopherCon Australia 2023

This year a bunch of our engineers from our Sydney office attended GopherCon AU at University of Technology, Sydney, in ...

David Howden
James Kozianski

David Howden &

James Kozianski

Enhancing customer engagement: The role of conversational commerce
e-commerce

Enhancing customer engagement: The role of conversational commerce

Second only to personalization, conversational commerce has been a hot topic of conversation (pun intended) amongst retailers for the better ...

Michael Klein

Principal, Klein4Retail

Craft a unique discovery experience with AI-powered recommendations
product

Craft a unique discovery experience with AI-powered recommendations

Algolia’s Recommend complements site search and discovery. As customers browse or search your site, dynamic recommendations encourage customers to ...

Maria Lungu

Frontend Engineer

What are product detail pages and why are they critical for ecommerce success?
e-commerce

What are product detail pages and why are they critical for ecommerce success?

Winter is coming, along with a bunch of houseguests. You want to replace your battered old sofa — after all,  the ...

Catherine Dee

Search and Discovery writer

Why weights are often counterproductive in ranking
engineering

Why weights are often counterproductive in ranking

Search is a very complex problem Search is a complex problem that is hard to customize to a particular use ...

Julien Lemoine

Co-founder & former CTO at Algolia

How to increase your ecommerce conversion rate in 2024
e-commerce

How to increase your ecommerce conversion rate in 2024

2%. That’s the average conversion rate for an online store. Unless you’re performing at Amazon’s promoted products ...

Vincent Caruana

Senior Digital Marketing Manager, SEO

How does a vector database work? A quick tutorial
ai

How does a vector database work? A quick tutorial

What’s a vector database? And how different is it than a regular-old traditional relational database? If you’re ...

Catherine Dee

Search and Discovery writer

Removing outliers for A/B search tests
engineering

Removing outliers for A/B search tests

How do you measure the success of a new feature? How do you test the impact? There are different ways ...

Christopher Hawke

Senior Software Engineer

Easily integrate Algolia into native apps with FlutterFlow
engineering

Easily integrate Algolia into native apps with FlutterFlow

Algolia's advanced search capabilities pair seamlessly with iOS or Android Apps when using FlutterFlow. App development and search design ...

Chuck Meyer

Sr. Developer Relations Engineer

Algolia's search propels 1,000s of retailers to Black Friday success
e-commerce

Algolia's search propels 1,000s of retailers to Black Friday success

In the midst of the Black Friday shopping frenzy, Algolia soared to new heights, setting new records and delivering an ...

Bernadette Nixon

Chief Executive Officer and Board Member at Algolia

Generative AI’s impact on the ecommerce industry
ai

Generative AI’s impact on the ecommerce industry

When was your last online shopping trip, and how did it go? For consumers, it’s becoming arguably tougher to ...

Vincent Caruana

Senior Digital Marketing Manager, SEO

What’s the average ecommerce conversion rate and how does yours compare?
e-commerce

What’s the average ecommerce conversion rate and how does yours compare?

Have you put your blood, sweat, and tears into perfecting your online store, only to see your conversion rates stuck ...

Vincent Caruana

Senior Digital Marketing Manager, SEO

What are AI chatbots, how do they work, and how have they impacted ecommerce?
ai

What are AI chatbots, how do they work, and how have they impacted ecommerce?

“Hello, how can I help you today?”  This has to be the most tired, but nevertheless tried-and-true ...

Catherine Dee

Search and Discovery writer

Looking for something?

facebookfacebooklinkedinlinkedintwittertwittermailmail

Just as with a school kid who’s left unsupervised when their teacher steps outside to deal with a distraction, unsupervised learning in a machine-learning scenario can have its drawbacks. For instance, consider search technology. With no data scientist overseeing the generative process, Ms. or Mr. Machine could get creative in the sources it deems accurate, stringing together unrelated information to present search results that don’t quite compute — that is, they aren’t necessarily true or accurate.

In our era of Big Data, supervised learning has inherent value. It seems safer. When a kid isn’t left alone to let who-knows-what ensue, there aren’t usually issues.

But when it comes to AI-powered search, among other applications, unsupervised learning can’t simply be written off as too risky. For one thing, it offers way too many major benefits. In terms of data science, one huge advantage is the removal of human bias. Instead of having a researcher predefine classification groups, all on its own, the independently minded machine puts forth a number of clusters based on empirical proof.

So an unsupervised data mining system that can detect many kinds of data groupings based on similarities it can detect — for instance, as applied to customer segmentation or image segmentation — is undoubtedly a powerful tool.

How clustering works

Unsupervised learning utilizes clustering: grouping data points in scatter plots in classes that share similar characteristics. Instead of predefined categories, unlabeled data is classified (clustered) by features.

When doing clustering segmentation with words, things can get complex because some words can be placed in a variety of clusters. For example, a word like “moving” might be placed in multiple grammatical clusters (because it’s a verb and an –ing word), as well as clusters related to driving a car and relocation.

Fortunately, the various problems arising from establishing word meaning in machine learning can be summarily solved. And that’s where the k-means clustering algorithm comes in handy.

What is k-means clustering?

The variable k, referring to the number of centroids you need in the dataset, is predefined. It represents the number of groups or categories. The data is divided into k clusters (classes). 

The “means” in k-means refers to the averaging of data related to the location of a center point, or centroid. A centroid is the (real or imaginary) center of the cluster.

K-means looks for a fixed number (k) of clusters in a dataset.

With k-means, the distance between each point in the dataset and each initial centroid is calculated. According to one study, “Distance measure plays a very important rule [in] the performance of this algorithm…. Based on the values found, points are assigned to the centroid with minimum distance. Hence, this distance calculation plays [a] vital role in the clustering algorithm.”

The goal of using the k-means algorithm is to minimize the sum of distances between the data points and their corresponding clusters, grouping similar data points together and thereby being able to see underlying patterns.

Each dataset can belong to only one group of similar properties, which pinpoints the location of the cluster center of mass (the centroid, a collection of feature values).

How the k-means algorithm works

The k-means clustering algorithm starts by randomly choosing data points as proposed cluster centroids, the beginner points for the initial clusters. A k number of centroids is identified.

The algorithm assigns each data point to its nearest center point. Every point is allocated to the nearest cluster by reducing the in-cluster sum of squares, while simultaneously ensuring that the centroids stay as compact as possible. The points near the center create the cluster.

The algorithm uses an iterative (repeating) approach to pinpoint the best value for each centroid, recalculating new centroids until it ultimately comes up with workable final cluster assignments. When the initial centroids are estimated, these estimates are pinpointed by repeatedly assigning examples to their closest centroids and, based on the new assignments, recomputing the centroids.

With this step-1 activity complete, a new data point is assigned for each cluster. The process is repeated until convergence is reached, with the recalculated centroids matching the initial centroids. The centroid values no longer change and the defined number of iterations has been reached. If the convergence criterion is not attained, there is still a limit on the maximum number of iterations (between 1 and 999). Either way, the algorithm has a trigger to stop creating and optimizing clusters.

Determining the number of clusters

With k-means, to determine the number of clusters in a data set, data scientists often apply the aptly named elbow method. Based on where the curve of the plotted data shows an “elbow” or starts to straighten out after heading up, they can see where returns will diminish. Based on that, they can determine the optimal number of clusters to use.

With k-means, there’s another clustering method as well. You can alternatively choose the number of clusters by analyzing the silhouette plot, which shows how close each point in a cluster is to points in other clusters.

Why is the k-means algorithm so popular?

The k-means clustering algorithm is employed extensively to analyze data clusters in applications ranging from search to business analysis.

Among the various unsupervised machine-learning algorithms available, what makes k-means stand out?

In a word, speed. Whether there’s not much information or there’s a very large dataset, k-means excels at quickly dividing data into clusters. It’s also flexible, and it’s known to work well immediately. This is an expedient way to figure out the categories of groups in an unlabeled dataset without training a machine.

The advantage of this approach is also that the human bias is taken out of the equation. Instead of having a researcher create classification groups, the machine creates its own clusters based on empirical proof, as opposed to people’s assumptions.

The downsides? A few considerations:

  • K-means performance is typically not as impressive as with other high-level clustering techniques, as minor variations in the data could possibly lead to high levels of variance.
  • Clusters are expected to be spherical and sized similarly, without too many outliers; when they’re not, it could lower the accuracy of k-means-clustering Python results.
  • In the k-means method of cluster analysis, observations are grouped by minimizing Euclidean distances between them. According to Wikipedia, “k-means clustering minimizes within-cluster variances (squared Euclidean distances), but not regular Euclidean distances…. The problem is computationally difficult…however, efficient heuristic algorithms converge quickly to a local optimum.”

For more information

Hope you’ve enjoyed this quick introduction to k-means clustering. If you want to learn more:

  • Consult the scikit-learn library for a straightforward explanation of how k-means clustering works
  • Check out some of the excellent in-depth tutorials available on k-means clustering
  • Get a good visualization of how the k-means machine-learning algorithm works using the Python programming language

Want the advantages of k-means-powered search?

Looking for potent optimization of your website? Algolia utilizes k-means clustering among other algorithms to return fast, accurate AI-powered search results and recommendations for our clients’ sites and apps.

See how our groundbreaking NeuralSearch technology can improve the search and discovery experience for your users, plus potentially boost your site metrics up along with your customer satisfaction levels. Contact us for all the details!

About the author
Catherine Dee

Search and Discovery writer

linkedin

Recommended Articles

Powered byAlgolia Algolia Recommend

What is clustering?
ai

Vincent Caruana

Sr. SEO Web Digital Marketing Manager

An introduction to machine learning for images and text — now and in the (near) future
ai

Peter Villani

Sr. Tech & Business Writer

How We Tackled Color Identification
engineering

Léo Ercolanelli

Software Engineer