Search by Algolia
Add InstantSearch and Autocomplete to your search experience in just 5 minutes
product

Add InstantSearch and Autocomplete to your search experience in just 5 minutes

A good starting point for building a comprehensive search experience is a straightforward app template. When crafting your application’s ...

Imogen Lovera

Senior Product Manager

Best practices of conversion-focused ecommerce website design
e-commerce

Best practices of conversion-focused ecommerce website design

The inviting ecommerce website template that balances bright colors with plenty of white space. The stylized fonts for the headers ...

Catherine Dee

Search and Discovery writer

Ecommerce product listing pages: what they are and how to optimize them for maximum conversion
e-commerce

Ecommerce product listing pages: what they are and how to optimize them for maximum conversion

Imagine an online shopping experience designed to reflect your unique consumer needs and preferences — a digital world shaped completely around ...

Vincent Caruana

Senior Digital Marketing Manager, SEO

DevBit Recap: Winter 2023 — Community
engineering

DevBit Recap: Winter 2023 — Community

Winter is here for those in the northern hemisphere, with thoughts drifting toward cozy blankets and mulled wine. But before ...

Chuck Meyer

Sr. Developer Relations Engineer

How to create the highest-converting product detail pages (PDPs)
e-commerce

How to create the highest-converting product detail pages (PDPs)

What if there were a way to persuade shoppers who find your ecommerce site, ultimately making it to a product ...

Vincent Caruana

Senior Digital Marketing Manager, SEO

Highlights from GopherCon Australia 2023
engineering

Highlights from GopherCon Australia 2023

This year a bunch of our engineers from our Sydney office attended GopherCon AU at University of Technology, Sydney, in ...

David Howden
James Kozianski

David Howden &

James Kozianski

Enhancing customer engagement: The role of conversational commerce
e-commerce

Enhancing customer engagement: The role of conversational commerce

Second only to personalization, conversational commerce has been a hot topic of conversation (pun intended) amongst retailers for the better ...

Michael Klein

Principal, Klein4Retail

Craft a unique discovery experience with AI-powered recommendations
product

Craft a unique discovery experience with AI-powered recommendations

Algolia’s Recommend complements site search and discovery. As customers browse or search your site, dynamic recommendations encourage customers to ...

Maria Lungu

Frontend Engineer

What are product detail pages and why are they critical for ecommerce success?
e-commerce

What are product detail pages and why are they critical for ecommerce success?

Winter is coming, along with a bunch of houseguests. You want to replace your battered old sofa — after all,  the ...

Catherine Dee

Search and Discovery writer

Why weights are often counterproductive in ranking
engineering

Why weights are often counterproductive in ranking

Search is a very complex problem Search is a complex problem that is hard to customize to a particular use ...

Julien Lemoine

Co-founder & former CTO at Algolia

How to increase your ecommerce conversion rate in 2024
e-commerce

How to increase your ecommerce conversion rate in 2024

2%. That’s the average conversion rate for an online store. Unless you’re performing at Amazon’s promoted products ...

Vincent Caruana

Senior Digital Marketing Manager, SEO

How does a vector database work? A quick tutorial
ai

How does a vector database work? A quick tutorial

What’s a vector database? And how different is it than a regular-old traditional relational database? If you’re ...

Catherine Dee

Search and Discovery writer

Removing outliers for A/B search tests
engineering

Removing outliers for A/B search tests

How do you measure the success of a new feature? How do you test the impact? There are different ways ...

Christopher Hawke

Senior Software Engineer

Easily integrate Algolia into native apps with FlutterFlow
engineering

Easily integrate Algolia into native apps with FlutterFlow

Algolia's advanced search capabilities pair seamlessly with iOS or Android Apps when using FlutterFlow. App development and search design ...

Chuck Meyer

Sr. Developer Relations Engineer

Algolia's search propels 1,000s of retailers to Black Friday success
e-commerce

Algolia's search propels 1,000s of retailers to Black Friday success

In the midst of the Black Friday shopping frenzy, Algolia soared to new heights, setting new records and delivering an ...

Bernadette Nixon

Chief Executive Officer and Board Member at Algolia

Generative AI’s impact on the ecommerce industry
ai

Generative AI’s impact on the ecommerce industry

When was your last online shopping trip, and how did it go? For consumers, it’s becoming arguably tougher to ...

Vincent Caruana

Senior Digital Marketing Manager, SEO

What’s the average ecommerce conversion rate and how does yours compare?
e-commerce

What’s the average ecommerce conversion rate and how does yours compare?

Have you put your blood, sweat, and tears into perfecting your online store, only to see your conversion rates stuck ...

Vincent Caruana

Senior Digital Marketing Manager, SEO

What are AI chatbots, how do they work, and how have they impacted ecommerce?
ai

What are AI chatbots, how do they work, and how have they impacted ecommerce?

“Hello, how can I help you today?”  This has to be the most tired, but nevertheless tried-and-true ...

Catherine Dee

Search and Discovery writer

Looking for something?

facebookfacebooklinkedinlinkedintwittertwittermailmail

Search is a very complex problem

Search is a complex problem that is hard to customize to a particular use case — even search experts need a lot of iteration to configure a search engine.

Search is composed of several steps that all require different configurations. It is usually split into three steps:

  • Step 1: Retrieve all hits relevant potentially to the query
  • Step 2: Rank all those potential results to make sure the best results are first
  • Step 3: Reranking or dynamic ranking where external signals are used. Typically we apply a learning-to-rank approach.

In most modern search engines, the first two steps are executed concurrently for performance reasons, but each has distinct benefits — first to expand the search for the largest possible result set (ie, optimize for recall) and then order results from most to least relevant (ie, optimize for precision).

Independently of the technology we use, both the retrieval and the ranking processes are difficult and contain a lot of parameters.

In this article, we explain where weights (also called boosts) are used as a solution to these challenges in search. Weights can feel like an intuitive solution, which is why they’ve been used for a long time. However, they can actually be dangerous and often counterproductive.

Challenges in the retrieval phase

The main challenge associated with the retrieval phase is to ensure that all the potentially relevant records are found. This challenge takes a different approach depending on the technology that is used.

  • In a keyword-based search engine, the retrieve phase challenge is to ensure that all the potential hits are scanned. The engine needs to identify all the approximations that may be relevant. In particular, the engine needs to identify all the synonyms, and all the typos including some complex ones like a concatenation or a split of two keywords, explore partial matches (only a subset of the query words are found), etc.
  • In a typical vector-based search engine using LLMs, the retrieval phase challenge is to keep an acceptable performance as all hits have a similarity score that is not zero with the query. To make it even more difficult, the scores for relevant vs irrelevant items can vary greatly per query! Most engines use a graph representation (HNSW being the most popular) to scan only the results that have a vector close to the query. Most engines are finding that bi-encoders produce only rough approximations. And, there is no one threshold score that works for relevance. Thus it is now commonplace to include a re-ranker as a merge step between keyword and vector hits, such as a cross encoder. Cross encoders retain more of the network attention relationships that make transformers work and can thus make much better decisions on relevance even when the hits are derived from keyword retrieval. However they need to process the query and hits together, so they are not efficient for retrieval (like bi-encoders).

Today, more and more search engines rely both on keyword search and semantic search, meaning that the two challenges need to be addressed.

Challenges in the ranking phase

In the ranking phase, the challenge is to merge all the signals together to have one final way to order the results. In particular, there are three categories of signals that are merged together in the ranking

  • Textual signals: the information coming from the search engine technology when comparing the query and the retrieved record. There are often parameters This can be the output of a textual score in a keyword engine like a BM25F score or the similarity score between the query vector and the text.
  • Business signals: the information that is business-related to help sort the result independently of the textual relevance between the query and the record. In particular, it can use a notion of the popularity of the item, availability of the item, items promoted for advertisement, contribution to the business objective, etc.
  • User signals: the information that is specific to the customer. It contains some notion of availability (what are the records that the user can access), notion of preference, geography, etc.

Mixing those criteria is very complex and the reflex is often to use weights set by the business to merge all those signals.

Weights inside the Textual Score

In the retrieve phase, the query information can be found inside multiple attributes (also called fields) of a record. Not all attributes have the same importance and this is why most engines ask the customer to set a score (or a boost) to every attribute to translate the idea of the business importance of an item. Setting those scores is not an easy task and often leads to relevance problems as there is an infinite number of configurations that are possible, a lot of these configurations producing very similar results.

From the business point of view, there is a difference between the value of each attribute. For example, in an ecommerce store, it is not the same to match the query “laptop 13” in the “category” attribute than in the “description” attribute (you can imagine this is not relevant to match a backpack that has a pocket for laptop on the “laptop” query).

The problem with setting the weights manually is that you never know if you have the best configuration or not, and the best configuration actually varies for different query types. So there is no one solution. You are limited by the number of different configurations you can test, each of them requiring a test period. Even worse, it is common to iterate on the weights based on the observation of a small set of queries without checking in advance the global impact of this change.

Instead of setting those weights manually, it is better to give a “hint” to an AI algorithm that will optimize the weight automatically and constantly (because the data are also changing over time). Such a hint is usually a list of attributes ordered from the most important to the least important with potentially some equality. For example, (“Category”, “Brand”, “Color”) > “Name” > “Description” when the business knows that “Category”, “Brand” are clean data reviewed by the business and Name/Description are more generic.

A weight to merge the Textual signals and the Business signals

Another common usage of weight is to merge the textual signals and the business signals (score = 𝝈 x TextualScore + (1 – 𝝈 ) * BusinessScore). For example, a weight of 0.5 will give the same importance to the textual score and to the business score and compute a final ordering. This seems very intuitive but creates a lot of relevancy issues. To illustrate the problem, let’s take an extreme example where you want to sort by a business signal. In this case, you totally ignore the textual signals to sort by one business criteria (sorting by increasing price for example). Because the search engine is designed to consider all potential hits, even the ones that are far away from the query, you will end up with results that are very far from the query because they are cheap. This is a problem you find on a lot of websites, often forcing the customer to filter the query to have something a bit more relevant.

The fundamental problem is that the business score is merged in the same way for all results, while some of them are very far from the query and should be eliminated. For the sort, one classical solution is to filter the result set and only keep the good textual scores before applying the sort. In practice, most marketplaces use such an approach. (you can detect such an approach if you’re on a site where the number of results for the “sort by” is lower than the initial search).

For the general problem of merging the Textual Signals and the Business Signals, we have exactly the same problem as in the extreme case of a “sort by”. To be appropriate, it requires to have several buckets of textual relevance and to then apply the sort inside each bucket.

Similar to the problem of weights inside the Textual Score, the best approach is to define the business score and let an algorithm optimize the merging for you depending on the user behavior.

Weights between different signals

Ranking is a complex merge problem between a lot of signals. It is tempting to merge all of them with weights, at least at a high level between the Textual Signals, Business Signals, and User Signals. In practice setting those weights manually is one order of magnitude more complex than the previous two examples and is a setup for failure.

Depending on the context (the query, the section of the website, etc.), the merge can be different to provide relevant results to the user. The search engine can potentially use weights and a machine learning algorithm to set them, this is fine. But setting those weights manually is never something that delivers the best results.

Weights are dangerous

Weights are more often part of the problem than part of the solution when they are set manually. The next time you will be asked to manually configure a weight, you should think about the danger of such a setting and if there is a way to automatically configure them. Noting that LTR (Learning To Rank) algorithms often use boosted trees and other non-linear ways of finding dependencies and relationships between the ranking signals. Thus the optimal way of setting the weights ends up different for basically every query. For the sort example above, this could learn that the relevance cut off should increase when a hard sort — like the price attribute — is applied.

If you want to learn how Algolia is tackling this challenge, watch this space to learn more.

About the author
Julien Lemoine

Co-founder & former CTO at Algolia

githublinkedintwitter

Recommended Articles

Powered byAlgolia Algolia Recommend

Algolia's top 10 tips to achieve highly relevant search results
product

Julien Lemoine

Co-founder & former CTO at Algolia

What is a search query and how is it processed by a search engine?
product

Catherine Dee

Search and Discovery writer

Comparing Algolia and Elasticsearch For Consumer-Grade Search Part 2: Relevance Isn’t Luck
engineering

Josh Dzielak