Search by Algolia
Add InstantSearch and Autocomplete to your search experience in just 5 minutes
product

Add InstantSearch and Autocomplete to your search experience in just 5 minutes

A good starting point for building a comprehensive search experience is a straightforward app template. When crafting your application’s ...

Imogen Lovera

Senior Product Manager

Best practices of conversion-focused ecommerce website design
e-commerce

Best practices of conversion-focused ecommerce website design

The inviting ecommerce website template that balances bright colors with plenty of white space. The stylized fonts for the headers ...

Catherine Dee

Search and Discovery writer

Ecommerce product listing pages: what they are and how to optimize them for maximum conversion
e-commerce

Ecommerce product listing pages: what they are and how to optimize them for maximum conversion

Imagine an online shopping experience designed to reflect your unique consumer needs and preferences — a digital world shaped completely around ...

Vincent Caruana

Senior Digital Marketing Manager, SEO

DevBit Recap: Winter 2023 — Community
engineering

DevBit Recap: Winter 2023 — Community

Winter is here for those in the northern hemisphere, with thoughts drifting toward cozy blankets and mulled wine. But before ...

Chuck Meyer

Sr. Developer Relations Engineer

How to create the highest-converting product detail pages (PDPs)
e-commerce

How to create the highest-converting product detail pages (PDPs)

What if there were a way to persuade shoppers who find your ecommerce site, ultimately making it to a product ...

Vincent Caruana

Senior Digital Marketing Manager, SEO

Highlights from GopherCon Australia 2023
engineering

Highlights from GopherCon Australia 2023

This year a bunch of our engineers from our Sydney office attended GopherCon AU at University of Technology, Sydney, in ...

David Howden
James Kozianski

David Howden &

James Kozianski

Enhancing customer engagement: The role of conversational commerce
e-commerce

Enhancing customer engagement: The role of conversational commerce

Second only to personalization, conversational commerce has been a hot topic of conversation (pun intended) amongst retailers for the better ...

Michael Klein

Principal, Klein4Retail

Craft a unique discovery experience with AI-powered recommendations
product

Craft a unique discovery experience with AI-powered recommendations

Algolia’s Recommend complements site search and discovery. As customers browse or search your site, dynamic recommendations encourage customers to ...

Maria Lungu

Frontend Engineer

What are product detail pages and why are they critical for ecommerce success?
e-commerce

What are product detail pages and why are they critical for ecommerce success?

Winter is coming, along with a bunch of houseguests. You want to replace your battered old sofa — after all,  the ...

Catherine Dee

Search and Discovery writer

Why weights are often counterproductive in ranking
engineering

Why weights are often counterproductive in ranking

Search is a very complex problem Search is a complex problem that is hard to customize to a particular use ...

Julien Lemoine

Co-founder & former CTO at Algolia

How to increase your ecommerce conversion rate in 2024
e-commerce

How to increase your ecommerce conversion rate in 2024

2%. That’s the average conversion rate for an online store. Unless you’re performing at Amazon’s promoted products ...

Vincent Caruana

Senior Digital Marketing Manager, SEO

How does a vector database work? A quick tutorial
ai

How does a vector database work? A quick tutorial

What’s a vector database? And how different is it than a regular-old traditional relational database? If you’re ...

Catherine Dee

Search and Discovery writer

Removing outliers for A/B search tests
engineering

Removing outliers for A/B search tests

How do you measure the success of a new feature? How do you test the impact? There are different ways ...

Christopher Hawke

Senior Software Engineer

Easily integrate Algolia into native apps with FlutterFlow
engineering

Easily integrate Algolia into native apps with FlutterFlow

Algolia's advanced search capabilities pair seamlessly with iOS or Android Apps when using FlutterFlow. App development and search design ...

Chuck Meyer

Sr. Developer Relations Engineer

Algolia's search propels 1,000s of retailers to Black Friday success
e-commerce

Algolia's search propels 1,000s of retailers to Black Friday success

In the midst of the Black Friday shopping frenzy, Algolia soared to new heights, setting new records and delivering an ...

Bernadette Nixon

Chief Executive Officer and Board Member at Algolia

Generative AI’s impact on the ecommerce industry
ai

Generative AI’s impact on the ecommerce industry

When was your last online shopping trip, and how did it go? For consumers, it’s becoming arguably tougher to ...

Vincent Caruana

Senior Digital Marketing Manager, SEO

What’s the average ecommerce conversion rate and how does yours compare?
e-commerce

What’s the average ecommerce conversion rate and how does yours compare?

Have you put your blood, sweat, and tears into perfecting your online store, only to see your conversion rates stuck ...

Vincent Caruana

Senior Digital Marketing Manager, SEO

What are AI chatbots, how do they work, and how have they impacted ecommerce?
ai

What are AI chatbots, how do they work, and how have they impacted ecommerce?

“Hello, how can I help you today?”  This has to be the most tired, but nevertheless tried-and-true ...

Catherine Dee

Search and Discovery writer

Looking for something?

facebookfacebooklinkedinlinkedintwittertwittermailmail

Search has been around for a while, to the point that it is now considered a standard requirement in many applications. Decades ago, we managed to make search engines scale by leveraging inverted indexes, a data structure that allows a very quick lookup of documents containing certain words. We’ve come a long way since then and have moved to vector indexes.

To understand why vector search represents such a significant advancement in search capabilities, we need to look back and see how search functionality has developed over time. 

The early stage of search

Information retrieval is a pretty old (and still active) research area. Every year, conferences like ECIR or SIGIR attract considerable interest from researchers and engineers around the world to discuss progress made in this field. The Algolia team attended the recently held ECIR conference, and you can check out our key highlights from the event.

End to end AI search banner

The development of search functionality as we know it can probably be traced as far back as the 1950s, as researchers tried to solve the problem of efficient information retrieval across large databases. Progress was made steadily with the increase in computing power and the introduction of new concepts, such as the inverted index to scale better.

This data structure allows us to fetch documents — matching specific words with great speed and accuracy — while also being potentially distributed, enabling the architecture to scale infinitely. In the late 1990s, this was the kind of structure that allowed Google to scale on the size that was the Internet at the time.

For search functionality that is truly performant, fetching is just one part of the equation, the other is ranking. A major breakthrough in this area has been TF-IDF (term-frequency, inverse-document-frequency). TF-IDF is the dominant term-weighting scheme that assigns a score to each piece of content, ranking it in relevance to each user query. This score computes the number of matches of query words in a document (TF) and the frequency of said words in the whole corpus (IDF).

The combination of the two makes sure that documents ranked first are matching the query well, and ideally on words that are rare and specific. That is, matching on articles such as the word “the” is less important than matching on less frequent, but more important terms like the word “cat.”

Limitations

While innovations, like inverted indexes and term weighting schemes, are central to making search actually work and scale, there were also some serious limitations that were inherent with these early search engines. Since the 1990s the industry has been working to address these challenges in a variety of ways: 

Text processing

Because search engines used to  store keywords and corresponding documents containing those keywords in inverted indexes, there was a need to formalize what each keyword was. This can be a difficult challenge due to a variety of reasons such as: hyphenated words may be segmented differently, languages work differently, uppercase and lowercase words may be treated differently, etc.

In each instance, it may seem easy to solve these challenges but there will always be exceptions for each word that may create edge cases. For example, we may decide to lowercase all text, but then someone looking for “Bush” may find documents that are not about “George Bush.” Coming up with these rules across each language is an extremely tedious process and can result in a suboptimal search experience.

Exact matches

If someone looks for the word “cats,” they will not find documents that contain the word “cat.” The solution in general for this challenge is stemming, where we remove the suffix of all words both in queries and documents, so that someone looking for “cats” will retrieve documents containing “cat” and “cats.”

The limitation of this approach is, once again, the resulting exceptions and edge cases. For example, “universal” and “university” are stemmed to the same token.

Word ambiguity and synonyms

Some words are ambiguous, such as “jaguar.” If the query is just this word, it is impossible to really guess what the intent is (do they mean the American professional football team, the automobile, the animal, or something entirely different). But if the query is “jaguar zoo” or “jaguar price,” intents should be easily differentiable based on the context. However, search engines that rely simply on words alone cannot understand the context by default.

On the other hand, some words are different but refer to the same concept, for example “e-mail” and “email.” Or, it would be relevant for a query “chicken” to retrieve documents containing “hen.” The solution in general is to come up with a list of synonyms so the engine maps words together. 

Misspelling

Because spelling needs to be accurate in keyword search engines, misspelled words will not allow the user to retrieve the right results unless an autocorrect feature is in place. Autocorrecting queries need some specific development, since the impact/types of misspellings will differ depending on whether you are an ecommerce platform, a publisher, or a legal document library. 

Language

In addition to the challenges presented so far, language support adds another layer of complexity. For each one of the limitations above, additional work will be needed to resolve each challenge across every language supported by the system. Assuming you have fine tuned your engine with proper tokenization, stemming, synonyms, and autocorrect for English, you will need to develop the same for every other supported language, adding tremendous complexity to your project.

New breakthroughs and expanded horizons 

Recent years have been marked by the development of more and more powerful large language models (LLMs). Their power comes from different sources such as:

  • Multitasking, e.g., ELMo was trained to perform multiple tasks with equal or better quality than independent models
  • Transformer models, which scale much better
  • Multilingual models, such as M-BERT
  • Massive models trained on massive datasets, such as GPT4

The progress of the last 5 years at a very high pace of innovation has led us to where we are today. That is, language models that are much more scalable and powerful than they were back in 2017. This breakthrough is significant for the field of search engines, because it naturally removes a lot of the challenges and limitations mentioned earlier in this article. 

LLMs can be used to understand both queries and documents by creating their vector representation. This is a semantic representation of the entire content, which removes the points of failure listed earlier that occur because of the word-by-word approach of traditional keyword search.

Stemming is also not needed anymore, since the vector representation of the word “cat” will be very close to that of “cats.’ Even better, the vector representation of “when was Barack Obama born” will be very close to the one of “how old is the 44th President of the US.” Vector search engines are naturally much better at assessing semantic similarity, while with keyword search a lot of manual work is needed to reach this level. 

Cross-language engines that are resilient to user misspelling are also much more accessible using vector search. First, because these models support more and more languages out of the box. Second, because the data on which these LLMs are trained typically come from the Internet. Misspellings are available at training and models learn about their representation. Furthermore, models are designed to support word variations much better in recent years too.

Hence today, building a search engine that works in all situations and that really understands the query and documents is much more accessible and low maintenance. There are still important challenges but they have shifted from how to be relevant to how to make these models scale. Even if models are more scalable, running them in real-time and making predictions with high throughput remains a complex problem.

Luckily, Algolia has a proven track record in scalability, availability, and robustness to bring the best of both worlds to our customers. We’ve also designed a proprietary NeuralHashing solution to compress vectors to a fraction of their size while retaining up to 99% of the information. This allows us to deliver vector-based results as fast as keyword results and can even combine them into a single API call. With vector search, you will spend time on things that are specific to your use case, and less on fine tuning relevance, optimizing some queries, or wondering how to build a reliable search on your app.

Vector search is definitely solving a lot of problems that you can experience with more standard keyword search. But it is important to remember that keyword search still has many advantages outside of the edge-cases presented above. Someone looking for an iPhone, for example, will simply type “iphone,” and there is no need to rely on vector search for this straightforward query. A great advantage of keyword search is that because it’s a pretty standard technology, it’s extremely fast and very cost effective.

So, an ideal system would benefit from the best of both types of search functionality, i.e., using keyword search when the query is simple and known, and vector search for the long tail of unique and rare queries. Such a system can make sure that every query generated by users will always retrieve results that are fast, relevant, and accurate. 

Where do we go from here?

At Algolia, we have managed to build out such a solution to combine these functionalities. Learn more about our newly launched Algolia NeuralSearch solution.

Contact our team to learn more about how you can start leveraging Algolia Neuralsearch today.

About the author
Nicolas Fiorini

Director, AI Engineering

Recommended Articles

Powered byAlgolia Algolia Recommend

The past, present, and future of semantic search
ai

Julien Lemoine

Co-founder & former CTO at Algolia

What is concept search?
ai

Hamish Ogilvy

VP, Artificial Intelligence

What is search relevance?
product

Jon Silvers

Director, Digital Marketing