Add InstantSearch and Autocomplete to your search experience in just 5 minutes
A good starting point for building a comprehensive search experience is a straightforward app template. When crafting your application’s ...
Senior Product Manager
A good starting point for building a comprehensive search experience is a straightforward app template. When crafting your application’s ...
Senior Product Manager
The inviting ecommerce website template that balances bright colors with plenty of white space. The stylized fonts for the headers ...
Search and Discovery writer
Imagine an online shopping experience designed to reflect your unique consumer needs and preferences — a digital world shaped completely around ...
Senior Digital Marketing Manager, SEO
Winter is here for those in the northern hemisphere, with thoughts drifting toward cozy blankets and mulled wine. But before ...
Sr. Developer Relations Engineer
What if there were a way to persuade shoppers who find your ecommerce site, ultimately making it to a product ...
Senior Digital Marketing Manager, SEO
This year a bunch of our engineers from our Sydney office attended GopherCon AU at University of Technology, Sydney, in ...
David Howden &
James Kozianski
Second only to personalization, conversational commerce has been a hot topic of conversation (pun intended) amongst retailers for the better ...
Principal, Klein4Retail
Algolia’s Recommend complements site search and discovery. As customers browse or search your site, dynamic recommendations encourage customers to ...
Frontend Engineer
Winter is coming, along with a bunch of houseguests. You want to replace your battered old sofa — after all, the ...
Search and Discovery writer
Search is a very complex problem Search is a complex problem that is hard to customize to a particular use ...
Co-founder & former CTO at Algolia
2%. That’s the average conversion rate for an online store. Unless you’re performing at Amazon’s promoted products ...
Senior Digital Marketing Manager, SEO
What’s a vector database? And how different is it than a regular-old traditional relational database? If you’re ...
Search and Discovery writer
How do you measure the success of a new feature? How do you test the impact? There are different ways ...
Senior Software Engineer
Algolia's advanced search capabilities pair seamlessly with iOS or Android Apps when using FlutterFlow. App development and search design ...
Sr. Developer Relations Engineer
In the midst of the Black Friday shopping frenzy, Algolia soared to new heights, setting new records and delivering an ...
Chief Executive Officer and Board Member at Algolia
When was your last online shopping trip, and how did it go? For consumers, it’s becoming arguably tougher to ...
Senior Digital Marketing Manager, SEO
Have you put your blood, sweat, and tears into perfecting your online store, only to see your conversion rates stuck ...
Senior Digital Marketing Manager, SEO
“Hello, how can I help you today?” This has to be the most tired, but nevertheless tried-and-true ...
Search and Discovery writer
Search has been around for a while, to the point that it is now considered a standard requirement in many applications. Decades ago, we managed to make search engines scale by leveraging inverted indexes, a data structure that allows a very quick lookup of documents containing certain words. We’ve come a long way since then and have moved to vector indexes.
To understand why vector search represents such a significant advancement in search capabilities, we need to look back and see how search functionality has developed over time.
Information retrieval is a pretty old (and still active) research area. Every year, conferences like ECIR or SIGIR attract considerable interest from researchers and engineers around the world to discuss progress made in this field. The Algolia team attended the recently held ECIR conference, and you can check out our key highlights from the event.
The development of search functionality as we know it can probably be traced as far back as the 1950s, as researchers tried to solve the problem of efficient information retrieval across large databases. Progress was made steadily with the increase in computing power and the introduction of new concepts, such as the inverted index to scale better.
This data structure allows us to fetch documents — matching specific words with great speed and accuracy — while also being potentially distributed, enabling the architecture to scale infinitely. In the late 1990s, this was the kind of structure that allowed Google to scale on the size that was the Internet at the time.
For search functionality that is truly performant, fetching is just one part of the equation, the other is ranking. A major breakthrough in this area has been TF-IDF (term-frequency, inverse-document-frequency). TF-IDF is the dominant term-weighting scheme that assigns a score to each piece of content, ranking it in relevance to each user query. This score computes the number of matches of query words in a document (TF) and the frequency of said words in the whole corpus (IDF).
The combination of the two makes sure that documents ranked first are matching the query well, and ideally on words that are rare and specific. That is, matching on articles such as the word “the” is less important than matching on less frequent, but more important terms like the word “cat.”
While innovations, like inverted indexes and term weighting schemes, are central to making search actually work and scale, there were also some serious limitations that were inherent with these early search engines. Since the 1990s the industry has been working to address these challenges in a variety of ways:
Because search engines used to store keywords and corresponding documents containing those keywords in inverted indexes, there was a need to formalize what each keyword was. This can be a difficult challenge due to a variety of reasons such as: hyphenated words may be segmented differently, languages work differently, uppercase and lowercase words may be treated differently, etc.
In each instance, it may seem easy to solve these challenges but there will always be exceptions for each word that may create edge cases. For example, we may decide to lowercase all text, but then someone looking for “Bush” may find documents that are not about “George Bush.” Coming up with these rules across each language is an extremely tedious process and can result in a suboptimal search experience.
If someone looks for the word “cats,” they will not find documents that contain the word “cat.” The solution in general for this challenge is stemming, where we remove the suffix of all words both in queries and documents, so that someone looking for “cats” will retrieve documents containing “cat” and “cats.”
The limitation of this approach is, once again, the resulting exceptions and edge cases. For example, “universal” and “university” are stemmed to the same token.
Some words are ambiguous, such as “jaguar.” If the query is just this word, it is impossible to really guess what the intent is (do they mean the American professional football team, the automobile, the animal, or something entirely different). But if the query is “jaguar zoo” or “jaguar price,” intents should be easily differentiable based on the context. However, search engines that rely simply on words alone cannot understand the context by default.
On the other hand, some words are different but refer to the same concept, for example “e-mail” and “email.” Or, it would be relevant for a query “chicken” to retrieve documents containing “hen.” The solution in general is to come up with a list of synonyms so the engine maps words together.
Because spelling needs to be accurate in keyword search engines, misspelled words will not allow the user to retrieve the right results unless an autocorrect feature is in place. Autocorrecting queries need some specific development, since the impact/types of misspellings will differ depending on whether you are an ecommerce platform, a publisher, or a legal document library.
In addition to the challenges presented so far, language support adds another layer of complexity. For each one of the limitations above, additional work will be needed to resolve each challenge across every language supported by the system. Assuming you have fine tuned your engine with proper tokenization, stemming, synonyms, and autocorrect for English, you will need to develop the same for every other supported language, adding tremendous complexity to your project.
Recent years have been marked by the development of more and more powerful large language models (LLMs). Their power comes from different sources such as:
The progress of the last 5 years at a very high pace of innovation has led us to where we are today. That is, language models that are much more scalable and powerful than they were back in 2017. This breakthrough is significant for the field of search engines, because it naturally removes a lot of the challenges and limitations mentioned earlier in this article.
LLMs can be used to understand both queries and documents by creating their vector representation. This is a semantic representation of the entire content, which removes the points of failure listed earlier that occur because of the word-by-word approach of traditional keyword search.
Stemming is also not needed anymore, since the vector representation of the word “cat” will be very close to that of “cats.’ Even better, the vector representation of “when was Barack Obama born” will be very close to the one of “how old is the 44th President of the US.” Vector search engines are naturally much better at assessing semantic similarity, while with keyword search a lot of manual work is needed to reach this level.
Cross-language engines that are resilient to user misspelling are also much more accessible using vector search. First, because these models support more and more languages out of the box. Second, because the data on which these LLMs are trained typically come from the Internet. Misspellings are available at training and models learn about their representation. Furthermore, models are designed to support word variations much better in recent years too.
Hence today, building a search engine that works in all situations and that really understands the query and documents is much more accessible and low maintenance. There are still important challenges but they have shifted from how to be relevant to how to make these models scale. Even if models are more scalable, running them in real-time and making predictions with high throughput remains a complex problem.
Luckily, Algolia has a proven track record in scalability, availability, and robustness to bring the best of both worlds to our customers. We’ve also designed a proprietary NeuralHashing solution to compress vectors to a fraction of their size while retaining up to 99% of the information. This allows us to deliver vector-based results as fast as keyword results and can even combine them into a single API call. With vector search, you will spend time on things that are specific to your use case, and less on fine tuning relevance, optimizing some queries, or wondering how to build a reliable search on your app.
Vector search is definitely solving a lot of problems that you can experience with more standard keyword search. But it is important to remember that keyword search still has many advantages outside of the edge-cases presented above. Someone looking for an iPhone, for example, will simply type “iphone,” and there is no need to rely on vector search for this straightforward query. A great advantage of keyword search is that because it’s a pretty standard technology, it’s extremely fast and very cost effective.
So, an ideal system would benefit from the best of both types of search functionality, i.e., using keyword search when the query is simple and known, and vector search for the long tail of unique and rare queries. Such a system can make sure that every query generated by users will always retrieve results that are fast, relevant, and accurate.
At Algolia, we have managed to build out such a solution to combine these functionalities. Learn more about our newly launched Algolia NeuralSearch solution.
Contact our team to learn more about how you can start leveraging Algolia Neuralsearch today.
Director, AI Engineering
Powered by Algolia Recommend