Search by Algolia
Add InstantSearch and Autocomplete to your search experience in just 5 minutes
product

Add InstantSearch and Autocomplete to your search experience in just 5 minutes

A good starting point for building a comprehensive search experience is a straightforward app template. When crafting your application’s ...

Imogen Lovera

Senior Product Manager

Best practices of conversion-focused ecommerce website design
e-commerce

Best practices of conversion-focused ecommerce website design

The inviting ecommerce website template that balances bright colors with plenty of white space. The stylized fonts for the headers ...

Catherine Dee

Search and Discovery writer

Ecommerce product listing pages: what they are and how to optimize them for maximum conversion
e-commerce

Ecommerce product listing pages: what they are and how to optimize them for maximum conversion

Imagine an online shopping experience designed to reflect your unique consumer needs and preferences — a digital world shaped completely around ...

Vincent Caruana

Senior Digital Marketing Manager, SEO

DevBit Recap: Winter 2023 — Community
engineering

DevBit Recap: Winter 2023 — Community

Winter is here for those in the northern hemisphere, with thoughts drifting toward cozy blankets and mulled wine. But before ...

Chuck Meyer

Sr. Developer Relations Engineer

How to create the highest-converting product detail pages (PDPs)
e-commerce

How to create the highest-converting product detail pages (PDPs)

What if there were a way to persuade shoppers who find your ecommerce site, ultimately making it to a product ...

Vincent Caruana

Senior Digital Marketing Manager, SEO

Highlights from GopherCon Australia 2023
engineering

Highlights from GopherCon Australia 2023

This year a bunch of our engineers from our Sydney office attended GopherCon AU at University of Technology, Sydney, in ...

David Howden
James Kozianski

David Howden &

James Kozianski

Enhancing customer engagement: The role of conversational commerce
e-commerce

Enhancing customer engagement: The role of conversational commerce

Second only to personalization, conversational commerce has been a hot topic of conversation (pun intended) amongst retailers for the better ...

Michael Klein

Principal, Klein4Retail

Craft a unique discovery experience with AI-powered recommendations
product

Craft a unique discovery experience with AI-powered recommendations

Algolia’s Recommend complements site search and discovery. As customers browse or search your site, dynamic recommendations encourage customers to ...

Maria Lungu

Frontend Engineer

What are product detail pages and why are they critical for ecommerce success?
e-commerce

What are product detail pages and why are they critical for ecommerce success?

Winter is coming, along with a bunch of houseguests. You want to replace your battered old sofa — after all,  the ...

Catherine Dee

Search and Discovery writer

Why weights are often counterproductive in ranking
engineering

Why weights are often counterproductive in ranking

Search is a very complex problem Search is a complex problem that is hard to customize to a particular use ...

Julien Lemoine

Co-founder & former CTO at Algolia

How to increase your ecommerce conversion rate in 2024
e-commerce

How to increase your ecommerce conversion rate in 2024

2%. That’s the average conversion rate for an online store. Unless you’re performing at Amazon’s promoted products ...

Vincent Caruana

Senior Digital Marketing Manager, SEO

How does a vector database work? A quick tutorial
ai

How does a vector database work? A quick tutorial

What’s a vector database? And how different is it than a regular-old traditional relational database? If you’re ...

Catherine Dee

Search and Discovery writer

Removing outliers for A/B search tests
engineering

Removing outliers for A/B search tests

How do you measure the success of a new feature? How do you test the impact? There are different ways ...

Christopher Hawke

Senior Software Engineer

Easily integrate Algolia into native apps with FlutterFlow
engineering

Easily integrate Algolia into native apps with FlutterFlow

Algolia's advanced search capabilities pair seamlessly with iOS or Android Apps when using FlutterFlow. App development and search design ...

Chuck Meyer

Sr. Developer Relations Engineer

Algolia's search propels 1,000s of retailers to Black Friday success
e-commerce

Algolia's search propels 1,000s of retailers to Black Friday success

In the midst of the Black Friday shopping frenzy, Algolia soared to new heights, setting new records and delivering an ...

Bernadette Nixon

Chief Executive Officer and Board Member at Algolia

Generative AI’s impact on the ecommerce industry
ai

Generative AI’s impact on the ecommerce industry

When was your last online shopping trip, and how did it go? For consumers, it’s becoming arguably tougher to ...

Vincent Caruana

Senior Digital Marketing Manager, SEO

What’s the average ecommerce conversion rate and how does yours compare?
e-commerce

What’s the average ecommerce conversion rate and how does yours compare?

Have you put your blood, sweat, and tears into perfecting your online store, only to see your conversion rates stuck ...

Vincent Caruana

Senior Digital Marketing Manager, SEO

What are AI chatbots, how do they work, and how have they impacted ecommerce?
ai

What are AI chatbots, how do they work, and how have they impacted ecommerce?

“Hello, how can I help you today?”  This has to be the most tired, but nevertheless tried-and-true ...

Catherine Dee

Search and Discovery writer

Looking for something?

facebookfacebooklinkedinlinkedintwittertwittermailmail

Broadly speaking, a search index is like the index at the end of a book, where a small, non-exhaustive list of words and subjects are listed with page numbers. More precisely, it’s the mapping of a query to the content in a corpus (a large set of online books and documents, a product or film catalog). In computer-jargon, it’s an inverted list (index) of words that a search engine uses to find every word in every document within a corpus. 

But is the metaphor of the book index actually correct? As in all matters related to technology, it’s hard to find a good balance between providing an overview of a subject and diving in deep – without losing meaning or your audience.

In the past, we’ve answered the question What is a search index? in different ways: 

  • At the highest level, we described what an index does, focusing especially on how it has transformed our way of shopping online 
  • At a functional level, we illustrated how a fast indexing process can be leveraged to drive a number of different and surprising scenarios 
  • At the technical level, we dove inside our seach engine index to describe the actual physical structure of an index, and how this structure enables our search engine to store millions of records and still be able to perform a search in milliseconds 

This article covers a middle ground between the functional and the technical, defining the capabilities of the powerful search indexes we often see in Google, Amazon, and Netflix, and providing an introduction to how these indexes can perform at such fast speeds. 

The book metaphor: a search index is like the index at the back of a book

A book index for a biography looks like this: 

  • Marriage: pages 30, 50-67, 246
  • Early life: pages 2-15, 77-89
  • Father: pages 10, 83-85, 150-178

A search index can be represented in a very similar manner:

  • Marriage -> documents 1, 5, 400, 900
  • Early life  -> documents 33, 45, 98, 235
  • Father  -> documents 101, 345, 785

The book metaphor is useful because it underscores the general idea that an index is a separate object from the underlying content, which is used to (easily and quickly) find specific parts of the content (pages in a book, documents in a collection of documents)

To use another metaphor, an index helps us navigate a book like a compass to a map, where the compass replaces the need to scan the map. In the same way, an index at the end of a book is far more efficient than scanning the whole book for one phrase: it obviously saves you time and is more reliable. In the example above, the index directs you reliably to the exact sections in a biography that discuss the “early life” of the subject. 

But is a book index the same as a search index?

A metaphor only goes so far. The book metaphor doesn’t fully capture the capabilities, purposes, and mechanisms, nor our expectations, of a search engine index.

For example:

  • A search index normally indexes every word and part of a book or set of content. A book’s index would be too long if it contained every word and theme in the book. However, search technology is powerful enough to create and search through an exhaustive index of words and themes.
  • We expect that a search engine will show us every piece of content that mentions the words we type in when entering a search query.
  • A search index usually searches more than one book. This is one of its most powerful aspects, to combine and search through large amounts of content in a split second. Examples of content: products on a company’s website (site search), a library’s full online catalog (every word in every book), or everything crawlable on the web (Google).
  • A computer is tolerant: it allows typos and spelling errors, ignores superfluous words like ‘the” and “is”, and understands multiple languages with complex grammars.
  • A search index is dynamic, updating its data with every change in the underlying content. And it does this as fast as possible. Some changes are expected to be in real-time (Twitter’s real time indexing, Uber, airline reservations), others in regular intervals, such as every 5 minutes, or a few times over the course of a day or week. 
  • An index provides relevance – a central topic in search technology. For our purposes here, relevance helps you discover content, connections, and other terms that you might not have thought about, and puts them in an order that makes sense to your current research.

Let’s just say that the metaphor of a book index gets you in the door to understanding what an index does, but details like the above (and there are many more), help you understand the full potential of what a search engine index can accomplish and how it has transformed our lives.

So, what is a search index?

A search index can be used in two different contexts:

  • Searching for content, like the text in books, blogs, newspaper articles, system logs, and any other “document” that contains a lot of text. The expectation here is that every word counts, because the searcher is looking for the exact words and phrases used in the content. 
  • Searching for objects, like products on Amazon, films on Netflix, services in a hospital – any object that relies on attributes to define it. The expectation here is that the object is titled and described accurately enough so that a searcher using a sensible set of keywords will find the object.

Now, you can also search for books by tagging them with subjects, themes, authors, etc., but if the underlying goal of the search is to find content, the expectation is that every word and sentence in the book is searchable.

Object-based search and attributes

A successful object-based search (as we’ve defined it here) relies on a set of attributes that describe objects sufficiently so that a searcher can find what they are looking for using a reasonably small set of well-chosen keywords. A keyword can be one or more words, or even the first few characters of the first word. For example, while looking for the film Star Wars, a user might only need to type in “star”; but if the search engine bases its search algorithm on popularity (that is, it favors popular films in the first results), then “st” should be sufficient enough to find the blockbuster Star Wars.  

The number of attributes

If you want to find a movie, you most likely need only a few attributes, such as title, description, cast, crew, year, and a few others. If you want to perform a more general research, you’ll add attributes like themes, dialogues, cross-references, and additional background information. However, the list of attributes can get quite large. For example, cars have 1000s of attributes – material used, the name, type, and year of each part, owner history, factories and repair history, speed, and so on. 

What all objects have in common is the notion of keywords. Keywords are the words the owners of the content use as they build an object’s attributes, such as title, brand, author, year, and price. Or from another point of view: keywords are the “words” that a search engine uses to match the words in an index with the query of the searcher. 

Creating an index

As we’ve outlined above, a search engine identifies documents (books, web pages, products) that match a user’s query (keywords). To do this, it cannot scan every document. So it uses an index, either an exhaustive index of every word, or an attribute-based index with a subset of the most important descriptions.

An index is created before a user searches. It is a pre-scan of the underlying content. It’s also in a separate part of the server. For example, in a content-based scenario, the search engine pre-scans every document and saves all the unique words in an index. Many search engines structure their index in an “inverted index”, as we describe in the last (fun) section.  

Different kinds of indexes – ordering results with accuracy and relevance

Search indexes come with an order. For online searches like Google and Amazon, search results are usually ordered on the “best” not “accurate” matches. 

In those contexts, it’s not only about accurate results. If a user types in “brad” and Brad Pitt comes up, that doesn’t mean it’s accurate. Other results will include Brad Davis or the Brady theater. They are all relevant in different ways, but none of them can be considered “accurate”. One user who types in “brad” might choose to go to Brad Pitt’s Wikipedia page, another might go to Brad Pitt’s IMDB page. Accuracy doesn’t really capture the meaning of these choices. 

It’s all about how right the result feels to a given user, or how the result matches the intent of the searcher. To return to the compass metaphor: a compass helps us navigate by combining accuracy and relevance: a compass gives us accuracy in terms of north and south; but it also gives us relevance by pointing in the general direction of our destination and helping us match our intentions with our knowledge of the physical world to reach our destination. On the other hand, we expect a GPS system to be accurate not relevant.

Consider the bank employee who looks up your records and finds out that you owe the bank some money. The bank employee’s search results better be completely accurate. Likewise, when store employees or customers look for precise products, they are not interested in relevance: they rely on an accurate, exact product identifiers. 

This is not to say that searching by relevance does not contain an aspect of “accuracy”. For example, if someone types in “ball point pen”, the accuracy is to find all products that have an attribute with the words “ball point pen”. However, accuracy gives way to relevance: the relevance is which ball point pen to show first. 

A more technical way to explain the difference is to consider the difference between a database and a search index. Database-like indexing (the bank example) is centered around accuracy – ensuring that exact matches are properly sorted and exhaustive. A search-index-based search like Google is more flexible, where the textual matching is a mix between textual accuracy and relevance (optimizing your content for what we call SEO (search engine optimization)).

Similar to Google, the site search we see on Amazon and Netflix, and on websites where search is provided by Algolia, rely on a combination of structured sets of attributes and a ranking system that bases relevance on popularity, trends, likes, and a business’s product-promotional needs.

The search index on the search engine’s server(s) 

Okay, so let’s open up the hood. A search engine index is saved in a structure that enables fast retrieval. We call this structure an inverted index. One thing to note, an index is saved separately on the server, in a different location than the data.

While there are many types of inverted indexes, with many nuances, the following diagram sums up the idea:

inverted index

As you can see, with an inverted index, the search engine inverses the logic. So, instead of reading (scanning) a document looking for words, it inverts that process and uses the words to find the documents. Here’s an example of an inverted index:

  • Aardvark -> documents 1, 5, 400, 900
  • Amoeba  -> documents 101, 345, 785
  • Animals -> documents 33, 45, 98, 235

… and so on. Let’s say there are 10,000s unique words in 999 documents.

In the above diagram, the search engine’s logic to search an inverted index followed this process to find “aardvark”: 

  • The index has a row with the 26 letters of the alphabet and the 10 digits. Underneath this row is an upside down tree of all the words. The first row, first column has every word that begins with “a”, the next column begins with “b”, and so on. 
  • When you type in “a”, you get every word that starts with “a”, and subsequently remove every other column.
  • This probably removes 90% of the words, so the engine has a lot less to surface.
  • You type in “aa”, which removes nearly all of the remaining words. You’re probably left with less than 1% of the original list.
  • You type in “aaa”, which removes all of the remaining words in the list: there is no document with a word that begins with “aaa”.
  • So you delete the “a” from your query and add “r” to get “aar” and only one word appears “aardvark”, which leads you to 4 documents that include information about aardvarks.  

That’s how every word in a set of documents is stored in an index. It gets more complicated for non-prefix, middle of the word queries, but you get the idea.

And that’s all … Well, there are a lot more details. If you’re interesting in more, check out Algolia CTO’s article on the inside story of indexing.

About the author
Peter Villani

Sr. Tech & Business Writer

linkedinmediumtwitter

Recommended Articles

Powered byAlgolia Algolia Recommend

What is a search query and how is it processed by a search engine?
product

Catherine Dee

Search and Discovery writer

Inside the Algolia Engine Part 2 — The Indexing Challenge of Instant Search
engineering

Julien Lemoine

Co-founder & former CTO at Algolia

Algolia's top 10 tips to achieve highly relevant search results
product

Julien Lemoine

Co-founder & former CTO at Algolia