Search by Algolia
Add InstantSearch and Autocomplete to your search experience in just 5 minutes
product

Add InstantSearch and Autocomplete to your search experience in just 5 minutes

A good starting point for building a comprehensive search experience is a straightforward app template. When crafting your application’s ...

Imogen Lovera

Senior Product Manager

Best practices of conversion-focused ecommerce website design
e-commerce

Best practices of conversion-focused ecommerce website design

The inviting ecommerce website template that balances bright colors with plenty of white space. The stylized fonts for the headers ...

Catherine Dee

Search and Discovery writer

Ecommerce product listing pages: what they are and how to optimize them for maximum conversion
e-commerce

Ecommerce product listing pages: what they are and how to optimize them for maximum conversion

Imagine an online shopping experience designed to reflect your unique consumer needs and preferences — a digital world shaped completely around ...

Vincent Caruana

Senior Digital Marketing Manager, SEO

DevBit Recap: Winter 2023 — Community
engineering

DevBit Recap: Winter 2023 — Community

Winter is here for those in the northern hemisphere, with thoughts drifting toward cozy blankets and mulled wine. But before ...

Chuck Meyer

Sr. Developer Relations Engineer

How to create the highest-converting product detail pages (PDPs)
e-commerce

How to create the highest-converting product detail pages (PDPs)

What if there were a way to persuade shoppers who find your ecommerce site, ultimately making it to a product ...

Vincent Caruana

Senior Digital Marketing Manager, SEO

Highlights from GopherCon Australia 2023
engineering

Highlights from GopherCon Australia 2023

This year a bunch of our engineers from our Sydney office attended GopherCon AU at University of Technology, Sydney, in ...

David Howden
James Kozianski

David Howden &

James Kozianski

Enhancing customer engagement: The role of conversational commerce
e-commerce

Enhancing customer engagement: The role of conversational commerce

Second only to personalization, conversational commerce has been a hot topic of conversation (pun intended) amongst retailers for the better ...

Michael Klein

Principal, Klein4Retail

Craft a unique discovery experience with AI-powered recommendations
product

Craft a unique discovery experience with AI-powered recommendations

Algolia’s Recommend complements site search and discovery. As customers browse or search your site, dynamic recommendations encourage customers to ...

Maria Lungu

Frontend Engineer

What are product detail pages and why are they critical for ecommerce success?
e-commerce

What are product detail pages and why are they critical for ecommerce success?

Winter is coming, along with a bunch of houseguests. You want to replace your battered old sofa — after all,  the ...

Catherine Dee

Search and Discovery writer

Why weights are often counterproductive in ranking
engineering

Why weights are often counterproductive in ranking

Search is a very complex problem Search is a complex problem that is hard to customize to a particular use ...

Julien Lemoine

Co-founder & former CTO at Algolia

How to increase your ecommerce conversion rate in 2024
e-commerce

How to increase your ecommerce conversion rate in 2024

2%. That’s the average conversion rate for an online store. Unless you’re performing at Amazon’s promoted products ...

Vincent Caruana

Senior Digital Marketing Manager, SEO

How does a vector database work? A quick tutorial
ai

How does a vector database work? A quick tutorial

What’s a vector database? And how different is it than a regular-old traditional relational database? If you’re ...

Catherine Dee

Search and Discovery writer

Removing outliers for A/B search tests
engineering

Removing outliers for A/B search tests

How do you measure the success of a new feature? How do you test the impact? There are different ways ...

Christopher Hawke

Senior Software Engineer

Easily integrate Algolia into native apps with FlutterFlow
engineering

Easily integrate Algolia into native apps with FlutterFlow

Algolia's advanced search capabilities pair seamlessly with iOS or Android Apps when using FlutterFlow. App development and search design ...

Chuck Meyer

Sr. Developer Relations Engineer

Algolia's search propels 1,000s of retailers to Black Friday success
e-commerce

Algolia's search propels 1,000s of retailers to Black Friday success

In the midst of the Black Friday shopping frenzy, Algolia soared to new heights, setting new records and delivering an ...

Bernadette Nixon

Chief Executive Officer and Board Member at Algolia

Generative AI’s impact on the ecommerce industry
ai

Generative AI’s impact on the ecommerce industry

When was your last online shopping trip, and how did it go? For consumers, it’s becoming arguably tougher to ...

Vincent Caruana

Senior Digital Marketing Manager, SEO

What’s the average ecommerce conversion rate and how does yours compare?
e-commerce

What’s the average ecommerce conversion rate and how does yours compare?

Have you put your blood, sweat, and tears into perfecting your online store, only to see your conversion rates stuck ...

Vincent Caruana

Senior Digital Marketing Manager, SEO

What are AI chatbots, how do they work, and how have they impacted ecommerce?
ai

What are AI chatbots, how do they work, and how have they impacted ecommerce?

“Hello, how can I help you today?”  This has to be the most tired, but nevertheless tried-and-true ...

Catherine Dee

Search and Discovery writer

Looking for something?

facebookfacebooklinkedinlinkedintwittertwittermailmail

Artificial intelligence has been built on the back of vector arithmetic. Recent advances show for certain AI applications this can actually be drastically outperformed (memory, speed, etc) by other binary representations (such as neural hashes) without significant accuracy trade off.

Once you work with things like neural hashes, it becomes apparent that many areas of AI can move away from vectors to hash-based structures and trigger an enormous speed up in AI advancement. This article is a brief introduction to the thinking behind this and why this may well end up being an enormous shift.

Hashes

A hash function is any function that can be used to map data of arbitrary size to fixed-size values. The values returned by a hash function are called hash values, hash codes, digests, or simply hashes.

You can read more about hashes here. The example from Wikipedia is illustrated below.

hash function diagram

Hashes are great for trading off accuracy, data storage size, performance, retrieval speed, and more.

Importantly, they are probabilistic in nature so multiple input items can potentially share the same hashes. This is interesting because at the core the trade off is giving up slower exactness for extremely fast, high probability. The analogy here would be the choice between a 1 second flight to somewhere random in the suburb of your choosing in any city in the world versus a 10 hour trip putting you at the exact house you wanted in the city of your choice. The former is almost always better, as navigating within a suburb in 10 hours is a piece of cake.

When thinking about vectors, floats are the data representation of choice. Although they are more absolute in nature than hashes, they are still not exact either. More on floats below…

Floats

To understand AI you need to understand how computers represent non integer-based numbers. If you have not read up on this, you can do so here.

The problem with floating point numbers is that they take up a decent amount of space, are pretty complex to do calculations with, and are still an approximation. Watching Rob Pike talk about a bignum calculator was probably the first time I thought about it much. And it’s bothered me a lot since. Thanks Rob 😁.

The binary representation can also be wildly different for tiny numerical changes (with respect to vector calculations) that have virtually zero impact on model predictions. For example:

Take 0.65 versus 0.66 which in float64 (64 bit floating point) binary can be represented by these two binary numbers respectively:

11111111100100110011001100110011001100110011001100110011001101

11111111100101000111101011100001010001111010111000010100011111

It’s not easy to see, but with just a 1% numerical change, almost half (25 of the 64 bits) are different! From a vector perspective in a matrix calculation these two numbers are very, very similar, but in the underlying binary (where all the heavy lifting happens) they are worlds apart.

this is fine meme

Our brains definitely don’t work like this, so they obviously don’t use floating point binary representations to store numbers. At least it sounds like an odd thing for neurons to do, except that there are people that can remember over 60,000 decimal places of Pi, so maybe I simply have no idea. But seriously, our brains are visual and visually our brain’s neural networks are great at handling fractional numbers representing intensities and such. But when you think of a half or a quarter, I’ll bet you immediately visualized something like a glass half or quarter full, or a pizza or something else. You likely weren’t thinking of a mantissa and exponent.

One idea commonly used to speed float calculations up and use less space is dropping the resolutions to float16 (16 bit) and even float8 (8 bit) which are much faster to compute. The downside here is the obvious loss of resolution.

So you’re saying float arithmetic is slow/bad?

Not quite. Actually it turns out this is a problem people have spent their careers on. Chip hardware and their instruction sets have been designed to make this more efficient and have more calculations processed in parallel so that they can be solved faster. GPUs and TPUs are now also used because they handle mass, float-based vector arithmetic even faster.

You can brute force more speed, but do you need to? You can also give up resolution, but again do you need to? Floats aren’t absolute either, anyway. It’s less about being slow here, but more about how to go much faster.

Neural hashes

So it turns out binary comparisons like XOR on bit sets can be computed much, much faster than float-based arithmetic. So what if you could represent the 0.65 and 0.66 in a binary hash space that was locality sensitive? Could that make models much faster in terms of inference?

Note: looking at a single number is a contrived example, but for vectors containing many floats, the hash can actually also compress the relationship between all the dimensions which is where the magic really happens.

Turns out there is a family of hash algorithms to do just this called locality sensitive hashing (LSH). The closer the original items, the closer the bits in their hashes that are the same.

local sensitive hashing LSH

This concept is nothing new though, except that newer techniques have found added advantages. Historically, LSH used techniques like random projections and quantisation, but they had the disadvantage of requiring a large hash space to retain precision, so the benefits were somewhat negated.

It’s trivial for a single float, but what about vectors with high dimensionality (many floats)?

So, the new trick with neural hashes (or sometimes called learn-to-hash) is to replace existing LSH techniques with hashes created by neural networks. The resulting hashes can be compared using the very fast Hamming distance calculation to estimate their similarity.

This initially sounds complicated, but in reality it isn’t too difficult. The neural network optimizes a hash function that:

  • retains almost perfect information compared to the original vector
  • produces hashes much smaller than the original vector size
  • is significantly faster for computations

This means that you get the best of both worlds, a smaller binary representation that can be used for very fast logical calculations, with virtually unchanged information resolution.

Use cases

The original use case we were investigating was for approximate nearest neighbors (ANN) for dense information retrieval. This process allows us to search information using vector representations, so we can find things that are conceptually similar. Hence why the locality sensitivity in the hash is so important. We’ve taken this much further now and use hashes much more broadly for fast and approximate comparisons of complex data.

Dense information retrieval

How many databases can you think of? Likely a lot. How about search indexes? Likely very few and most of those are based on the same old technology anyways. This is largely because historically language was a rules-based problem – tokens, synonyms, stemming, lemmatization, and more have occupied very smart people for their entire careers and they’re still not solved.

Larry Page (Google founder) has been quoted as saying search won’t be a solved problem in our lifetime. Think about that for a second, the biggest minds of a generation, literally billions of dollars of investment and it’s unlikely to be solved?

Larry Page

Search technology has lagged databases mainly due to language problems, yet we’ve seen a revolution in language processing over the last few years and it’s still speeding up! From a technology perspective, we see neural-based hashes dropping the barrier for new search and database technology (us at Algolia included!).

If this piques your interest, consider submitting a resume — we’re hiring! If you’re working on hash-based neural networks and indexes, I’d love to hear your thoughts on what’s coming next! You can find me on Twitter @hamishogilvy.

About the author
Hamish Ogilvy

VP, Artificial Intelligence

linkedin

Recommended Articles

Powered byAlgolia Algolia Recommend

What is hashing and how does it improve website and app search?
ai

Catherine Dee

Search and Discovery writer

What is vector search?
ai

Dustin Coates

Product and GTM Manager

How neural hashing can unleash the full potential of AI retrieval
ai

Bharat Guruprakash

Chief Product Officer