Search by Algolia
Add InstantSearch and Autocomplete to your search experience in just 5 minutes
product

Add InstantSearch and Autocomplete to your search experience in just 5 minutes

A good starting point for building a comprehensive search experience is a straightforward app template. When crafting your application’s ...

Imogen Lovera

Senior Product Manager

Best practices of conversion-focused ecommerce website design
e-commerce

Best practices of conversion-focused ecommerce website design

The inviting ecommerce website template that balances bright colors with plenty of white space. The stylized fonts for the headers ...

Catherine Dee

Search and Discovery writer

Ecommerce product listing pages: what they are and how to optimize them for maximum conversion
e-commerce

Ecommerce product listing pages: what they are and how to optimize them for maximum conversion

Imagine an online shopping experience designed to reflect your unique consumer needs and preferences — a digital world shaped completely around ...

Vincent Caruana

Senior Digital Marketing Manager, SEO

DevBit Recap: Winter 2023 — Community
engineering

DevBit Recap: Winter 2023 — Community

Winter is here for those in the northern hemisphere, with thoughts drifting toward cozy blankets and mulled wine. But before ...

Chuck Meyer

Sr. Developer Relations Engineer

How to create the highest-converting product detail pages (PDPs)
e-commerce

How to create the highest-converting product detail pages (PDPs)

What if there were a way to persuade shoppers who find your ecommerce site, ultimately making it to a product ...

Vincent Caruana

Senior Digital Marketing Manager, SEO

Highlights from GopherCon Australia 2023
engineering

Highlights from GopherCon Australia 2023

This year a bunch of our engineers from our Sydney office attended GopherCon AU at University of Technology, Sydney, in ...

David Howden
James Kozianski

David Howden &

James Kozianski

Enhancing customer engagement: The role of conversational commerce
e-commerce

Enhancing customer engagement: The role of conversational commerce

Second only to personalization, conversational commerce has been a hot topic of conversation (pun intended) amongst retailers for the better ...

Michael Klein

Principal, Klein4Retail

Craft a unique discovery experience with AI-powered recommendations
product

Craft a unique discovery experience with AI-powered recommendations

Algolia’s Recommend complements site search and discovery. As customers browse or search your site, dynamic recommendations encourage customers to ...

Maria Lungu

Frontend Engineer

What are product detail pages and why are they critical for ecommerce success?
e-commerce

What are product detail pages and why are they critical for ecommerce success?

Winter is coming, along with a bunch of houseguests. You want to replace your battered old sofa — after all,  the ...

Catherine Dee

Search and Discovery writer

Why weights are often counterproductive in ranking
engineering

Why weights are often counterproductive in ranking

Search is a very complex problem Search is a complex problem that is hard to customize to a particular use ...

Julien Lemoine

Co-founder & former CTO at Algolia

How to increase your ecommerce conversion rate in 2024
e-commerce

How to increase your ecommerce conversion rate in 2024

2%. That’s the average conversion rate for an online store. Unless you’re performing at Amazon’s promoted products ...

Vincent Caruana

Senior Digital Marketing Manager, SEO

How does a vector database work? A quick tutorial
ai

How does a vector database work? A quick tutorial

What’s a vector database? And how different is it than a regular-old traditional relational database? If you’re ...

Catherine Dee

Search and Discovery writer

Removing outliers for A/B search tests
engineering

Removing outliers for A/B search tests

How do you measure the success of a new feature? How do you test the impact? There are different ways ...

Christopher Hawke

Senior Software Engineer

Easily integrate Algolia into native apps with FlutterFlow
engineering

Easily integrate Algolia into native apps with FlutterFlow

Algolia's advanced search capabilities pair seamlessly with iOS or Android Apps when using FlutterFlow. App development and search design ...

Chuck Meyer

Sr. Developer Relations Engineer

Algolia's search propels 1,000s of retailers to Black Friday success
e-commerce

Algolia's search propels 1,000s of retailers to Black Friday success

In the midst of the Black Friday shopping frenzy, Algolia soared to new heights, setting new records and delivering an ...

Bernadette Nixon

Chief Executive Officer and Board Member at Algolia

Generative AI’s impact on the ecommerce industry
ai

Generative AI’s impact on the ecommerce industry

When was your last online shopping trip, and how did it go? For consumers, it’s becoming arguably tougher to ...

Vincent Caruana

Senior Digital Marketing Manager, SEO

What’s the average ecommerce conversion rate and how does yours compare?
e-commerce

What’s the average ecommerce conversion rate and how does yours compare?

Have you put your blood, sweat, and tears into perfecting your online store, only to see your conversion rates stuck ...

Vincent Caruana

Senior Digital Marketing Manager, SEO

What are AI chatbots, how do they work, and how have they impacted ecommerce?
ai

What are AI chatbots, how do they work, and how have they impacted ecommerce?

“Hello, how can I help you today?”  This has to be the most tired, but nevertheless tried-and-true ...

Catherine Dee

Search and Discovery writer

Looking for something?

facebookfacebooklinkedinlinkedintwittertwittermailmail

What do OpenAI and DeepMind have in common?

Give up? These innovative organizations both utilize technology known as transformer models.

What are transformer models?  

The transformer (represented by the T in ChatGPT, GPT-2, GPT-3, GPT-3.5, etc.) is the key element that makes generative AI so, well, transformational.

Transformer models are a type of neural network architecture designed to process sequential material, such as sentences or time-series data.

The concept of a transformer, an attention-layer-based, sequence-to-sequence (“Seq2Seq”) encoder-decoder architecture, was conceived in a 2017 paper authored by pioneer in deep learning models Ashish Vaswani et al called “Attention Is All You Need”. Since then, in the realms of AI and machine learning, transformer models have emerged as a groundbreaking approach to various language-related tasks.

Compared with traditional recurrent neural networks (RNNs) and convolutional neural networks (CNNs), transformers differ in their ability to capture long-range dependencies and contextual information.

The transformer “requires less training time than previous recurrent neural architectures, such as long short-term memory (LSTM), and its later variation has been prevalently adopted for training large language models on large (language) datasets,” notes Wikipedia.

From machine translation to natural language processing (NLP) to computer vision, plus audio and multi-modal processing, transformers have revolutionized the field with their ability to capture long-range dependencies and efficiently process sequential data. They’re used widely in neural machine translation (NMT). They’re used to perform or improve AI and NLP business tasks, as well as streamline enterprise workflows. Transformer technology has also heralded generative pretrained transformers (GPTs) and Bidirectional Encoder Representations from Transformers (BERT).

Multi-head attention

A transformer measures relationships between pairs of input tokens (for example, if the content is text, the tokens are words), known as attention. The attention heads are a key feature of transformers. A transformer uses parallel multi-head attention, meaning the attention module repeats computations in parallel, affording more ability to encode nuances of word meanings. The attention score is computed by combining the similar attention calculations.

In addition to multihead attention mechanisms, transformers rely on layer normalization, residual and feedforward connections, and positional embeddings.

How do transformer models work?

Here’s how the transformer architecture works: 

1. Input embedding 

The first step in transformer operations is understanding the input data. It takes a sentence — or a sequence of data — and turns each word or element into numerical representation known as vector embeddings. The sequence model’s embeddings capture the meanings of the words or elements. Various techniques can be employed for input embedding, such as word embeddings and character embeddings.

This allows the model to work with continuous representations rather than discrete symbols.  

2. Positional encoding 

Next, the transformer model gets to know the order. Transformers don’t naturally understand the order of words, so they use positional encoding to give the model information about the order. This is done by combining the embeddings with sinusoidal functions (remember sine from trigonometry class?), which helps the model understand the relationships between parts of the sequence. For example, if the input sentence is “The cat is on the mat,” the transformer knows “cat” and “mat” are related because they’re both objects.  

3. Encoder layers

The embedded and encoded input sequence is passed through multiple encoder layers. Each layer consists of two sub-layers called the self-attention mechanism and the feed-forward neural network.  

  • The self-attention mechanism allows the model to focus on different parts of the input sequence and capture dependencies. It calculates attention scores for each element based on its relationships with other elements in the sequence.

For each word in a sentence, the self-attention layer computes three vectors (key, value, query). To determine a word’s contextually related words, the dot products of the query vector are considered with the key vectors of the other words.

  •  The feed-forward neural network applies a non-linear transformation to the outputs of the self-attention mechanism, introducing complexity and expressive power to the model. The feed-forward layer makes up two-thirds of the parameters in a transformer model.

4. Decoder layers 

The output is fed into the decoder layers next. Like the encoder layers, each of these consists of two sub-layers: the self-attention mechanism and the encoder-decoder attention mechanism. 

  • The self-attention mechanism in the decoder allows it to attend to different parts within the output sequence, capturing dependencies between elements. It calculates attention scores based on the relationships between positions in the output sequence.  
  • The encoder-decoder attention mechanism enables the decoder to focus on different parts of the input sequence, incorporating information from the encoder. This helps the decoder understand the context of the input sequence, aiding in generating the output sequence.

5. Output projection 

The output of the decoder layers is passed through a linear projection layer. Because the dot products yield values between negative and positive infinity, a softmax activation function is applied; this maps the output to the same size as the vocabulary and generates a probability distribution for each position in the output sequence. The highest probability is considered the predicted output.  

6. Training and optimization 

Transformers are trained using supervised learning. The model’s predictions are compared with the correct target sequence, and optimization algorithms adjust the model’s parameters to minimize the difference between predicted and correct outputs. This is done by going through the training data in batches and improving the model’s performance. 

7. Inference 

A pretrained model can then be used for inference to generate predictions for new input sequences. During inference, the trained model applies the same preprocessing steps as during training (such as input embedding and positional encoding) to an input sequence, then feeds it through the encoder and decoder layers.  

 The model generates predictions for each position in the output sequence, producing the most probable output at each step. The predictions are then decoded into the desired format, such as when generating a translation or sequence of words. 

Applications of transformer models 

Just how much of a help are transformer models in deciphering real-world challenges?

As documented by Google, Vaswani et al’s paper shows that “the Transformer outperforms both recurrent and convolutional models on academic English to German and English to French translation benchmarks. On top of higher translation quality, the Transformer requires less computation to train and is a much better fit for modern machine learning hardware, speeding up training by up to an order of magnitude.”

Because of this high level of effectiveness, transformer neural networks are used for various types of applications, including: 

Machine translation 

In earlier times, traditional machine translation approaches relied on statistical methods and phrase-based models, which often struggled with capturing the semantic meaning and syntactic structure of sentences. But with the introduction of transformer models, translation accuracy has significantly improved. 

In the transformer, the self-attention mechanism allows the model to attend to different parts of the input sequence, capturing long-range dependencies and improving the overall translation quality. Because transformer models can effectively learn the patterns in source and target languages, they can generate more-fluent and accurate translations.  

Some of the most successful machine translation systems powered by transformers include Google Translate, Microsoft Translator, and DeepL. This application can improve global communication between organizations as well as fine-tune multilingual chatbot support and content localization.  

Natural language processing

Transformer models’ ability to handle long-range dependencies and capture contextual information makes them super effective in language understanding and humanlike text generation. Their functionality has been applied to tasks such as sentiment analysis, text classification, named entity recognition, and text summarization.  

In sentiment analysis, for example, models powered by transformers can accurately determine the sentiment expressed in text. This enables companies, for instance, to gain insight from customer feedback, identifying areas for improvement and ways to better manage their brand reputation. 

Furthermore, NLP (with a transformer working alongside it) is used in industries such as finance and healthcare to understand and analyze legal and regulatory documents. This ensures compliance and identifies potential risks, as well as detects fraud. 

Speech recognition 

Their ability to capture dependencies and contextual information has enabled transformer models to transcribe spoken language very accurately. This has led to utilization in popular voice assistants such as Amazon’s Alexa, Apple’s Siri, and Google Assistant.  

These models process the audio input, segment it into smaller units, and generate the corresponding text representation. Transformers have improved the accuracy and fluency of the transcriptions.

One result: more-seamless interaction between humans and machines, especially when it comes to chatbots. The ecommerce, finance, and health Industries routinely employ chatbots in their customer service operations. By improving content quality, transformers have ensured that shoppers, clients, and patients can all chat with an AI entity to quickly get the support they need. 

Image captioning 

Images contain rich visual information, while captions provide textual descriptions of the image content. Transformer models encode the visual features of an image and then decode them into corresponding captions.  

The transformer’s ability to capture dependencies and generate coherent text makes it effective in producing accurate and contextually relevant captions. Image captioning powered by transformers has found application in areas such as content understanding, visual search, and accessibility for visually impaired individuals. 

In ecommerce, image captioning is utilized to automatically generate captions for product images. Descriptive captions proactively provide shoppers with valuable information such as product features and dimensions and other specifications, thereby enhancing the shopping experience. 

Transform your outlook 

That’s it for this introduction to how transformers work their magic.

Want to use this technology to transform your ecommerce revenue? Here at Algolia, we’re incorporating transformer models and other amazing technology to improve our clients’ search results and recommendations. We use vector representation, along with machine-learning techniques such as spelling correction, language processing, and category matching, to make sense of language. Our smart search experiences have proven to enhance user engagement and increase conversion for a vast array of clients. 

Want to know more? Let’s chat, or take the next step and request a demo of how our AI-powered NeuralSearch can give your site surprisingly on-target search results.

About the author
Vincent Caruana

Senior Digital Marketing Manager, SEO

Recommended Articles

Powered byAlgolia Algolia Recommend

What are large language models?
ai

Catherine Dee

Search and Discovery writer

Top examples of some of the best large language models out there
ai

Vincent Caruana

Sr. SEO Web Digital Marketing Manager

What does it take to build and train a large language model? An introduction
ai

Vincent Caruana

Sr. SEO Web Digital Marketing Manager