Add InstantSearch and Autocomplete to your search experience in just 5 minutes
A good starting point for building a comprehensive search experience is a straightforward app template. When crafting your application’s ...
Senior Product Manager
A good starting point for building a comprehensive search experience is a straightforward app template. When crafting your application’s ...
Senior Product Manager
The inviting ecommerce website template that balances bright colors with plenty of white space. The stylized fonts for the headers ...
Search and Discovery writer
Imagine an online shopping experience designed to reflect your unique consumer needs and preferences — a digital world shaped completely around ...
Senior Digital Marketing Manager, SEO
Winter is here for those in the northern hemisphere, with thoughts drifting toward cozy blankets and mulled wine. But before ...
Sr. Developer Relations Engineer
What if there were a way to persuade shoppers who find your ecommerce site, ultimately making it to a product ...
Senior Digital Marketing Manager, SEO
This year a bunch of our engineers from our Sydney office attended GopherCon AU at University of Technology, Sydney, in ...
David Howden &
James Kozianski
Second only to personalization, conversational commerce has been a hot topic of conversation (pun intended) amongst retailers for the better ...
Principal, Klein4Retail
Algolia’s Recommend complements site search and discovery. As customers browse or search your site, dynamic recommendations encourage customers to ...
Frontend Engineer
Winter is coming, along with a bunch of houseguests. You want to replace your battered old sofa — after all, the ...
Search and Discovery writer
Search is a very complex problem Search is a complex problem that is hard to customize to a particular use ...
Co-founder & former CTO at Algolia
2%. That’s the average conversion rate for an online store. Unless you’re performing at Amazon’s promoted products ...
Senior Digital Marketing Manager, SEO
What’s a vector database? And how different is it than a regular-old traditional relational database? If you’re ...
Search and Discovery writer
How do you measure the success of a new feature? How do you test the impact? There are different ways ...
Senior Software Engineer
Algolia's advanced search capabilities pair seamlessly with iOS or Android Apps when using FlutterFlow. App development and search design ...
Sr. Developer Relations Engineer
In the midst of the Black Friday shopping frenzy, Algolia soared to new heights, setting new records and delivering an ...
Chief Executive Officer and Board Member at Algolia
When was your last online shopping trip, and how did it go? For consumers, it’s becoming arguably tougher to ...
Senior Digital Marketing Manager, SEO
Have you put your blood, sweat, and tears into perfecting your online store, only to see your conversion rates stuck ...
Senior Digital Marketing Manager, SEO
“Hello, how can I help you today?” This has to be the most tired, but nevertheless tried-and-true ...
Search and Discovery writer
It’s common to search by color on an ecommerce website. Unfortunately though, a purely text-based search index might not have all the information it needs to return the most relevant results when a user wants to search by color.
For example, searching for a “white t-shirt” might return results with thumbnails of clearly red or blue t-shirts just because their descriptions mention that the same cut also comes in white. Whether it’s technically correct or not, including those results at least gives the user the impression that our search engine is broken. What can we do about this?
The logical next step is to create some system that can automatically identify the color of the object in the thumbnail. There are some open source scripts that exist specifically for this, like josip/node-colour-extractor or lokesh/color-thief.
However, they largely work by finding the most common pixel value in an image, which comes with a few problems:
So instead in this article, we’re going to make something closer to Vue.ai, which identifies the foreground color in a queryable word. That commercial application is going to work better than what we come up with in this article, but if you’d like to see the process or whether our approach will fit your project’s requirements, read on!
This problem is relatively complex, so the potential solutions are numerous. Most state-of-the-art computer vision frameworks nowadays go down the Deep Learning path, classifying images with Convolutional Neural Networks. This approach leads to some astounding results, but the huge dataset and specialized hardware places it slightly out of the scope of an experiment like this.
Deep learning frameworks are also notoriously hard to set up and run, and we wanted to release this as open-source so you can take a stab at it. Here’s the process we settled on:
Since we were imagining using this on a fashion ecommerce website, we thought that it might make sense to crop the image to just the foreground. That should make it easier for our algorithm to identify which part of the image really matters for our usecase, since it’ll take up more of the image. Also, because we only cared about the primary color of the main object in the picture, detail was probably just going to confuse our algorithm and lengthen the processing time.
So our next step was to shrink all of the images down to about 100x100px. Our tests showed that this was close to optimal for our case. Here’s what that process looked like for this image:
The background could be plain white or very complex, but either way, we don’t want to get any data from it. How do we separate it from the data that we care about? It’d be easy to make rules or assumptions, like mandating that the background be a plain color or where the main object should be in the picture.
But to allow for broader use cases, we’ll try to combine several more common algorithms to handle reasonably complex backgrounds.
Let’s start by thresholding over the image, which involves taking the delta E distance between every pixel and the four corner pixels of the image, and considering them part of the background if the result is below some threshold. This step is particularly useful on more complex backgrounds because it doesn’t use edge detection, but that makes it vulnerable to picking up on gradients and shadows incorrectly.
To fix that, we’ll use flood filling and edge detection to remove any shadows and gradients. The technique is largely inspired by this article on the Lyst engineering blog. The trick here is that backgrounds are usually defined by the absence of sharp edges. So if we smooth the image and apply a Scharr filter, we’ll usually get a clear outline of our foreground.
The caveat is just that complex backgrounds usually fool this test, so we’ll need to combine it with the thresholding from earlier for the best results.
Sometimes one of these two steps messes up and removes way too many pixels and we have to ignore the result of our background detection. They should work well on clean input, though, so we’ll call that part done.
The last thing we’ll want to remove from our image before trying to classify the color is skin. Images of clothing worn by models (especially swimwear) will often contain more pixels representing skin than representing the clothing item, so we’ll need to get it out of the image or our algorithm will just return that as the primary foreground color.
This is another rabbit hole we could easily dive down, but the simpler answer is to just filter out pixels in that general range. We chose a filter with a decent false-positive rate to reduce the risk of seeing orange-tinted clothes entirely as skin, but because that’s a possibility, we made this step completely optional in the final script.
Earlier we mentioned that RGB values weren’t going to do; we need textual output. We need something that a user might search for on a fashion ecommerce website. But now that we’ve isolated the clothing item in the image, how do we isolate such a subjective label from it?
In the blouse the woman is wearing in the image above, you can see how the precise color might change from pixel to pixel because of how it is folding in the wind, where the light source is, and how many layers of fabric are visible at that exact spot. We need to find a cluster of pixels of similar color, average them together, and figure out which color category they belong to.
We don’t know how many clusters we’ll need though, since different pictures could have different amounts of distinct colors in the foreground. We should be able to find that using the jump method, where we set some error threshold and only include pixels in a cluster if they’re (a) connected to that cluster, and (b) below the threshold amount of color distance when compared to the pixel at the center of the cluster. This will create as many clusters as appears to be necessary, and then (if we wanted to) we could go along the borders of the clusters and reevaluate which group they should belong to.
That step would give us finer edges, but it’s not really necessary for our use case, and it would just waste processing time.
This process gives us few enough clusters that similar colors are grouped together (for easy categorization):
While also, distinctly different colors in the foreground are still given separate clusters:
The last step is categorizing our cluster average colors into readable English names. That sounds like a difficult problem, given the subjectivity (not to mention cultural implications) of what counts under each color category. We turned to a K-Nearest-Neighbors algorithm to give color names to RGB values, thanks to the XKCD Color Survey. The survey consists of 200,000 RGB values labeled with 27 different color names (e.g. black, green, teal, etc.) that we use to train a scikit-learn KNeighborsClassifier.
It works fairly well, but there is one big edge case in which it fails: gray. In the RGB system (where each channel is a dimension), the shades of gray form a plane through the space. That makes it hard to define a region where every value we want to be categorized as gray is closer to the center of the gray region than to the center of the regions that represent other colors.
We ended up sticking an extra step in to compute the distance between RGB values and their projection on the gray plane and if that distance is less than a certain threshold, we’ll call it gray and skip the classifier. The overall process isn’t super elegant and it has its drawbacks, but it works well for this case.
This experiment is available for you to mess around with here on GitHub! It’s a standalone Python script that you can use to enrich your search index with extracted color tags. We also made a library and a CLI for your convenience. Everything is completely configurable – we’re using sane defaults but left the option open to tweak all of the steps above. You can even add support for other languages by messing with the color dataset. We’re not using machine learning (yet), but we managed to hit ~84% accuracy even without the dramatic boost that route would get us.
Given this tool is designed to be used in conjunction with textual search, boosting images whose colors match what was searched for, this seems like a great place to leave version 1. Check back in next time for version 2!
Software Engineer
Powered by Algolia Recommend