Semantic search with Azure Cognitive Search

Brian Weeteling 9/14/2021 10:07:18 AM

While the search technology behind a lot of large e-commerce websites has reached a high level of sophistication, there are always ways to improve; search is still not a solved problem. In a previous article, we've looked at an intriguing technology that can boost search performance a lot, called semantic search.

"Online retailers can double conversion rates by investing in site search and discovery (source)"

Search for e-commerce goes beyond the search box itself; there are many other aspects that impact search performance. Users expect to be able to filter through the catalog quickly. For example, you can use facets to let your users filter results by product attributes such as price, color, size or reviews. Use a product recommendation engine to suggest products to your users, based on their user profile, their location or previous search queries.

If we look at the quality of the query itself, there are a lot of ways to improve that too. These include things like spelling correction, suggestions, synonyms, autocomplete, stemming and phrase search.

And it is important to note that search can be used in different ways to interact with your online store. People use it to just browse your online store and filter through the large amount of products, but they also resort to the search bar when they have a specific product in mind. While this might seem easy, it can still be challenging depending on the query, let's take a look at this example on Google:

what year did brian set foor on pluto

Credits and more fun examples here

For made up queries like the one above, it's hard to come up with a search result that makes actual sense as there is no real answer. In the same way this might happen, when users are searching for something that does not exist on your website, leading to the dreaded zero results page.

While the result above might be slightly confusing, it's the best Google can come up with at the moment, and we could argue that it makes sense that I'm looking for information about the moon landing instead of me landing on Pluto. Using semantic search you will never end up with zero results, which sounds great but it could be that you're not entirely confident about the quality of the results. In that case you could mention it to the user an maybe try to suggest an alternative query instead, similar to what Google is doing: 'Did you mean ...'.

More than just search

Search for e-commerce is a highly researched topic, it's not just a simple input field on a webpage. A lot of thought goes into the order of items (think about marketing and promotions), the way the items are displayed and the filters or facets that will be shown to the user, these facets could even be personalized for the user or query!

Page with filters

While filtering and faceting has been around for a long time on classic search, it's actually a bit harder to implement using semantic search, especially using scalable vector search technologies like approximate nearest neighbour search (ANN, using things like Faiss or Annoy). These technologies create an index for the complete search corpus (for example your product catalog) and store it in a highly optimized way. Storing the index like that has the downside that it is not possible to exclude parts of the indexed items while doing a search.

"As shown in a previous blogpost, searching with vectors opens up a world of possibilities: searching by or for images and searching across languages; it's all possible"

Another downside is that semantic search tends to work better with longer queries which are written using natural text. It's not a given that semantic search will instantly work well for all types of queries, it really depends on the model and how it was trained. It might be necessary to train a custom model, add query understanding or to add (automatic) filtering in order to get good results.

In the next section we'll take a look what Microsoft has done to tackle these problems in order to add support for semantic search.

Hybrid solutions: re-ranking

In order to offer a fully fledged search engine which still provides sufficient control over the results, Microsoft decided to go for a hybrid solution: using both an existing 'classic' search algorithm and wrapping it with some semantic magic. Basically adding a layer on top of the existing system in order to improve the search result rankings. This means that you'll still receive the same search results, but they will probably be shown to you in a different order.

Semantic re-ranking the search results

"Here you can see that classic search with semantic search do not agree, hit 1 ends up as hit 50 with semantic search."

Azure Cognitive Search

Azure Cognitive Search has a beta program for semantic search, for which you can apply here. The core of the search system will still be their classic search algorithm, implemented on top of Lucene, which uses BM25 for relevance scoring. Azure Cognitive Search provides most of the functionalities we discussed before: filtering, boosting, faceting and more advanced things like suggestions and autocomplete.

When you enable semantic search, you basically add an extra layer to this algorithm which will re-rank (or 're-order') the top-k results by using semantic models. If you fetch search results for page 1 of your search page, they will re-rank the top 50 results using to their own semantic algorithm. The algorithm is pretty interesting: first they summarize all documents, after which they score these summarizations by conceptual and semantic relevance for the given search query.

Azure Semantic Search Diagram

Note that this is very different from full semantic search, as it only applies semantic re-ranking on the top-50 search results that were returned by the classic search algorithm. One of the downsides is that these top-50 results might not contain all items that would have been matched using full semantic search. Therefore, there is a chance that semantic search would have found completely different results, for example because of better query understanding, as that is one of the strengths of this semantic technology. Nonetheless, semantic re-ranking was one of the biggest 'single improvements' Microsoft has ever seen:

"By enabling semantic re-ranking, Microsoft found that the clickthrough rate increased by 2.0 ~ 4.5 percent, depending on the length of the query. This was the largest single improvement of key performance indicators for search engagement the Microsoft Docs team has ever seen. (source)"

Conclusion

Semantic search is a great way to improve search experience on your site. Both Microsoft and Google make huge claims about improved search performance by implementing semantic search. Improving search is important as it may help you convert twice as much customers!

As with any technology, semantic search has it's own strengths and weaknesses. It will take some time for the tools and libraries to mature, before full semantic search becomes mainstream.

Microsoft has their own unique approach to semantic search. While it is not full semantic search, they have been able to combine the flexibility and configurability of classic search with the improved relevance that semantic search brings. Especially if you're already using Azure Cognitive Search, just ask them to enable semantic search and start comparing!