AI Semantic Search for Your Website with Azure Cosmos DB | E-commerce

Mechanics Team
9 min readMay 1, 2024

Build low-latency recommendation engines with Azure Cosmos DB and OpenAI. Elevate user experience with vector-based semantic search, going beyond traditional keyword limitations to deliver personalized recommendations in real-time. With pre-trained models stored in Cosmos DB, tailor product predictions based on user interactions and preferences. Explore the power of augmented vector search for optimized results prioritized by relevance.

Kirill Gavrylyuk, Azure Cosmos DB General Manager, shows how to build recommendation systems with limitless scalability, leveraging pre-computed vectors and collaborative filtering for next-level, real-time insights.

Build low-latency recommendation engines.

Use Azure Cosmos DB and Azure OpenAI Service, and get started.

Elevate search functionality with vector-based semantic search.

Discover relevant items with user intent. Check it out.

Personalized product predictions

Generate predictions based on user and product interactions. See how it works in Azure Cosmos DB.

Watch our video here:


00:00 — Build a low latency recommendation engine
00:59 — Keyword search
01:46 — Vector-based semantic search
02:39 — Vector search built-in to Cosmos DB
03:56 — Model training
05:18 — Code for product predictions
06:02 — Test code for product prediction
06:39 — Augmented vector search
08:23 — Test code for augmented vector search
09:16 — Wrap up

Link References

Walk through an example at

Try out Cosmos DB for MongoDB for free at

Unfamiliar with Microsoft Mechanics?

As Microsoft’s official video series for IT, you can watch and share valuable content and demos of current and upcoming tech from the people who build it at Microsoft.

Keep getting this insider knowledge, join us on social:

Video Transcript:

-Imagine finding your next purchase just by describing what you want to do, using natural language, with results returned in real-time, like asking for everything you’ll need to climb Mount Kilimanjaro that links directly to the appropriate items in your catalog to return the results for you to consider in just a matter of milliseconds, along with a just-in-time recommendation for items you are statistically likely to purchase.

-Now, as a developer, building a next-level and low-latency recommendation engine for distributed apps like this is not as difficult as you may think: We’ll use a combination of the vCore-based Azure Cosmos DB for MongoDB along with Azure OpenAI to generate vector embeddings and Cosmos DB’s built-in vector search for super fast similarity lookups over conversational data.

-And we’ll use a popular collaborative filtering model, Alternating Least Squares, ALS, in PySpark for learned and predictive recommendations. To show you what’s possible, let’s first look at the experience without AI and Vector Search. This is our e-commerce website, specializing in winter outdoor sports equipment.

-I’ll start with a classic text-based keyword search and type snowboards here in our text box and press Enter. And as you’d expect, I get a results page with a few snowboards. But what if I don’t know exactly what I want? Or maybe I want something very specific that is not in our keyword index.

-This time, I’ll try something different. I’ll type, “I want to snowboard like an Olympic champion.” And as you can see, this yields zero results. As you’ve probably experienced, keyword search works well when words or text strings are found in a database or search index, but it cannot apply semantic meaning.

-Let me now show you the difference with vector-based semantic search. I’ll type the same query as before, “I want to snowboard like an Olympic champion.” And here you can see I get a page of results. The very first result is a Shaun White snowboard from the three-time Olympic champion.

-In this case, we’re combining the power of our predictive recommendation model, ALS, along with the results from Cosmos DB’s built-in vector search, and Azure OpenAI GPT-4 for personalization of the response. And to keep me engaged and to stop me from clicking away, if I click on the Shaun White snowboard here, I’m also presented with a list of other products that I might like based on my preferences, my location, similarity between items, and user ratings. More on that in a moment.

-And as you saw, this happens in real time without delays that could make me hit the back button. Speed and relevance of results is important here, which is why Cosmos DB with its single digit millisecond latency and built-in vector search for semantic similarity is such an advantage. Let me explain how it works.

-First on the backend, for data in your database, we use a helper function that calls Azure OpenAI’s text embedding 3 model to automatically generate vector embeddings in real time as data is ingested into Cosmos DB. Think of embeddings as a coordinate-like way to refer to chunks of data in your database. And later, those are used for lookups.

-Then in the app frontend, when a user performs a search, their search string is also converted to a vector embedding by Azure OpenAI, and the lookup will try to find the dimensionally closest matches between the search string embedding and the embeddings in the product database. We then use the ALS model that has been trained on data, including the user’s purchase history, products entries in the database, and their ratings to re-rank the results by likelihood of purchase with collaborative filtering.

-This is then presented to the Azure OpenAI GPT-4 arge language model to generate a conversational response. And because vector search is built-in to Azure Cosmos DB, you don’t have to move the data to a separate vector database. Let me show you the steps to build this recommendation engine, first by looking at model training.

-This is where you’ll want to do the predictions ahead of time; store them in Cosmos DB and use them for real-time personalized recommendations. We’ll use the ALS model from the PySpark package to make our recommendations Now we’ll skip over some of the configuration setup and get right into the model. We’ve split the data so 80% was used to train the model, and 20% was used to evaluate how well it performs.

-And then we created an ALS model and configured it to train multiple different models so we could choose the one with the best parameters. The training itself takes a while, so we skipped that here to have a fully trained model. Now we’ve picked the best model and see that it has a root mean square error of 0.64. This means that, on average, we would expect it to be about 0.64 off the predicted rating, which in our case ends up being less than 10%. Not too bad.

-Then we used the model to make predictions for all of our users, and all of the products they have not rated before, and saved those to Cosmos DB. This way when we look up predictions for specific users, we can simply do a point read with the user ID to retrieve the predictions from Cosmos DB in under 10 milliseconds. Now, let’s look at the code for product predictions based on specific users and the products they’re viewing.

-This function takes the current user ID and the product ID from the product page that the user has opened and returns the user’s predicted products. The first step is to execute this point read for this user’s product predictions. Next, we need to remove the current product if it is one of the predicted products for this user. This is an important step as we don’t want to display a recommendation for the same product they’re actively looking at.

-Finally, we will fetch the product details to display to the user for each of the product predictions. And add the ratings for each product to the resulting list. Then return the list to the user. And now with the code complete for our function, let’s test it out. Here I have some values to feed into the function we’ve defined above.

-This includes user_id, product_id for the Shaun White snowboard we saw earlier, and we’ll return 10 results just so you can see a more complete list. Let’s run the function. And here you can see the recommended products. Notice the ratings on the right-hand side are in descending order. The higher the rating, the stronger the prediction If you remember, this list is what we saw on the right side of the screen when we clicked on the Shaun White snowboard.

-Now we’ll move on to our augmented vector search where we can again use these calculated predictions to improve the results based on what the user is most likely to buy. I’ll show what that code looks like. The first step is to generate vector embeddings from the user’s search text. Here we’re using our helper function to generate Azure OpenAI embeddings.

-Next, we execute a point read to grab all of the predicted products for the user. And this time, we’ll return every product so we can have a more complete set of results from our vector search. This is what we will use to perform our filtered vector search in Cosmos DB, so we’ll pass a list of product IDs to the $in operator for our vector query. Hybrid queries like this is an advantage of using a database with built-in vector search.

-Now it’s time for the vector search itself. This takes the array of embeddings from the user’s search and the filter criteria of predicted products. Then in my projection, I’ll return the entire product document, as well as the similarity score that I’ll show you in a minute. Next, after the vector search executes, I want to add in the prediction rating for that user to each product in my results. And our last step is to rank the results.

-As I mentioned, we want to return the top result from our vector search, which will have the highest similarity score. Then order the remaining results by the prediction rating for each of the remaining products. So I’ll remove the top vector search result, then sort the remaining results by their rating, then re-insert the top vector result back at the top of the list. And after all that, we can return the results to the user.

-With all the logic coded for our augmented semantic search to provide the best result for the user’s query and top-rated additional products, we can now test it out. I’ll use the same user_id I showed in the web app before, And I’ll also use the same text for the search we used earlier. Let’s run the cell. And notice how the top result is our Shaun White snowboard.

-This, of course, has the highest similarity score in our results. Coincidentally, it also has the highest rating too. The rest of the ratings are in descending order, but similarity scores are not. This is because of the sorting we did earlier to prioritize the order of the list based on rating as the highest prediction to buy, which is why these results are not all snowboards.

-And these are the results that we saw on the website when we enabled vector search. And with that, I’ve shown you the power of using pre-computed vectors in Azure Cosmos DB, combined with collaborative filtering and large language models for generative AI to help you build next-level, real-time recommendation systems of limitless scale.

-You can walk through this entire example yourself; we’ve published the eShop and our notebook on GitHub at And you can try out Cosmos DB for MongoDB for free. Check out our quickstart at Keep watching Microsoft Mechanics for the latest updates. and thank you for watching!