Code AI apps on Azure — Python, Prompty & Visual Studio

10 min readOct 9, 2024

Build your own custom applications with Azure AI right from your code. With the Azure AI Studio, leverage over 1,700 models, seamlessly integrating them into your coding environment to create tailored app experiences. Utilize features like Retrieval Augmented Generation and vector search to enrich responses with contextual information, as well as prebuilt Azure AI services to incorporate cognitive skills such as language, vision, and safety detection.

Dan Taylor, Principal Product Architect for Microsoft Azure AI, also shares how to streamline your development process with tools for orchestration and monitoring. Use templates to simplify resource deployment and run evaluations against large datasets to optimize performance. With Application Insights, gain visibility into your app’s metrics, enabling data-driven decisions for continuous improvement.

Build custom AI apps with Azure AI.

Orchestrate models, automate deployments, run evaluations, and monitor apps with GenAIOps. Get started.

Utilize vector search with Azure AI to enhance model responses.

Convert prompts into vector embeddings for precise data retrieval. See how to ground models with contextually accurate answers and detailed references from your dataset.

Customize your dashboards.

Customize to visualize key metrics, including evaluation scores, token usage, and model duration. Access detailed trace information using a transaction search. See all the options to monitor your deployed GenAI app.

Watch our video here:

QUICK LINKS:

00:00 — Build custom AI apps with the studio in Azure AI
00:27 — Leverage the studio in Azure AI
01:37 — Build apps grounded on custom data
03:03 — Retrieval Augmented Generation
03:48 — Vector search
04:17 — Set up your coding environment
06:11 — How to build in code
07:16 — Traces
07:45 — Evaluate performance against large data set
08:19 — Options for monitoring
08:58 — Wrap up

Link References

To get started, go to https://ai.azure.com

Check out our code samples at https://aka.ms/AIAppTemplates

Unfamiliar with Microsoft Mechanics?

As Microsoft’s official video series for IT, you can watch and share valuable content and demos of current and upcoming tech from the people who build it at Microsoft.

Subscribe to our YouTube: https://www.youtube.com/c/MicrosoftMechanicsSeries
Talk with other IT Pros, join us on the Microsoft Tech Community: https://techcommunity.microsoft.com/t5/microsoft-mechanics-blog/bg-p/MicrosoftMechanicsBlog
Watch or listen from anywhere, subscribe to our podcast: https://microsoftmechanics.libsyn.com/podcast

Keep getting this insider knowledge, join us on social:

Follow us on Twitter: https://twitter.com/MSFTMechanics
Share knowledge on LinkedIn: https://www.linkedin.com/company/microsoft-mechanics/
Enjoy us on Instagram: https://www.instagram.com/msftmechanics/
Loosen up with us on TikTok: https://www.tiktok.com/@msftmechanics

Video Transcript:

-You now have everything you need to build your own custom AI app experiences with the studio in Azure AI, which is now generally available. In fact, today I’ll show how you can use the studio together with your code as you build apps grounded with your data. I’ll also show how you can orchestrate different models together across multi-step processes and set up automated deployments, run evaluations and continuously monitor your production app as part of GenAIOps. Now, don’t think of the studio as just the user interface.

-You can leverage it to harness the Azure AI resources you create from it and call them directly from your code. It’s also a great way to familiarize yourself with the building blocks for your AI apps, and you can get to it from ai.azure.com. Even without signing in, you can view the model catalog, which gives you access to the latest models.

-There are more than 1700 of them and counting from OpenAI, Microsoft, Meta, Mistral and more. Then in model benchmarks, you can get help with choosing the right model for your app experience with comparisons made based on model accuracy and averages across different models. And the same applies to coherence, which evaluates how well the model generates smooth and natural sounding responses. Groundedness is a measure of how well the model refers to the provided source materials.

-Fluency looks at the language proficiency of the answers and relevance helps you gauge how well the model meets expectations based on prompts and more. Next, if we navigate to AI services, you’ll see that we give you access to prebuilt Azure AI services to build multimodal applications that can incorporate cognitive skills like speech, language and translation, vision and OCR, and detect harmful or inappropriate inputs and outputs using content safety. Now let me show you how you can use the studio from Azure AI together with your coding environment to build apps grounded on your custom data.

-Once you’ve signed in with your Azure account, you can create a project for your app. I’ll do that and give it a name. This project allows you as a developer to securely connect to the Azure resources you need to consume models, organizes the assets for your project and provisions the necessary resources so that you can get right to work. So now with the core resources deployed, the next thing you need to do is deploy a model.

-Back in the model catalog, I’ll select GPT-4o, which is an all around high performing model. Then I just need to hit Deploy and I’ll leave the default name and deployment details and confirm. Deploying this model takes just a few seconds, and if you have multiple models deployed, you can easily switch between them. Now let’s head to the playground to test our GPT-4o model, which is automatically selected. The simplest way to give your app a personality is by defining the system message.

-Think of this as an instruction set that’s appended in the first message of every chat session, I’ll keep the default system message and add a sentence, be cheerful in your responses and use emojis. For context, the app I’m building is meant to help people make purchasing decisions for Contoso’s outdoor products. I’ll prompt it to generate a response based on its open world training, and I’ll say, “What kind of shoes do you have?” It responds in the expected way because it has no context or product knowledge, but it’s still cheerful and as instructed, it has added some fun emojis.

-Now, if we want it to respond with context, I need to ground the model with reference data or what’s known as Retrieval Augmented Generation. Still in the playground, I’ll select the Add Your Data tab, and because I’m starting from scratch, I’ll add a new data source. I’m given options for an existing Azure AI search index, or to create a new index automatically by uploading files from my device or by pointing to existing data sources.

-I’ll choose the upload option and I’ll pick the files that I want. I have a few dozen product info files that I’ll select, and then I’ll select the Search service for my index. And by default, it’s going to vectorize my data and also enable keyword matches with hybrid search. I’ll go ahead and hit create, and after a few moments, our vector search index will be created and can be used to retrieve data relevant to user prompts.

-Now, if you are new to Vector search, think of vectors like floating GPS coordinates. When a prompt is submitted, it is also converted into vector embeddings, and by looking for the closest coordinate matches in the dataset, grounding data is then retrieved. And with that, let’s go back to the playground. And if we try our prompt again, we see that with the Vector Search Index added, the model is able to generate an answer based on our grounding knowledge. It even references our files.

-And when I click into one of the references, you can see all of the details of our hiking shoe. So now that it’s working in the playground, let’s build this capability into our app. First to set up your coding environment, you’ll want to install the OpenAI SDK, and the Azure Identity Library. This will call the Azure OpenAI service from our Python code using our Azure credential and without needing API keys.

-Now with VS code setup, I can return to the playground for a second and I’ll grab the code we’ve built in the playground so far. I’ll copy that code into my clipboard and move back to VS code and paste it in. I can then open the terminal and run the code. Once it’s run, you can see the responses output with the same information that we saw in the playground, but in JSON format, which enables me to integrate this with the rest of my code. And there we have it. We have now implemented Retrieval Augmented Generation in our app.

-That said, real apps can take multiple steps with different prompts and models to perform a given task. They could be customized based on a user prompt with an orchestrator running a logical steps for completing a task. For example, here we see a solution that performs an orchestrated sequence of tasks to write an article for Contoso Creative. One of those being the product search step we just built.

-There’s a research task to help curate the right information, for example, trends to give us writing ideas, a product task to take the research summary and connect it back to actual products in our catalog. And finally, an assignment task, which then takes the information from the last two steps and writes an article. Each step is its own discreet task within the end-to-end orchestration and subsequently builds on the previous step using session history.

-Now let’s run it, and that will take a moment to parse through the information on the web and correlate against the product catalog before generating the article. It’s found a trend here with the Quinzee shelters for camping. It then sends the article to the editor task for review. The editor either accepts or rejects the article and provides feedback if the article needs additional edits. And here is our article featuring the Quinzee shelter, and it even references a similar item in our catalog, the TrailMaster X4 Tent. So now let’s look at how you would build this.

-Now you could use the studio’s new assistant playground with additional tools for building systems like this, but I’m going to show you how to build this in code and how you set up GenAIOps as you build your app. And here, instead of creating the project in the studio, I’ll start my assistant using one of our new templates in VS code. I just need to use a simple AZD-Op command to deploy all of the necessary resources in one step. And this takes a few moments to run.

-While it provisions, I’ll show you the available templates at aka.ms/AIAppTemplates. There are quite a few available with more on the way. And back in VS code, you’ll also want to run a GitHub pipeline config command, and I’ve already done this disabled little time. Next, I’ll move over to the Orchestrator Python file. Here you can see three of our four tasks and each are using different models and prompts.

-Now, I’ll move over into a prompting file for a researcher where I can iterate in our local incode playground if I want. A prompting file is essentially a prompt template and each of our tasks uses prompts to give instructions to the models. And all task roles in this example have their own prompting file that runs from Python Code. Now, one thing to note, because there are so many interactions happening, it can be really hard to see where things go wrong and debug any issues in your code, and that’s where traces come in. I can run the orchestrator file to test this out, and you’ll see all the raw prompt and completion information from the run.

-From here, I can pull up a trace. Here we see each step and model call that was taken in creating the article and see the details to better understand and debugger app. And for each step in our trace, we can see what was sent to the model. So now that I’ve done a manual test of the experience, I can move on to the next step to evaluate how it performs against a larger data set.

-For that, I’m going to use our orchestrator file to run an evaluation with Azure AI evaluators built into our Python file. When I run it and let it complete, it tests using four basic built-in evaluators for relevance, fluency, coherence, and groundedness with scores for each, the higher the better on a one to five scale, along with the averages across a few runs. From here, I can use prompt templates to optimize prompt engineering, add content filters, or update the system message and evaluate and iterate until I’m happy with the outcome.

-Now let’s jump ahead to production and look at your options for monitoring. If you’re already using application insights, the reporting visuals can be added to your dashboards. This in fact is the dashboard for the deployed app with all of the metrics we’ve configured to monitor, including the evaluation scores across runs, token usage over time and by model, as well as model duration, which is useful if you’re testing out multiple models.

-And using a transaction search, you can get down to detailed trace information like we saw before in DS code. But here, it’s across all runs and traces for a defined range of time. And with that, we’ve built and tested our GenAI app along with the retrieval augmented generation pattern.

-As you’ve seen, you now have everything you need to build your own custom experiences with Azure AI right from your code. To get started, go to ai.azure.com. Check out our code samples at aka.ms/AIAppTemplates. Subscribe to Mechanics if you haven’t already, and thanks for watching.