How to Choose the Right Models for Your Apps | Azure AI

Mechanics Team
8 min readOct 15, 2024

--

With more than 1700 models to choose from on Azure, selecting the right one is key to enabling the right capabilities, at the right price point, and with the right protections in place. That’s where the Azure AI model catalog and model benchmarks can help.

With Azure AI, you can seamlessly integrate powerful GenAI models into your app development process, making your applications smarter, more efficient, and highly scalable. Access a vast selection of AI models, from sophisticated large language models to efficient small models that can run offline.

Matt McSpirit, Microsoft Azure expert, shows how to compare and select the right AI model for your specific needs. Azure AI’s model benchmarks evaluate models on accuracy, coherence, groundedness, fluency, relevance, and similarity. Experiment with different models in Azure AI Studio or your preferred coding environment, and optimize costs with serverless pricing options.

Choose the right AI model for your app.

See how to make apps smarter, more efficient, and more user-friendly. Get started.

Switch between models.

Integrate with LLM development tools, and choose embedding models. Use your environment of choice to access AI models via Azure AI’s unified API. See it here.

Compare different models.

Use the Azure AI model inference package, and test models with your own data in your preferred coding environment. Check it out.

Watch our video here.

QUICK LINKS:

00:00 — Build GenAI powered apps

00:53 — Model choice

02:11 — Use your environments of choice

02:44 — Choose the right AI model

05:28 — Compare models

08:04 — Wrap up

Link References

Get started at https://ai.azure.com

See data, privacy, and security for use of models at https://aka.ms/AzureAImodelcontrols

Unfamiliar with Microsoft Mechanics?

As Microsoft’s official video series for IT, you can watch and share valuable content and demos of current and upcoming tech from the people who build it at Microsoft.

• Subscribe to our YouTube: https://www.youtube.com/c/MicrosoftMechanicsSeries

• Talk with other IT Pros, join us on the Microsoft Tech Community: https://techcommunity.microsoft.com/t5/microsoft-mechanics-blog/bg-p/MicrosoftMechanicsBlog

• Watch or listen from anywhere, subscribe to our podcast: https://microsoftmechanics.libsyn.com/podcast

Keep getting this insider knowledge, join us on social:

• Follow us on Twitter: https://twitter.com/MSFTMechanics

• Share knowledge on LinkedIn: https://www.linkedin.com/company/microsoft-mechanics/

• Enjoy us on Instagram: https://www.instagram.com/msftmechanics/

• Loosen up with us on TikTok: https://www.tiktok.com/@msftmechanics

Video Transcript:

-Gen AI has forever changed the way we interact with apps and data, but how do you find and integrate the right AI model for your app? In fact, integrating Gen AI into your app dev process can make your app smarter, more efficient, and more user-friendly. Responses to user inputs are more personalized, engaging, and natural, and because Gen AI models can reason over large volumes of data and interactions, it’s easier to scale your app to accommodate the growing needs of your user base without compromising app performance. Additionally, when you combine and orchestrate multiple AI models with different functional components of your app, you can easily automate repetitive tasks and processes. In the next few minutes, I’ll walk you through choosing the right AI model for your app, and as part of that, how to compare models, as well as your options to deploy and minimize the cost of inference, all from the studio in Azure AI as well as in code. First, let’s take a look at model choice.

-Here, the choice you have today to incorporate different classes of AI models in your apps has never been broader. Everything from large language models capable of sophisticated reasoning based on their vast open-world knowledge comprising multiple billions, and even trillions of parameters hosted on Azure supercomputer infrastructure, to powerful quantized small language models that can also run locally and offline, such as the Phi family of models from Microsoft. In the studio, we provide a continually expanding central location to bring you the best selection of AI models as you develop your apps. The Model Catalog in Azure AI currently hosts more than 1,700 models, both premium models and hundreds of open models organized by collections. There are even regional flavored large language models, such as Core42 JAIS that support the Arabic spoken language, and Mistral Large, focused on European spoken languages. All models available on Azure have been vetted to meet Microsoft’s stringent security and compliance standards, which you can learn more about at aka.ms/AzureAImodelcontrols.

-Additionally, using the Hidden Layer Model Scanner, models are scanned for embedded malware and back doors and common vulnerabilities and exposures to detect tampering and corruption across model layers before being hosted on Azure. Importantly, the choice you get with the Azure AI service also extends to how you can access these models, from your favorite tools and languages via the Azure AI model inference API, which with its unified API schema, works across all models, making it super easy to switch between models. It’s also integrated with LLM app development tools like LangChain, as well as Semantic Kernel, Azure AI Prompt Flow and more. We also let you choose your embedding model for vector generations such as ADA from OpenAI or Cohere.

-Next, with access to so much choice, let’s get into choosing the right AI model for your needs. Here, it’s essential to clearly define your app’s use case and the specific tasks it needs to accomplish, where in the Azure AI studio, you can start by filtering models by inferencing task. For example, if natural language processing is the main priority, for tasks like chat completion, you can see recommendations for models like OpenAI ChatGPT, Microsoft’s various SLMs like Phi models, Meta’s Llama or Mistral as options. For audio-focused tasks like speech recognition or generating speech from text, you could consider OpenAI Whisper, or for computer vision tasks like text to image for generating contextually relevant images from text prompts, DALL-E 3 and Stability AI appear as potential options.

-Now, if you need more precision and domain knowledge, here is where you can proactively look for off-the-shelf models, for example, Nixtla’s TimeGEN model for time-series forecasting and anomaly detection. Additionally, if you and your team have the expertise, you can start with a foundational base model and fine-tune the model you want right from Azure AI. That said, ultimately top of everyone’s mind is cost. Here, to optimize your app budget, you have the choice of hundreds of free open models, and even if you start there, you can move on to more performant models as needed, and that’s where our Model as a Service lets you use our Serverless API option that provides serverless pricing for dozens of foundational models with pay-go inference input and output tokens to literally pay as you go.

-Alternatively, you can choose to run hundreds of open models on hosted hardware with pay-per-GPU managed compute. The trade-offs based on your use case lie in model quality and the sophistication of the models themselves, combined with the impact on inference costs. As you start to build your app prototype, the good news is the studio in the Azure AI service makes it really easy for you to make decisions on choosing the right AI model for your app. One path is to choose one and experiment. Of course, you’ll need an Azure subscription and access to Azure AI for that, and once you select Deploy, that’s going to connect your Azure subscription with the Azure marketplace, so that you can be billed for use.

-From there, in the studio playground, it’s easy to test the model you’ve deployed by crafting the system message to instruct the model on the purpose and style of response, and you can experiment with sample prompts to test the output based on its open world knowledge. You can even continue this experimentation by adding your own data and testing the model responses in context of your data. And by the way, your prompts and completions are not shared with model providers or used for training models, it’s your private data. That said, you’ll likely want to compare multiple models, and that’s where model benchmarks come in. For example, if you’re looking to build an app primarily for chat completion, once you’ve filtered the list in the model catalog, you can head over to Model benchmarks, which are scored based on multiple industry datasets for breakdowns on each model across multiple categories. First is Model accuracy, which is just like it sounds. The line in the middle is based on averages across the different benchmarks.

-Next, Model coherence evaluates how well the model generates smooth and natural-sounding responses, then Model groundedness looks at how well the model refers to source materials in its default training set. Model fluency measures language proficiency of answers, Model relevance then scores how well the model meets expectation based on prompts, then Model similarity measures the similarity between a source data sentence and the generated response. And so now for example, if I want to optimize for Model coherence, I might decide to choose this Meta Llama 3.1 model, and I can also look at more details on model and the pricing.

-You can also apply Model Benchmarks to select your embedding model to create vectors, where data is given numeric, coordinate-like values to map similar terms based on contextual similarity. These are then used for vector-based search to retrieve grounding data for models in Retrieval Augmented Generation, the most common ones are also compared in the model benchmarks, and here, you’ll see which embeddings model perform best across categories like Classification, Clustering, and more.

-Beyond the studio, you can also compare different models using the Azure AI model inference package so that you can test models with your own data in your preferred coding environment. The only difference being the endpoint’s Target URI and Key, which makes it easy to switch between models. So, here we have, for example, set up three different notebooks using three different models to test generated answers from our custom data. Running them all at the same time with the same prompt can help provide like-for-like comparisons, and once you’ve made your model selection, have your app running, you can continue to evaluate how well it performs in code. There are basic built-in evaluators for Relevance, Fluency, Coherence, and Groundedness with scores for each on a one to five scale, and it will average your scores over a handful of runs. Additionally, you can use Application Insights dashboards to visualize model performance and other key metrics over time and across multiple runs, including detailed evaluation score trends, token usage over time and by model, which can help you evaluate costs, along with model duration, which is useful if you’re testing multiple models.

-So, now you know the essential steps for evaluating AI models based on your use case, from initial considerations to comparing models, and exploring deployment options with Azure AI. Beyond the model choices we give you, you can also benefit from responsible AI controls with content filters that work for prompt inputs as well as generated response outputs, and the Azure platform overall provides the scalability, intelligence, and security your Gen AI apps need, including extensive global data center reach, seamless integration with other Microsoft products, and of course, one of the most comprehensive suites of AI and machine learning tools and more.

-To learn more and get started, check out ai.azure.com, subscribe to Microsoft Mechanics for more explanations and tech updates, and thanks for watching.

--

--

No responses yet