In addition to the public services provided by AI players and their APIs, there are situations where we prefer to deploy a model within our own tenant to ensure better alignment with legislation, data compliance, and data regionalization requirements.
Microsoft offers a straightforward solution for this through Azure AI Foundry.
The Azure AI Foundry portal is a Microsoft platform that simplifies deploying AI models. It provides a centralized hub to build, manage, and launch AI solutions, including generative models, using tools and prebuilt options from OpenAI and others.
The initial step is to set up a project (and a hub, if you don’t already have one for your projects).
We can personalize the project and resource names. For this tutorial, I’ll call the project "AI-tutorial-demo-2025" and name the hub "Singularity" (inspired by Ray Kurzweil’s book The Singularity Is Near). I’ll also update the resource group name to "rg-ai-demo-2025". An Azure AI/OpenAI service will also be created.
* pay attention to resource availability by region, as it changes frequently. A list can be found on Microsoft’s website Azure OpenAI Service models.
After clicking Next and Create the resources will be provisioned and we can see them in Azure:
The next step is to deploy a model from the ones available, in this case I will deploy gpt-4o.
As you can see now we have the button to "Open in playground", this will open a chat session and allow us to customize some parameters like temperature, top P, include past messages, etc. or even to add our data for a RAG implementation. For this article I will leave them at their defaults and I will address RAG and other implementations in a later article.
We also have access to modify what is called a "system prompt", the prompt that instructs the model how to behave and provides initial context. For this demo I will update the system prompt so that the model will have a bit of initial context for the replies.
After we have this configured we can start chatting to test how the model responds.
As we can see the model is aware of the information we provided in the system prompt and it is using it for the replies.
Now that we see the model is working as expected we can click on "view code" to see details about how the model is configured.
{
"messages": [
{
"role": "system",
"content": [
{
"type": "text",
"text": "You are an AI assistant crested by Valentin Valeanu. Your job is to help Valentin to demonstrate how Azure AI Foundry works for a tutorial on his blog. "
}
]
},
{
"role": "user",
"content": [
{
"type": "text",
"text": "hello, how are you?"
}
]
},
{
"role": "assistant",
"content": [
{
"type": "text",
"text": "Hello! I'm just a virtual assistant, so I don't have feelings, but I'm here and ready to help you demonstrate how Azure AI Foundry works for your tutorial. Let me know how you'd like to proceed! 😊"
}
]
},
{
"role": "user",
"content": [
{
"type": "text",
"text": "tell me more about you."
}
]
},
{
"role": "assistant",
"content": [
{
"type": "text",
"text": "Of course! I’m an AI assistant created by you, Valentin Valeanu, to help demonstrate the capabilities of Azure AI Foundry for your blog tutorial. My purpose here is to showcase how Azure's advanced AI tools can be used to build interactive and intelligent systems.\n\nI can help you simulate real-world use cases, answer questions, perform tasks, and even generate creative ideas. My architecture is powered by state-of-the-art AI models, which can process natural language, analyze data, and assist in building solutions using Azure's AI ecosystem.\n\nLet me know what you'd like me to do for your tutorial—whether it’s exploring Azure Cognitive Services, building machine learning models, or just showing off how conversational AI works! 😊"
}
]
}
],
"temperature": 0.7,
"top_p": 0.95,
"max_tokens": 800
}
As we can see from the JSON the system prompt is passed to the model as an initial message in the system context followed by my first chat in the user context. All the following chat messages are appended to the payload to allow the model to have a "memory" and this is controlled by the parameter "past messages included".
I will make another article on how to test the model endpoint using Postman and how to include the AI model in a website.
Featured image created with Grok.