Welcome to this hands-on AI-900 lab session, where we explore the powerful Azure AI Speech Service and its speech recognition, text-to-speech, translation, and more capabilities! Whether you're preparing for the Microsoft AI-900 Certification or looking to integrate AI-powered voice solutions into your applications, this step-by-step demo will guide you through everything you need to know.
🔍 What You’ll Learn in This Video:
1️⃣ Introduction to Azure AI Speech Service and its key features
2️⃣ Setting up Speech Services in the Azure portal
3️⃣ Converting speech to text (STT) and text to speech (TTS)
4️⃣ Speech translation for multilingual applications
5️⃣ Speaker recognition to identify and authenticate users
6️⃣ Real-world applications in voice assistants, call centers, and accessibility solutions
🛠️ Who Is This For?
Beginners exploring AI-powered voice and speech technologies
Professionals preparing for the Microsoft AI-900 Certification
Developers and businesses looking to integrate speech AI into their applications
📌 Key Highlights:
✅ Speech-to-text & text-to-speech with high accuracy
✅ Real-time speech translation for global communication
✅ Speaker recognition & voice authentication
✅ Step-by-step hands-on demo using Azure AI Speech Service
Explore Our Other Azure Courses and Practice material on: https://www.youtube.com/@skilltechclub
🔍 What You’ll Learn in This Video:
1️⃣ Introduction to Azure AI Speech Service and its key features
2️⃣ Setting up Speech Services in the Azure portal
3️⃣ Converting speech to text (STT) and text to speech (TTS)
4️⃣ Speech translation for multilingual applications
5️⃣ Speaker recognition to identify and authenticate users
6️⃣ Real-world applications in voice assistants, call centers, and accessibility solutions
🛠️ Who Is This For?
Beginners exploring AI-powered voice and speech technologies
Professionals preparing for the Microsoft AI-900 Certification
Developers and businesses looking to integrate speech AI into their applications
📌 Key Highlights:
✅ Speech-to-text & text-to-speech with high accuracy
✅ Real-time speech translation for global communication
✅ Speaker recognition & voice authentication
✅ Step-by-step hands-on demo using Azure AI Speech Service
Explore Our Other Azure Courses and Practice material on: https://www.youtube.com/@skilltechclub
Category
🤖
TechTranscript
00:00good morning good afternoon and good evening my name is Maruti and I would like to welcome you
00:14to skill tech club here we are going to learn new things every day related to cloud technologies and
00:20AI and today I'm here with the new service of Azure AI which is speech service we are going
00:27to learn about Azure AI speech service where we will talk about speech synthesis speech recognition
00:33and many more things step by step with the practical demo so let's get started I have
00:41successfully logged into my ensure portal and as you can see right now I am going to create a new
00:47resource so I'm going to click on a create new resource and the service which I am going to
00:52create today is a speech service remember speech service is available as a separate standalone
00:57service in Azure cloud as well as if you are creating Azure AI service multi-service account
01:02this is also going to be part of that we are creating a separate service which is a speech
01:07service I'm going to create a new resource group called AI 900 RG because all the services are part
01:14of the official course which is AI 900 region is East US which is fine I'm going to give a meaningful
01:21name of the service and I'm going to choose a pricing tire which can be free or standard I'm going
01:31with standard right now and then everything else is fine we'll click on review plus create most of
01:37the time when you are creating this kind of a service the provisioning of the service is going
01:41to happen in this Azure portal which is portal.azure.com but on the successful creation of that service you
01:47are going to use that with the help of a separate studio like for vision you have vision studio for
01:54document you have document studio same way for a speech service you have a speech studio and that's
02:00what exactly we are going to use I just clicked on a create and then it's going to submit my deployment
02:05and within few minutes this particular service is going to be available now because we are learning
02:11speech service right now let me tell you one thing that this is going to be very useful if you
02:16are having a requirement of converting your voice into text or maybe you have a text and you want
02:22to generate a specific voice with that even in the multiple languages yes we have something which is
02:28known as speech gallery very Microsoft is actually providing various voices with multiple languages
02:34I'm going to introduce speech gallery also today in this particular video so let's get started with
02:39that yes I think my deployment is complete it is showing me that speech service is successfully created
02:46and it's right now loading the page of this it's showing me status of the service is active
02:51and if I really want to use this I just have to click on go to speech studio let's do that
02:59now you can see right now when I click on the speech studio it is actually going to load
03:03a separate portal this is what we call speech studio and by default if you are clicking on this based on
03:11your azure portal service creation then it will associate that speech studio with your account
03:16and that's exactly what it's going to do initially once this is done
03:25yes I'm done with this I have logged into my speech studio right now with my account
03:29and if I just click on the settings section the first thing which I have to do is I have to make sure
03:35that my speech service resource which I have created is selected here you can see right now
03:39it's showing me that the current selection is selecting this service my subscription is also
03:44visible there and it's also showing me what kind of an access I have on this I'm good to go I'll go
03:51back to speech studio inside the speech studio the first thing which I want you to focus is we have
03:56four separate sections we have speech capabilities by scenario and they are giving you certain scenarios and
04:02demos associated with that you have speech to text section where we are basically going to convert
04:07your speech into text then you have text to speech kind of a section also with that you have a voice
04:13gallery option and then you have a voice assistant kind of thing now let's start with the voice gallery
04:19first because we want to understand that what kind of voices microsoft is providing in this so I'm going
04:25to click on explore the voice gallery this is going to load a voice catalog where actually you're going
04:32to have multiple voice samples now if you want to generate a voice with a specific pronunciation or
04:39specific tone then that tonality that modality is also all going to be coming into this particular voice
04:44catalog as you can see if it is loaded here they are giving me a catalog with the multiple voices
04:50we have a female voice called eva we have andrew we have cora we have adam now all these are actually
04:57voice samples which i can actually check play and then try now you can see right now there is a
05:04speaking style which is also mentioned here like if i go into eva eva is a multilingual voice
05:09where they are actually providing in english which is based on united states plus 90 languages are available
05:16so for this single voice itself you have 90 different languages which are available and there
05:21are a couple of other voices also where you have 90 plus languages which are available now when i try
05:26to play this audio like this is my initial speaking style of eva i'm going to click on this play button
05:34cooking and gardening each episode is packed with inspiration and step-by-step tutorials to unleash your inner diy
05:42now this is perfectly fine this is speaking in english us and you can see we have 91 languages
05:47which are available in this let's say i want to see that how exactly this person is going to speak
05:52in german which is based in austria i'm going to play the sound this gesundheits ministerium gab grünes
06:04now obviously i don't understand german but yeah this is how it's going to speak same way you can try
06:08different voices you can try different languages and then while configuring this in your application
06:14you can actually choose which voice which language and what style you want to use when you want to
06:20convert your text into speech this is awesome seriously now there are a couple of next step which
06:26they are trying to highlight here that you can select your speech source you can configure that thing
06:31accordingly which we are not going in depth right now but this is something which is actually giving you
06:35a voice gallery with multiple voices and then there are also some of the voices which are very popular
06:42if you want this voice to be used like a news anchor or if you want that this is something which is going
06:48to be used with some different kind of a style you can actually customize this thing with a separate markup
06:54language which we are going to see in the coming videos but yes right now this is my voice gallery i'm going
07:00back to my speech studio and inside the speech studio the next thing which i want to do is i want to
07:05associate captions with speech to text now because we saw that we have a voice gallery voice gallery will
07:11be useful when you're going to use text to speech so you have some text and you want to convert that
07:17into voice then you're going to use this now we can first try this one which is speech to text and then
07:23we are going to try text to speech so let's say i want to try this one which is real-time speech to text
07:30is taking me to this particular page it's selecting my services source with that and all i'm going to do
07:36is i have to either drag and drop audio files if i have or i can click on the mic and then i can record
07:43my voice now let me try this once so i'm going to use mic first if this is not working i'll upload an
07:49audio file allow while using visit the site yes hey how are you doing i am just trying to check
07:58whether this is working or not and let's see now you can see i just stopped this thing and it's
08:08showing me hey how are you doing i'm just trying to check whether this is now this is something which
08:14i have said and then they have written this thing and this is stored here as one particular
08:18wave file now same way if i want to browse for a file
08:26i have some speech sample which i have downloaded from the official microsoft lab i'm opening that
08:36and you can see this particular audio file is actually having this text inside that
08:40so this is actually just having few lines of statement that ai enables us to build amazing
08:45software that can improve healthcare enable people to overcome physical disadvantages empower smart
08:52infrastructure create incredible entertainment experience and even save the planet now this is
08:59what which was mentioned in that audio file i'll know that thing now if you want to try the same lab
09:04with the step-by-step instructions this is an official lab of microsoft and the link of that official lab is
09:09given in the description of this video so i strongly recommend you to try this thing and check out this
09:15now once we have the speech to text which is real time speech to text let's go back and let's try
09:21something which is this one which is live chat author okay it's showing me that region is not supported
09:31there are certain services in speech service which are still not generally available globally available in
09:35all the regions when you go for this kind of one you have to make sure that you are creating your
09:40resource in a specific region now you can see right now i have created my resource in east us region
09:45while this is saying that please switch a resource support region to west us to west europe southeast
09:51asia south central us or whatever which are mentioned if i have created my resource in one of these
09:56regions then only i'll be able to use this thing now intentionally we have not created in this region
10:01because i want you to know this thing that regions are very important now right now we are not going
10:07with this one but in the coming videos if you want to see the demo of this one just comment down that
10:12in the comment of this particular video and we will work on it and we'll give you a separate video on that
10:18right now let me go back to speech studio and this thing next which we are going to try is going to be
10:25this one pronunciation assessment with speech to text now i strongly need this because my pronunciation
10:32is not that great uh i am just going to try certain things so you can see right now they have given us
10:38certain samples here and then if you want to check your pronunciation with this you can try the sample
10:44record this thing and then based on that they are going to give you an assessment which is here so let's
10:49say i want to try sample number two right now uh i'm going to record an audio with a microphone and then i'm
10:55going to speak this
10:59it took me a long time to learn where he came from the little prince who asked me so many questions
11:05never seems to hear the ones i ask him it was from words dropped by chance that little by little
11:14everything was revealed to me
11:18now i just saved this and now if you check right now they are showing me that my pronunciation score is
11:2294 which is obviously not that great i made some mistakes somewhere i took a little longer pause
11:28so this is actually going to give you a full assessment of your pronunciation
11:32now this is one of the tool which you can use for a correct pronunciation before going for
11:36any particular online recording or video kind of thing now this is something which is going to
11:42be used when you are creating an application which is focusing on reading speaking and even
11:48gaming you can see there is a gaming section also where you can try this thing and then you can
11:53actually improvise your pronunciation with that so there is a game associations also with that
11:59right now we are going back to the next thing which is now text to speech now we already saw voice
12:04gallery let's say i want to focus on custom voice or a personal voice kind of a thing let's say we'll try
12:11the personal voice configuration right now when i click on personal voice they are first going to show me that
12:17we have multiple things where we can use the personal voice and there are certain samples which they
12:21have given so we can use it as a voice assistant broadcast conversation or story you can explore
12:29some sample human voice prompts and their replicas in the different scenarios and across multiple
12:34languages let's say you want to try your own voice then you can just add a new voice in that list
12:40and then you can try that let me try some of the voice assistant things here so this is a prompt i've
12:46turned off 10 a.m alarm good morning today's weather is sunny with a high of 75 degrees you have two
12:55meetings scheduled and a reminder to call your mom how can i assist you further so this is something which
13:02is showing you the sample of that particular voice which is perfect for voice assistance same way if you
13:07want some emotional voices where people are actually whispering or shouting or maybe very excited you
13:13can try that kind of voice what i just won the lottery we're going on a dream vacation now this is showing
13:20me that this voice is very excited well same way this is going to be shouting watch out the ball is heading
13:27right towards you and this is whispering sure here is the note what else can i do for you now this all
13:35styles is all things you can use in your customization when you're developing app or when you're trying
13:41to customize your voice with that last but not the least we have multilingual where as i told you from
13:46the voice gallery we can try different languages with that now we have a prompt customization and they
13:52are giving us the statements in english chinese and french as of now and sometimes german or spanish also
13:58you can try this thing and then if you want to add your own voice you can just add your voice your voice will
14:04appear here and then you can create an ai voice easy from a human voice sample so that is going to
14:09create an ai voice from your voice so that whenever you're going to use that everyone is going to feel
14:14like this is the same person who's actually talking now right now if i click on this new voice it's actually
14:20showing me that background noise should not be there you should relax and then you should try a good
14:25microphone and then you can do this thing i'm not doing this right now but i'll surely request you that
14:30you should try this thing let's move forward if i click on got it it's going to show me that you
14:38just have a new voice configuration you just mention i then state your first and last name i'm aware that
14:45recording of my voice will be used to create and use a synthetic version of my voice and this is
14:51something which i have to do here now i'm going to mention this thing and i'm going to specify the state
14:56name of the company so this is something which i'm going to see right now but because these are
15:01personal information i do not want to do this thing so i strongly recommend you try this thing
15:06you can specify your voice talent name your company name into this and then you can provide this data
15:12and that's going to create your own voice which you can use anywhere also there is one particular
15:19message here which is for responsible use of ai whenever you are using any ai service you have to make
15:24sure that you are following responsible ai guidelines even microsoft and all the countries across the
15:31globe are taking strict actions if you're not following or if you're not using ai responsibly so make sure
15:37whatever you're going to use you're not going to misuse that so that's something which you have to
15:43understand from the responsible ai notice thank you so much this is maruti signing off i'll see you tomorrow