AI Voice Agents: Full Guide from Beginner to Pro
全トランスクリプト
hey guys welcome back to the channel in
this video I'm going to provide you with
a complete guide on AI Voice assistance
I'm going to answer some frequently
asked questions break down how AI voice
agents work today and cover some use
cases for businesses however the biggest
part of this video will be dedicated to
reviewing and comparing the three best
tools for building these assistance and
of course I'll include some practical
tutorials that's for sure so that by the
end of this video you'll be able to
build these AI voice Solutions yourself
if you're new to the channel my name is
Bogdan and I run an AI automation agency
called bosar agency we specialize in
providing ai chatbots ai voice systems
and automation solutions for businesses
so if you're looking for an AI color for
your business feel free to get in touch
at bosar do agency I've been in this
space for quite a while now and we
receive a lot of feedback on how people
perceive this technology today so you
know there is a common skepticism
towards AI based Voice assistance and I
think it's important to address that
upfront so I'll go over the main
concerns and questions just to make sure
we are all on the same page when it
comes to the types of solutions we are
discussing and the tasks we are
considering them for so the general
perception is that AI is going to change
a lot but today it's not there yet it is
for frustrating for customers to talk to
robots um they prefer human interactions
these AI voice assistants can't really
handle complex tasks Etc so long story
short AI colors are not there yet and
human interactions are just better and I
agree with that human interactions are
better AI is not here to replace humans
at least at this stage but it is here to
help us automate all the generic and you
know repetitive tasks such as answering
FAQs scheduling appointments redirecting
the calls to to the right Department uh
troubleshooting for these kind of tasks
AI is doing a great job many businesses
have implemented these Solutions and
they're saving a lot of money by not
wasting human resources on generic tasks
and it it worth noting that even these
basic tasks were not handled well by
older technology a good comparison here
is chatbots we used to have old school
rule-based chatbots that were pretty
limited they were predefined with a set
of rules and they couldn't handle
anything outside those rules but now
those chatbots are powered by AI right
and they use large language models to
understand user's query and then
generate an answer rather than just
spinning out a preset response the same
thing is happening with voice technology
we used to have those awful interactive
voice response systems like ivr systems
that work like a phone tree where
everything was preconfigured press one
for this department press two for
another and so on but now we have ai
powered voice assistants that use large
language models to understand users
inquiry and then generate an answer on
the spot without the need for
preconfigured options the output is
generated by AI based on the user's
input making the whole experience much
more Dynamic and effective okay this
little flowchart illustrates how
voice-based AI systems work it starts
with voice input which is converted to
text using speech recognition technology
the text input is then processed by the
large language model to generate the
text output and then this text is
converted back to speech using text to
speech technology resulting in The Voice
output of course it is incredibly
simplified in between it probably use
some kind of a logic platform to manage
the whole process but for now I want you
to understand that it takes audio input
converts it to text for processing and
then it converts the text output back to
audio that how it works right now there
is no direct audio input and output so
for example if you breathe heavily it
wouldn't be able to understand it
because it is hard to convert your
breath into text right however open AI
has released direct audio input and
output for a handful of users and it is
expected to be fully released by the end
of the year and in my opinion it will
make the communication with this
voice-based assistant even more
realistic and hence it will further
increase the demand for these kind of
solutions but for now I like to think of
it as an AI text based chatbot which
you're probably already familiar with
the only difference is that there are
two extra steps speech recognition at
the start and text to speech at the end
if you've already built AI text based
chatbots maybe using voice flow or other
chatbot Builders the same logic you can
apply here you already have the skills
you just need to switch to tools that
support AI voice-based assistance and
I'll show you exactly how to do it later
in this video but before that let's
answer the most common questions right
away is it usable yet yes businesses are
actively adopting these Solutions I'm
impressed by how much platforms for
building these voice systems have
improved over the last few months you
can now control latency you can add
filler sounds like H you know to make it
sound more human plus you can Implement
custom functions allowing this
assistants to not only answer questions
but handle tasks like scheduling
appointments creating CRM records making
outbound calls qualifying leads and much
more I've seen people implementing these
solutions for real estate agents law
firms dentists roofing companies and
many more really any business that has
quite straightforward workflow and
handles a good number of calls each
month can benefit from this technology
the second question is can it speak
other languages than English we've had
leads requesting bots in Spanish in
Czech and the answer is yes yes it can
but the quality depends on the language
and on the provider on the technology
provider for text to speech right but if
we take 11 labs for example they provide
a highquality text to speech model that
supports 32 languages including Spanish
German French Italian Russian and more
the next question is what does it cost
well it depends on the tool you use some
of them may charge a monthly
subscription fee others can charge per
minute so they offer a pay as you go
model right uh you also have to consider
a few factors like technology providers
for speech recognition for text to
speech for uh llm processing but let's
say we use VY for the system
infrastructure GPT 3.5 turbo as the llm
and 11 labs for text to speech in that
case you'd be looking at 13 cents per
minute or 65 cents for a 5 minute call
another question is how customizable or
how flexible it is to meet your business
needs I already mentioned this you can
fully customize the behavior of the butt
and you you can integrate it with your
website with your CRM system Calenders
anything that has API endpoints this is
what we do at our agency we create these
systems and we build custom Integrations
with your business so again if you need
a solution like that definitely reach
out to us the last question is about
speech recognition and robotic sound as
I said the technology is evolving
rapidly it's already much better than it
was just a few months ago and it's only
becoming better and better as as we move
forward its ability to recognize speech
handle pauses and understand when you
want to interrupt it is already
impressive robotic sound is no longer an
issue I'm telling you it's it's becoming
increasingly hard to tell the difference
between these systems and the real human
so don't judge based on an interaction
you had with an AI voice bot along time
ago the technology has changed and you
don't know how that bot was built as
さらにアンロック
無料でサインアップしてプレミアム機能にアクセス
インタラクティブビューア
字幕を同期させ、オーバーレイを調整し、完全な再生コントロールでビデオを視聴できます。
AI要約
動画コンテンツ、キーポイント、および重要なポイントのAI生成された要約を即座に取得します。
翻訳
ワンクリックでトランスクリプトを100以上の言語に翻訳します。任意の形式でダウンロードできます。
マインドマップ
トランスクリプトをインタラクティブなマインドマップとして視覚化します。構造を一目で理解できます。
トランスクリプトとチャット
動画コンテンツについて質問します。AIを利用してトランスクリプトから直接回答を得られます。
トランスクリプトをもっと活用する
無料でサインアップして、インタラクティブビューア、AI要約、翻訳、マインドマップなどをアンロックしてください。クレジットカードは不要です。