トランスクリプトEnglish

AI Voice Agents: Full Guide from Beginner to Pro

1h 42m 43s17,108 単語2,384 segmentsEnglish

全トランスクリプト

0:00

hey guys welcome back to the channel in

0:02

this video I'm going to provide you with

0:03

a complete guide on AI Voice assistance

0:07

I'm going to answer some frequently

0:09

asked questions break down how AI voice

0:12

agents work today and cover some use

0:15

cases for businesses however the biggest

0:17

part of this video will be dedicated to

0:20

reviewing and comparing the three best

0:22

tools for building these assistance and

0:26

of course I'll include some practical

0:27

tutorials that's for sure so that by the

0:30

end of this video you'll be able to

0:32

build these AI voice Solutions yourself

0:35

if you're new to the channel my name is

0:37

Bogdan and I run an AI automation agency

0:40

called bosar agency we specialize in

0:43

providing ai chatbots ai voice systems

0:46

and automation solutions for businesses

0:48

so if you're looking for an AI color for

0:51

your business feel free to get in touch

0:53

at bosar do agency I've been in this

0:56

space for quite a while now and we

0:58

receive a lot of feedback on how people

1:01

perceive this technology today so you

1:03

know there is a common skepticism

1:05

towards AI based Voice assistance and I

1:09

think it's important to address that

1:10

upfront so I'll go over the main

1:13

concerns and questions just to make sure

1:15

we are all on the same page when it

1:17

comes to the types of solutions we are

1:19

discussing and the tasks we are

1:22

considering them for so the general

1:24

perception is that AI is going to change

1:26

a lot but today it's not there yet it is

1:29

for frustrating for customers to talk to

1:32

robots um they prefer human interactions

1:35

these AI voice assistants can't really

1:37

handle complex tasks Etc so long story

1:40

short AI colors are not there yet and

1:43

human interactions are just better and I

1:45

agree with that human interactions are

1:48

better AI is not here to replace humans

1:51

at least at this stage but it is here to

1:54

help us automate all the generic and you

1:57

know repetitive tasks such as answering

2:00

FAQs scheduling appointments redirecting

2:03

the calls to to the right Department uh

2:06

troubleshooting for these kind of tasks

2:08

AI is doing a great job many businesses

2:11

have implemented these Solutions and

2:13

they're saving a lot of money by not

2:16

wasting human resources on generic tasks

2:19

and it it worth noting that even these

2:22

basic tasks were not handled well by

2:25

older technology a good comparison here

2:27

is chatbots we used to have old school

2:30

rule-based chatbots that were pretty

2:32

limited they were predefined with a set

2:35

of rules and they couldn't handle

2:36

anything outside those rules but now

2:40

those chatbots are powered by AI right

2:43

and they use large language models to

2:47

understand user's query and then

2:48

generate an answer rather than just

2:50

spinning out a preset response the same

2:53

thing is happening with voice technology

2:56

we used to have those awful interactive

2:58

voice response systems like ivr systems

3:01

that work like a phone tree where

3:03

everything was preconfigured press one

3:06

for this department press two for

3:08

another and so on but now we have ai

3:12

powered voice assistants that use large

3:14

language models to understand users

3:16

inquiry and then generate an answer on

3:19

the spot without the need for

3:20

preconfigured options the output is

3:23

generated by AI based on the user's

3:25

input making the whole experience much

3:27

more Dynamic and effective okay this

3:29

little flowchart illustrates how

3:31

voice-based AI systems work it starts

3:34

with voice input which is converted to

3:37

text using speech recognition technology

3:40

the text input is then processed by the

3:43

large language model to generate the

3:46

text output and then this text is

3:48

converted back to speech using text to

3:51

speech technology resulting in The Voice

3:53

output of course it is incredibly

3:55

simplified in between it probably use

3:57

some kind of a logic platform to manage

3:59

the whole process but for now I want you

4:02

to understand that it takes audio input

4:04

converts it to text for processing and

4:07

then it converts the text output back to

4:10

audio that how it works right now there

4:12

is no direct audio input and output so

4:15

for example if you breathe heavily it

4:18

wouldn't be able to understand it

4:19

because it is hard to convert your

4:21

breath into text right however open AI

4:25

has released direct audio input and

4:27

output for a handful of users and it is

4:29

expected to be fully released by the end

4:31

of the year and in my opinion it will

4:34

make the communication with this

4:36

voice-based assistant even more

4:38

realistic and hence it will further

4:41

increase the demand for these kind of

4:43

solutions but for now I like to think of

4:45

it as an AI text based chatbot which

4:48

you're probably already familiar with

4:50

the only difference is that there are

4:52

two extra steps speech recognition at

4:55

the start and text to speech at the end

4:58

if you've already built AI text based

5:00

chatbots maybe using voice flow or other

5:03

chatbot Builders the same logic you can

5:05

apply here you already have the skills

5:07

you just need to switch to tools that

5:09

support AI voice-based assistance and

5:12

I'll show you exactly how to do it later

5:14

in this video but before that let's

5:16

answer the most common questions right

5:20

away is it usable yet yes businesses are

5:24

actively adopting these Solutions I'm

5:26

impressed by how much platforms for

5:29

building these voice systems have

5:30

improved over the last few months you

5:33

can now control latency you can add

5:35

filler sounds like H you know to make it

5:38

sound more human plus you can Implement

5:40

custom functions allowing this

5:42

assistants to not only answer questions

5:45

but handle tasks like scheduling

5:47

appointments creating CRM records making

5:50

outbound calls qualifying leads and much

5:53

more I've seen people implementing these

5:55

solutions for real estate agents law

5:58

firms dentists roofing companies and

6:01

many more really any business that has

6:04

quite straightforward workflow and

6:06

handles a good number of calls each

6:08

month can benefit from this technology

6:10

the second question is can it speak

6:13

other languages than English we've had

6:15

leads requesting bots in Spanish in

6:18

Czech and the answer is yes yes it can

6:21

but the quality depends on the language

6:23

and on the provider on the technology

6:26

provider for text to speech right but if

6:29

we take 11 labs for example they provide

6:32

a highquality text to speech model that

6:34

supports 32 languages including Spanish

6:37

German French Italian Russian and more

6:40

the next question is what does it cost

6:42

well it depends on the tool you use some

6:44

of them may charge a monthly

6:46

subscription fee others can charge per

6:48

minute so they offer a pay as you go

6:51

model right uh you also have to consider

6:54

a few factors like technology providers

6:56

for speech recognition for text to

6:59

speech for uh llm processing but let's

7:02

say we use VY for the system

7:05

infrastructure GPT 3.5 turbo as the llm

7:09

and 11 labs for text to speech in that

7:12

case you'd be looking at 13 cents per

7:15

minute or 65 cents for a 5 minute call

7:18

another question is how customizable or

7:21

how flexible it is to meet your business

7:23

needs I already mentioned this you can

7:25

fully customize the behavior of the butt

7:29

and you you can integrate it with your

7:31

website with your CRM system Calenders

7:34

anything that has API endpoints this is

7:36

what we do at our agency we create these

7:39

systems and we build custom Integrations

7:42

with your business so again if you need

7:44

a solution like that definitely reach

7:46

out to us the last question is about

7:48

speech recognition and robotic sound as

7:51

I said the technology is evolving

7:53

rapidly it's already much better than it

7:55

was just a few months ago and it's only

7:57

becoming better and better as as we move

8:00

forward its ability to recognize speech

8:02

handle pauses and understand when you

8:04

want to interrupt it is already

8:06

impressive robotic sound is no longer an

8:09

issue I'm telling you it's it's becoming

8:12

increasingly hard to tell the difference

8:15

between these systems and the real human

8:17

so don't judge based on an interaction

8:20

you had with an AI voice bot along time

8:23

ago the technology has changed and you

8:26

don't know how that bot was built as

さらにアンロック

無料でサインアップしてプレミアム機能にアクセス

インタラクティブビューア

字幕を同期させ、オーバーレイを調整し、完全な再生コントロールでビデオを視聴できます。

無料でサインアップしてアンロック

AI要約

動画コンテンツ、キーポイント、および重要なポイントのAI生成された要約を即座に取得します。

無料でサインアップしてアンロック

翻訳

ワンクリックでトランスクリプトを100以上の言語に翻訳します。任意の形式でダウンロードできます。

無料でサインアップしてアンロック

マインドマップ

トランスクリプトをインタラクティブなマインドマップとして視覚化します。構造を一目で理解できます。

無料でサインアップしてアンロック

トランスクリプトとチャット

動画コンテンツについて質問します。AIを利用してトランスクリプトから直接回答を得られます。

無料でサインアップしてアンロック

トランスクリプトをもっと活用する

無料でサインアップして、インタラクティブビューア、AI要約、翻訳、マインドマップなどをアンロックしてください。クレジットカードは不要です。

    AI Voice Agents: Full Guide fr… - 全文書き起こし | YouTubeTranscript.dev