< Back

Now Playing

Nathan French from Mia.inc #1

Okay.

So, Nathan, the first question that I wanted to ask you is when you're building with voice AI, so I don't mean to say that like an AI, when you're building in voice,

if you could wave a magic wand at anything to make it better, easier, faster, less.

Yeah.

What would you be waving the magic wand at?

Yeah, that's a good question.

There's.

There's a lot of things

at the moment.

I would say evals is one of the more difficult things to kind of.

Pinned down and getting

reliable ways to assess how well your agent's doing in production is surprisingly difficult

because it's like the models are so non-deterministic you end up getting a lot of issues that like you wouldn't have expected if you just manually regression test something.

So I think

just having an eval system where you can test tens of thousands of calls across a broad set of test cases and have that actually mimic what you'd see in production.

And like if you make an improvement, have like a simulated test suite for that improvement that you just made to make sure that it actually improves whatever sort of like thing you're trying to measure.

But yeah, there's a lot of challenges within eval specifically just because a lot of the time there isn't like a set metric that you're trying to do.

There are some examples where like if you want the agent to

book a test drive or something, then it's very objective.

This tool call was called.

Did this action actually happen in the back end is a very objective thing.

But there's a lot of like not objective things that are really hard to actually like put a value on that makes sense.

Yeah, is there anything that you've tried like where you kind of got got to with that.

Yeah, yeah, yeah, there's there's a lot of things

there's

I I have I use this thing called deep eval which is just like a pretty standard

Pytest-based LmEval framework, but it's not really meant for voice.

So we just use it on the transcripts themselves and only deal with like text-based evals.

And then we're using this thing called Hamming that's, I think they went through YC a couple years back.

But they,

sorry.

Well, sorry, I'm just looking them up.

Hamming, you said?

Yeah, yeah.

H-A-M-M-I-N-G-A-I.

And it's founded by a guy named Sue Minyu, but super smart team.

But they're,

like the problem they solve is this eval problem on voice agents, but it's just like a very difficult problem to solve.

So we're implementing them now, but we'll see how much of a lifesaver they are.

Hopefully they help a lot, but

yeah, there's so many cases to cover and it's hard to get like 100% test coverage in real life, like in production.

And so just kind of almost like asking the question again, so based on, evals and what you kind of have at the moment.

If you could wave a magic wand at what you have, what would you change sort of right now?

Yeah, I mean, there's just a lot of things that we haven't been able to get a good

way of testing.

So, for instance, like, pronunciation issues have been a huge thing recently with a lot of our customers because if you're an accurate dealership and you're just implementing our solution, then and like the our system is mispronouncing your dealership name, then it's like one of the biggest, it's like why would you buy the solution?

It's like embarrassing because it's a bad reflection on them.

So it's like how do you test for bad pronunciations in a way that you can guarantee that like the model is going to pronounce things right?

Because text-to-speech models like Cartesia are pretty unreliable and they have a lot of issues that

you just can't really like fix.

So yeah, I'd say like, There's just so many

tests that we don't have.

So the magic wand would be like, if I could have a

metric that we can test against and try to optimize

across the board, that would be fantastic.

Have a metric for how well the model is pronouncing

I don't know, like a set of keywords or something.

But yeah, there's a lot of things that are really hard to test for.

So does that answer the question?

Yeah, that's really, really valuable.

If we were able to deliver

this and you could kind of test pronunciation and really feel confident in that, that value that you were getting tens of thousands of

development match production and how would it change your life?

Yeah, I think most of it would,

it would make us a lot more certain of

the guarantees of the system.

Because right now we have to provide SLAs and we have to provide certain guarantees that for a lot of things,

they'll receive the service that they're asking for.

With these systems, we can build the most reliable thing, but the fact of the matter is we're relying on external providers in some ways.

We're always going to have to provide or we're always going to have to use external providers like Partisia for text-to-speech, Deepgram for speech-to-text.

Just having

more

confidence that the model is actually doing what you want it to do and our voice system is doing what we want it to do.

Because right now, we manually regression test a lot.

We have a couple engineers whose time is dedicated to calling the test numbers and just calling it probably hundreds of times.

So it would just save us like a lot of development time is the main thing.

Yeah, in addition to giving us like confidence and everything.

But yeah, it's a big problem.

Considerable time saving then if you could automate that.

Yeah, for sure.

For sure.

Yeah, it would be

easily hundreds of hours a week.

Probably good for your developer retention, I guess.

Yeah, luckily we haven't had issues with that, but

yeah, yeah, yeah.

But for sure, I,

yeah, I'd be working on it, not having to test it.

I'll say that.

Is there anything about this, this is my son, Pine the Sky, is there anything about the world now that makes this more valuable than it was like, say a year ago?

Yeah, yeah.

I think like voice AI in general wasn't really feasible a year ago just because the models were so much more unreliable that people, and it was so new as well, people are kind of scared to implement something so different.

Yeah, there's a lot of social reasons why.

I think people have worries that their customers aren't going to receive talking to AI as well as a real human.

So there's always like, I don't know, like with things this disruptive, I think there's typically some like tentativeness in implementing it.

So I think like now we're finally seeing the models get good enough where people like talking to them and people can actually see them solving big problems that they have.

So for dealerships, it's like inbound lead generation.

Is a huge thing.

And so normally, like with car dealerships, a lot of their, the people calling in, just the, the dealership team doesn't have enough capacity to actually field those calls.

So those people will just go to a different dealership and take their, their money elsewhere.

So I think there's Niche cases like that where there's just a massive benefit to having even a simple agent that can at least field those calls and schedule an appointment.

I just think that

the models are so much better now that it's possible to have voice AI that does something useful.

Whereas before with things like structured output, even structured output wasn't reliable at all.

So you couldn't really interface with LMS's functions.

You just kind of had to hope and sort of use them more like chatbots.

But.

Yeah, there's been a lot of, yeah.

Sorry, sorry, I interrupted you, Nathan.

Oh, no, no worries.

I might ask you, sorry, this will be very loopy.

I'm in a four loop basically.

But

you had evals as the number one thing that you would wave your magic wand at.

Is there anything else that would be like, you know, come after that?

Yeah.

Just in terms of like the voice stack?

Well,

yes or yeah, well, anything that's causing you a lot of problems right now as an engineer?

Yeah.

Yeah.

There's a lot of like small, details, but I don't know, like, how specific you want this to be.

Yeah, I'd love to hear anything that's on your mind.

Yeah.

Eval is definitely the big one.

What else?

I mean, I think, like, latency is always a consideration.

So.

How do we.

Everything kind of ties back to evals, though, because, like, you can optimize models like you can.

Take open source models, quantize them.

You can do so many things for latency, but it all ties back to evals because you can try out a million things.

And then if you know that the performance isn't degrading past a certain percent, then you can accept the latency gains where a lot of the time,

I can complain about latency, but if we had super strong evals, then I could test out

as many different methods as I want to reduce latency.

And then I can be 100% sure that there isn't going to be some sort of numerical instability in the models or I can be 100% sure that there isn't some sort of downstream effect that's being caused by the faster systems.

But,

Yeah, I mean,

I think there's a lot of challenges specifically in the pipeline I'm working with because it was built very

non modularly.

So like we have one LM instance that's like hard coded as like this provider and that provider has downtime and then we're just kind of fucked.

So we have, like, a load balancer across a bunch of different regions of the same model, but it's, like, a single model.

So it's.

It's.

It's just, like, really ugly.

And making that modular is kind of, like, a huge pain because there's so many touch points.

But I wouldn't say that's, like, a problem that I just don't know how to fix or something.

It's kind of like a.

Like an annoyance is having everything like hard, because everything moves so fast.

There's probably going to be a new state of the art LLM in like two weeks, three weeks.

So just having the ability to swap in LLMs for each other and just instantiate a different LLM and then going back to the evals, run an eval to make sure that your tasks still work properly with a different LLM is super valuable.

But yeah, does that make sense?

It's like a second problem.

Yeah.

Where are you currently learning about building with voice and what resources are you using?

I kind of just had to fuck around and find out for a lot of it.

I'd say the open source community I'm most active in is Pipecat.

And I think they have a really strong group of developers that, and I think they're, they have a lot better documentation now, so you may not have to around as much to find out the same amount.

But,

yeah, I, I think the only way to really do this type of thing is by actually hosting a model in the clouds or actually hosting a pipeline in the cloud somewhere and just like at least internally testing it and seeing what happens.

Because there's there's no way, I wouldn't say there's like a great way out there to just like learn about it because it's so new.

Like I don't think very much,

I would say almost nothing I learned was like from definitely not one consolidated resource, but it's kind of like you see some like Twitter post about like some weird thing and you're just like, oh, that seems like maybe it would be useful here.

And then you try it out and it sucks or you try it out and it's awesome.

But I definitely wouldn't say there's like a consolidated resource, but Typecat's like the one that I would say would be the closest to a consolidated resource.

Yeah.

Okay, that's really, really helpful.

Those are most of the questions that I had, and this is like very, very valuable.

One very rogue question, we're putting together some swag for the technical advisory board.

Nice.

What's the best sort of swag?

Dude, I would say hoodies.

I'm a big hoodie guy.

I like hoodies,

shirts.

Kind of.

Yeah, kind of anything.

I mean, I I would say my go-to hoodies, but I'd say anything.

Okay.

Would you want it to say, like,

what kind of, like, would you want it to say something, like, about, like, voice or, like, technical Advisory board or just, like, our logo?

Like, what sort of.

Thing.

Yeah.

What would you actually wear?

I think.

Yeah.

Oh, I see.

I I have no idea.

I think.

Yeah, I have no idea what it would even look like, but if you.

You can send me, like, designs.

If.

If you have someone I can, like, I think right now I just have, like, no idea.

Yeah.

That's what it would look like.

It's a good.

Yeah, but you got a few ideas in mind.

That's why Jack's asking.

Yeah.

Okay, cool.

Yeah, yeah, yeah, yeah, yeah.

If you have some ideas, like.

That I would definitely provide my opinion if you send me some.

Okay, sweet.

I have a quick question.

Yeah.

Like, you know, walking around and finding out has been your adventure and how to actually figure out how to build this stuff.

Yeah.

I think, like, you know, they do a really good job with a lot of the docs and stuff.

I'm wondering, because we're obviously thinking like, as you said, this is super early and like, I think there's a opportunity for us to do some education and we're just thinking about like the medium that makes sense for people.

Are you like, yeah, stuff, newsletters, podcasts, video, like combination of those things?

I mean, if we were gonna create some content, like, it kind of, this is the same question as the hoodie question.

It's like, yeah, yeah, yeah.

Actually, there'd be a chance you'd be like, you know what?

Actually, that is a good that's how I would like to learn, you know?

Yeah, yeah, yeah.

I definitely know how to answer this question better than the hoodie question, but I would say, I would say YouTube would definitely be a good resource.

There's a lot of communities though.

I would say like open source communities are kind of like the gold

thing to

interface with other people building this space and have like educational resources.

Because when people are first trying to build

products, most people aren't just going to be like, oh, fuck it, I'll spend like 5K a month on some enterprise voice orchestration platform.

Most of them are going to be like

how to build voice AI or something.

And it'll come up with the open source repositories online like Pipecat or LiveKit.

And then they join those discords, they join the, the Twitters, maybe if they have a slack, and they'll just kind of collaborate on resources there and, and learn there.

So I'd say, like, it depends on what exactly you're trying to do.

Like, if it's to extend your company's reach and get, like, feedback on a product, I would definitely say some sort of

open source community?

Yeah, I think it's like there's two, this, it's honestly, it's kind of like three pieces to this.

One is like, you know, for our own learning, what do we wish existed when we were kind of getting into this, right?

Like creating that.

Second is there's obviously like an audience and community building component for us, but the third is that that all ultimately leads to at some point, hopefully some of these people might want to try our product, but I think Thinking way before that, because I know with developers in particular,

it's not like a traditional funnel.

Obviously, people just pull out, they might never come back.

You might actually just provide a learning resource that someone who on the weekend is into voice actually finds value in, but they may never need to build voice.

They have a friend later who asks them about it or something.

And so I think, yeah, yeah.

We really would, if we're going to, and we are going to be creating more content, but it's like, you know, it really has to be valuable.

And so how can we actually really teach people about this, you know, or some aspects of this is kind of top of mind.

So that's really, really helpful context just on your preferences there.

And I think, yeah,

yeah, definitely with the voice stuff, I think the most active communities are on Twitter and Discord in like voice AI groups.

But in terms of like general public, it's always different.

So like if you're trying to market towards more general public, there's actually a lot of people on TikTok that like this type of content and kind of like absorb AI content on TikTok.

But I don't know, it's definitely not my forte, like marketing.

So.

That'S why you're like obviously getting value from content that's out there.

So it's a really good perspective.

Yeah, yeah, yeah, yeah, yeah.

I would say

most of the content I do kind of

stay away.

I don't consciously stay away from it, but, like, I feel like a lot of it isn't very well informed because it's such a new space.

And so people

are kind of, like, experimenting with it, like, thinking this thing is cool, but don't really know too much about what they're talking about, which is, of course, like, fine.

Like, that's.

That's, like, the way it is.

But it's been down the mines for a while and so.

Yeah, yeah, yeah.

So most of the stuff I look at now is like general like math and machine learning type stuff on Twitter.

Or like reading research papers on like specific architectures or stuff like that.

But

most of it's from Twitter.

What's your like angle?

Is it curiosity or like when you read like a machine learning paper?

Yeah curiosity or you're thinking that it.

Gives you yeah, for sure.

Yeah, a lot of it's curiosity which is like tied into

Like I was a math CS and I get technically finance undergrad, but math was like the thing I like the most so I think there's definitely like an aspect of mathematical beauty that I like seeing a lot on my feed and it just like piques my interest.

So

yeah, I'd say like curiosity that would probably be like the main thing, like wondering how people built some sort of like technical architecture, how is one type of new machine learning model works.

But yeah, which I wouldn't say is like mainstream content for sure.

So that's why I think my opinion's a little bit skewed and may not be as helpful.

But yeah.

Any people stand out that you like to watch?

On the research side?

Yeah.

Yeah.

Yeah, yeah, yeah.

On Twitter, there's a couple.

The first one that comes to mind, his name is Francois Chollet.

Yeah,

yeah.

He's like the serious reason.

He's like a real researcher, right?

Yeah, yeah, yeah, yeah.

He's a researcher.

He founded Arc AGI, the benchmark, and he was also the founder of the machine learning package, Caris.

How was it?

Caris.

Yeah, yeah, yeah, yeah.

He works at DeepMind now, but he always has really good content.

But I think that sort of people, that sort of person is the most

captivating for my attention.

Okay, you're going straight.

Yeah, yeah, yeah, yeah.

Yeah, I think we're coming towards the end.

So.

Is it okay if I send you another, I've got another Calendly that's just got a kind of it can like, if there's a time that would work for like sort of once a month for like sessions.

Yeah, for sure.

Okay, amazing.

And we're gonna reach out as well with like

what we learn as we learn it and also probably at some point to like get your maybe, I don't know if we're gonna get hoodies.

Sounds like you've got a strong vote in here.

I think we should get hoodies.

We should get hoodies.

And stuff as well.

Awesome.

Yeah, yeah, yeah.

I'll be looking forward to it.

Okay.

Nathan, thank you so much.

That was incredible.

Incredibly, incredibly useful.

Yeah.

Yeah.

We'll try and provide some value to you as well.

So thank you so much.

Yeah, of course.

Thank you all.

Have a good one.

Yeah.

Great to meet you, Nathan.

Yeah, yeah.

Good to meet you too.

All right.

Bye.