Now Playing
Nathan French TAB 2
Hello, Nathan.
How you doing?
Good.
How's it going?
Yeah, not too bad.
Not too bad.
Yeah.
Back in London now, so.
Oh, not.
Where were you before?
So I was visiting Canada because Aiden's in, like.
Victoria near Vancouver.
Oh, okay.
Far West London.
No.
So most of the, like, there's an engineer.
It is kind of weird.
There's, like, an engineer in Glasgow in Scotland, and then there's me in London.
Damian, the CEO, is in, like, outside of London, like, near kind of near Brighton, if you've got.
I don't know how.
Okay, yeah, yeah, yeah, yeah, yeah.
And then we have one engineer in Egypt and one in Switzerland.
So it's pretty.
It was like kind of like EMEA, like Europe, Middle East, I guess.
For sure.
West Coast Canada.
So that's crazy.
Yeah, that's very remote.
Yeah.
Yeah.
How have you been?
Good.
Yeah.
Same old, same old.
Not too much excited.
I moved into a new place.
Oh, sweet.
But yeah, we're in that same old stuff.
Nice, nice.
That's fun.
You settled in or is it all just chaos?
It's been slow.
I just got a bed frame because I was just sleeping on a mattress on the floor.
Did you're a meme?
Yeah, the classic.
Did you get a lamp or a chair yet?
No, I don't have a lamb.
He's got a say, he doesn't need a.
No, but I do have a bed frame and a chair and a couch.
So I'm pretty much set now.
Yeah, it's been slow, but we're working on it slowly.
That's awesome.
That's awesome.
Yeah.
By the way, I'm recording this just because I want to save if that's okay.
Cool.
Yeah, yeah.
Yeah.
Because the last few calls, I've just been like half the time I've been like locked in on the wrong Zoom account and then I'm like, oh shit, hang on one second, let me just come back.
So anyway, that's going if that's okay.
And yeah,
Yeah, sorry.
So is it okay if I just, so basically, what I wanted to do is just like show you some of the stuff that we've been pulling together.
And this is like, sorry, this is like from what people have said, these are some of the themes that have come out.
And I just wanted to get like your reaction to it to just start to figure out like, and it's very,
this can be a bit chaotic about how we've put this down because it's really hard to say like, oh, this is the big theme and this is so.
Yeah, yeah, yeah.
Anyway, without further ado, I will just share it.
Okay.
Yes.
So, I know that says four, but these are some of the ones that we've been finding and I just want to, like, get a reaction to some of these.
Yeah, yeah, I think that's definitely accurate.
Yeah, it's kind of, I mean, they're for like the conversations feeling natural part, I'd say like a lot of
things like revolve around that, but it's just like a super hard thing, but it's accurate.
It's just like a very wide net, I guess.
And do you think that, so if we start with like pains, would you say that this is accurate on like your top three or four pains?
Like, and does it match your ranking?
Yeah.
Yes.
I think that now it's shifted a good amount in the last month
because I implemented this thing called Crisp that basically amplifies the signal or it removes the background signals and kind of helps the speech to text engine pick up on voice signals and transcribe lower quality audio.
That's actually fixed a lot of stuff for us and a lot of problems that we were having.
Oh, sorry, it's Chris with a K.
Sorry.
Yeah, I've heard really good things about Chris.
Yeah, I would highly, highly recommend.
It.
Yeah, it definitely, I don't know how I felt about it before.
I had heard some good things, but it definitely exceeded my expectations, especially with our use case, because a lot of people are calling in on the move where there is a lot of background noise.
But I'm definitely aware that not all use cases have a lot of background noise or people calling in with background noise.
But
yeah, number two is.
So that might not be number one now, basically.
Yeah, I would say that definitely would.
I mean, I don't even know if I would call that a huge, I mean, I'd still maybe call that a pain, but below
the turn taking.
So, yeah, like, on the bottom of the list, because it still does happen, but now it's like.
Usually.
Just the person's talking super quietly,
and I'm actually investigating some stuff with.
I think it's AI Acoustics.
Yeah, we're talking to them, actually.
Yeah, we're gonna.
Yeah,
yeah, yeah, yeah.
But they.
Because crisp only removes background signals, whereas aiq6 removes background signals and amplifies the actual voice.
So on
issues where people are talking quietly, in theory, it'll, like, amplify that and still be able to, like, extract a signal from it.
So.
That I would assume that would pretty much like solve the transcription issues
that we're having.
But that and like keyword prompting with deepgram too.
But.
Yeah, I definitely don't, I wouldn't say like as of now it's a huge issue that I'm like worried about.
What would you say is your number one?
Yeah, or were you asking me or Aiden?
Sorry, you, Nathan.
Yeah, yeah.
I was making sure.
I would still definitely say
Evals is like the biggest thing because we implemented deep ground flux, which is like a, I'm sure you're aware, but it's like a speech to text with a turn taking model built into the species text layer.
And I implemented that and like we're using it for a couple of our production clients just because it's a sandbox URL and we don't want to like just put it on everyone just in case it goes down.
But we have it on like some low priority customers just to like see how it works in the real world.
And it's just like we had a I don't think it was due to flux, but we had some regression in the voice stack and the.
Our database was just, like, locking, and no one.
It's been, like, the whole weekend, basically, and no one really knows why.
And it's something like.
I don't know, it's.
It's something like async related.
I think something's blocking the event Loop, but.
No one really knows.
So
I think observability into what's going on in the voice stack is still
the number one thing, which, I mean, this is more like traditional software engineering, I guess, but there's still elements of the voice stack and there's some things that are fully voice stack.
Evals into pronunciation would be something that's fully voice stack.
But yeah, the observability stuff is just such a huge pain.
Yeah.
What have you tried there actually on the observability?
So we use this thing called Hamming.
It's just like we hired someone to take care of it and it didn't work out.
He, like, passed it on to someone else, and it's just like,
the main.
I guess we're just, like, hammering it out, and there's, like, a couple Kinks, like, with hamming.
The idea is, like, you have a bunch of these agents running in parallel that are just, like, stress testing your thing, and.
We have like text to speech concurrency limits.
So when we're running these tests, like it obviously like hits the concurrency limits pretty quickly.
So
so yeah, there's just like a couple of kinks like that that have slowed down the development of that.
But.
Yeah, Interesting.
Yeah, that's that's the only thing we've done so far though.
And then I want to get us.
On.
Deep eval, which is like a pie test like LLM eval thing, but that only works on transcripts.
So it
it's like good enough because the voice stuff is obviously just like working off of transcripts, but It's still not ideal because it wouldn't pick up stuff like pronunciations, etc.
Yeah.
Makes a lot of sense.
Never.
That's like such a weird thing that I would not have thought about, but like how concurrency would be like a massive issue in like tests.
It's just like, yeah, yeah, yeah.
It makes sense, I guess.
Or extremely slow to run, I guess.
Yeah.
It's, yeah, it's pretty annoying.
It's pretty annoying for sure.
But yeah, I mean, I can't even think about like how, I mean, I guess the best way to do it is just like not run as many concurrent tests and just have it run like nightly or something.
But.
We still haven't gotten that figured out fully.
But.
Yeah, kind of on that same note too, with the,
like the second point, the like hard to reliably know how well conversations are doing.
There's not really any benchmark that makes sense for like, I know Deepgram tried to make one called like the Vaki, like VAQI, but it's, I think voice is like one of those things that is really hard to figure out, like logical benchmarks.
So that also makes it hard because it's not like
it's very non-trivial to make benchmarks that actually align with human preference with voice.
Yeah.
I was surprised when I was reading about some of the way they developed.
I don't know if that's what said to you already, but like reading about how they were doing, like, how to actually create the models and like a lot of it, like it is a gold standard, right?
So I should just ask like humans, like, do you prefer this?
Yeah, I generate this out of ten.
Like, yes, it sounds natural.
Yeah.
Yeah.
And it turns out that like humans don't agree with each other on those preference benchmarks like over 50% of the time.
So
there's a research paper.
I forgot how long it was.
It was kind of a while, like a few months ago, but It was saying that like using an LM as an evaluator is actually more consistent with human preference than a human, like guessing if that would line up with human preference.
So like an AI is more likely to agree with a human preference in a lot of domains rather than like two humans.
Oh, if they agree with each other.
So like they're kind of it's closer to the mean or whatever.
Yeah, yeah, exactly.
Yeah.
Yeah.
Which is like really funny to think about, but I guess like there's a lot of difference in human preference and these sort of things.
Yeah.
Yeah.
Wow.
That's cool.
Yeah.
I'm not sure like that.
There's that guy, though, like the vibes guy that was like the
producer.
Anyway, wait what?
You know, there's that meme where there's that producer that's got like the good taste.
I can't remember his name, but the guy with the big beard.
Oh, yeah, yeah, yeah, yeah, yeah.
He's like, what are you giving me?
And he's like, absolutely nothing.
Unless you have, unless you find that human.
Okay, wait, so what, so I think, so we've got your, so your number one
is evals.
What would you say is your number two problem, pain?
Number two pain?
Yeah.
I mean, as a company or just in the voice stack specifically?
Well, kind of curious on the company if you think it could be relevant on the- yeah, yeah.
I mean, I guess this is super specific to the industry, like the car dealership industry, but
onboarding customers and the customer relations, aspect of how
car dealerships work is really difficult to get right.
We basically need a dedicated Ford deployed engineer for every dealership.
And relationship building is so important because we were
toying around with the idea of making a voice agent that would just onboard customers because our, our onboarding process is, like,
very long, like months long.
And there's just a lot of back and forth between, like, our team and the dealership.
So we were toying around with an idea of just making a voice agent that would speed up the onboarding process, but it's, they They just like really do not want to onboard that way.
And so it's really hard to like kind of hit a flywheel where we're like automatically onboarding people where they can just like sign up and just like go.
It's, there's still like a super slow customer onboarding process.
Um, so it's not really clear, like, how to best do that except just like hiring more people.
But yeah, I think we'll definitely have to figure something out for that.
But our competitors are in like the exact same boat with onboarding as well.
So it's definitely not just like an us thing, but it's something I wouldn't have expected because I would think that you could just build a voice agent or something like that that would just like collect some subset of or some set of information and it would just work.
But yeah, it doesn't.
Interesting.
Do you think there's any technology that could help you with onboarding?
I think like not on the voice, not on the voice side.
I think most of it would just be like developing tooling that are customer service team can use where they can just click a button on the dashboard and it just automatically
fill something out over, I mean, maybe something that would take the transcript of a call and just trigger some tooling to fill in information from the transcript or something.
But
it's just like a hard problem because there's so much back and forth.
With dealers and a lot of times like the dealerships don't know what systems they use because like someone set it up like 30 years ago and those people left.
So it's like they don't know what CRM they use and it's like, okay, well, we kind of need that to move forward.
But yeah, that's one of our bigger, I guess like pain points.
It's like a very manual process.
Now.
Yeah.
I can definitely say you're not alone on that.
I think this is something that we're hearing quite a lot, actually.
And.
Oh, really?
Yeah.
Yeah.
Is it Integrations?
Is that part of it?
Like you mentioned, CRN.
Yeah.
Yeah.
We.
We have a lot of Integrations, and every dealership has, like.
Obviously like different integrations.
And
so we integrate with all of them now, but it's just the dealerships, like I was shocked.
Like so many dealerships just don't know like what technology they use.
And so it's like we have to be like, okay, like this is how you check.
And if it's over email, like that correspondence can take a very long time just like going back and forth.
So our strategy now is just like trying to get on a call with them like as quickly as possible and just doing everything over a call where you can just like bang out everything in like the shortest number of call sessions as possible.
Cause the email stuff takes like so long to like every single time something has to be done over email.
You just like, it takes a minimum of a week or two.
So it's for like even simple things.
So
yeah, but it would be nice if we could have like ultra realistic voice agents that that like literally are indistinguishable from humans.
But I think that would kind of solve a lot of our issues here.
Yeah, that's that's the challenge here.
It feels like.
It's a lot of human
challenges, I don't know.
Yeah.
Interesting.
Very, very interesting.
If you had to say a number three, would you have a sort of a number three in mind?
Number three.
Pain.
I mean,
most of them are like smaller things.
There's like a lot of small things, I'd say.
I wouldn't say there's like any
huge problems that,
I mean, the LLM would be
one thing.
Azure's OpenAI endpoints are just getting slammed and we're seeing time to first tokens with three seconds sometimes.
I built an LM gateway that can just route responses using a load balancer.
If any requests are taking a while, it'll just start routing requests elsewhere to the lowest latency endpoints.
But.
But each.
Each LM has, like, different little quirks.
So Gemini, it'll.
I mean, they're.
Each one has, like, super small quirks about it that make it so it's, like, technically compatible, but you still have to do more work.
So, like, with Gemini.
What was the issue with Gemini?
Oh yeah, it was just like a function call to return inventory.
Someone would ask, Hey, do you have any Kia Souls in stock?
And it would run the inventory tool call and it would get a bunch of information.
And let's say there is a Kia Soul in SOC and it's like a 2024 or whatever, then it would just.
Say.
2024, Kia Soul.
And it's just like not a human like way of talking.
And there's no real way to know that it would like, it's just super random, but it's just like something that you have to prompt away.
Like make sure to, I don't know,
basically talk like a human.
But
there's so many little quirks that you'd think that just making a load balancer and just like distributing traffic would just work because you'd think that all LLMs respond like about the same.
But.
There'S actually a lot of variation in how like each type of LLM responds and each one has like its own sort of tone that may or may not be what you're looking for in your agent.
So
it's kind of hard.
So, like, seems like you can just swap them out, but you actually can't do that.
Yeah, yeah, yeah.
Yeah, exactly.
Specific quirks.
Yeah, yeah.
And there's a lot of companies that make these, like, LM gateways.
Yeah.
Like, LM.
Like, a lot of these things that popped up, but it's like, it's.
It's a really good idea, but, like, at the end of the day, I think.
It for sure speeds up the process a lot, but there's still like some small details that
always have to be custom.
That is super interesting.
Yeah.
It's a fun time.
Just all this stuff.
Yeah.
Just quickly, Nathan, if I could ask you to describe, so we've got the pains, what would be like, sorry, we've got these pains, like your pains.
So I guess just the first two really.
If you had to describe what the gain would be from like evals, if you were waving, going back to that kind of like magic wand and you're describing what your number one gain would be like.
How would you describe that?
So like describing the gain associated with emails?
Yeah.
Yeah.
Yeah.
I mean the biggest thing is just knowing before the customers that something's gone wrong.
Because even with the time to first token,
I mean the time to first token is actually a really easy example because you can just have like a rolling average of that.
But with some of the voice things, like if, I don't know, like your voice is just 25% of the time pronouncing a dealership name wrong.
It's due to like some update that 11 Labs pushed out.
Then we, in an ideal world, we would know that before we start getting complaints about it.
So we can have it fixed and we don't get mad customers saying that like this thing's pronouncing my dealership's name wrong, which happens more than you think.
But.
Just generally there's a lot of problems with
that sort of thing.
Integrations will sometimes just fail, tool calls will fail.
And if there's an underlying issue, we always just want to know before a customer either figures out about it or before a customer complains to us.
Because right now,
our process is basically just waiting for a customer to complain.
And then if we happen to get an alert from
an uptime alert or something, Then
I mean we might be able to fix it, but it's it's it's not the way to run a business.
I'll tell you that.
Yeah,
yeah, yeah, yeah.
Relate.
Yeah.
Yeah.
Okay.
And then also for the same for the onboarding if you I mean, I guess we didn't talk about what this might be, but like yeah, for sure.
That was a magical product.
Like what would yeah.
Yeah, definitely just reducing the number of days to onboard someone is like the main thing.
So, I mean, like in an ideal world, someone would just have like a self-serve product because it's really hard to scale things when you have to work on a one-on-one correspondence with a dealership and it's taking like 60 days to start collecting payment.
And it's just like a lot of stress on our customer support team, having to like have that one on one sort of thing where I think like to scale anything, it kind of has to be like somewhat self serve.
If you're trying to get to more than like a couple hundred customers, you just can't manually onboard people.
Yeah, yeah, true.
Yeah.
Yeah, that's okay.
That's amazing.
And are you happy with this order again?
Number one evals, number two, boarding?
Yeah, yeah, I think that's good.
Yeah, I think that's good.
Amazing.
I think we're coming towards the end, so I don't want to go down the rabbit hole.
End up going talking over time.
But this is extremely, extremely helpful.
Thank you so much.
Well, yeah, of course.
Yeah, really, really appreciate it.
Yeah, we're just like, if you're happy to still chat in like another month, we're going to start to like kind of get your feedback on like how we're thinking about things.
And I think we need to start talking about some of these challenges as well internally because some of us, some of them keep coming up again and again.
Yeah.
And by the way, just as an aside, I think you're, like, really good at, like, describing challenges and also just, like, even just, like, what's the gain here?
You're just like, yeah, let me go.
Like, whereas, like, yeah, yeah, yeah.
I don't know.
You're just very sharp on that, so.
Thank you.
Yeah.
We're doing a lot of these, so definitely.
That's not.
That's not normal.
Like, you know, you're smart, dude, so.
Just to say thank you again for your time.
It's such an amazing thing to just hear, because you're so obviously right at the cutting edge of this.
To just hear the things that you're struggling with is like so interesting.
Yeah, thank you.
Thank you very much.
Thank you so much, Nathan, and look forward to chatting again in a month.
Yeah, likewise.
Yeah, yeah, for sure.
All right, have a good one.
Thanks.
Appreciate it.
See you later.
Bye.