< Back

Now Playing

Ben from Achilles HR TAB 2

So I was just gonna show you

some of the stuff that we've been seeing from people.

Hey, Aiden.

Hey.

Nice to meet you, man.

Yeah, you as well.

Yeah.

So I was just gonna show you some of the stuff that we've been seeing, Ben, and just wanted to get your reaction and, like, kind of get towards, like, hopefully, like, some sort of ranking on, like, things that you.

See things basically.

I'll just share my screen and make more sense.

Cool.

Yeah.

So we've got.

We've been starting to, like, write down some of these, like, pains and gains and just wanted to get, like, your sort of reaction to these.

Yeah.

Yeah, I would say that the biggest one is number one for sure.

And then we don't, are you talking about onboarding new customers in terms of like me onboarding new customers onto our product?

Yeah, sorry, I should have been clear on this.

Yes, yes.

I don't think that's actually too difficult for us.

There's some like data stuff that It's a pain in the ass for us, but it doesn't have anything to do with voice.

I would say bigger things are.

Not.

Maybe this isn't really like a voice issue, but we need to be better about getting the model to kind of like go into a little bit more detail on interviews it's doing and think on its feet a little bit more.

That's not really voice stuff though.

It's just like we need to improve our prompts.

And.

You know, improve contacts and stuff like that.

But.

Context.

Context, sorry.

But yeah, those are probably

our two biggest ones.

So it's the logic of like the actual agent stuff.

Yeah, a bit.

That's not like, we're definitely not in a bad place on that right now, but it could be better.

I mean, the other big thing for us, which again, it's not really related to voice, is like,

you know, for our product, we have, you know, basically building an AI recruiter.

So we have people moving through a candidate flow and just managing like the context.

And, you know, when we have another touch point with the candidate, Are we asking a question that we've already asked or stuff like that?

Just getting all that stuff right is difficult, but again, that's not really a voice issue.

Yeah, yeah, yeah.

Are you asking questions you've already asked?

Yeah, that makes sense.

It's like the,

are you using any frameworks and stuff using like ASDK or whatever?

Right now we're not doing any of that, but we should be at some point.

Wow.

Yeah.

I don't know.

I feel like you hear different things from different people that like whether they're worth using and stuff.

Yeah.

Okay.

That's cool.

So it's just that that's sort of like

design slash, I don't know.

I feel like that's the core of what you're, that feels like the core of your

differentiation and you doing that well is like the driver of the product, I guess.

That's part of it, yeah.

There's a lot of other stuff as well, but that's definitely a big part.

Yeah.

Is there anything else that's kind of keeping you up at night at the moment?

Those are the big ones on the technical side, honestly.

Yeah.

Yeah.

Just like zooming in on the conversation doesn't feel natural.

Is there anything that's really painful there for you guys?

I think, yeah, mostly just turn taking.

Right now we have our silence

duration between turns set pretty long.

And so the conversations feel kind of like there's a high amount of latency.

The reason we have it set long is to avoid interruptions, like avoid the model.

Like in that exact scenario where I was thinking of what I was going to say, the model would interrupt basically to avoid that kind of thing.

So just having like a more context driven approach to that where the model understands what's being said and we'll wait to talk.

Until it's obvious that, you know, it's the model's turned to talk.

I know there's some new stuff on this, like the Gramian release, a big new thing, which I've been meaning to get around trying, but.

Yeah.

Yeah.

Yeah.

We just starting to implement that one of our engine.

You're talking about the flux model.

Yeah.

Yeah, it seems really good.

It's like we're getting, I think, timed at like two seconds latency on turn taking and it feels like really like whole thing of like replies and it feels like really, really fast.

I don't know.

That's great.

Our only problem is right now it's only in English, which is a problem for us, but I'm sure it'll be multilingual soon.

Yeah,

yeah,

yeah.

I can, if it's helpful, not about using us or anything, but just so if you wanted, I can send the demo that our engineer did.

So you have an idea of what it's like in the real of someone implementing it.

That'd be great.

Yeah.

Cool.

I'll send that.

That's mostly it.

I mean, really, like, it boils down to, I think, turn taking.

I mean, the voice is already, like, good in terms of.

How it sounds as a voice.

It's really just like turn taking and content that we need to improve on, I would say.

Okay.

Okay.

Yeah.

Yeah.

Makes sense.

Okay, great.

This is super, super helpful.

And these are like the kind of gains from that.

Like, maybe this is very specific this one, to be honest, but I don't know if it.

Applies to you as well.

But we had something that came up for a lot of people was perfect transcription, low quality audio.

That's definitely somewhat of an issue for us.

It used to be a really big issue.

We kind of have a hack for that, which is if the model can't hear or

kind of gets in a bad state.

We basically just have the model say, hey, sorry, I can't hear you.

Can you speak up?

Which is kind of a hack because, you know, it kind of will prompt the person to make, you know, people know how to make their phone quality, phone noise quality better, so they'll do that.

That tends to help.

It's not really, like, great, but it.

You know, helps.

That is genius.

Gotta say, that's the first time I've heard someone doing that.

Yeah, it works pretty well.

Yeah, I do that with my friends all the time.

Genius.

Yeah.

Have you tried, like, the noise, like, isolation stuff?

I know some people are doing that.

Like, messing with the audio.

Nah, we just take the audio that comes straight from Twilio and sends it to.

We send it to.

Deep, yeah, I guess we're using DeepCram now.

So, yeah, I haven't really messed with any of that yet.

Maybe there's something you can do, but I don't know.

My kind of theory on that is like the iPhone's already doing so much to make the audio good and then some of the phone networks like.

I.

Am probably not going to do a better job at that than Twilio or Apple.

Yeah, yeah, yeah.

That's a good point.

But we have spoken to some people that said that they've got a lot of benefits from using like Crisp and AI acoustics.

But I can't actually remember if it was like over phone or if that was WebPonD.

Yeah, I think it makes a big difference.

When things come in over the phone, it definitely changes a lot because, like, if you have just, like, a laptop, it depends on the laptop that it's coming from.

But there can be a lot of,

like, excess noise outside of the speech that comes in.

But with phones, they're already, like, so tuned to that.

That, like, when I listen to the twilio recordings, it's like, yeah, I mean, even if I, like, Like we did try stuff to,

we actually at one point tried using like FFmpeg to basically, we use like a combination of VAD and FFmpeg to basically like anytime that VAD was

basically saying there was no speech, we would like strip out the audio effectively and then send that and it didn't really account and cause a bunch of problems, too.

So.

Yeah, I don't know.

Maybe there's somewhere we get for that now, but I just don't see it as, like, a, like, good use of time at this moment.

Yeah, it's not, not that big.

So would you say it's more

reducing, from what you said, was it reducing latency between,

like,

Tons.

Not necessarily reducing latency.

It's more like just making sure the model, like having, you know, half a second latency is like totally fine.

It's more like, like if you, if we take, I mean, we intentionally set the latency to basically a second in most calls, but if you have it at half a second,

you're a lot more likely to get an-- like the model will interrupt the caller in a way that's not good.

And so it's really like,

how do you keep the latency at half a second and then reduce those interruptions?

Okay.

So reducing interruptions while keeping latency at half a second.

Yeah.

And is that so the gain would be reduce interruptions while

keeping?

Yeah, I mean, basically I think it's what Deepgram has done.

From what I read, it's like what they're doing where it's like you look at the actual text that's happening or, you know, the content of the audio and then you decide whether or not.

To.

Create, you know, to have the model start speaking.

Because like even when we're talking right now, sometimes I talk immediately right after you talk less than half a second.

Sometimes I wait longer than half a second.

So for it to feel natural, it has to be, it can't be just like when there is silence, start talking after half a second.

Like there has to be, it has to be contextually driven.

So it's like, ah, context appropriate.

Contextual.

Yeah.

Okay.

Yeah, that makes sense.

And I guess this perfect transcription is like not that high

for you.

Yeah, we don't really, I mean.

I.

Mean, frankly, I've looked at the transcriptions that come in on the call

and they're definitely not perfect when the call is actually happening.

But the model can basically get the gist of what's happening almost always.

It's totally fine that the transcription isn't 100% perfect.

Then post-processing, we use a different transcription that is Honestly pretty good.

Like every time I see a weird transcription and I listen to the call.

I'm like.

You know, it's pretty good.

I don't think there's really anything we could do there.

Like even I am kind of misunderstanding what that person's saying or like, I don't really, you know, so totally.

Yeah.

Okay.

Yeah, totally makes sense.

Yeah.

So,

This is like

kind of the jobs to be done.

And do you think there's any

well, I guess this is like

this is kind of like for

yeah, I should struggle on the best place to say this, but like this is like what we're trying to think about, like what problem we should work on the job that we should try to solve essentially.

Yeah.

Do you have a view on like what

should be worked on as like, like you would want someone to be working on?

I mean,

Yeah, yeah, I mean in general like well right now we use LiveKit which does a lot of the you know actual transport like we don't use the we never really handle the audio ourselves obviously

or yeah, not really.

But.

I don't know.

I mean, just the biggest thing is like, yeah, we've had to spend a ton of time tuning settings and putting in all this stuff where it's like, for example, we have

a bunch of code to make the conversation feel more natural in terms of adjusting values at different times in the conversation and all that kind of stuff based on.

Is.

It incoming or outgoing, all that kind of stuff?

Handling, yeah, it's tough because like.

Would.

I like to not have to do that work?

Yes, but also

I want to somewhat be in control of what's happening there.

So

yeah, I don't know.

It really just depends on like how good the solution, like if someone was like, hey, you can basically just provide an LLM node and numbers and like 12 number IDs or something and the conversation felt perfect every single time, then I'd feel comfortable giving away control on that.

But

yeah.

Does that make sense?

Yeah.

Yeah, I understand.

It's like you're kind of like, well, it would be great if I didn't have to like, configure and test all these options.

But also it sounds like you feel like a lot of those were quite specific and

you want them a certain way for your customers that probably someone else would want differently.

Potentially, yeah.

I don't know.

I mean, most phone calls are probably pretty similar.

And so,

You can probably get by with a somewhat generalized solution there.

But also we're in a pretty good spot now.

We're not perfect, but I listen to other calls that are, you'll see on Twitter like, oh, this is a good voice AI thing.

And I'll listen to their call and I'll be like, We're better than that.

Yeah, yeah, yeah.

I'm not like, yeah, in dire- the first time we talked, I was in much more dire straights.

Now we're in a better spot.

Yeah, yeah.

I don't know if I've spelt that as the band or the, that was actually spelled dire straights.

Yeah, yeah, that makes sense.

And by the way, just like, is there, have you seen any examples that were like, Unbelievably good.

Honestly, not really.

I keep going onto people's websites and like, hello patient, the other day I tried and I was like, this is good, but it's basically just the same.

It's definitely not any better than what we're doing.

They're just using LiveKit under the hood and doing all the same tricks that we are.

Yeah.

And then like in that example, I tried it and I was like, well, this quality is like

maybe the same, maybe slightly worse than ours.

And I tried it on their website.

So it was going like just over the internet and through my super high quality MacBook microphone.

So I was like over the phone, this seems like it'd be even worse.

Yeah.

So yeah.

How do you feel about the ChatGPT voice one?

I haven't used it much.

Yeah, it's pretty good.

It's good.

Yeah, I think it's good, but be curious what you think.

I mean, I feel like long term, all of this is heading towards there's just a speech in speech out model.

But

the problem right now with that is like, For example, with

the real-time API,

it's not going to work well for a business use case because you're going to want to be able to control it.

You're going to want to make your own voice and you can't do that with speech to speech right now.

I feel like the tool coding and stuff's harder as well.

Also that there's the timing stuff.

Yeah, I mean, great points and all that.

Yeah, yeah, yeah.

It's interesting.

Yeah.

Okay.

And so that makes sense.

Yeah.

This is really cool.

Yeah, I know.

Sorry.

Do you have, like, can we actually just like play around with Achilles right now and see like, Just kind of curious like, because I should have done this before, but I don't know if you've got anything that we can speak to.

Just like, I can send you something after we are,

I just need to double check with my co-founders because we're kind of playing things a little tight to the chest right now.

Just because there's a lot of competition in the space and everything, but I can probably send you guys like send you a demo call Yeah, yeah, it'll be amazing.

Yeah, it's just purely for ourselves, like just want to see like what good looks like basically.

Yeah, yeah.

Yeah.

That'll be amazing.

I'm sure it's fine if I can just send you our like demo number

after this and you can just like call it in and call in on it and see how it sounds.

I think the prompts on the demo number are not up to date, but the content's not really exactly what we do now, but I keep it up to date with our whole voice pipeline.

So it sounds like the conversation will behave like our existing conversation basically.

Yeah, it makes sense.

In terms of like the, well, I think this one, I don't know if it's like super useful, but do you think if there's anything that's like really changed recently that makes

sort of solving these problems more valuable?

I mean, I think there's just like just with the whole concept of voice, I mean, the canonical answer is like there's so many problems that can now be solved with software, but I don't think that anything has changed since we last talked really.

I haven't really experimented with like any new models in the last like couple months at all.

Been working on other stuff, but.

I.

Mean, there's like, I mean, how many people are employed doing phone calls that, you know, can read off of a script?

Like, there's just a ton of money to be made in 100% automating that.

Yeah.

100%.

Yeah, this is amazing.

Yeah.

Ben, those are the questions I wanted to ask you.

So,

Just is there anything else that we can help with you guys at the moment?

I'm just curious where you guys are at in terms of development.

Yeah, so we have been really focusing on like we made the onboarding a lot simpler because obviously for us we want people trying it out so we can actually and like using it so that we can actually make it better.

So we've been we launched a CLI that's got us sort of get started in a minute.

It's just lots of quality of life stuff for tunnels and stuff because for us right now, I don't know if you have to do this with LifeKit, but probably do, I guess, if you're running it on their cloud.

You have to have a tunnel running locally and then update it in the LifeKit dashboard.

We run on our own cloud, so we just,

So from the agent process, we can just hit our own server basically.

Is that what you're talking about?

We were talking about if you, well, I guess for us, because we're doing like we have the authentication and stuff, so

you give it kind of like an endpoint to hit locally in local development so that layer code sends you

the data to like an endpoint that you define.

So we have to basically put like localhost 3000 online so that it can be hit and that sort of stuff.

So it's like to test it out you have to do that.

And then so we've got like a.

Oh, see you fancy to do it.

Yeah, just like an update in the dashboard and stuff.

Just like it's just me.

Our life easier, that sort of thing.

And so now we've got people actually using it and there's just like

so much stuff with trying to make it work well with AI SDK and all this kind of thing.

And just

lots of small things that, but we're trying to think about the bigger picture as well.

So we're not just getting bogged down in making

SDK changes and stuff, but actually thinking about what's going to be most useful, which is why it's extremely helpful to talk to you and we really appreciate it.

Yeah.

Yeah, we're gonna hopefully have a Flux model integrated soon.

And yeah, that's kind of mostly what we've been working on.

Cool.

And by the way, I just pulled up your LinkedIn, I see you live in Victoria.

I've actually been there a bunch.

I'm from Seattle originally.

Oh no way.

Yeah.

I worked for Kenmore Air so I would,

like in high school I had gas planes for them basically.

You've done a lot of seaplanes then?

Yeah.

Yeah.

Yeah.

Jack, you didn't get to do that when you came but

we've got to do that because it is fun.

Did you ever get tired of it though Ben?

I feel like if you do it.

All the time it's It was pretty cool.

I mean, I wasn't flying every day.

I was just I was on the dock and I flew just occasionally.

So yeah, it was pretty fun.

Did you get to fly them yourself?

I have, I had a friend that was a family friend that was a pilot and she one time was just like, let me fly the plane with passengers in it, which was pretty wild, but.

That'S cool.

Not landing.

Yeah.

No, no, it is.

I don't know if they do it with Kenmore, but the Canadian one, Harbore,

they let you sit up front with the pilot.

Oh yeah, they do.

They do it Kenmore too.

Harbore is much more professional than we were.

We were pretty a little bit wild west of us.

Not in a dangerous way, but Harbore had their shit together a little bit more.

I've done it a few times.

It's always kind of terrifying because you're like, this is really fun.

Then you're like, What if the pilot.

Fucking has a heart attack?

Like I'm literally sitting here.

So yeah, that's cool.

Where are you based now?

I'm in New York.

Nice, okay.

Yeah.

I actually have a meeting right after this, so I gotta run, but thanks guys.

Thanks, thanks.

Have a good rest of your day.

Thanks, see you later.