Now Playing
Nathan Amin TAB 1
This sounds good.
Awesome.
Cool.
Yeah, we have a standard process for the tab just to try and get the same questions.
And then we're looking at everyone's answers to those questions and trying to pattern match.
But yeah, we are now recording.
Yeah.
Okay, cool.
Yeah, so Nathan, and this is our favorite question.
We're building with voice AI and the stuff you've just described.
If there's anything that you could wave a magic wand at and make it better or make it less terrible or make it change anything, what would you wave the magic wand at?
Right now, because I'm in the phase of just having gone from zero to working product in the last week, Being
able to integrate with our system would have been fantastic.
I definitely think that with the tools that are out there now, it is 100% possible and would have been great if it was similar to Devin maybe in a way where you're able to kick off PRs because that's really the process now.
It's just code reviews.
If it can build out the reasoning for like, hey, this is what your system looks like, And this is how we can integrate with you.
This is a PR for how we've done that.
Please review this.
That would be magical.
That would allow us to just start working immediately.
Awesome.
And you kind of answered it, I guess, but how would that change your life if it existed?
I think, oh, how would it have changed last week?
I think I would have already been able to start focusing on metrics and decreasing delays, improving the audio quality, improving the script that we're going through.
So I'm doing that now or I'm doing that next.
I'm not even doing that now.
I'm doing that in a few days, but it has to be done by Friday.
So it would have been, it would have been great to have already kicked that off last week because a lot of that is very self-contained and It doesn't have to relate to integrating with our full system.
Is there anything different about the world now versus a year ago?
Yeah, sorry.
I was building in voice AI three years ago.
It was two years ago.
It was 2023.
You have one of the last ones.
It was summer of 2023.
Yeah, because
we went through YC, we were like one of the first voice AI companies that went through YC because OpenAI had just like just come out.
And.
It was crazy back then.
We were building Pipecat essentially on our own.
So
the fact that it just works is insane.
And I think tomorrow could be even better, especially with where we're seeing AI tools moving.
So
I would love for there to be a world where people can build out their own products and have solo founders.
I definitely think we're moving towards that, where anyone with an idea can build out a full system.
But anyone who is perhaps not a staff level software engineer, can also just build this out on a weekend.
That would be ideal.
That would be incredible.
And also building out demos that way, you guys would be the go-to place for any startup that was building out a demo.
I've got friends that are building in EdTech voice AI, and they didn't even know what Pipecat was.
Which was, and I was like, what are you gonna, how did you build the system?
What did you do?
And like the other options are like,
you're using a system that doesn't have very good customization and you have very little control over how to scale or how to improve the system.
Or it kind of feels like you can't even get started.
Like PipeCAT, I think when you first look at it, can be a bit intimidating, even though the documentation is so beautiful.
But it doesn't give you that confidence that this will just start working immediately until you actually just start.
You go through the five minute tutorial, then you're like, shit, okay, cool, this is working immediately.
Now, how do I integrate this into my system?
Yeah, yeah.
This is not on the standard sheet of questions, but I mean, it sounds like, piper cat have kind of like won your heart.
It seems like they've done a good job there.
Is there anything that would
make you think, oh, let me look at something else?
Oh, totally.
I think it's definitely just being able to integrate with our system.
They just don't have that.
And that is far harder than creating a GitHub repo that just works.
Interesting.
Like dig into the integrating with your system, because I feel like this is, I feel like the nuance is where the pain is, right?
Like, I don't know, like, what would you say is like within, within, okay, maybe I'll just ask this question.
So like within integrating with your system, If you could wave a wand at any part of it, there was particularly
a magic wand waved at it.
Let me take a look actually.
Let's see.
I'm literally looking through the code right now and seeing what was really annoying.
Oh.
So there was some restructuring things where it's like, how can we ensure that this is a process that we can use multiple times?
So like separating out the Flow Factory, separating out the agent and the runner into different files.
And then being able to do plan renewal and then other kind of voice agents.
That was more of just like, it's not very complicated, but just like structural changes.
When you're looking at the GitHub repo versus integrating to our system, they're separating out the client and the server, kicking off an outbound call,
Maybe that might be the best way of thinking about it is how can you build out a script that kicks off a call that integrates with your
existing server
and fast API architecture as opposed to creating its own fast API system, which is what it did.
So it should have let me.
Yeah, it should have looked at seeing that we have a fast API service and start using it, but also that was definitely on me too.
But it's the, it's the, yeah, I know.
It's the, the vibe code to non vibe code software.
Yeah, yeah.
Okay, that, that makes sense.
So it's like the, seems like it's kind of just the general, like, Stuff with the kind of stuff that comes with having a fairly complicated system that does a few different things.
Do you already have a Twilio service?
Do you already have everything set up.
Or.
Are you trying to build this out as its own server, as its own system?
As you come to launch it, is there anything that you would wave a magic wand with?
You're going to start scaling it out a lot soon.
Is there anything that you're anticipating waving a magic wand at?
I think in terms of the scale plan, we're already able to kick off multiple agents at the same time.
I think we'll definitely get to 10 and then we'll see what the growing pains are.
And then beyond that.
But I think the thing that I'm most concerned about and I don't understand is how do we improve transcription?
How do we improve speech to our text to speech as well?
I think transcription is the one I'm most concerned about because that's the one that screws up the entire system.
But
if we're finding that there are delays or there are issues, how do we solve those issues if it's not associated with OpenAI?
Because in terms of improving the prompt and fine tuning, I understand that system, but I don't really know what to do with Deepgram.
Yeah.
Yeah, that's, yeah, we, I mean, we had a lot.
So, yeah, like a diagnosis.
Like, okay, so now we have this working in broad.
We're seeing that in these, like, between these steps in the flow, there is X amount of delay.
So you can see like where in the flow are there issues.
And then what are your options?
How do you improve each of those parts?
And the OpenAI part is easy in terms of Deepgram.
What have you guys found for how can you improve Deepgram?
I think some people are experimenting with the audio isolation stuff.
There's a tool called AI Acoustics.
We haven't implemented it yet, but some people have told us it helps.
There's also another one that's more well known called Krisp, K R I S P.
Oh, I've seen that.
I think Pipecat integrates with Krisp.
Yeah.
So some people, and I think it typically is like the biggest issue is that that's when it's like a difficult, lots of background noise or like that sort of stuff.
I don't think it will solve like all problems, but I think We have heard that it does make a big difference from a few people.
That makes a lot of sense.
Yeah, but yeah, I guess there's not that much more.
They have a new model.
I don't know if you saw that.
Flux, Deep Graph, Flux.
They have a new model, so maybe that's something to look at as well.
Nice.
That's awesome.
I've only heard good things about Flux.
Okay, cool.
Yeah, we're trying to integrate Flux, right, Jack?
Yeah.
How do you guys think about the
different services you can be using as well?
So I think also 11 Labs is trying to expand into the full pipeline as well.
They're trying to do text-to-speech, speech-to-text.
I don't know if they have their own AI, like they're trying to create their own agents as well.
And then I'm sure Gemini is also can probably jump into each of those as well, maybe.
But what have you guys seen?
Because I think the Deepgram to OpenAI to 11 Labs is kind of the classic.
But what are the other configurations you guys have seen that works really well?
We're testing, we've been using rhyme for text to speech because they have limits on concurrency.
So for us, that's like a very useful thing.
I think that the quality of the voice is not quite as good as like, it's definitely not as good as like 11 labs, the best models.
But
for real time, you can't use the V3 one anyway, I think right now.
So,
yeah, I think, like, I I don't know.
I personally, like, feels like text to speech is not the biggest of deals.
It's like.
No, it's not.
I think.
I think Texas is fine, to be honest.
Yeah, it's, like, good enough.
So I think for.
So we spend a lot more time, I guess, thinking about the speech to text.
Yeah.
Yeah, it seems like deep ground flux audio isolation.
Yeah.
And then I think like there's a lot of stuff that's just around like that, that, that like, you know, for latency, just like a lot of tips and tricks of just doing it well, it seems like just spending a lot of time, like just understanding where the latency is and
yeah,
yeah, but so, I was actually going to ask you one more, like ask you this question, is this basically the one question.
Cool.
But, like, I know I asked you that.
Asked you at the first time within voice AI, but just very curious, like, if we take voice AI out the equation, it could be anything.
Your answer could be the same, but, like, if you could wave a magic wand at, like, anything that's, like, a big problem right now, what would you wave it up?
Oh.
I would love for our systems to be self-improving.
And I think being able to go from human in the loop to fully autonomous, using Slack as that medium of, hey, this is an AI service.
It's having issues, diagnose these issues, summarize them on Slack with links to user IDs so that we can address each of the issues with an Ops team.
And then over time, actually replace the Ops team with another agent that is then improving the prompt or is building out a fine tuning data set or is improving the evals itself
would be fantastic.
I can definitely see it doing those three things.
Then that would kick off a PR and then we'd be able to review that and the system just keeps going.
That's what I want for the first AI system that I was building out for Frtuna and that is what I will want for the voice AI platform as well.
Wow, that's so cool.
That would be.
Yeah, it's not complicated.
And to be honest, like, like the most of the system, I was working on the whole, like, like writing to Slack, a summary of the issues is just a greater.
Like you have to build that greater for anything anyways.
So then the next step is really, I think Devin's been doing this where like you're able to integrate with Slack.
And then kick off an agent that will create a PR, but that process shouldn't be too hard to create a PR because it's just a bunch of Git commands.
That would be so cool.
I've seen Sentry is starting to do some, I've not looked into it.
Are they?
I think they're starting to do here's the bug,
And then I don't know.
I've seen them, I've not dug into it yet, but I've seen them doing stuff.
Doing stuff that's super general, I think, will be incredibly hard.
But if you can just enable tools for engineers to build it out themselves, that would be so beautiful.
If I could just call an API that just did all the whole Slack integration, that'd be great.
And then another API that was like, okay, this is how you kick off APR.
That would be great.
And then you're really just like building out the greater yourself.
Yeah, this is so cool.
You're like living in the future.
I love it.
It's like, I feel like you're living in the past though, because San Francisco is like, everyone there is like, we're creating a new AI code, whatever slop.
Every block, there's a new one that they're making.
And they're like, we're going to have it be very specific for ASIC fab design or something.
Specific to like server engineering or yeah, yeah, yeah.
So I feel like New York is very much like if you are working in fintech, it's on the cutting edge.
But for AI and for engineering, that's still San Francisco.
Yeah, San Francisco is crazy.
So you have to keep one foot in SF no matter what.
Yeah, absolutely.
Yeah.
Okay, amazing.
The actual one just massive side outcome I was thinking about.
So I felt like I should just test everything as a Vibe coder rather than as a, oh, 100%.
Yeah, yeah.
It's like it and stuff.
It should be like, because right now it's like me, but it's like, it.
Should just, why I say that is because even the best engineers, not me, I'm working on that, but our CTO is incredible.
And even he was saying, Hey, before we raised our series A, everything was on fire.
I was Vibe coding fucking everything.
Like, I needed to move so fast.
So it's far more to do with urgency than it is to do with anything else.
Yeah, 100%.
But it makes so much sense that we should optimize for that.
And even at the beginning when it's like things are probably harder in a way, like, because like, yeah, this is great.
Yeah, it bridges make it work to make sure it I like scales.
Yeah.
Yeah.
I think more and more people are kind of getting to that point.
You know, it's like, yeah, we talked about this before, Nathan, that, like, the industry is so young, right?
Like, so many people just kind of, like, doing demos and then starting to hit production problems.
And then I feel like there's just this gradient of, like, the.
The wider you scale, the more you run into that we're.
We're speaking to people who are at various points on that line.
So yeah, they're raised.
They don't need to impress investors anymore.
They need to like execute.
Yeah.
Yeah.
Yeah.
Yeah.
Nathan, thank you so much.
This is super helpful.
Thank you.
Yeah, I'm excited to see you guys in a few weeks.
Or maybe I'll see Jack, right?
Yeah, yeah, yeah.
I'll see you before, I think.
And I'll see you.
I'm gonna.
I'm gonna be in New York in, like, 10 days or something, so I need to follow up with you on that and book something in, but.
Yeah.
And then Jack and.
And the rest of the team will be at the hackathon.
Yeah, we have this, like, I have this, like, other event booking thing that's got, like, they could just book, like, five sessions for the tab, if that's okay with you, Nathan, that I could send.
But maybe we want to.
Oh, sure.
And, like, because I'm gonna see you anyway, like, maybe we start it.
In like six weeks.
Yeah, we can start it in London and then we can discuss.
That sounds great.
Yeah.
Okay, cool.
I'll send it to you anyway and then yeah, cool.
Sounds awesome.
That's a quick question.
You mentioned one of your colleagues wanted potentially to come to the Hackathon.
I wanted to make sure that if they did reserve their space, I'll check.
Back in with them because I think they're also in Texas and they're trying to figure out how to be in London at the same time.
And I'm like, you can't have two of you.
So I think it's low likelihood that they'll be able to make it.
So I wouldn't stress.
Yeah.
Cool.
That's great.
Okay.
Perfect.
Well, yeah.
Thanks so much for the time, man.
Really appreciate it.
This is super valuable.
Of course.
Yeah.
Oh, also, if at some point you guys could do a hackathon with,
I think maybe something that looked really cool too was Gemini and Pipecat are doing a hackathon O the eleventh, so this week end, in San Francisco.
So maybe check out how they did that too, because, um, if you could also get Gemini in the UK, like that would be awesome.
Or like like partnering with like, another one of these organizations, um, could be really cool.
I'm definitely young.
But it sounds like yourre you you're, um it's, uh, it's fully booked, which is a really good sn.
So that's awesome.
Well, voice voice voice Testing voice in a room full of people is an experiment.
Oh yeah, that's gonna be madness.
Yeah, that's gonna be crazy.
Okay, I will make sure I use Crisp.
Also, I just have to say, guys, thank you so much for your advice, especially with Deepgram, because I knew that was gonna be an issue, but I was like, I'll just address this when it comes up.
But at least now I have a little bit more plan, so thanks for that.
Yeah, we're actually working with CloudFlare and Deepgram as well.
So
you can do it through Deepgram directly or you can go through CloudFlare.
That's awesome.
Find one that's better than the other.
Cool.
Sweet.
Well, thank you guys so much and I hope you have a good rest of your day.
Thanks, Nathan.
You too.
Cheers.
Bye for now.
Bye.