Jan Siml (Swap) TAB call 1

Now Playing

Incredibly useful. Skip to 20:58 to the end to hear Jan talk about our positioning

Resisting my urge to ask you about your London eBay AirPods edition, because I just saw that and it's very interesting.

But I'll hold that off.

Yeah, so

Jan, in voice AI in general, with the stuff that you've been doing, If you could wave a magic wand at anything to change it, make it better,

just what would you wave it at?

So there is an hopeful answer which I have to say, which is I would wave it to be a founder of a successful startup that just exited.

So I can finally like chill out and just build for fun.

But in terms of what you were actually asking.

So like win the lottery basically and that's in that front or like just-.

Oh, having like a successful startup, it's like more you deserve the money.

I just want to fast.

Yeah.

You get more like status as well.

Oh, I don't care about that.

I just want the money.

Okay.

With the voice AI,

I would love a magical wand where I wave it and basically I couldn't care less that it's a voice AI or text AI or image AI.

I just want one thing that basically is my agent one framework and it doesn't really matter which modality I use.

And that's kind of not necessarily abstracted way because we solve with LangChain how that can get really shit and basically it's unmaintainable.

But it's more

having the right primitives that if you want to bolt on the other modalities and things, when you have something like your compositional pipeline, to make it so effortless that basically you don't really notice the additional lift.

If you had that, how would it change your life?

Because I wouldn't have to worry about Voice Agent.

Ideally, when I say that, that in my head includes some way of dealing with latency.

So potentially having some pattern that I can just copy paste, you know, Shenzhen style, but like AI SDK where I sleep in this other small model that keeps kind of chatting to the user while my bigger pipeline runs.

And then I can stop fucking around with these dumb models and literally chug my sonnet

to basically deliver the intelligence.

I don't have to worry too much about prompt thing because it's pretty smart.

And in the meantime, this simple model on the side will basically keep commenting on, oh, we have just started doing this tool.

Oh, yeah, I'm still looking because it's been three seconds and we haven't given an update.

Oh, there are so many products in our database, but we are really working on it.

It would give me a recipe like Shatian style that, oh, does your pipeline take longer than five seconds?

Then you should use these components to to make the waiting shorter.

That would be an ideal world.

Basically, seamless voice as an addition, because I don't believe in a voice as a standalone modality.

It's too noisy, too imperfect, too difficult because it requires some privacy and all this shit.

I don't think it's a big thing on its own.

I think it's going to be useful only when you seamlessly can jump between all the different things.

And do you see it as like, is it, if it kind of collapses into like one thing, is it like, kind of you deal with text or is it just

like, yeah, yeah, that's.

Kind of the modality that's like, it would be harder to deal with images or sounds.

So I would have a text agent.

The idea there is, I wish that It

will be easy for our design team to figure out an experience because we will be able to tell them, hey, it's going to take this many seconds to do this, this many seconds to do that.

We can like entertain the user so they can do their design.

And all we will focus on is not going to be like, how do we get this voice platform up in the air?

We will just use the tools we already know in the past that would have been like PyTorch AI where we wanted to run OR.

AI and we would just basically build the tools we need to run the application that we want to run.

And voice would be just a thing that you almost like, yeah, I want the voice too.

Yeah.

So you don't want to go down this very specific world of voice.

You want it to just be like.

Yeah, because that's what it is.

I don't want to undermine what you guys are doing.

Not at all.

No.

But if I was doing this myself, you just slap in the STT TTS around your normal pipeline.

There is a bit more latency, but hey, that's life.

And you just go with it.

It's just like you add a button, sometimes you integrate it slightly better in your UI, but that's what it is.

True, true.

Yeah, it makes total sense.

If you could wave a magic wand at anything else, what would you wave it up?

Outside of the voice AI or more ambitious?

Or is there like a number two thing that you,

I guess if that didn't exist

and there was like kind of more of like a

immediate, I don't know, like kind of right now wave it up, I guess.

Sort of.

I don't know.

However you want to answer it, to be honest.

Yeah.

I'll leave it to you, as always.

Focus on voice AI or in general, what I worry about or what I'm thinking about.

Actually, yeah, let's do in general.

If we take out voice AI, what would you wait for?

Magic wand up.

For me right now, it's still voice AI, but basically what I'm thinking about is also how do I deliver the memory layer to this?

Because

it has the same problem that you have some providers that kind of give you something out of the book, but it's shared, so you end up doing the work anyway.

Then you get a custom, but when you start doing a custom, you don't actually know all the edge cases and you haven't really tuned your tooling for the model to really understand all the edge cases because you haven't pushed the volume of memories through it.

So you basically

you have to give in, sorry, very specifically memory layer, let up, obvious one, right?

Because literally there, DevRel is a friend of mine, ultimately didn't decide to go for it because I have to use their SDK, their abstraction.

Even though I can self host it, it still imposes, you need to use Postgres, you need to capture everything.

I don't care about that.

I need 10% of their features right now.

So the complexity trade-off right now is not worth it.

It might be in the future, but like right now when I'm sprinting, I don't care.

We don't have like six months to plan out like what will be the optimal agent because the reality is it's kind of a build or die.

If we don't build it fast enough, if we don't release it fast enough, like we're gonna pivot and go something else because everyone else would have done it.

Yeah.

So in that world, I basically built my custom thing in two days.

It's too hacky and something

that would be really good primitives like Pydantic AI style where they basically bake in good engineering practices, good best practices of the workflows that they see from other teams.

They

just converge on them and create a light abstraction layer that you can just hack on.

That would have been amazing if it was in memory.

I couldn't find any.

Then search.

Search is trivial.

Million businesses do it for you.

It's very hard decision because everyone wants to own their data.

So we end up running with GCP Cloud SQL, which is this really crippled Postgres instance where a bunch of but the extensions don't work.

So I can't even do fucking BM25.

So now my, some of my queries have P 99 line that say, oh, 15 seconds, even though I can do the same workload on my laptop in one millisecond.

Yeah.

It's like completely stupid, but I'm basically locked in to certain architecture and infrastructure decisions

and I need to, get that to work.

And if it was easy to switch over to someone who would manage the search for us, I think we would do it.

It's just everyone promises it, but it's not always that easy.

I build a full POC on Vespa, if you know them.

They

are basically a bunch of these old school searches as well with some big names.

It's a really fast database, but basically you configure the whole thing in these crazy configs.

Which means it's very domain specific, right?

And I have literally, I built a whole GenAI's ecosystem in Julia language, so I know exactly what I want when I'm building this system.

And I was still struggling to basically navigate how they do all these things in their config formats.

So that gives me abstraction for the infra, which is great, but it doesn't give me abstraction to basically quickly build the thing I want.

Which might be too niche of an ask because ultimately you spend longer time hosting your services than building them.

But the world I know and the world I'm in, in this weird space, is mostly the world of building and sprinting.

And in that world, I haven't found a good search provider.

So right now I'm basically just hacking on top of Postgres and I'm always this close to basically chugging my local search powered by NumPy because I can beat the shit out of any search index I can see.

Yeah.

So that's kind of the, those are the concerns, basically not necessarily the voice agent itself, but like the voice is one layer of the cake, but then there is all the tools that they need, one of which is always going to be some sort of a retrieval.

Then there is always the memory because nowadays, like without memory, you can probably, probably everyone expects it in a way.

It doesn't have to be fancy, but like you need it.

So those are the other things I'm thinking about.

That's super, super helpful.

Yeah.

It is very helpful.

Yeah.

So as you kind of see like the big picture, what actually like, yeah, is what actually is the biggest problem, I guess, with like, is it like the fact that you mentioned that some of the searches are slow, stuff like that?

Is there anything that's actually really a blocker for getting it out and stuff?

It's not a biggest blocker for us.

This is proprietary, so make sure you don't share this video.

Biggest blocker for us is because we want to compete with Shopify, which is strategically the dumbest decision I've ever heard of.

We are Shopify plugin.

Our whole business line is basically Shopify plugin.

So if they cut us off, we lost the whole business.

You can't raise, you can't build, you're done.

And because of that, we are building out separate checkout and everything because their terms and conditions are you can't use their checkout partially, you need to really buy in or you need to do everything else.

So obviously for the actual devs, their biggest nightmare is dealing with the terms of the service and payments.

But that is not my problem, right?

Because that's vanilla, that's engineering.

Like figure out your shit, like how will you pay and do the checkout and everything.

Because I need to worry only about the AI pieces.

That is what they worry about.

And to be clear, most of it is

there is right now one of me for foreseeable a few months.

I think we already hired a bunch of people, but long,

they have always three months notice.

So there is effectively one of me.

And now that I've finished memory, I will move on to search and I will make it faster.

However I do it, I will figure out a way to make it faster, more indices.

It's still a database.

It all can be solved, but ultimately

there are too many choices, too many dev tools, too many providers.

You tend to know one which has obviously like yourself, podcasts or whatever.

You basically awareness because of the channels you consume.

Sometimes you top it up with some research, then you have a bunch of other providers.

Even if you basically move as fast as you can, you end up with this weird, like basically having to build it because docs, you can't really trust docs.

You need to build the thing to understand what the real edge cases are.

And with every one of these, you basically end up two to five days building the piece of the platform you need, not even putting it all together

for every piece of that final proposition.

it's basically this like, It's this empirical science to an extent where basically even like choosing providers, this empirical science.

That's why I had to build for voice agents.

I had to build this like side by side comparison of all of them.

Until you build it, when you just look at the docs, you have no idea that Gemini was like single most painful thing to get done to properly connect.

Now that was fine, but like at the time their dogs were just insufficient for what they had.

And even all your coding agents, just can't figure it out.

So you basically need to build it.

You need to do it.

And that takes multiple days if you have only limited time.

Of course, yeah.

So I guess that's the challenge.

But yeah, there might be people who know it.

But the other thing I had to deal with was virtual drown, right?

Yet another piece of it, which is completely visual video generation, all those pieces.

So knowing all the models, all the workflows.

So none of these really

They are not really adjacent.

There is no knowledge.

It's completely different domains and you kind of end up with all of them.

Sorry, what did you say?

Virtual what?

Virtual town.

So in our app, when you basically select the garden.

Sorry, yeah, yeah, yeah, yeah, yeah.

So you can see how it looks on you.

Yeah, yeah, yeah, yeah.

Which requires image model, video models, background removals, different things, which is a completely different modality, which you need to integrate in your app.

And it has its own latencies, its own failures and stuff.

Yeah, I guess I've seen, yeah, I saw like levels.

I was doing that, right?

We've seen him doing the with Fal and I think he's just...

Yeah, he.

Does a lot of this interior designing for whatever reason.

Photo AI as well, yeah.

That's super cool.

That's fun.

Okay.

Very, very cool.

So it's like the challenges is like, is it almost like you're saying like you just need to build it yourself in a way, like you or like you build it yourself first and then use tools rather than you can't just go straight to the tool.

No, I usually go into the tool, but basically I need to build the thing I want.

With the pool to learn whether it's actually useful because

I don't understand whether it can do the thing I want very quickly.

It's like opening a real time, right?

It's the same thing ever in the docs.

Just do this real time voice agent.

Boom, you're done.

You can define the tools, put an MCP there, you're done, right?

Turns out you're not because if you want to do any other operations, all the transcripts and things, you have to hook into the transport layer, which is like pure session data.

So you need to read up all the docs and figure out what the events are.

And they have a lot of them.

Yeah, yeah.

Basically start changing everything by sending these real time events and ending your turns and things like that.

So everything is magical in the docs, but not in reality.

And that's why you have to build to try.

Yeah, that's

always the case.

Yeah.

Sorry, it's always we're part of that as well.

But it's a, It's a challenge.

How do you find tools actually?

By the way, just one quick question.

How do you discover tools and stuff?

How do you...

depends on the tools.

If it's a package or something else.

I tend to have like two to three windows of ChatGPT open at any point in time anyway.

Because basically I'm in a constant learning mode because I'm building all these things.

Yeah, of course.

Generally, I don't find myself, I always know a tool that I would use because I spend probably

two to three hours a day listening to all the various podcasts.

Oh, cool.

And reading all the small AI news and refreshing my ex probably too much.

Because I listen at 4x speed.

I actually cover a lot of podcasts.

4X speed.

Yeah.

So I tend to have a reasonably good awareness because of all the various channels of what the are, but that doesn't mean I have used them.

Yeah, that's where always the issue is.

Most of the time, the bottleneck is.

And I was like, oh, search, we should do Vesta.

They do Colbert and all these cool things.

And then you actually try to use it and you realize, well, I would be the only person in the startup who would actually know how to do anything.

Yeah.

And that's not a good one, right?

When you need to scale around and things.

Get people to use it.

Yeah.

So yeah, that's always the challenge.

But yeah, that's what you learn when you actually build with the tools.

But the very nice is standard.

Yeah.

Newsletters.

Yeah.

The tech news one as well.

You know, comes every 1pm and 2pm.

There are two different ones.

TLDR, I think, is the one.

Oh, yeah, yeah, yeah.

That one I like as well because it keeps me current on top of the small, small AI one.

Have you got to use that console Dev on console.dev?

I don't know that one.

Yeah, it's quite a good one.

It might cover some,

I don't think it's going to give you the full thing, but what they try to do is they say like, here's what it's good at, here's what we like, what we don't like about each tool.

And so you might get some, I know, because you said that's the challenge, is knowing what it can't do.

But I, sorry, go on.

I think, no, I was going to say, I think they used to spend a lot more time doing that.

I feel like they don't, I know the guy is a really cool guy, but he's got a startup now, so

yeah.

Cool.

Sorry, what were you going to say?

No idea.

I was going to say, I read even these various blog posts of people using different things, but it still doesn't cover that what you want to do with the restrictions you have, which are imposed by what the design wants to do, what your PM thinks the functions should be and what your architecture is mostly laid out with your infra.

Within that, when you then start using any of these tools, it might still be very different thing, but it's very cool when someone shrinks it.

I don't do Hacker News and I don't do Product Hunt and any of these things.

Yeah, yeah.

Makes sense.

Yeah.

I don't think that many people use product hunt.

I think obviously a lot use Hacker News, but for like product hunt is it's good, but it's something that's good to do, but you know, it's not.

And I think it's where most people get their information.

Amazing.

Well, yeah, that was so helpful.

That's all the questions I had.

So we can Can I ask you.

Something about layer code?

Yeah, absolutely.

So I had like random call about this with Damien at some point and I gave him like brain dump.

One thing that I was struggling to figure out is like,

who do you guys want to serve?

Like I really can't figure it out from your, because like,

even as a dev tool, you can't be too general as a dev tool because then you're unusable.

You always have to pick something and it wasn't clear to me because everyone else is like already positioning who they want to serve and which vertical and potentially use case as well.

Who is it for LayerCoke?

If you had to, just before I answer that, if you had to say who you thought it was for, just be curious.

It sounds like you don't know, but if you had to just

say, someone held a gun to your head, who is it for?

Well, I would say it's devs.

But

that also doesn't make sense because you still need devs who are AI engineers.

So it's like a subtranche of that.

And within that, I am still not seeing...

I don't see how you stand out too strongly.

Who are the devs who need the full control and have that kind of scale that you guys need for all these things to fit together.

So that's why I was curious.

Yeah, I think, well, I would say right now, firstly, I don't think we're doing a good job on this.

That's one thing.

So I think our answer is probably not going to be like satisfying answer.

But.

I think we've narrowed it down to devs, as you said.

But as you also said, it's just

not narrow enough.

My personal view is that we should narrow it down to like JavaScript first, like TypeScript and really integrate well with like AI SDK and stuff.

As a.

First port of call

and try and just do that really well and just have lots of good TypeScript examples.

Good.

Even just focusing on Next.js and just really double down on that.

Because I think that's the one that we know a little bit better.

But,

yeah,

that's my view.

But right now we are also just trying to, I think, just talk to a lot of people and maybe figure out where there are gaps because

I think, I don't know, it's already starting to feel like a lot of people are figuring out the pipeline stuff.

Okay.

There's like, at the scale is like really difficult, but maybe there are like some kind of tools around.

Like I feel like a lot of people might end up, I don't know, our feeling maybe is that a lot of people are going to end up using the real time APIs and just run it through open AI.

And so then like, where can we be useful?

Are there things that people are building with voice AI that like, we can build.

But it's very much like we're trying to

be useful.

That's why we're trying to do all these calls.

We're putting a lot of time on

trying to figure out where we can actually solve problems that aren't being solved well.

That would be interesting because I don't think even TypeScript Dev is narrow enough.

I would love to.

It's none of my business.

No, no, no, it is.

It is your business.

We want to hear.

You know, I am solving a problem of I'm not building a voice agent.

I'm building a factory of voice agent.

Right.

So what I need is actually to be able to create a customized voice agents for different brands and run that whole kind of factory.

Yeah.

So I'm a dev, but, like, I have a specific take, which is, like, this repeatable pattern, because I'm serving a lot of brands.

If you go to a dev, then they might be serving one brand which has its own unique things, right?

Like if you look at 11 labs, they clearly figured out the single biggest thing where you need voice is customer support.

So they literally have a widget, you drop on the web page, you have this stupid playground configurator, whatever, and you're done.

So they are targeting the dev who's kind of not too much of a dev, but it's so simple.

And also they like hit this like, They didn't go for vertical, they hit the horizontal of that like custom support function and it just makes it so easy because what they built is perfect for that, like change a few, whatever we generate the widget.

So their focus is actually incredible.

It was pissing me off because that wasn't our use case and hence features were missing.

But where I'm challenging you is basically doesn't help us honestly.

I don't know why I'm telling you that, but like in general, because you love it, yeah.

I can't help it.

I spend like 10 minutes.

You just, this is, yeah, you're a startup guy.

I mean, same, same.

I'm a startup guy for past four months.

But

it would be so much easier for you if you literally pick what is the other line in customer support or a specific pattern in customer support, right?

Like vertical industry, whatever, that has some specific interaction, whether it's a hard to guard rails, right?

Like when you're buying it's so much more.

I will have to jump.

I have different call, but like rails are tough, right?

Yeah.

How do you integrate guard as well?

Well, that's when you're buying stuff like retail or whatever or where you have some sort of a danger.

Like when you guys were discussing all this PII and all those things, can you figure out how to do that like validation with like UI or whatever?

Yeah.

Like when you focus on few specific problems.

Yeah.

So much better.

You can add.

Your marketing would be so much better and you would pop up in every fucking GPT deep research for like who's this provider?

Yeah.

You're focused.

When you're broad, you're gone.

You will never- With nobody.

Yeah, with nobody.

We were building Geo for a while and that's all about being like ultra niche and then having few of these ultra niche things you solve.

Anyway.

Thank you, Jan.

We appreciate it.

That was helpful.

So helpful.

See you later.

See you on Thursday.

Next Thursday.

Yeah.

Monday, right?

Wait, hackathon?

I thought it's Monday.

23, no?

Oh, is it?

Wait, isn't that Thursday?

That's Thursday, no?

23, yeah.

Okay, all right.

Oh my God, this is the first time ever that I am...

Yeah, the Oracle, yeah.

Did.

I'll see you later.

See you next week.

Bye.