Jan (swapcommerce) feedback at pre-hackathon

< Back

Now Playing

Jan (swapcommerce) feedback at pre-hackathon

Jan shares lots of feedback - including why he chose Layercode.

Free audio post production

by alphonik.com yeah.

I don't think, because I want to make sure we don't mess up, because I'm trying to write down as much.

As I can, but it's, um, files, is it okay if I throw it that Damien?

Yeah, throw it to Damien.

There's some, this amazing, like, basically like documentation that he has written.

No, it's just generating mermaid stuff, but it's basically how I explained it to devs, how it works.

Wow.

Okay.

So you can probably use it.

It's like a whole sequence.

You feel so handicapped knowing how it's built, trying to explain to people how it works.

Because I just don't understand the bits that people don't understand.

So that's super valuable.

Well, yeah, I think

that looks really, I hadn't thought so.

It's just also a good execution on that.

Can we go left, please?

So, what would you like help with?

Well, you want to speak like a pirate?

Wow, this is a, any, yeah, shirt.

I want to drink, by the way.

Is that the, yeah, it's a config file.

Oh, that's for the money, from the internet.

Yeah, the profit of welcome is high and falling today.

Bunch of different things here.

So, you can trigger this video in the system.

Oh, nice.

Yeah, yeah, yeah.

No, no, I get that.

I think I'm a bit stuck on like where to go next because I like,

so like I did everything it told me to do.

And then I'm like, okay, I've spoken to it.

What should I do?

They don't give you any kind of suggestions.

I don't know if we should.

Yeah.

So now I'm more complex.

Walk me through it.

I think the, yeah, then the terminal should say as well, right?

On the output.

I'd say the next thing people probably do is put in a tool.

But then the other option is you've got, it makes your own UI.

Now there is, so sorry.

One thing I really dislike, I needed to remove it immediately from the demo, these like weird waveforms, the synth stuff.

It was nice to know that something's happening, but like then I, very quickly got tired of it.

So put in like a customer facing UI example, because we used to have this thing very like customer facing UI.

Because when you put it in front of customers, you don't want this, do you?

No, that's why it didn't turn into a whole.

It needed to be clean and basically resembling ChatGPT as much as you can.

But like we could have like dev mode, customer mode, switch in this.

One thing I will say is Now that I know how ours looks, when I use that one, I feel handicapped because I cannot type text.

That's like the, especially when you have a room full of people, you try to say something twice, you just can't get it over the line.

You just want to put that thing in and you can't.

I would really love to have that text box in there, now that I know you can.

We, we, yeah, we should.

The text box and then customer facing UI, voila.

So naturally you can, we have got an agents MD file here.

So you could like launch a Codex for code if you do, if you do use those for instance, and then say like, now turn it into a smart agent that does this.

Okay.

And then it could think of chains like BUI.

And it's got all the Laiko docs in there, so it shouldn't know how to use Laiko code.

Nice.

But yeah, this will be the point where there would be some specific voice.

We made it too easy.

You've done everything you need to do.

You've got a voice AI assistant.

Nice.

Cool.

Can you just hear it?

That's.

Happening.

It's almost like a video.

Like you can just like, you know, hello.

Every day.

Yeah, you didn't need to say anything over there.

Yeah.

Because Dan told you.

Solid takes.

No.

What are you building?

E-commerce voice agents.

Okay.

Is this the enemy?

No, it's only.

We are actually using layer code, but it's a little bit more work than that.

Yeah.

I think that's the.

That's the hard thing about voice is, like, a demo is quite easy to.

Yeah.

Production, not so hard.

Yeah.

I had a minor heart attack.

I was stress testing our database and discovered that rp95 is like.

Almost 20 seconds on some searches.

So I need to tell people like a whole fairy tale while it's searching.

I have no idea what I'm going to do.

That is the crazy thing, isn't it?

As fast as the voice AI is hitting your legacy database.

So that's why I keep obsessing.

I have this like little framework in my head of different patterns for the voice agents where literally

I would love to have it simpler to basically combine something that's fast and slow and keeps the conversation going, which is like one pattern in one quadrant, right?

But you ultimately want to have how that fast agent, if you have two agents, fast and slow, like fast and slow, how much autonomy does the fast agent have?

Should it be just doing a chat that should it literally be just checking in on the bigger agent and just say, oh, I'm still looking, oh, can't find it, be funny, you know, like code code, how is this like gibberish at the bottom?

Like could it literally do that?

And then the next level is it literally can do like speculative drafting when you really get it open autonomy because like the fleshlight is pretty good and you can have it.

And then as soon as it finishes, like you can still run your slow model and that slow model can then issue in correction where if you know, you can risk it, you can basically have the slow model and be like, oh, sorry, I just checked.

We don't have the size n, right?

Which is still okay.

So I'm basically thinking about this, like if there could be a few patterns which are around.

Autonomy and like how deeply you parallelize these like different architectures, but like they will be, they will make it so much easier to build something that's both smart, but also like fluent that it keeps the conversation going.

But I think that's thinking vast and smaller side of chance.

Yeah, but it's really how do you integrate the two?

That's the biggest problem because then you can end up in garbage from both.

And how do you control it?

And I think that's where I'm really trying to see that.

Remember how we're looking at voice stuff?

Yeah.

There's the two streams that we're talking about.

Yeah.

You kind of want to do like doing stuff like that.

Did you do like, did real time API let you chat while the tool calls?

It does, doesn't it?

I think so.

Yeah.

You can do interrupt, but the tool call still runs in the back end.

So it would be more, it would.

Basically be real time.

Yeah.

It would be more similar to that kind of shadow conversation that the fast one keeps the conversation.

You would literally prompt it to speak more slowly and describe what the tool is that currently running, which is very hard to do with normal models because they are basically tuned to send a tool call and only then assist that message.

Exactly.

So you need to actually, the only way to wire it up is to have a fake tool called pre call, which then pushes the model to have to call more tools and it's more complicated, fails more.

So that's why I kept thinking like two tools are in a way simpler.

Sorry, two models are simpler.

But yeah, it's really messy to integrate that.

Did they export nearly?

Yeah, it's solely exported.

Do you have, is that your laptop?

Yeah, you can drop it to me.

Okay.

Why did you look at using real time?

I literally built like a gym or OpenAI, Gemini, ElevenLabs, you guys, and others to do a bit of a side by side.

Really?

Yeah.

Why did you pick us?

I spoke to Damien, and I was really pissed at OpenAI that day because there was some sort of a bug, but I just could not define tools at all, even though I was exactly following docs.

So I realized ultimately that magic...

The problem with OpenAI real time is incredibly fast demos because they have really good abstractions where you just say, I want real time agent.

But you get hit because anything meaningful, like passing the payloads, you need to hit the transport layer.

And at that point, even session update and everything, you have to add a transport layer because that transport interface is so not complete.

And providing all the tool definitions and things, it just was constantly telling me this is not valid definition of tool.

I spent literally two hours dissecting what exactly about the tool.

Who saw it?

Come two days, update the library, and it worked.

Maybe about, maybe I did something different, I don't know.

But basically at that point I was like, well, we will have this breakage and we'll completely fuck it.

And that morning I let you spoke to Damien and he was like, well, you will have your own beckett.

So we actually know that works.

Worst thing that happens, that audio goes away, but you still have text.

So it felt like way more robust.

So we were just talking earlier today about how we add in real time in Gemini Live and then give people the ability to test those and then transition to those when they get reliable or whatever for that specific context.

Because we've got another person who's doing tire booking and tire change bookings for German.

You know, the world of voice AI.

Um, and they were saying that the number plate and VIN number in German with real time API is just not accurate enough for them.

Although it seems to do okay in English for us.

Um, that was like their biggest problem.

And

it's interesting.

It feels like it's just, it's going to get better.

But then, in all these scenarios, it doesn't cut it quite yet.

It's fascinating because we never felt like it's not accurate enough until we were using deep Cramalot in layer code.

Before we were using OpenAI real-time photo testing and ElevenLabs on their platform.

And that was literally picking up very clearly people speaking in Spanish, like eight meters away from me.

OpenAI real time.

But ElevenLabs has their own transcription.

When I was using ElevenLabs agents, I think it was pretty good in a noisy, It wasn't too noisy, but it was literally picking up a Spanish colleague.

I was super freaked out about what it's writing.

And then I realized it's picking up his chat.

Is that his helmet?

Can I assist you on your boat ride today?

It was picking up the Spanish, picking up background.

It was picking up background stuff, but it was picking up clear language and writing.

What's on the front page of accuracy?

Right now I was sitting in front of my laptop and I couldn't say, I want a jacket.

It was literally impossible.

I feel like we need to get.

So...

Did you try a stem?

On the front page of BBC?

They've got a, they were the next one on the list and they've got a lower word error rate.

I think this is the one.

The problem is the live, the transcriptions didn't come through live as live as they did.

So you came from talking?

Oh, okay.

Can you find a jacket for me?

Yeah, I don't know if it's still working.

I might have killed the backend actually.

I'm not sure.

Oh no.

Thank you.

What did you do?

Hook up to an API.

What API did you hook up to?

Yeah, what was it?

One of my friends was just...

One.

Of my friends was searching API.

Matt, one of my friends was teaching a coding class once and it was like he always used BBC as the example of like, you know, doing like flex boxes and stuff.

And then he was like, okay guys, Let's look at the BBC.

Oh, by the way, there's been a terrorist attack.

Okay.

And it was like literally a terrorist attack in London.

Like, while it was on the, and it was the class was like in London.

Just like, buy me a jacket immediately.

Cool.

Did you send it over?

Natalie?

Yes.

Okay.

Do you think this is the best one?

That was my memory, but I'm testing it in the noise.

You'll actually confirm to have the same environment.

Find me a jacket.

I want something dark for less than 200.

Yeah, do it ASAP.

Yeah, it was brilliant.

Yeah.

You should.

Yeah, the language.

I don't know why, but this is actually pretty clear recording for the numbers and the voice.

Just missed a lot.

Yeah, we need five wardens.

I said find new jacket.

Come on.

Perfectly jacket.

Every time I try to do it on the previous one, it says check it.

It only searches most fun thing ever.

Okay, we're gonna put in the current.

Temperature,

but it's not set up here.

You were literally speaking and it picked up perfectly.

50 grams has not got high enough accuracy.

Well, like you say, it might be better in some situations, but like, These are all my previous attempts to say a jacket.

Oh, this one actually worked.

But, all the previous ones were check it, check it.

This is good.

Tell me any own fact about...

So, wait, you wanna go to Google?

I wanna add something, which is...

This is so valuable.

This is like the most valuable fact.

I'm using the truth for...

Jan, this.

Is...

Guys, this is so funny.

So you're saying it's going to be fixed on Monday?

Yeah.

Wow.

We're going to be working hard.

We know the list now.

Give me some questions.

Or something.

I'll write me in.

Whoa.

There's a couple of groups in there, David.

What would be fixed on Monday if that was, oh, it was like for something.

You could wait a little longer.

Oh, there will be unrelated to it.

That would be unrelated to the dogs.

It would literally be Our devs are now stuck and they cannot build our front end.

I think I posted in a group today.

Yeah, we haven't been in sync for.

A long way today.

It's this one.

So what is happening is we basically will have a lot of services behind Gateway on GCP and they cannot provide the headers so the response can't find it.

Yeah.

From the car.

So the simplest way would be if we can basically tell you these are the headers to provide and then basically those are the ones you should come back to.

Yeah.

Okay.

Just give me the number.

This would 100% be the number one, which is literally, that's why I was looking at the docs.

I was trying to figure out, can we build it from your primitives?

So hard.

So these are some topics you guys think about.

Never was fun.

Gave examples on these topics to contact us.

This one is different.

No, no, this is literally devs on our company are now stuck and.

Cannot.

Implement our voice agent because they cannot provide the creds to get into a VCC.

Ah, got it.

Right, because there's a more authorized handshake and everything and there needs to be some forward handlers if the this is.

The same thing, basically.

So yeah, we're coming back to it.

This is where it got to basically they flagged like we can't move.

But we can do that tomorrow.

I wonder if we could expose that.

Okay.

So we can fix that.

Okay.

We already have this section.

I mean, it's not a hard bit, but there, that we had already thought back through, and there is metadata field in the database table for sessions.

It's just not settable by the API.

I don't know if that's going to be enough just to have the metadata.

I said it as well.

My concern is you will not get through.

That's very awesome.

To our back end.

So what this is, is basically CRUDs to get into our VPC on incoming.

What would you imagine?

So we need those to be included already coming out.

I'm just making my own project.

And I think that's a very great.

For your agent or

your dev environment.

I would expect there are some temporary to be saved.

That's the end state.

I mean, I mean, mean selling this project.

But like I think a lot of customers will have the same true.

They have like income with them being.

Yeah, it's basically, if we added, it's.

More of a demo of like, this.

Is the API, yeah, under the web book.

And you could set any headers how to set that project on that code.

Assume you're assuming they are fixed.

That's what's the problem.

And there must be a really, there must be a lot of rotation to be safe, right?

This is how you do it.

So I don't think there's a way to avoid it.

Like a lot of the, but basically, yeah, if there was a way, yeah.

Easy, like all takes to the ground and run.

Simplest way would be if there are these custom headers that should be including or any communication in the session, start a new project.

I appreciate this.

It loads so fast.

Look how many comments.

Yeah, I don't think you will be able to get your agent basically without them.

So the better for all users.

More flexible.

Exactly.

But we have better data of agents and custom headers when you do the authorize and create the session.

Perfect.

Yeah.

Basically if both are open, then the customer can do whatever they want on that end.

We have each do that.

Yeah, because client side, we can set it all ourselves.

It's really the layer code to backend.

Yeah, because I'm working on this.

Yeah, that we can't.

We need to reroute the one.

For demo.

I could.

Sorry, reroute out there.

Yeah, I have that one.

That's the one I built in.

Oh,

I built it.

So I have a company on the

market.

I need to sell it.

I could probably have code carved out.

Yeah, we've been like 10 years.

I felt like the quick start sound like infrastructure within that.

I've been visiting for four weeks.

Yeah, like transform into different frameworks.

I'm.

From the states, from Colorado, Boulder.

Lovely.

I'm from like the countryside.

Nice.

Yeah.

That's right.

I'm not.

Yeah, I'm back.

So I'm pretty happy about that.

Oh, I think it's actually.

What I'll do.

Oh, do you know what?

I would give fraxus to call a code.

No.

Yeah.

I do not have it.

What do you do?

I have open code.

You're allowed to use that?

Yeah, but it just doesn't have the.

Just doesn't have the love facts of it.

You can actually use Kodax.

I've got a course.

You can't actually use a lot with that side of me will work.

Yeah, that's how I do it.

I mean, I'll see you a lot.

Yeah, I'm gonna just.

I was paying so much.

This is what I was joking to Jeff about earlier.

I'm just taking.

What you said.

I'm putting it into code right now.

So it's a good first draft.

Sure.

Do some of our teams have Devin?

I don't know.

They seem to like it, but you never know, like, how much exposure they've had to, like, the different models.

I don't know.

They just like it because they're going from zero to, like, some.

Or whether they're going to, like, had this amazing workflow in called code and get.

I was just something like with black tree or no, where to get a house?

Yeah, with issues.

It's like parallel pull codes with gab issues.

Really?

It was so good.

That's why I like codeamps.

Kind of really, like, you don't have to keep the JDF on your phone.

It's actually, you can just go and.

Then just be like, yeah, with my wrapper.

It's

the only one that works remotely these days.

Claude code fucked it up after 167 something that you cannot remotely log in.

So I used to code from the couch on my phone.

I did from home.

I just make an issue of my code.

Yeah, but you have to do this basically work around, right?

But I like to hit my computer exactly with the terminal and everything and I like to code code and do bunch and basically so I can pick it up.

Without any handover between platforms and cold code.

I feel like they must have done it on purpose because people just connecting randomly.

But you can't log in.

Is this thing called Vibe Tunnel?

Can vibe that?

But like, I'm using Tealskay.

That sounds like...

It also has the worst sound when.

You do the vibe tunnel.

Oh my god.

I was expecting a different drop there.

Sounds like a video game to me.

Yeah, it's like Mario, right?

Yeah.

They changed it.

This used to be on like fully loud that you couldn't turn it off.

So that's kind of cool.

So what is your company trying to build exactly?

Oh, I wish I knew.

It changes every month.

Right now,

because we have a company, Series A, a bunch of products for fashion brands.

It's like a backend Core efficient source.

Yeah.

Backend in the sense, like the software layer, YouTube, inventory, all those things.

That's basically what the core is.

And like global commerce.

And I am in the venture arm where they are basically trying to place a bunch of random bets to figure out what would be the next analog for much higher proxies and terminal.

In the four months I've been there, there have been like three different strategies already.

The latest one, and I was even in contact with voice agents, virtual try on, basically memory to give a user an experience.

Basically you come in your Instagram, you click on a brand when they have a commercial show, whatever, you click on it, it opens this full screen page and you basically chat to it.

This is what I want to find, whatever.

If you like it, you can log in and at that point you can throw in your image.

It basically does the video on you.

With all the pieces, like basic composed outfit.

And if you keep logging in, that basically it remembers all the...

Because the biggest pain when you speak to like girls is basically they have to keep filtering for the same stuff over and over.

So you can remember all the basic preferences today.

I think I listened to my nails.

Yeah, that's the idea.

Yeah, but yeah, the voice part of it, like I've been telling to Damien, like it's really cool.

It has the vowel factor.

But the reality is it's like 7% of the service we need to nail the interaction with our voice.

Because

I have never seen my wife shop with us.

I'm not very good at it because you're self-conscious of what you say, prices you mentioned, sizes you mentioned, like anything you say, you would have to be completely alone in a room and shopping them, which then like how many people shop like that and what they do.

And she's even like watching YouTube.

And she's shopping during watching YouTube influencer.

So she can't use the voice assistant.

So practically that voice is like very narrow.

That's why I was pushing on text so much in the beginning because voice can be amazing, but we really need to have, we need to nail not the voice, we need to nail the fusion of voice, text and the UI together in a way that doesn't feel.

Like a finance app.

Like a toilet paper chatbot.

But it feels like

however I want to interact with him right now, whether it's a click or say what I want, is kind of convenient.

That's what we need to know.

And I see.

That would be something.

You could just do it before you press record.

I learned from Jack that you never fresh record.

You always record.

AVR.

Always people calling.

Is this the AV C, which is.

Like, this is a.

Yeah, it's from on the water.

Yeah, yeah, yeah.

I'm not sure about AV.

I made up.

I think like Tim Ferriss or someone was saying, like, I think the most important part of any interview is like the.

Five minutes before you saw the call, like, something was just,

just kind of have to travel.

Me and Will just got so pissed.

Off that we chatted about all the fun stuff right at the beginning and we had the best record.

Yeah.

It was like the stuff that we were going to chat about, we explored that.

Yeah.

And then we're like, oh, we could.

Spend a call at some point.

Yeah.

Well, like, at the end, sometimes, like, you stop recording and you're like, oh, yeah, by the way, like, I can't believe you said that.

I'm like, yeah, let me tell you.

Did you grow up in London?

Yeah.

It's like, that be time.

Oh, been around here on Netflix for a few years.

I'm not going to go off for that.

Thanks.

Nice.

The headers one.

Yeah, I tried to look for the repo you shared with the React router to see how my implementation is different.

I have no idea how to find it.

Oh, I can find it.

Okay.

There

on my drink.

Yeah.

Can you have open containers here in London?

Yeah.

Can you earn an alcohol?

Can you always surprise them?

You can get arrested, right?

Yeah, yeah, I'm doing it fine.

And sort of when you can too.

But you can have beer.

Anything goes around.

Yeah, you used to be able to like drink on the tube and stuff as well, but they changed that.

I think a lot of people still do.

Does anyone ever get like got life told off for drinking on the tube?

I haven't drunk on the tube so can't.

I see, I would like spill half of it anyway.

Having a drink on.

It's got that bad where you have to have a drink.

Afterward, let me see this.

That is quite a lot of stuff, actually.

We got a load.

Yeah, this one is thank you so much.

Animal guy.

That's awesome.

So when is the python version coming?

It's awesome.

Yeah.

So, well, we already have.

We have it.

We need to fix it.

The thing we did is we fixed that whole conversation storage thing in the JavaScript ones.

So I think we just need to apply that to the puzzle.

So Python.

So would that be the second most useful thing to do?

It's going to be very relevant to my work.

It's just I was hacking in these patterns.

And for the pattern hacking, it's easier for me to think in Python DKI than in AI SDK.

Okay.

What's the number two thing that would be helpful besides that?

Request?

For us to do to like your life is easier model.

And then it has a WebSocket connection to a DO that records all of the messages.

Everything I can think of, think about is coming down to latency.

Latency and the sets, not that you can improve it, but

I'm making suboptimal decisions all the time because I'm afraid it's going to go too slow.

And so that collects all the data.

It is basically the key concern that I think about because in terms of the interaction, with a user, I think that's going to be the most obvious thing when it's going to be basically shit, even worse than like showing the wrong thing.

So there is a whole range from hard devs are building something that I think is going to be really bad latency because of the patterns they've used.

So it starts from being able to see it well

on the dashboard.

Yeah.

So see where the latency is coming from and like where it's coming from.

And also how much worse it is than when I have my toy app on the side.

Right?

So basically being able to see the stat and like, I mean, like actual stats, not the average, because average is really hard.

It hides the whole LLM calculation and how long search takes and all those things.

I would really like to understand their implementation of the problem and my implementation of the problem in the handshake and the first connection time, whatever it is that can be done.

Like that would be helpful because it helps me have conversation with them.

Guys, this is very cute.

I know you like to do these things this way, but it just doesn't work.

It's like too short.

So that's first thing to have basically a data-driven conversation about this approach doesn't work.

That's the first one.

Then I don't really know in the middle

how exactly to manage making things better.

I guess that's where...

Yeah, yeah, agent stuff.

Yeah, it's like

I know you guys can't do fuck around or whatever, but I wish someone will literally spend a ton of time and figure out these are our top three models.

This is a little bit of PR that we do.

And the reason we do it is because we found that you actually can do the fast one with 500 tokens of reasoning.

It's still super fast at the time to first bite, but it's actually smart like the other one.

That's actually some combination.

So we're figuring out which models and.

Which...

Yeah, and it's not exactly which model, but basically, if someone tests, like, these models are okay, you can actually have some reasoning here.

Because if you think about how would I test it today?

How would I test it today?

I would have to hack it up on my backend and then running it through the tunnel of Cloudflare and then measuring responses and always wonder, well, is it in the Cloudflare?

That's giving me slightly different latency.

Or is it that obviously I have my LLM observability, but it's in a completely different place.

So it's these little experiments, which are Not too hard to implement, like if you change a model, but they are really hard to measure.

Yeah.

So that's the answer.

It's almost like if you just had something that you could do like a.

Prompt.

To copy for your specific use case.

Right.

And then just like get results for like each of the different models.

Yeah.

Or replay the same conversation somehow.

Yeah.

Replay the same conversation somehow a few times.

Yeah.

And just changing the backend.

That would be great.

Probably too hard.

But like even a set of simple set of some best practices around Drunk.

We see that this is Drunk.

B is sharding in a DIA and.

We want to make it large or whatever are basically killing your latency more.

Probably not.

Honestly, I haven't found anything like that because people can deal with a lot of latency.

There is a lot of experiential things that I think would stop people from starting when it's You should never do it.

Sorry.

Example being, we are testing all this when we're at home on 500 megabit Wi-Fi.

Yeah.

Have no idea how the slow experience will be even worse when I'm there on the corner and I'm between two hotspots.

And it's basically one of, you know, like in London, in so many places, we are just like one bar 5G.

Yeah.

Probably with us, like one bar on each side.

So, something, but like if you had a way...

Testing all these, you know how in Chrome it's just like there is an on 256.

Yeah, yeah.

Sorry, 256.

I'm on edge or whatever.

Yeah.

If there was a way to artificially create some of these slowdown where dev at a computer would be able to go like see how it looks like when you're on that shit cover, that will be huge to like drive that empathy.

But the problem is devs probably don't want that.

It's the people who care about the experience, about that.

Yeah.

So I'm not sure you will say anything.

But that will be huge.

You will.

At some point you must.

So yeah, definitely this is like, well.

Devs do it for free, right?

I'm like, I mean, you're a dev.

Like, or you've been maybe, I think.

Our devs are more like, we have, we have 150 things to build.

We have time to build 50.

Yeah.

And we need to make it.

We're like driving like crazy so we don't have time to fuck around.

Tell us what you need to build and we will build it for you.

Right?

It's not that they don't care, it's just, getting them to do these things.

I think they just don't have a bandwidth.

They are like, they're already running at 100 miles an hour.

Iron Wolf's agent is meant to like do all the things that should be fought, which is complete nonsense, but like basically they are completely killed on the scope.

So because everyone's like, oh, you need to be honest to do this really fast because of all the agents.

So that's where having a way to quickly show to someone rather than running 100 experiments, it can be actually a powerful way to do a conversation.

The other thing, because they are putting everything in this like VPC on cubes, I have no idea how that experience is going to be.

Obviously if I have the data that I can show that, if we had a simulator

and some good measurements in the console even or something simple that you can do.

Oh yeah, you actually have a lot of power.

Anyway, I was basically thinking how can we fast, right?

Architecture decision to quickly testing it to basically check whether it's because this is like so I'm like, any other experience?

Hi everyone.

How are you all?

I'm sorry.

I'm late.

I'm like any other experience.

This one is like so grounded in latency.

That's why I'm obsessing about it.

And the last end of it is, it's a big wish.

Jack from higher.

Nice to meet you.

Wonderful friends.

Oh boy, we're now just enjoying because they successfully integrated.

Yeah, maybe we can try and mock something up.

Like just knock like quick.

Because I guess it's just like about recording.

Yeah.

And ultimately, like, you get a bunch of requests, but then you, you, yeah.

But you want to know really, like.

Yeah, I, I kind of don't know exactly what I want to hear.

Yeah.

Because there are several different tasks.

One task is how can I, from the beginning, start from a good set of principles that this works 80% of the time?

Yeah.

So I don't waste time even testing things I shouldn't be.

I can come prepared to our desks and be like, Hey guys, we cannot do this.

We can't do this, you know, like that stuff.

Even some stats around, if you like hack it up and measure it in your network inspector.

Well, have a look at those things.

Okay.

And it takes more than 1.5 seconds when you try it, your customer will notice.

You know, we don't even know.

It doesn't make sense.

Yeah, I feel like you're trying to translate the numbers that I can show them in a dashboard.

Yeah, we do real experiences, right?

Like this is like 0.3 seconds.

What does that mean?

What is, is 1.5 seconds a long time or is it not?

Like it's like, it goes back to that empathy of like being able to like see the numbers and link them to experience or having a way to show them the experience.

Um, but at the very end of it is, and that will be a real killer for me is there is going to be this tension between small fast models and then everyone who have their actually chatbot that they are using everywhere else, but it's too sloth thing, right?

And like, how do you bridge that?

And if you guys can come up with some patterns, which could be just keep the conversation going in the background and make it really easy to set up, right?

Just simple conversational non-committal, be very safe.

There's this like whole on every tool call, you know, like some of those patterns, they don't even need to be like new approach to create.

But they could go really far to know how to basically push the system because I am half expecting I will have to do that anyway because we are already on Gemini 2.5 and I am not seeing great performance on the search quality and I'm worried we will have to drive the whole checkout and the checkout happens at the end of the session where potentially the context is going to be pretty long if someone goes back and forth.

So I'm worried to basically involve I will not have enough intelligence to success to sell through.

Card manipulation and check out by the end of it.

And I keep thinking about, well, what can I do here?

Can I more aggressively keep shaking the context?

Yeah.

Or can I potentially have a smarter model like Sonnet 4.5, but tap into it only sometimes?

Maybe I can be driving the simple model virtually, but some tools I actually want to make sure are only done by the right people.

Yeah.

All those things when you know what you want, you can prefer.

It's not coming down.

But it's knowing which ones to actually just pick.

So it's like kind of knowing where the potential to improve it is.

Or having a few patterns that you can start with that are really simple to bootstrap.

They give you 80% of the output.

But even some of these patterns, I have done deep research every other day.

And there is just not much stuff how to affect the rebel voice agents and bridge this like, intelligent versus later to get.

Yeah, we did.

We're here.

Like, on the site, I'm gonna try and write up some of the stuff we had, but, like, I don't think there's any, this answers any of yours.

We had, like, an interesting one yesterday that was like, someone was having issues with, like, bad quality audio, and then he was just like, when the audio quality is bad, the agent just says, can you speak up?

And then, because I can't hear you.

And it just, like, solved, like, 90 of the problems because it was like, they just.

I'm with you.

I'm doing.

I'm eating your job.

Like, they just usually people know.

To like improve the quality usually it's like using their headphones.

Yeah.

Switch to like their phone or something.

Those are great tips.

And then so like we want to write up those.

I don't know if we have like other one best prep.

I mean like that was one thing I found.

I don't think it's fixed yet.

But I kept getting really good voice performance from the 11 labs playground.

And first message in layer code was always really shit.

Very like mechanical.

Yeah.

Almost like sounding from a trash.

Yeah.

Turns out.

And then basically three turns out.

I think

they are pushing some basically warm up text for their product line.

It's still going to get below five.

Which then means it basically continues.

And it doesn't have that weird wrap up in the beginning.

Yeah.

When the parachute is basically already.

Yeah.

And then completely took it out.

I was basically generating all these audio files and that's how I found that there's this previous text.

Yeah.

If you send that.

You remove most of these artifacts.

So you send a Walmart text that you don't play.

Yeah, exactly.

Eleven Labs API already has that.

And they even have that you can be linking your past requests.

So you don't even have to be sending it.

They can literally do requests that link to each other.

You guys just don't have that exposed in your backend.

I think I mentioned it.

It's in the chat.

But that one is an easy win.

I can do that.

Like every, almost every time I start a app, it starts with like very mechanic sounding

welcome to our shop here.

Oh, it's like, by the way, if you are viewing me or my friends extend you need to get receipts.

He says something like, thanks, thanks.

Oh, yeah.

Oh, it was cold.

Don't worry.

I can think I can share it.

Pretty sure that's against TM events.

Just turn that out again.

They try to get me like video.

I don't know why.

Yeah.

Oh, my God.

Do you have a group?

That's amazing.

Yeah.

So we need.

It's capital big capital events.

There's like, yeah, we've got until 9, I think, but, yeah, Phil, you don't, obviously don't.

I literally live from 10.

I do 10 minutes.

Oh, nice.

It was very convenient.

Yeah.

Okay, nice.

It's a nice area.

Have you been here before?

Two minutes.

Yeah.

Because I, I don't want to say.

Only the actual, only the nightclub, yeah.

So actually not.

So we have a golden doodle.

But we don't, we don't, golden doodle.

It's one of those like, really mix.

Oh, yeah, yeah, yeah, yeah.

We've been here for woofmas, literally in this very space.

Woofmas.

That was literally sent out that they would put the dogs.

I left, but like, my wife was very much into it.

So it was like a Christmas for people who don't have kids and have only dogs.

That's how we ended.

Oh my God, that was so funny.

I don't want to talk about my ministry experience.

Okay, that's my photo is going to be something crazy.

Like, it's just like ministry.

Sadly not.

Yeah, yeah.

I got it.

People actually, anything, anything true to is usually a good dog.

I like, my parents have a cockapoo.

She's quite chill.

Like, oh, that's right.

She's like very, she's just very chill.

She's very happy to see you, but she's not like annoying because we've got, they look after some other dogs sometimes and like, they're really like hyper, but she's, she's pretty chill.

Tell me how, she's kind of an.

Old lady, so it's just, that's probably why it's like, it's like a bit.

Then just generated

exactly, like first again, you know, we do this as tooling within the screen.

The real time stuff is pretty exciting.

I don't know.

It's like it feels, it allows a.

Lot of cool stuff.

But the thing is, you can do all that stuff without the real time changes.

I feel like the tone of voice though, right?

That is, yeah, harder to do.

I've been looking a lot into voice cloning and then the remixing of it.

And it doesn't work as well as we thought.

Basically, unless you're American, it's really hard to clone so fast with three minutes or less.

Oh yeah.

So we're figuring out where is the threshold if our brands have like northern,

but yeah, American, like one minute.

I claimed mine and it was like perfect, but I put like, I put over 100 minutes, I think, of like, oh, I think it was like with like a real microphone and stuff.

That's the other thing.

Most people can't record anything, a little bit of background noise and for removal, it should distort it as well.

Yeah, yeah.

But they literally say in one lives, they say that three minutes, you know.

It wasn't mine sound, the three minutes one sounded nothing like me at all.

100 minutes was like, I showed it to my mom and she was like, this sounds like you.

I know.

It sounded exactly like me.

Must be easier when you do podcasts.

I don't think I ever recorded it.

It's actually a lot because it's like, if you think like, when you're at, it's actually like, even if the episode is like an hour, it was only like eight minutes or less probably of me speaking.

So it has to get like a lot because, I mean, I, I don't speak that much in it.

So I was many asked questions.

So especially, but it was like a lot.

So it was quite easy to.

It.

Was actually quite easy to, I feel like, they should actually just have a tool because like, I think I can't remember what exactly, but it's one of the features on FMPeg, you can actually like just strip out all the silence really easily.

So if you just get the track that's just your voice, you can just like strip out all the silence on

the VLP.

So I feel like it would be easier if you could just dump tons and tons of audio.

Yeah, but then you get...

And they would cut them.

Then you get into this weird state that we were basically, it was picking up too well the pacing and everything.

So the American guy, it was perfect how his voice sounded, but he read really boring FT article.

And literally the voice of the shop was this like really bored American.

Like it literally picks up the pacing and everything.

So I think when you chop it up and lose the gat, so they are natural, it's going to pick it up as well.

So actually that I would be worried about.

Oh, so like cut, so I should have just left in the gaps.

I would say you can cut out the gaps, but you need to be clear about not collapsing it too much.

It's hard because unless you have like a monologue, like if it's in a conversation, I cut out the gaps, I also have to cut out the other person, and then it's like, so then you don't get the gap anyway, so.

It'S like, well, it's fine as long as you As long as it delivers your full range and there is nothing blending, so you still need to leave some islands basically.

That's what I mean, like completely cutting it off.

Oh yeah, yeah, yeah.

That's just, you know, basically.

What we're now doing is basically we have like a vocabulary that we found somewhere like phonetic stuff, like with dunk twisters and things to have a full coverage of the language and most important things.

Then we need to develop a brand specific one for their key products because those can be hardly mispronounced.

Yeah.

And then we're figuring out what will be the other fillers and problem with news, which is like perfect infinite content, is that people will stop doing the inflections when they speak and then that then translates into their voice.

Like most people will struggle to record even two minutes in.

We still haven't figured out the cloning and the remixing doesn't work very well.

Who do they want to do the voice?

You should.

Who should the voice be?

It's kind of assumptions.

RPMs don't talk to customers enough, but basically they think it's going to be easier if we basically disagree with brands.

You need to give us these recordings as per the instructions and there will be a brand voice.

Because that way they defined the pace, the tone, the emphasis.

There's so much that I don't even know how to describe.

Do you think they do want to control?

I don't think they want it.

But it's easier to define it and iterate on their own than us building a playground where they have all these like police and different voices to pick from.

So ultimately there is always a founder or someone who has a cool voice that they're going to identify as.

Ultimately it's going to be which accent, which gender, and basically a deep low And then how do you want to convey the energy?

I think those are the main directions.

But it's really hard to get right just with like remixing or whatever.

It feels easier just by his voice.

So it's a guess.

We have so many of these that.

It'S still kind of...

We shot from...

Silik is quite Natsa.

But like in We Need Docs.

We should say the rendering thing is to download the one stop

and then improve the code.

Do you think you'll be using real-time voice in six months?

Did you prefer to do that instead?

Because me personally, when you do this.

You get like, or for this product.

Yeah, your work how it's going to be.

So to be clear, I use speech attacks quite heavily when I'm at homes for all coding.

But like these voice agents,

I do think it's gonna work.

I don't think it's gonna be, I don't think it's gonna be the best thing ever because it's effectively competing with JGPGUI, I think.

And it doesn't have the same.

Yeah, this is all recording, right?

It's okay.

Yeah, it's a good point.

It doesn't have the biggest benefit.

We will try to do a technically align to fishing industry and different things.

But what I think we will always struggle with is that we don't have the spend on multiple brands.

That's where we started.

But we realized you can't cold start it.

So we will start from the brand specifically.

And that's the biggest problem.

I don't think it's going to be whether it's a voice, whether the voice is fluent.

Enough, it's really going to be, when I'm in the store, I want to buy this outfit, but they do jackets and I need to go another store and they do pants.

And then the ideal app like ChatGPT is basically, I want an outfit for this event.

I think that's the app.

So I think it's going to work.

I think all brands will want their own experience, compete with ChatGPT, which basically makes it nameless and drives down the price swap.

But I don't think the huge volume will all move to brand specific.

Haha, Angel, it will be my best.

Yeah.

Because, you know, I think it would just be,

I don't think it's gonna be to GPT.

I think it's gonna be vertically.

Oh, sorry.

So yeah, so like, for example, I want to buy a new, you know, like men's fashion app, could you chat to?

Yeah.

And then it's like, got a cool brand to that.

Yeah.

And those apps already exist and while they're growing, they're nowhere near to GPT because when you're only problem of getting in is that like GPT does everything.

Only have one subscription.

So even if they do this right job, it's going to be impossible to get out of you.

Yeah.

Because you just have them.

It's the best thing for us.

You're okay waiting.

So cautious.

Like they can say whatever, like they even build.

I don't know if you know Pulse.

Pulse.

Chigi Pulse.

Yeah.

It's like a feed or that you get every day.

I actually said it.

I'm looking for a jacket.

These are the things I'm looking for.

Or even you will discuss buying this article next day in your daily update.

It will find a bunch of new products that you missed and literally expose them in this like daily update, this pulse.

It has all these different channels into your life and it's like constant contact.

And on top of it, it's like VC subsidized models and patents which are already established.

So I think it's going to be really hard for anyone to really kill that.

Yeah.

Um, otherwise they can do their own work.

And like anything you can build, they can build too.

It's so, I don't think it's going to be the biggest one.

So there's, yeah, so many vertical voices.

That are going to totally, that's no, I think.

So when we talked, she's telling me about critical.

Yeah.

I still do think there is a lot of stuff that can be done with voice.

All of that.

So painful.

I would love to control more of my life with voice.

Right now, you like to run your voice.

Control more of my life with your voice.

The Alexa dream kind of thing.

Yes.

Like if I...

And the heck out of...

If I'm gonna be hacking, I'm basically gonna be playing something.

That I can cloth code multiple sessions when I'm walking or when I'm like around because I don't like to be chained to my phone.

Yeah.

It's not much better.

Yeah.

I always can be tuned to my desk.

Yeah.

You like to be outside walking.

Yeah.

Like when I'm looking to a photo.

Yeah.

I agree.

I feel like I got more good stuff done.

But for that, I think you need to solve also the digital twin problem.

Yeah.

Because you need to do this like a compression of what's on the screen.

There's like so much you visually can comprehend, you do code code whatever.

And if you know what you care about in the days for in the next steps or whatever, you let me know.

You don't want to compress it.

So you need something that knows what you're looking for in it.

And that should be passed as voice.

And I give it some guidance as a voice, but that voice was translated.

Yeah.

So that's why I think the unlock there is something that's like really to do.

It's almost like oh when they got blocked.

Yeah.

You've been doing this time.

Yeah.

Can you pass me the mic?

Yes.

Yeah.

Sorry, this is so far.

Okay.

Free audio post production

by alphonic.com.