00:00:19.840
First, if you have any question, feel free to ask. No need to wait.
00:00:26.880
I gave the first version of this talk in 2013
00:00:32.480
and um it was quite a change in my rates career because after that I got a lot of
00:00:40.800
contracts for training consulting um for optimization of rates and later Phoenix
00:00:49.280
applications and my main aspect is always to increase
00:00:55.199
performance performance um many times through cache.
00:01:00.800
Today I use two different sets of slides. The old ones from 2013 are with
00:01:07.840
a black background and the new ones are with a white background.
00:01:13.280
The old slides use code examples. The new ones use agent prompts. I think
00:01:20.400
it's easier for everybody to just copy and paste them and to redo that work in
00:01:26.799
your own application. In German, we have one idiom which says
00:01:32.880
client fematist and I thought this is a good way of teaching a little bit of German.
00:01:39.920
Um so clinist means that even small farm
00:01:45.920
animals which is clinfi um
00:01:51.759
make a lot of and I know and and I didn't think that through. I don't know um if I'm allowed to say what mist is in
00:01:59.520
English if that's PC or not. I don't know. Look it up. It's a real idiom. I
00:02:05.439
use it every week. So stuff I didn't think about in 2013
00:02:16.080
mainly I didn't think about CPU cache and that's from today's perspective in
00:02:22.239
my opinion the biggest I don't want to say hidden secret but um too few people
00:02:30.160
think about this I asked CHPT to draw this uh pyramid and um you see chbd has
00:02:39.120
some problems with like here and there and whatever but still I wouldn't be
00:02:44.480
able to draw this myself so I take it
00:02:50.560
CPU has internal cache normally three levels L1 L2 L3 a couple of them have a
00:02:58.480
fourth level L4 so before a CPU
00:03:03.519
takes anything from RAM it tries to get data from the internal cache. The L1
00:03:09.840
cache is directly built in the C CPU. The L2 cache mostly times two. And the
00:03:17.280
bigger the number, the bigger the cache. So L1 cache is very small and L3 cache
00:03:23.599
is kind of big. And you see how farther away the more CPU cycles it takes to
00:03:30.879
read that cache. So here at one cache, that's just up to five cycles. And if
00:03:36.799
you go down all the way to RAM, that's that's 100 to 500 cycles, CPU cycles.
00:03:44.080
That is what is most important for me to understand. Okay, how long does it take to get that data from that place
00:03:52.400
and you see hard drive or SSD that's like that takes forever in CPU cycle terms.
00:03:58.640
Why is this important? I
00:04:03.840
have these two minimal oneliner. Um one is oneplus 1 and one is getting the um
00:04:12.159
initials of the string stefan vintaya. This is the assembler code for this. So
00:04:19.759
you see it takes five CPU cycles to add 1 + one or any of this addition.
00:04:28.240
So it would be stupid to have this have the result of
00:04:34.720
this addition somewhere in the cache because we can do this faster in the CPU. No need to waste real estate in the
00:04:42.639
cache. This is a assembler code for the SW
00:04:48.639
initial example that takes 12 CPU cycles. So that's not fast enough for
00:04:55.840
L1, but we are already in L2 land. So depending on your application and your
00:05:03.040
situation, I wouldn't even bother about caching that either because it's so easy
00:05:09.120
and so cheap to calculate in the CPU. Why should we cach it?
00:05:15.360
So for me, one of the most important things in the daily consulting business,
00:05:21.680
there's no one size fits all in the caching aspect because
00:05:28.639
it all depends on how fast is your CPU, how fast is the cache, how fast is the
00:05:34.240
RAM, how fast is the SSD. Um there's a whole variety. So the very same code can
00:05:41.360
do well on one machine and not so well on the other. So you really want to
00:05:47.120
either understand your hardware or just try it. Write yourself um a rag test
00:05:55.680
with some benchmarking and um test out what your idea is and then see what
00:06:02.400
happens. Again, one size doesn't fit all in this aspect.
00:06:09.039
So the back to the 2013 slides, my setup in 2013 was a Raspberry Pi first
00:06:17.120
generation. I use that because I wanted to show how
00:06:23.199
little CPU power you need or how little hardware power you need to run a nice
00:06:28.639
web shop. This was the old model. We have um we
00:06:35.680
have a product, we have ratings, we have user, we have a shopping cart situation,
00:06:41.600
uh we have categories, and we had discount groups.
00:06:47.280
This was the old web shop. Um I think that was Bootstrap
00:06:53.520
and um so pretty much normal web shop.
00:06:58.560
And this was the slide I used to show how much improvement we got from the
00:07:04.639
speed. So we started with this vanilla that was our baseline and here was uh
00:07:10.080
the latest improvement um from the caching and I set up the the same thing
00:07:15.440
for today's example. So, in 2025, we use a slightly better
00:07:24.240
Raspberry Pi, the Model 3, because I'm cheap. Um, actually, I have this one
00:07:31.120
here. Does anybody have a kid who wants to play with this?
00:07:43.759
It has a German adapter, so you have to get a a US power plug. But say hi. Have
00:07:51.360
fun. Welcome. Anybody could have had it.
00:08:00.800
So, this is a a new er or a new models. It's pretty much the same. The only
00:08:05.919
difference I did is um I don't use discount groups anymore. Um for the same
00:08:13.440
idea of changing data, I use expiration date now. So all the products have an expiration date and I have a little bit
00:08:20.240
more detail about the product, but that's it. So let's start a new Rails app and fire
00:08:27.919
up cloud code and tell it what we want. We want a e-commerce platform and um
00:08:35.680
here are the models we want and I tell you the last couple of months um bootstrapping a new application became
00:08:42.320
much much faster.
00:08:57.440
So once we have the new webop covered. I asked Claude to install
00:09:06.399
Fakerim and to set up seed data for 50,000 products and some 200 categories
00:09:14.080
uh ratings for the products and uh expiration dates um for the products.
00:09:21.120
So this is a result result. Uh we have a typical web shop. Um we have a shopping
00:09:27.600
cart here in the top. Um we have uh this is a um show view. Uh we can delete and
00:09:36.240
we can add to the uh card. Pretty normal basic stuff.
00:09:41.680
Well that is our base system. To measure any kind of improvements, we
00:09:48.080
need some sort of baseline and we have to measure it. So again, I fire up cloud
00:09:53.440
and ask it to create a task where it uses the setup I'm thinking is optimal
00:10:01.360
for this case and that's different for any c for any customer for any web page.
00:10:06.959
So I'm assuming that most users of this web page are just window shopping and
00:10:12.000
I'm guessing that's about 80%. So they are just walking through the web page,
00:10:17.440
don't buy anything and about 20% are using the shopping cart. So that's my
00:10:22.959
ratio and all the numbers will change if you change that ratio that ratio. That's
00:10:28.640
very important to understand. So if you do that, you want to have a look in your
00:10:33.920
Google Analytics or any other sort of analytic system and uh get a feeling um
00:10:39.760
of real user who use a shopping cart and just window shoppers.
00:10:45.519
So as a result of that benchmark I get these numbers and I used to dive into
00:10:52.880
views and DB and optimize it and I still optimize it of course but as I said it
00:10:59.519
became a a little bit more complex today um because sometimes it is that I save
00:11:06.399
let's say a thousand milliseconds here but for whatever strange reason this one
00:11:11.519
gains 2,000 milliseconds that happens all the time. So today I'm
00:11:18.320
just taking the total time and that is my benchmark. That is my baseline. So we
00:11:24.320
start with about 30,000 milliseconds for this specific benchmark.
00:11:30.560
The first thing I'd like to introduce to the code is select.
00:11:36.959
The beauty of active record is you don't have to think a lot about this stuff. You just load everything and everything
00:11:43.440
is there by magic. The problem with that is um for example
00:11:49.360
for the category the category has a description. We don't need that description. But if we that's a text. So
00:11:55.920
if we load that description we bloat our whole cache we we we we have a lot of
00:12:01.040
information in the system which we never use. So the first thing here is that I
00:12:07.760
um tell Claude to please read the documentation for select and then to use
00:12:14.000
it to optimize our models and or the the the queries. Um and a good example here
00:12:22.320
is uh created at I don't use created at at all in this application.
00:12:29.200
So this is a result um quite a bit of optimization here. Let's dive into one.
00:12:35.680
Um and this is this is used to be include category which just included everything from the
00:12:42.480
category in including the description which we don't need. And then it was replaced to
00:12:49.279
select ID and name because that's what we really need and that saves a lot of
00:12:59.839
types. Same idea. For example, the rating by default is an integer.
00:13:07.279
The rating is 0 to five stars. I don't need an integer for that. A small in or
00:13:12.399
a tiny in is good enough for that. And that saves space. So let's ask Claude to go through all
00:13:18.959
the models and optimize the data types and use for example small if it makes
00:13:26.320
sense. So this is the result and you can see
00:13:32.000
the ratings now is a tiny int and we save 75% of storage. Um the state same
00:13:40.399
thing here 90% of storage saved. uh line items quantity
00:13:45.839
small in 50%. This doesn't sound like a lot because we
00:13:52.560
are talking about maybe kilobytes or maybe megabytes I don't know it's not like gigabytes but this is not a lot
00:13:59.920
either. So if we if we find a way to squeeze in all this information in the
00:14:05.600
best case scenario in our L1 cache or L2 that means a big massive improvement of
00:14:12.880
speed. So you really want to or no I don't who
00:14:18.480
am I to tell you what you want but um if you really want to go the down the road
00:14:24.399
of maximum performance you want to have a look at the select thing you want to have a look at your um data types um
00:14:33.040
because that squeezes out quite a bit um and that's in this case minus 3%. So the
00:14:40.560
the the application just became 3% faster. Again, client fee must of missed
00:14:48.079
indexes. That's a classic. Um I really don't have to talk that much about that. Um I asked Lord to analyze everything
00:14:56.160
and um to improve the or to create indexes. So that's another 6%.
00:15:03.120
And that didn't even take I don't know 10 minutes. So the application is already more than 10% faster.
00:15:11.760
One pro tip about index which I see all the time. If you have two indexes, one
00:15:18.639
is for last name and first name and one is just for last name and you need both,
00:15:24.880
get rid of the first one because you can use the second one for this scenario
00:15:31.279
because it's from left to right. So you can do use the second index just to
00:15:37.199
search for last name and that saves again space
00:15:43.120
fragment caching and that includes Russian doll and that's a classic. So this is our application and let's see
00:15:51.040
what we can do with fragment caching. So these cards are fragments. We use the
00:15:56.800
updated at attribute of each product to as a as a as a key for that fragment.
00:16:04.160
And Russian door is easy. That's just once around everything. And you can
00:16:10.399
stack that. So you can have I don't know multiple Russian doors on one page. Um
00:16:17.360
it kind of gets confusing after a time. It is a bit errorprone. So you really want to know what you're doing and it is
00:16:24.560
hard to debug um and you need a clean data structure structure but then it's
00:16:30.800
magic. So again I tell claw to read the
00:16:36.000
official guide for fragment and Russian door caching and then imple implement
00:16:41.839
it. And very important I tell it to include the updated ad because we didn't
00:16:48.320
in the step before but we need it now. And what is very important every time we
00:16:55.040
have something like a belongs to association we want to touch the parent
00:17:00.240
to give you an example um if I change the rating or if I add or delete a rating
00:17:06.640
I want to touch the product touch means update updated at to the current time
00:17:13.919
stamp but that is very important
00:17:19.120
that's 60% improvement speed. Well, that's a big one.
00:17:25.520
HTTP caching. HTTP caching means we still use the code
00:17:33.679
in the controller, but depending on the situation, we don't
00:17:38.720
render the HTML. And normally that's the part where we really spend time. So, normally the
00:17:45.520
controller is the fastest part. So we save time and money on the server
00:17:51.280
and we have less payload to deliver to the client. So that will be faster too.
00:17:58.320
Normally we use an E tech for that. E tech is like a fingerprint. So if we do
00:18:03.440
this um if I pull with curl two times the same page I get two different E
00:18:10.960
tags. This means the page has changed. It's not the same page.
00:18:17.600
So I'm asking cloud now to read the documentation and to implement a strong
00:18:22.720
E tag. What is a strong E tag compared to a weak E tag? A weak e tech is
00:18:28.559
something like if you have a proxy on your way from the server to the client and the proxy thinks okay yeah you
00:18:34.240
compress this but for example with I don't know gzip but I like I prefer prote
00:18:40.960
the the proxy can change the compression
00:18:46.480
but that would break the hard e tag because it's not the same it's not the
00:18:52.559
same file the weak e tag would be the same so weak E tech means more or less
00:18:58.000
the same. Hard E tech means exactly the same.
00:19:03.120
And this is a code which gets added to the controller just one line um for each
00:19:10.000
occasion. And um we use the top products here and we use the product here. And uh
00:19:17.360
this is a very good example of cloud failing because we don't need the product ratings because we asked code
00:19:25.200
cloud to add a touch. Um then we don't need the ratings here.
00:19:31.440
Yeah, go ahead.
00:19:37.919
Uh come again.
00:19:43.039
Okay. The question is what does a E tech catch? The e E tech doesn't catch anything.
00:19:48.480
Uh the E techch is just a fingerprint of saying okay um before the this is a
00:19:53.600
controller I we don't see the beginning of that uh controller part but let's say this is like an index view of products.
00:20:00.320
So this E tech just says all the products um it it it has a list of an
00:20:06.559
error of attributes of the product list and then it looks up um what E tech
00:20:12.080
makes sense for that and if nothing has changed it's the same fingerprint in
00:20:17.360
other words if one product changed the fingerprint changes too. It's not a cache it's just a it's just a
00:20:23.600
fingerprint and I show you why or show what what we
00:20:29.039
do that. So if we get the same page twice, we have the same fingerprint here
00:20:34.159
and that means the same page. And what the browser now does is the browser says, okay, I already have a copy of
00:20:41.440
this and that is this fingerprint here. That's e E tech. So has anything
00:20:47.440
changed? And that's often like um you you you browse through a shop and you you go back and forth. So you already
00:20:54.559
have that page and if it hasn't changed you don't need to render the HTML because that's that's a that's a waste
00:21:01.280
of time and in that case um the rate system or the server says uh nothing has
00:21:08.400
modifi has been modified and that's an HTTP code 304.
00:21:14.400
Did I answer that question? Okay, any questions or another question here?
00:21:24.400
Okay, the question is, is there a specific uh reason why I'm using a strong E tech? The answer is yes, I'm
00:21:30.400
German.
00:21:36.559
So, um last modified same idea, but now we are using uh a time stamp. So, the
00:21:44.320
code is here. uh the last modified and we use a time stamp. So the request will
00:21:50.159
look up like the browser will say I have a copy and this is from this time in the
00:21:55.520
past is it still good and then again we get a three or four if it hasn't modifi
00:22:01.120
hasn't been modified. So both work in parallel um I'd suggest to use both
00:22:07.039
there's no harm. So we save another 2%.
00:22:14.240
That's already a big improvement. Um, and that even for a bigger application
00:22:21.039
doesn't even take a day today. It used to take a lot longer, but with um with
00:22:26.559
cloud or other um agents, it's very easy to do as long as you have a good testing
00:22:33.280
environment. Uh testing with caching is absolutely is absolutely key. uh because
00:22:40.320
otherwise you always get in a situation where you have a situation where at development something works but your
00:22:46.720
customers are calling support that it doesn't and it's very hard to debug.
00:22:54.559
So the last slides are for what I call autoban
00:23:00.640
a lot of German today. The fastest page that is delivered from
00:23:07.600
our online shop is delivered by engine X or Apache only where rates doesn't do
00:23:13.840
anything. And this already happens this happens always with these pages like 404 HTML.
00:23:21.919
These are static pages which are in the public directory and don't even trigger
00:23:28.320
anything in rates. So
00:23:33.440
static copies of product show and remember we have 50,000 products.
00:23:40.640
Why not? So we tell Claude to um create a method
00:23:48.799
to store a static copy of that view in the file system in the directory
00:23:54.480
structure and while we at it um we
00:23:59.520
create a protection version version. So that's a uh state-of-the-art compression
00:24:04.640
the browser uses uh that used to be gzip and today is protein.
00:24:10.559
And we just add these hooks in the model which are triggered every time. And um
00:24:18.880
because I or Claude made a mistake um the delete um has to be done here in
00:24:25.120
addition. I guess it was Claude. Um so this is the
00:24:32.320
result. We have a list of 50,000 files and disk space is cheap. I don't care
00:24:39.600
about this space. Um and because while we are at it, we can use the um the
00:24:45.440
protein compression too because that saves time for engine X. Otherwise, engine X would grab that file, compress
00:24:53.279
it, and then deliver the compressed file. And now engine X says, "Oh, wonderful. There's already a prote
00:24:59.200
version of that. Let's take that." So that's a big performance boost on heavy traffic signs.
00:25:05.840
So how does Engine X know which one to use? because now we have two different words.
00:25:12.240
That's what cookies are for. So we ask N to set up the system where we set a
00:25:18.799
cookie when somebody is locked in or when somebody has an active card, shopping cart. Uh if somebody logs out
00:25:26.320
or has zero items in the shopping cart, we delete that cookie. And this is the
00:25:32.400
code for that. And this is the code for the engine X. That's a config for the engine X. Um, and that's pretty much it.
00:25:41.760
So, um, that is a big performance boost. Um, what we do have to do now is we have
00:25:47.760
to create a crown drop which runs every 50,000 products once a day at at zero
00:25:53.600
zero because we have this that expiration date. So, at the beginning of
00:25:59.120
the new day, we want to delete all the files and recreate them. That is very
00:26:05.039
important because otherwise we get yesterday news.
00:26:11.039
So now with this we have a major additional performance boost. So we
00:26:17.760
started at 30,000 milliseconds and now we have about 3 and a half thousand. But
00:26:24.960
let's say we are down to 10%. And
00:26:30.480
that means on the same hardware we can have 10 times the amount of traffic and
00:26:37.840
everything is going to be faster. Why is speed so important?
00:26:46.480
That's because we are humans. Every time you press on the button in an
00:26:53.600
elevator or at a traffic light and it takes forever, you're in this table
00:27:00.799
because if something takes longer than a second, the human does concept context
00:27:07.919
switching. So the gold standard is we want to have web pages which load initially under 1
00:27:15.440
second and everything after that should be even faster.
00:27:20.960
Most bad pages don't do that but that's a gold standard and if you really the
00:27:26.559
next couple of days really whenever you have a slow web page remember my words remember this talk
00:27:33.760
you feel the sluggish and that's a negative thing. This is a slide from 10 years, 12 years
00:27:41.600
ago. Uh, nothing has changed there. Every second the web page is slower
00:27:48.080
increases the bounce rate by 0.65. And that's massive.
00:27:56.480
So, the slower the web page, the less products you're going to sell.
00:28:03.039
That's the major reason why I'm stressing this point. So let's compare to random web pages
00:28:11.120
which I came up today like three hours ago.
00:28:17.200
Um this is Wi-Fi of the hotel. So I pinged railsconf.org
00:28:22.240
and I pinged my own web page. Um and you
00:28:27.600
see the difference is quite um impressive. Uh that's because my server
00:28:34.080
is located in Germany and um redscom dot isn't. So that Atlantic makes quite a
00:28:41.840
difference. That's 10 times as slow. So this is the web page we all use
00:28:48.480
today. Um and this web page cries for caching like we see like all this
00:28:55.679
fragment caching here. All these possibilities. That's just a perfect example of how to do good caching if it
00:29:03.039
would be good. So let's have a look that page. Um and if you want to have a look
00:29:10.240
here is a URL. Um I already uploaded the slides. I will publish the URL of the
00:29:16.799
slides on my Twitter account at vinttermire. Um, so the this page we the
00:29:25.919
browser starts to get the first bite at uh.3 seconds. Well, that's pretty quick.
00:29:33.520
But then it takes more than a second to start to render and render is just the
00:29:38.799
browser does anything and then it takes a little bit over 3 seconds to show you
00:29:44.559
the page. Well, that's not below 1 second.
00:29:50.000
And you feel that that's my web page.
00:29:55.440
Um that's below 1 second and that's the both numbers compared. And the tricky
00:30:01.919
part here is you see the first line the first uh bite is way faster here than
00:30:09.520
here. And that is because the adventic I can fix a lot of things. I can make my
00:30:16.399
f my server faster. I can write better code but I cannot increase the speed of
00:30:22.000
light. So everything has to move via fiber from
00:30:27.440
Frankfurt to Philadelphia. And that's my biggest disadvantage here. I cannot fix
00:30:33.200
that. Um but still because everything else is fast
00:30:40.159
I get the web page below 1 second and that's the gold standard.
00:30:45.600
Um, these numbers, by the way, are via AT network. So that's kind of uh that's
00:30:51.440
on your on your mobile phone. Um, it'll be faster a little bit um on on fiber,
00:30:57.840
but um that's a good comparison. So that's for today. Everything for
00:31:06.000
today. Any questions?
00:31:12.640
No. Okay. Ah, one question. It's not the system. Okay, the question is I said did
00:31:20.080
I I I I I said that Russian door caching and fragment caching is buggy which I
00:31:25.520
deny. I never I never said that. What I meant is it's very easy to implement
00:31:31.279
bugs in that. So um it it is very easy to get started and
00:31:37.919
then you forget about it and then often you you don't use caching in development
00:31:43.120
and everything is wonderful and then on production something happens you you
00:31:49.360
forgot to I don't know some you forgot to touch something. So for example, you you didn't charge the product with a new
00:31:56.559
rating and then the whole cache is like until until the next morning is corrupt
00:32:03.840
and that is very hard to debug. Um so that's the reason for that.
00:32:10.320
So I made a the rates code is perfect. I have never seen any any bug in rates never.
00:32:18.399
Any other questions? Okay. Okay. The the question is have I ever measured SSD? Um because today SSD
00:32:25.840
became quite fast. Yes. Um the like the first talk I gave uh in 2013 and those
00:32:32.880
days a lot of my servers used hard drive. So that was like stone age. My
00:32:38.559
kids don't even know what that is. Um that was so much slower. So today is
00:32:44.640
like for me it's like magic. But still um you want to have the fastest SSD possible and it's that pyramid. Um so it
00:32:53.679
is very easy to buy a cheap SSD which is slow.
00:33:00.080
Um and then that's that's that's wasting money. So you
00:33:05.600
really and yeah is that the answer to your question? Perfect. Any other
00:33:11.200
questions? No. So remember client fee.