00:00:08.720
hello Matama
00:00:11.040
hello hello i'm DK Gate uh today we're
00:00:14.880
going to talk about profile and
00:00:16.640
benchmark every
00:00:18.840
change uh some of you might know me or
00:00:22.000
show you that's my online handle i work
00:00:26.000
at Smart Bank Inc
00:00:29.119
which is a credit card company we're
00:00:30.880
hosting the hack space on the second
00:00:32.719
floor so please take a visit if you want
00:00:34.880
to hack away or just want to relax i
00:00:37.760
usually wait work as a Rails developer
00:00:40.320
at
00:00:42.520
SmartBank and um before we go into the
00:00:46.480
main topic I'd like to some deliver some
00:00:49.039
updates from my own Ruby profiler PS2
00:00:51.360
which was uh received five honorable
00:00:54.480
mentions at the keynote um I'm sure I
00:00:58.559
don't have to explain what profiler is
00:01:01.039
so I just going to um deliver recent
00:01:05.479
updates i've changed some internals so
00:01:08.560
reduced memory consumption by 90% and I
00:01:11.520
have expanded features for comparing
00:01:14.080
profiles which I'm going to um explain
00:01:16.400
later and I have re rewritten the core
00:01:19.280
in
00:01:20.280
fem and one thing I have realized in
00:01:23.200
this year is that profilers don't make
00:01:25.840
program fast just like debuggers how
00:01:28.880
just like debuggers don't find bugs uh
00:01:31.439
profilers itself don't make uh the
00:01:34.640
program
00:01:36.040
fast so today I'm going to address that
00:01:39.759
today I'm going to speak about three
00:01:41.680
things one if introd introducing
00:01:44.960
benchmark dri driven de development
00:01:48.560
which I will refer as bench
00:01:51.240
dd now next I'll show you uh the tool
00:01:54.880
set for doing benchd and lastly I'll
00:01:58.320
show how I applied benchmark driven
00:02:00.880
development to build a 100 times fast
00:02:03.600
version of
00:02:04.920
Sinetra also for this year I have
00:02:07.680
attached some short Japanese translation
00:02:09.759
on the slides as an experiment this part
00:02:12.800
is uh pure translation and it doesn't
00:02:15.760
contain any flavor text so um if you
00:02:18.000
don't read Japanese you can safely
00:02:19.680
ignore
00:02:21.239
it i have created a web framework called
00:02:24.560
uh
00:02:26.040
Zinatra with it implements a well fair
00:02:29.200
part of the center API so it can be used
00:02:31.599
as a drop in replacement for uh you can
00:02:35.519
simply change your app's base class from
00:02:38.000
Crabase to the Netrabase uh I first
00:02:41.280
thought to name it Fetra since FE is 100
00:02:45.200
in Roman numerals but I soon realized
00:02:47.599
that it's too hard to uh distinguish um
00:02:51.440
Cetra the S and Cra so I just used um
00:02:55.360
Zetra as the name but um I realized
00:02:57.760
later that X is 10 in normal Roman
00:03:00.080
numerals so it might be 10 times only
00:03:02.760
faster but let's just ignore that uh is
00:03:06.480
it really 100 times fast well it depends
00:03:09.200
on the the definition of fast the
00:03:12.000
routing and handling logic is 100 times
00:03:14.680
fast which means that whole world whole
00:03:17.440
world apps are 100 times fast um of
00:03:20.840
course the framework handling isn't the
00:03:24.080
most expensive part in web frameworks
00:03:26.319
obviously the app itself has the most um
00:03:29.360
weight but even in real world benchmarks
00:03:32.799
uh making writing 100 times fast can
00:03:36.319
lead to a
00:03:37.879
2% gain so I'd say it's like a free
00:03:42.799
lunch you can just change the base class
00:03:45.200
and get a 2% performance
00:03:47.480
gain now how did I make it 100 times
00:03:50.319
faster that's the main part and I'm
00:03:52.799
going to introduce a technique called
00:03:54.400
benchmark benchmark driven development
00:03:56.400
and a tool set so let's continue into
00:03:59.840
the
00:04:01.959
style uh benchmarking itself is a common
00:04:04.799
technique used to measure the per
00:04:06.959
measure the performance of a program the
00:04:10.080
Ruby world has many benchmarking
00:04:12.239
libraries including benchmark which
00:04:14.480
comes as a standard library and I'm
00:04:16.880
showing benchmark IPS which is a gem in
00:04:20.160
in slide uh in this case a program which
00:04:24.479
has up to 1 to 100 is benchmarked re
00:04:27.600
revealing that it takes 2.5 micros to
00:04:30.800
execute the code
00:04:33.080
block given that let's talk development
00:04:36.560
driven by benchmarks the concept of
00:04:39.280
benchmark driven development development
00:04:41.280
is simple uh before writing starting co
00:04:43.840
before starting writing code design a
00:04:45.919
benchmark then start writing code now
00:04:48.400
make it fast and the cycle continues
00:04:50.600
on this can be compared to TDD which
00:04:53.520
stands for testdriven development in TDD
00:04:57.360
we start writing a failing test then
00:04:59.759
make it pass then refactor do you see
00:05:03.040
the similarity
00:05:05.840
the reason why I brought bench writing a
00:05:08.320
benchmark first can be explain explained
00:05:10.720
using this metrics there are four
00:05:13.120
sections in this matrix broken and slow
00:05:15.919
code fast but broken code working but
00:05:19.039
slow code and then finally working and
00:05:21.680
fast code uh we all know that making
00:05:25.520
slow but working code to fast and
00:05:29.280
working code is very different uh
00:05:31.919
difficult it's easier to write fast code
00:05:35.680
from the
00:05:37.400
beginning uh now when we start talking
00:05:39.840
about bottlenecks the first thing many
00:05:41.840
people mention is that you should focus
00:05:43.919
on the bottlenecks if there's some
00:05:46.560
significant bottleneck you should work
00:05:48.240
on that part or you should focus on the
00:05:51.120
algorithm and
00:05:52.919
make magnitude faster or you could just
00:05:56.639
work on the entire
00:05:58.199
architecture of course that is perfectly
00:06:01.120
valid advice and you should stick to
00:06:03.440
these principles but that is not
00:06:06.600
everything in reality there's not always
00:06:09.680
something significant nope there's not
00:06:12.800
always something significant no
00:06:15.280
particular bottleneck could exist for
00:06:17.919
example in this plane graph every a lot
00:06:21.199
of blocks have the same length so
00:06:23.759
there's no particular significant
00:06:27.240
bottleneck even if your code is working
00:06:29.520
on a log log arithmic algorithm and it
00:06:32.960
could still not meet performance
00:06:36.840
needs so um lots of slight slowdowns
00:06:40.319
will impact performance as a whole
00:06:43.199
however those are hard to find since
00:06:46.319
those are slight even though they may be
00:06:48.720
easy to fix uh we we should remember
00:06:51.680
that not slow is not fast
00:06:55.120
uh in this example we want to take any
00:06:57.680
even number from the array numbers the
00:07:00.880
slow version always iterates through the
00:07:02.880
entire array while the fast version
00:07:04.880
bails out early if it hits the first
00:07:06.720
even number uh this might not sound
00:07:11.280
realistic but we see a lot of this kind
00:07:14.080
of code in real apps they have the they
00:07:16.880
have the same time complexity but the
00:07:19.039
latter version is 10 times faster
00:07:22.960
to address this problem while while
00:07:26.000
develop developing Zetra I benchmarked
00:07:29.199
really every single change so in each
00:07:31.759
pull request I attached a benchmark and
00:07:35.599
I compared the the before and after per
00:07:41.479
performance now this sounds well cool
00:07:46.880
you could run benchmarks as much as
00:07:48.800
possible to catch flow code maybe on
00:07:51.440
every pull request just like I showed
00:07:54.120
you or on every comment
00:07:57.560
maybe even more every time you type a
00:08:02.240
row on your editor you could run a
00:08:07.000
benchmark i I have created a tools and
00:08:10.160
frameworks to keep benchmarking in the
00:08:12.319
loop
00:08:13.919
combined together they will form a
00:08:17.919
framework called benddd so that's
00:08:21.960
bencd uh now let me walk through how to
00:08:25.599
do
00:08:28.199
it uh benchmarking zetra started from uh
00:08:33.440
setting a measurable performance goal so
00:08:36.240
this means that defining what needs to
00:08:38.240
be 100 times fast so I wrote a benchmark
00:08:41.760
first for that
00:08:44.560
uh my first benchmark looks very simple
00:08:47.680
it just takes a app and call uh this is
00:08:51.279
a rack app so I called call and passed a
00:08:56.560
almost empty hash now let's see the
00:09:02.680
results uhra took 35,000 nconds which is
00:09:09.200
no 35,000 nconds per request
00:09:13.120
and another I found um another
00:09:15.920
interesting benchmark target is was roa
00:09:18.640
which is known as a a fast very fast web
00:09:22.760
framework uh now if I wanted to make a
00:09:26.080
100 times
00:09:27.320
faster this means that it has to be it
00:09:30.480
has to complete one request in 350
00:09:33.640
nconds and now the empty rack app I had
00:09:38.160
just built takes 250 nconds per
00:09:43.240
request um this graph looks a little
00:09:47.839
squashed so let's make the access log
00:09:51.440
log all log and make it easy to read
00:09:56.839
um comparing this we see that empty we
00:10:01.760
have empty rect app and if you want to
00:10:04.240
make a synetra that completes request in
00:10:06.720
350 nconds we only have headroom 135
00:10:10.560
ncond headroom for request to do the
00:10:13.519
heavy lifting and everything we need to
00:10:16.200
features in 105 nconds that's what we
00:10:19.440
found through
00:10:21.720
benchmarks so in code it looks like If
00:10:25.839
uh the question they said how much time
00:10:27.920
can we spend
00:10:31.320
here now setting the um target was well
00:10:36.560
it looks like 100 times is a fun target
00:10:39.360
but maybe a different target could be
00:10:41.440
fed f set f set f set f set f set f set
00:10:42.079
f set f set f set f set
00:10:44.120
fet i initially wanted to overtake hono
00:10:47.360
which is known as uh which is a
00:10:49.360
javascript web framework it is known to
00:10:51.600
be very fast now the sad part is that
00:10:55.360
fullfeatured hono was faster than an
00:10:57.600
empty rack app so hono could complete
00:11:00.800
request in 51
00:11:02.360
nconds and um I wanted to overtake but
00:11:07.360
it turns out it was kind of impossible
00:11:10.079
to do that in rack so I just dropped
00:11:12.160
that goal um sometimes benchmark is
00:11:15.680
benchmarking is useful to set a goal
00:11:18.560
goal
00:11:21.560
itself anyways we now know our code goal
00:11:25.120
we now know our goal so now it's time to
00:11:27.760
real write real code so the process goes
00:11:32.720
like start Vim write code run a
00:11:37.480
benchmark start Vim again and then
00:11:39.839
benchmark
00:11:41.360
so um I started writing code like this i
00:11:44.560
started from writing the routing code uh
00:11:47.760
do routing takes an M and it checks i'll
00:11:52.480
talk about this later but the um problem
00:11:55.279
I found that is not fun at
00:11:58.120
all because
00:12:00.959
uh look at the time
00:12:02.440
stamps i was running benchmarks uh 10
00:12:07.000
benchmarks in like five minutes time
00:12:11.040
editing code and this is obviously well
00:12:14.399
it was kind of fun but not really the
00:12:17.040
ideal
00:12:19.160
process so to um remedy
00:12:23.320
this I created a benchmarking framework
00:12:27.279
now this is
00:12:28.760
a kind of a way to describe benchmarks
00:12:32.600
um there's a this is a um DSL like RSpec
00:12:38.240
that has a setup part a data data set
00:12:40.880
part and a scenario part in this case
00:12:44.160
the setup part is not measured and the
00:12:46.480
scenario part is the part that is
00:12:49.800
benchmarked
00:12:51.480
um so yeah
00:12:54.320
um I have created a benchmark
00:12:56.279
benchmarking framework sorry I flipped
00:12:58.800
the slides
00:13:01.760
um we see that the designing work the
00:13:05.760
designing the workload is very important
00:13:08.720
workloads should be realistic and
00:13:11.320
representative and compact at the same
00:13:14.440
time uh for Zenetra I prepared multiple
00:13:17.600
tiers of workloads
00:13:20.320
the small one is a generated randomly
00:13:23.040
generated set of requests which consists
00:13:25.519
of um 10k so uh 10,000 10,000 requests
00:13:31.440
the large one is a log cor log collected
00:13:35.120
from a real center app which consists
00:13:37.200
from 100,000
00:13:39.800
requests we use the small one when
00:13:42.880
running benchmarks in a tiny uh small
00:13:45.440
loop so when you're writing code and
00:13:47.920
trying to experiment we can use the
00:13:49.760
small data set to get results fast and
00:13:53.200
to get more accurate results we use the
00:13:55.600
large
00:13:59.160
one another problem is that benchmarking
00:14:02.000
does not provide pro provide us enough
00:14:07.800
information what benchmarking tells us
00:14:10.639
is the time per iteration of the current
00:14:13.560
code but however what we really needed
00:14:17.360
is how did the performance change before
00:14:20.639
to after and why did the performance
00:14:25.720
change so in this case the only
00:14:29.040
information we know that is the time per
00:14:32.680
iteration this is where profiling comes
00:14:37.480
in explaining perform explaining
00:14:40.720
performance is exactly what profilers do
00:14:44.320
so as I maintain my own profiler I've
00:14:47.920
added a new view to show performance
00:14:50.000
diff between two
00:14:53.079
revisions uh this is called a
00:14:55.760
differential flame
00:14:57.240
graph unlike a normal unlike a normal
00:15:00.480
flame graph the colors some some frames
00:15:03.600
are colored in red and red and blue
00:15:08.560
the red ones indicate that
00:15:12.160
uh so this flame graph is generated from
00:15:14.160
two benchmark results and two
00:15:16.519
profiles the blue frame blue flames are
00:15:20.160
the ones that get they have get improved
00:15:22.720
performance from the previous benchmark
00:15:25.680
and the red ones are the ones that got
00:15:28.560
degraded from last run
00:15:31.000
so seeing this we can see that the blue
00:15:34.240
ones got better and the red ones got
00:15:36.760
worse this gives this gives us hints to
00:15:40.480
um
00:15:41.560
optimize uh actually this screenshot is
00:15:45.600
not from my profiler but this isn't
00:15:48.399
actually a screenshot from Brendan Greg
00:15:51.279
Sensei site and there is a reason
00:15:55.519
because his books is in the bookstore um
00:15:59.040
this Brendan Gre is a great person you
00:16:02.480
really should buy his book so in the
00:16:04.399
bookstore you should go now and um I
00:16:07.759
have implemented another
00:16:10.040
thing uh this is editor integration so I
00:16:13.759
felt that reading flame guard I felt
00:16:16.639
that reading flame graph is kind of
00:16:18.399
tedious and not a thing that want to do
00:16:20.720
every single time so I have made a uh
00:16:24.560
editor expense editor integration that
00:16:27.000
shows the time spent per line in ghost
00:16:30.959
text um do you notice the gray the code
00:16:34.480
um text in gray for example the
00:16:38.000
continuation check on line 22 is
00:16:40.800
spending 82,000 nconds per iteration
00:16:43.920
indicating that this is a um possible
00:16:46.320
hot spot
00:16:50.160
this is implemented in cooperation with
00:16:52.079
the benchmark framework framework and
00:16:54.440
PS2 uh when a scenario runs when a
00:16:57.920
scenario runs it automatically engages
00:17:00.639
PF2
00:17:03.000
profiling um the results are recorded in
00:17:06.480
some temporary directory and the diff
00:17:09.199
engine in PF2 generates the differential
00:17:12.240
flame graph and data for the editor
00:17:15.039
integration
00:17:18.880
so now we have walked through a single
00:17:20.400
cycle of NCD it is important to it is
00:17:23.280
important to write a benchmark for each
00:17:25.120
feature not each only in the case of
00:17:28.160
Zetra I wrote benchmarks for routing
00:17:30.160
handling actions and more now we'll see
00:17:33.039
how I improved performance for each
00:17:34.880
feature in
00:17:36.440
detail uh so making Zetra 100 times
00:17:41.799
faster so this is a reminder we need to
00:17:44.880
fit we need to fit all features in 155
00:17:47.760
nconds
00:17:49.720
time uh we have to start from optim
00:17:52.480
optimizing this the significant routing
00:17:55.520
is the largest part in Sinetra as well
00:17:58.320
web frameworks in general which is the
00:18:01.120
which is the equivalent of what rails rb
00:18:03.840
does in
00:18:04.840
Rails um some algorithms come to mind
00:18:08.559
like try based rooting or linear rooting
00:18:12.720
there are multiple there are m multiple
00:18:14.480
options of algorithms for routing one is
00:18:17.679
linear so it tries to match every single
00:18:20.480
registered route for request and another
00:18:23.520
one is a try based on a hash well data
00:18:26.240
structure called a try now the one is um
00:18:30.480
and another one is
00:18:32.280
login so there's a um
00:18:37.240
trade-off it it's important to know
00:18:39.679
where's the line the line the line where
00:18:42.400
a simple um linear benchmark a linear
00:18:46.000
algorithm lose to a complex log
00:18:48.559
algorithm
00:18:50.240
uh for 10 to 20 routes linear routing
00:18:52.320
was faster which was found by
00:18:54.760
benchmarking so for zetra it
00:18:57.200
automatically switches the um algorithm
00:19:00.320
based on the number of routes
00:19:03.960
registered now that we have implemented
00:19:06.400
routing um we have 84 it it turns out
00:19:10.320
that routing consumed about 50 nconds so
00:19:13.679
we have 84 nconds headroom headroom to
00:19:16.360
go uh we have many more features that we
00:19:19.600
have to implement to make Sinatra well
00:19:22.160
zinatra useful web framework
00:19:25.400
uh
00:19:26.919
it only not not only routing makes a
00:19:30.080
framework we need to implement param
00:19:32.160
access before and after actions and
00:19:34.120
cookies and those are not those may not
00:19:37.280
be bottlenecks but there are still fit
00:19:40.640
challenging to fit in 84 nconds for a
00:19:46.039
request has a params API if you call
00:19:49.440
params in a handler you can get the
00:19:53.120
params passed to a um request the
00:19:57.600
problem is that params is a method call
00:20:00.720
and method calls are actually
00:20:04.760
expensive if we could make parents an
00:20:07.360
instance variable that will make uh that
00:20:10.640
will gain they will make a
00:20:12.440
gain however that is API
00:20:16.520
change
00:20:18.760
obviously app params and inest variable
00:20:21.280
is faster but it's obviously mutable and
00:20:23.280
can do less work zetra support supports
00:20:26.640
both so users can gradually switch to
00:20:29.440
the instance var variable version and
00:20:31.440
gain
00:20:32.679
performance so there's a lesson lesson
00:20:35.120
learned here performance can influence
00:20:37.840
API
00:20:38.919
design if you start from params and make
00:20:42.240
that a public API you can't change it
00:20:46.000
easily without introducing a breaking
00:20:48.280
change you have to um carefully check
00:20:51.840
every option
00:20:54.799
another example is the request
00:20:57.080
object request is uh another
00:21:01.640
known it returns a request object it's
00:21:04.799
kind of useful now can we change this to
00:21:07.520
gain some
00:21:09.240
performance we have some option to
00:21:12.440
implement request but actually there is
00:21:15.840
another usage so unlike param you can
00:21:18.240
call request.mp or request.par to get
00:21:21.280
another object
00:21:24.000
there's a multiple options to implement
00:21:26.039
this the one is option one is making it
00:21:29.679
a class option two is making a data
00:21:32.400
option three is making it a
00:21:37.000
strruct the data is the fastest of the
00:21:41.159
three it oh sorry it's all parents um
00:21:46.640
sorry about that um so the obvious here
00:21:50.799
is data one interesting thing is that
00:21:53.120
these results get reversed when widget
00:21:55.200
is off widget is off when watch is off
00:21:59.200
class is actually the fastest so it's a
00:22:02.480
lesson learned here is that it is
00:22:04.000
important to exper experiment in an
00:22:05.919
envir envir environment near
00:22:10.360
production another feature is before and
00:22:13.200
after action before and after actions
00:22:15.760
can be registered and those are saved on
00:22:18.880
app startup those get called on every
00:22:21.799
requests so they can be used for
00:22:24.240
authentication and other
00:22:26.520
texts there are multiple handful in Ruby
00:22:29.919
there are handful of ways to call a
00:22:33.720
proc one is block call so you can just
00:22:37.919
call on the
00:22:39.880
proc and another one is instance effect
00:22:43.520
and another one is instance evil these
00:22:46.720
have very different different
00:22:48.240
performances
00:22:50.000
block call proc call is the fastest but
00:22:53.360
in this case it was unusual since the
00:22:56.640
execution execution context
00:22:59.159
changes so the options left are infinite
00:23:02.559
and instance evil but benchmarking found
00:23:05.280
out that instance evil was somewhat
00:23:08.280
faster so I used instance evil
00:23:13.640
here
00:23:15.480
however 422 nconds is already 95 times
00:23:19.919
faster than ATRA so we're breaking the
00:23:22.000
100 times
00:23:24.039
line there so we have to come up with
00:23:26.559
another another
00:23:28.280
strategy one way is to just make the
00:23:31.679
before action a method actual method we
00:23:34.559
just called define method instead of
00:23:36.480
saving a block I just define defined the
00:23:40.400
actual method and called it later using
00:23:43.120
self
00:23:44.520
send calling methods is faster than
00:23:46.960
calling blocks and allows logic so this
00:23:50.559
was this gained us
00:23:52.840
significant
00:23:54.600
nonds however send it is kind of still
00:23:57.520
slow
00:23:59.440
what if we could just call a static
00:24:03.400
method we could do that but multiple
00:24:06.159
before cannot be defined in this version
00:24:08.559
so it's kind of breaking however this
00:24:11.840
led us to um jumping 145 times fast 144
00:24:17.600
times
00:24:20.360
fastra now uh one last feature I'd like
00:24:23.600
to introduce is ra session um this is a
00:24:26.960
this is technically not a part of
00:24:28.240
Senetra but many Senetra apps say use
00:24:30.559
this so um session handling wasn't in
00:24:32.640
the original benchmark so I'm not
00:24:34.320
showing numbers here but um profiling
00:24:37.679
shows that rack session usually consumed
00:24:39.760
quite a lot of CPU
00:24:41.760
um I tried to implement a equivalent in
00:24:44.080
rest and it was quite fast so just for
00:24:48.799
information and more and more um I
00:24:52.080
reduced hash access reduced object
00:24:54.120
calculation in this case um
00:24:59.000
hash this the flow pattern you see a lot
00:25:01.840
of this full pattern in Ruby code but it
00:25:03.840
actually access hash key twice so that's
00:25:07.760
some damage caused to performance and
00:25:09.679
reducing a lot of
00:25:12.520
things uh so wrapping up building was
00:25:15.600
removing a ton of small ton of small
00:25:17.880
debris doing high performance was not
00:25:20.480
making it fast but not making it slow
00:25:23.600
because 10 10% slower code is
00:25:27.799
150% slower
00:25:30.360
code now I'd like to introduce some tips
00:25:33.679
for better
00:25:35.720
benchmarking uh does this presentation
00:25:38.720
matter with me yes because in Ruby and
00:25:41.840
Rails CPU time is very precious um many
00:25:45.840
people say that the database is the
00:25:47.679
bottleneck and Ruby code won't matter
00:25:50.000
but actually Rails isn't that IO bound
00:25:53.919
it only does bound IO for like 50 to 60%
00:25:58.080
of time
00:25:59.440
and reducing one millisecond in Ruby
00:26:01.919
code could go very far especially when
00:26:04.400
you
00:26:06.200
scale uh another important thing is to
00:26:08.880
not do gacha uh gacha machines is by the
00:26:12.000
way the photo on the left right um you
00:26:14.760
can pay 100 or 200 yen to get some
00:26:18.640
random mikang mikang goods and well um
00:26:24.559
doing benchmark like running a bench
00:26:26.960
repeating benchmark commands until you
00:26:29.200
get a good good results is well tempting
00:26:31.840
and fun but that's kind of wasting time
00:26:34.880
um if you are unsure you should do a
00:26:37.679
stat hypo stat statistical hypothesis
00:26:41.640
test and um why is it enabling watching
00:26:46.720
during benchmark is important just just
00:26:48.799
as I said keeping the benchmarking
00:26:50.720
important benchmarking environment close
00:26:53.200
to production is
00:26:55.000
important uh why is it increasing for
00:26:57.840
methods called three times so a short
00:27:00.559
warm up should suffice
00:27:04.720
uh when profiling a set benchmarking uh
00:27:07.039
that is a yes you will see lower scope
00:27:10.320
scores with the profiler enabled but
00:27:13.360
that's okay as long if the overhead is
00:27:17.320
consistent and one thing benchmarking CI
00:27:21.240
so the continuous benchmarking idea is
00:27:25.600
kind of sounds it kind of sounds like a
00:27:28.240
thing to do in
00:27:29.559
CI why do it in local when we have uh
00:27:33.840
like GitHub actions or something well
00:27:36.320
that is because CI environments are very
00:27:38.480
unstable you don't get the same results
00:27:40.799
from the same code they have an unstable
00:27:43.600
based CPU you might have some noisy
00:27:46.320
neighbors the libraries could give
00:27:48.640
automatically updated and hyperthreading
00:27:51.200
could be enabled or
00:27:53.720
disabled um so wrapping up do you now
00:27:57.520
feel benchmarking some optimizations I
00:28:00.559
cover today won't be easy to do after
00:28:02.720
writing code but when you're when the
00:28:06.640
instance you're writing code that is
00:28:08.240
possible so always want always run
00:28:11.279
benchmarks when writing code and find
00:28:13.440
them before you get
00:28:14.919
comment so thank you