00:00:05.120
oh hello hello hello everyone good
00:00:06.879
afternoon yeah uh my today topic is SDB
00:00:11.440
in efficient Ruby stack scanning without
00:00:14.160
the dial when you see this title you
00:00:16.800
might wonder why creating another stack
00:00:19.119
profiler Ruby already has start and many
00:00:22.480
of them work pretty well in their cases
00:00:24.800
why should we create another yeah and u
00:00:29.119
and You you you might be weird how or
00:00:31.599
how how how or why we we we want to scan
00:00:36.160
the stacks without the GL how could we
00:00:38.239
do this yeah I will answer all of those
00:00:40.760
intentioning and in my traditioning I
00:00:43.760
will describe why I designed a new stack
00:00:46.399
profile then I will describe how I
00:00:48.960
designed it after that I will discuss
00:00:51.440
how to scan the Ruby stacks without the
00:00:53.840
JBL yeah yeah let me answer the first
00:00:56.879
question by a new stack profiler yeah to
00:01:00.160
answer this we need to see how we use
00:01:02.480
instrumentation and the stack profiler
00:01:05.519
we know people use the instrumentation
00:01:07.760
as the default solution yeah and we
00:01:10.720
insert the code around around the
00:01:12.799
function we want to measure their
00:01:14.799
latency yeah no matter through an
00:01:17.520
instrumentation library or such as the
00:01:19.759
dead dog or neuro or we can insert those
00:01:22.640
code by ourself but the question but the
00:01:25.759
things is we can't insert the
00:01:28.320
instrumentation code to all the
00:01:30.240
functions that means cost too much i
00:01:32.880
mean no matter the manual effort or the
00:01:35.360
effort the the the overhead of those
00:01:37.840
instrumentation code so for those
00:01:40.400
functions do not have those
00:01:42.240
instrumentation code they we can't know
00:01:44.799
their latency they are blind spots of
00:01:47.280
our instrumentation if when the
00:01:49.759
performance issue happen during those
00:01:52.159
those brand spots we may enable a stack
00:01:54.799
prof to see can it help us so you you
00:01:58.159
see here first we have instrumentation
00:02:00.960
and we know it may not work then we use
00:02:04.000
another tool to to to work as a comp
00:02:06.880
complement but my question is my
00:02:09.440
question is why not we just use one tool
00:02:12.239
I mean we just use the stack profile in
00:02:14.640
the first place yeah and uh doing this
00:02:18.080
the ob the benefits seem to be obvious
00:02:21.120
you see the stack profiler is
00:02:23.599
implementation agnostic and uh it has
00:02:26.959
less blend spots or no blend spots more
00:02:30.440
importantly a it doesn't require us to
00:02:33.680
to write the instrumentation manually
00:02:36.400
and it means less effort but in practice
00:02:40.400
this seems not happen i mean maybe
00:02:42.560
happen really well i think there must be
00:02:45.200
several reasons first I think uh most
00:02:48.640
stack profilers are designed for average
00:02:51.000
latency uh for example many of the
00:02:53.760
profilers use the 10 milliseconds as
00:02:56.480
their sampling interval that means if a
00:02:59.519
request takes around 50 milliseconds uh
00:03:03.519
those stack profilers can only gather
00:03:05.680
five samples the five samples can't say
00:03:08.239
much about a single request so I think
00:03:10.800
they are they are not enough for any
00:03:13.680
specific request however the
00:03:15.920
instrumentation works on the other side
00:03:18.400
even they have blind spots they can
00:03:21.120
detect any any any requested state so so
00:03:25.200
so so that's the the good part of the
00:03:29.080
instrumentation another issues of
00:03:31.519
existing solution is they may use the
00:03:33.760
high you use the high CPU may I mean if
00:03:37.360
a performance tool may consume more than
00:03:39.680
10%age CPU we may not we may hesitate to
00:03:42.879
use it more importantly
00:03:45.680
uh because existing solutions need the
00:03:48.400
JVL as we know the JVL use the the JV
00:03:52.319
when the stack profiler hosts the JVL
00:03:55.040
the Ruby application can't does it work
00:03:57.200
so it slows down our Ruby applications
00:04:00.599
Yeah so so I think if so if we want to
00:04:05.760
want a stack prof profiler that can
00:04:08.239
replace the instrumentation I mean we
00:04:10.159
can use a stack profile only the this
00:04:13.360
profile must be fulfill several things
00:04:16.160
first we should reduce the sampling
00:04:18.880
interval from 10 milliseconds to 1
00:04:21.320
millisecond yeah and we should keep the
00:04:24.000
CPU usage as low as possible maybe less
00:04:26.960
than five or three percentage and more
00:04:29.919
importantly we should not use a GVL when
00:04:33.199
we scan the stacks as no one want to
00:04:35.520
slow down their applications okay after
00:04:39.280
after we know why I want a new stack
00:04:42.400
profiler I will discuss how I designed
00:04:44.520
it yeah we can check the general
00:04:47.600
solution first yeah uh most typical
00:04:51.199
stack profiler will have has two parts
00:04:54.400
one is the signal trigger And another
00:04:56.560
one is the signal handler uh the signal
00:04:59.440
trigger sends the signals to the
00:05:01.160
application threats regularly maybe
00:05:03.440
every 10 milliseconds or 1 millisecond
00:05:06.400
and when the application threats receive
00:05:08.720
those signals it invoke the handler and
00:05:11.840
in that handler it it the scans the
00:05:15.759
stacks and synchronizing or maybe do
00:05:18.000
some an analyzing for the signal trigger
00:05:21.440
you see it actually just just does some
00:05:24.320
simple job it just send some signals and
00:05:27.039
we can release the JVL in the single
00:05:29.199
trigger so for a single trigger it job
00:05:32.320
is quite simple and there's not much to
00:05:34.639
say about it so let's let's us check
00:05:37.360
each element of the handler uh one one
00:05:40.800
component of the handler is the anizer
00:05:43.680
which accurate gates we collected yeah
00:05:46.160
that's because that's because the stack
00:05:48.400
scanner only can know which functions
00:05:50.320
are on the stacks or which functions are
00:05:53.280
during execution and we need another
00:05:56.720
component to calculate or infer those
00:05:59.680
those functions latency average latency
00:06:01.840
yeah that's the analyzer and after that
00:06:04.720
we may need the analyzer to convert the
00:06:07.039
data into other formats i mean maybe you
00:06:09.919
want to present the the data as a flame
00:06:13.440
flame graph or have some summary about
00:06:16.479
some text summary about each functions
00:06:18.479
average latency so that's the thing uh
00:06:21.199
actually the analyzer does the h work
00:06:23.840
yeah but for it we can do we can do the
00:06:26.880
lazy and offline analyze yeah as a
00:06:30.560
default stack profiler it can collect it
00:06:33.120
can collect a lot of symbols but only
00:06:35.600
some of them we are interested in so we
00:06:38.800
can only analyze those those samples we
00:06:42.479
think relate to our performance issues
00:06:44.720
so we can perform lazy analyze that can
00:06:47.199
reduce a lot of a lot of work yeah
00:06:50.319
another thing is our design goal here is
00:06:52.800
to reduce the impactation on our
00:06:55.919
production environment or online servers
00:06:58.960
that means we can we can put the data to
00:07:01.440
our local machines or other machines so
00:07:03.919
it doesn't impact our online source yeah
00:07:07.120
for the analyzer a solution is to
00:07:09.440
perform lazy and offline analyze the
00:07:12.880
next component of the handler is the
00:07:15.720
symbolizer which converts memory address
00:07:18.479
into human readable symbols yeah the
00:07:21.840
stack scanner only can gather the
00:07:24.000
function or methods addresses so to
00:07:26.639
understand them we need to convert those
00:07:29.280
addresses into method names and other
00:07:32.560
details so the another yeah but we know
00:07:36.000
for any given Ruby applications they
00:07:38.400
only have limited set of of methods that
00:07:41.759
means if we do not convert uh method
00:07:45.360
method address to to to symbols
00:07:47.520
repeatedly that means it work is
00:07:50.000
manageable so for a symbolizer is it
00:07:53.199
solution is to catch the translation
00:07:55.400
results okay uh after those two change
00:07:58.639
our architecture look like this you see
00:08:00.960
the stack gunner and simple only gather
00:08:03.599
state and lock them into file and after
00:08:07.039
that we can perform the analyze when
00:08:09.520
necessary so that's the offline analyzer
00:08:12.160
actually after those two changes the
00:08:15.199
stack profiler should have much better
00:08:17.479
performance however there is still one
00:08:20.080
last element the stack scanner uh the
00:08:23.680
stack scanner travels and collects state
00:08:26.400
regularly maybe every 10 millisecond or
00:08:29.199
1 millisecond yeah uh a lab stack a
00:08:32.399
typical lab stack can only can have 100
00:08:35.919
to 200 frames for a modern CPU it can
00:08:39.200
run very fast that means if we scan 200
00:08:44.159
lens array it doesn't take take much
00:08:46.560
time so that for us it means if we do
00:08:50.320
not do not scan the stacks in an
00:08:53.040
extremely high sampling rate this part
00:08:55.519
work is right compared to the other two
00:08:57.880
components but the existing solution is
00:09:00.880
not ideal because they use the JVL that
00:09:04.240
means when the stack gun holds the JVL
00:09:06.959
our application can't do their job it
00:09:09.839
blocks our applications execution so it
00:09:13.760
slows down our applications and it
00:09:16.720
limits scanning frequency as well as
00:09:19.560
accuracy yeah so let's go to the next
00:09:22.080
section how to scan the stack without
00:09:24.720
the
00:09:25.720
JBL yeah let's recheck our stack scanner
00:09:29.519
the stack scanner read the active frames
00:09:32.560
and record those frames information into
00:09:35.440
fail or into a fail or into the memory
00:09:38.160
and at the same time the RubyVM will
00:09:40.800
update them i mean push a frame onto the
00:09:43.760
stack or pop a frame out of the stack
00:09:47.120
more importantly the frame is a Ruby
00:09:49.760
internal structure we know the JVL is
00:09:52.640
used to to protect all Ruby objects yeah
00:09:56.720
so it seems that we must use the JVL but
00:10:00.720
if we look at closely it's not entirely
00:10:03.360
true yeah uh to answer this we need to
00:10:06.720
check the duty of the JVL or analog yeah
00:10:10.000
here is an example there are two threads
00:10:13.120
both of them share the same same object
00:10:15.920
the the instance of the link list and
00:10:18.800
the thread one want to free free the
00:10:21.680
last element of the link list and the
00:10:24.720
thread two at the same time want to read
00:10:27.279
it unfortunately the thread two first
00:10:30.480
holds the tail element the host elements
00:10:33.680
address and after that the thread one
00:10:36.640
free that element then the thread to
00:10:39.760
wants to read it as the element has
00:10:42.880
already been freed it means it will
00:10:45.200
cause segmenting fault to solve this
00:10:48.160
people use the lock generally yeah you
00:10:50.959
see the lock group things together and
00:10:53.760
it seconds critical sections that means
00:10:56.560
each sections can't happen at the same
00:10:58.720
time so there's no such issues yeah if
00:11:03.279
we look it closely we can find out the
00:11:07.200
one job of the lock is to group the
00:11:10.640
instructions i mean several instruction
00:11:12.720
make them as a whole item a whole unit
00:11:15.760
it's atomic but here my question is what
00:11:19.120
if we only need one single instruction i
00:11:22.560
mean the instruction itself can
00:11:24.640
guarantee atomic do we still need the
00:11:27.839
lock yeah okay the answer is no we can't
00:11:31.600
we we do not need the lock for example
00:11:34.160
there is a counter we can read and write
00:11:36.800
it without any lock yeah in in my in my
00:11:40.160
program we we may see some outdated
00:11:42.399
value but no corrupted date so uh for
00:11:45.040
example that means uh the CPU one has
00:11:48.240
already updated the value to some number
00:11:50.560
and after that the CPU 2 wants to read
00:11:53.040
it it may read some old value but it
00:11:56.480
will not read some invalid or corrupted
00:11:59.200
value i mean the the value count belongs
00:12:01.839
to different numbers so for counter the
00:12:04.640
lock is not necessary that's because the
00:12:07.839
six bit around the memory read and write
00:12:11.360
are atomic that means the read or write
00:12:14.800
can happen fully or not at all there's
00:12:17.839
no intermediary state this feature is
00:12:20.399
guaranteed by the hardware yeah after we
00:12:23.519
know this we can check the SDB stack
00:12:26.480
scanner actually a stack scanner only
00:12:28.880
depends on one single field which is ISQ
00:12:32.000
it holds the address of the assequence
00:12:34.480
structure you can consider this
00:12:36.880
structure as the internal representation
00:12:39.600
of the Ruby methods yeah for for simple
00:12:42.160
yeah and the EQ field is a sixful bit
00:12:46.160
around the memory that means we can read
00:12:49.120
it safely without any lock without the
00:12:51.760
JVL so that's the foundation of why we
00:12:54.720
can read the stack without the GVL yeah
00:12:59.360
but but the formal step only guarantees
00:13:02.160
that if a frame is is is alive we can
00:13:06.160
read it now we need to guarantee that we
00:13:09.760
only read the live threats yeah we know
00:13:12.560
Ruby used the stack to save to to store
00:13:15.600
those those frames and the stack is a
00:13:18.480
serial array this array is allocated
00:13:21.040
when Ruby create the threats and will be
00:13:23.920
freed after the threat finishes its work
00:13:26.880
so when a thread is allow frames array
00:13:30.240
must be valid so it's safe to read
00:13:33.320
it yeah here is the implementation is
00:13:36.399
quite simple we can drop the code around
00:13:38.639
the block around the thread block it's
00:13:42.000
for synchronizing with the Ruby VM and
00:13:44.240
the stack scanner it just tell the stack
00:13:46.800
scanner okay there's new threat you can
00:13:49.440
start do your work and after the threat
00:13:52.639
finishes it work it just tell the stack
00:13:55.360
scanner okay you this threat will die
00:13:58.160
you should not scan it anymore yeah uh
00:14:02.240
after this we still need to consider how
00:14:05.040
the stack scanner works with the Ruby JC
00:14:08.079
that's because in the stack scanner the
00:14:10.079
gun's ruby memories we need to know how
00:14:12.639
Ruby manages it memory or how it how it
00:14:15.680
JC works when duty of JC is to free the
00:14:19.440
inaccessible objects for example after a
00:14:22.639
real request all its local objects are
00:14:26.079
not useful so we can release it and it
00:14:28.560
work is done by the JC but we The STB
00:14:32.399
stack scanner only scans live threads so
00:14:35.040
it's safe for for for here another duty
00:14:38.639
of JC is memory compactation we know
00:14:41.519
Ruby can enable the auto compactation
00:14:44.240
and in this function in this future it
00:14:46.720
moves objects to different locations as
00:14:50.000
the SDB needs to scan the Ruby memory uh
00:14:53.920
if the memory I mean the memory has been
00:14:57.279
moved to other location it will cause
00:14:59.519
errors but the good news is uh the Ruby
00:15:03.279
VM doesn't move all the objects i mean
00:15:06.079
some objects when they are allocated
00:15:08.320
they will not be changed to other place
00:15:11.120
and the FDB stack scanner only depends
00:15:14.240
on those those objects I mean the
00:15:16.639
execution context and it stacks so but
00:15:20.240
yeah so that's why it's safe yeah uh but
00:15:22.720
if you want to read some movable objects
00:15:25.199
without the JVL there are several
00:15:27.199
options you can read the immutable ones
00:15:29.839
and do not read them due to during every
00:15:33.199
any JC phrases or you can insert some
00:15:35.519
blocks for them uh to summarize uh the
00:15:39.360
SDB only read the SEQ field and this
00:15:44.279
oper synchroniz state with the Ruby VM
00:15:48.000
so it it never raise any dead threats
00:15:52.079
and uh as and as the SDP only depends on
00:15:56.560
the fixed objects so it works well with
00:16:00.000
the GC competition so that's why SDB can
00:16:04.079
scan the stacks without JVL
00:16:06.920
safely okay we we know the implanation
00:16:09.839
so now we need to discuss the datics of
00:16:12.320
threat safety if you look at closely you
00:16:15.199
see the Ruby where I'm updates the stack
00:16:17.680
where the ID res it without the lock so
00:16:21.279
that means the dates can happen or it
00:16:24.000
has thread safety issue but I haven't
00:16:26.399
solved it you can see some several
00:16:28.880
similar tools have this issue for
00:16:30.720
example the RB spy pan both of them
00:16:34.079
offer the non-blunking mode that means
00:16:36.480
in that mode it has this issue i mean
00:16:39.199
it's not strike safe in that mode or you
00:16:42.160
can see the Java or single profiler use
00:16:44.880
some functions doesn't guarantee the
00:16:47.360
atomic so it has has the same issue so I
00:16:50.639
believe in practice the datas is not
00:16:54.240
must to be solved
00:16:57.440
uh that's because uh for a stack
00:16:59.920
profiler action book is far more
00:17:02.000
important than strict correctness you
00:17:04.880
can imagine when the stack profiler can
00:17:07.199
yield some wrong results only when those
00:17:09.760
functions are extremely high but our
00:17:12.799
stack scanner are for detecting the low
00:17:15.600
the high latency functions so those
00:17:18.079
extremely high functions doesn't matter
00:17:20.000
to our interest so I think action book
00:17:22.880
is far more important than straight
00:17:25.199
correctness this trade off happens in
00:17:27.919
many areas actually for example the inro
00:17:31.360
consistency in database or the ru in
00:17:34.320
operating system yeah however if you
00:17:37.039
still want want To solve this we have we
00:17:40.240
have solutions yeah for example the IODB
00:17:43.200
which is a paper published in SDI 2024
00:17:47.039
it uses a generation number for
00:17:48.960
concurrence control so it's still
00:17:51.080
resolvable yeah after we have all of
00:17:54.960
this I mean we have this information I
00:17:57.360
we know how it works we need to do some
00:17:59.919
test to see does it works as we expected
00:18:03.039
or not yeah uh so I does several
00:18:06.960
evaluation in my evaluation I focused on
00:18:09.440
two things one is the the overhead i
00:18:12.559
measured the SDB with a JVL based
00:18:15.200
solution and run several stack profiles
00:18:17.840
on the real application to see what to
00:18:20.559
see the overhead of those tools
00:18:22.960
additionally I have simple one single
00:18:25.360
request to see how much information can
00:18:27.600
be gathered by the SDP uh the first
00:18:31.440
evaluation is on an AWS machine yeah I
00:18:35.039
used a simple scrap to do some
00:18:36.799
calculation i compared the SDB the no
00:18:40.400
JBL solution and a JVL based solution i
00:18:44.160
run them in different sampling intervals
00:18:46.400
to see how the different simple simple
00:18:49.600
different sampling intervals impact the
00:18:52.160
execution time i mean how much how they
00:18:54.799
impact our per
00:18:56.679
performance and the JVL based I didn't
00:19:00.720
use any existing solutions for for the
00:19:03.120
JVL yeah because the existing solutions
00:19:06.240
does too much work making it hard to
00:19:08.400
isolate the impact of the JVL so I wrote
00:19:11.600
a simple simple one it only does some
00:19:14.240
bare minimum minimum job for better
00:19:16.360
performance and here is the results you
00:19:19.280
see the X access is the simple interval
00:19:22.320
ranging from 1 millisecond to 1
00:19:25.039
microcond and the web access is the
00:19:28.720
execution time the larger means the
00:19:31.600
stack profile impacts the the scrap much
00:19:35.960
yeah uh the blue line represents the JVL
00:19:40.080
based solution you can see it clearly
00:19:43.200
when we in decrease the sampling
00:19:45.640
intervals the execution time uh increase
00:19:49.360
sharply so that means how that
00:19:51.919
demonstrates how the JVL impacts our
00:19:55.120
performance in in for our stack profile
00:19:58.559
on the other side you see the the JVL
00:20:01.840
based solution even when even we
00:20:04.799
decrease the sampling interval it
00:20:06.960
doesn't change the execution time that
00:20:09.280
means it doesn't impact our applications
00:20:12.799
yeah that's the first evaluation yeah uh
00:20:17.600
I after that after that I did a did an
00:20:21.280
evaluation on real application because
00:20:24.559
many Ruby many Ruby use the real web
00:20:27.840
frame so it represents the production
00:20:30.320
usage yeah the real use case yeah uh I
00:20:33.360
picked the homeland which is the most
00:20:36.640
famous Ruby forum in China yeah and it's
00:20:39.440
written by professional pro programmers
00:20:42.159
so I think this operation is quite
00:20:44.520
representative so I run it on on a
00:20:47.679
machine and use different DB use
00:20:49.679
different stress sender and uh here is
00:20:52.559
the setup and I picked the sampling
00:20:55.360
interval as 1 millisecond the reason is
00:20:58.400
the homeland it latency is larger than
00:21:01.360
30 milliseconds so 1 millisecond
00:21:04.159
sampling interval is quite enough for
00:21:06.320
for it so I I think it's it's okay and
00:21:10.159
the major method is quite simple i just
00:21:13.440
run the homeland without any stack
00:21:15.840
profiles to see its latency and its CPU
00:21:19.360
after that I run I run it with one one
00:21:22.960
single uh stack profiler and measure the
00:21:25.360
CPU usage and the the request average
00:21:28.400
latency again and the overhead of this
00:21:31.600
this stack profiler is the difference
00:21:34.240
between those two tests so that's the
00:21:37.000
method yeah uh
00:21:40.880
uh and here is the detail of those stack
00:21:43.440
profilers i will not read them yeah and
00:21:46.880
uh here is our result you see uh there
00:21:49.919
are two group of bars the first group is
00:21:52.480
the average latency increased i mean
00:21:55.039
after we enabled the stack profile how
00:21:57.760
much average latis has been increased
00:22:00.480
yeah and the second group of bars is the
00:22:03.360
CPU usage how much CPU those stack
00:22:06.000
profile uses usage yeah uh we can see
00:22:09.360
the green bars belong to the SDB it only
00:22:13.200
increase 0.5 percentage average latency
00:22:16.320
and use less than 3%age CPU i think it's
00:22:20.159
quite good for most applications yeah
00:22:23.039
even for some large scale of Ruby Ruby
00:22:25.520
servers yeah and uh the the blue bars
00:22:30.000
represent the RB it doesn't even it
00:22:33.840
doesn't increase our latency but it use
00:22:36.240
a lot of CPU i I guess it may because it
00:22:40.000
needs to read the memory from other
00:22:42.000
process and it may does a lot of a lot
00:22:45.280
of work in its stack profiler it it when
00:22:49.520
when we enable RB spy it calculates the
00:22:53.360
average latency dynamically so it could
00:22:56.000
cause a lot and the next bars belong to
00:23:00.840
one.0 yeah it doesn't use much CPU but
00:23:04.640
it increase 10 percentage of our average
00:23:07.840
latency i guess it's not is not good for
00:23:11.679
our default solution i mean if we want
00:23:14.080
if we enable a stack profile and it
00:23:16.880
increase 10%age latency maybe we will
00:23:20.080
not use it like this yeah uh and to be
00:23:24.320
clear the one is 1.0 it has several
00:23:28.000
several neuro version i haven't tested
00:23:30.480
them because some reason uh so but I
00:23:33.200
think the newest version of wer could
00:23:35.120
have much better performance than this
00:23:37.120
test and I I I must to say both the RB
00:23:41.679
back and wer are quite good useful stack
00:23:44.760
profilers you can imagine if we just
00:23:47.440
want to enable it occasionally the
00:23:50.159
10%age of CPU uh the 10%age of average
00:23:53.679
latency increase it doesn't that matter
00:23:56.559
and for the abs if we change The
00:23:59.039
sampling interval from 10 milliseconds
00:24:01.039
to 1 millisecond is CPU usage is
00:24:03.760
reasonable however the STB SPB is
00:24:07.679
targets for enable by default enable
00:24:10.799
always so it must use low it
00:24:15.480
must make the impactation as low as
00:24:18.240
possible and use the low CPU or other
00:24:20.880
resources so so that's the difference
00:24:23.840
here is the one simples of one here is
00:24:27.200
the simples of of one request you see it
00:24:30.240
can collect a lot of information yeah it
00:24:34.240
collect a lot of methods and actually
00:24:36.559
the whole things it collect is very
00:24:38.799
large and count can't count any so I
00:24:42.320
just cut maybe 10%age of the whole image
00:24:46.159
so that's the result i think I think
00:24:48.559
it's quite enough for most cases yeah uh
00:24:53.039
last question I'd I'd like to answer is
00:24:55.440
does the effort pay off yeah the
00:24:57.919
engineer is never as simple as to find a
00:25:00.880
better solution it's about the the the
00:25:04.159
payoff i I mean does it deserves or not
00:25:06.960
here is the facts of the STP it's stack
00:25:10.400
scanner and symbolizer do not have much
00:25:12.480
lines of code it have less than 1,000
00:25:16.080
lines so I think it implementation is
00:25:18.640
code size is quite small but to write it
00:25:21.840
we need to understand several things
00:25:24.000
under the hood i mean we need need to
00:25:27.039
understand the GVL the GC the sequence
00:25:29.919
and a little bit about the low-level
00:25:32.480
concurrency after that we need to make
00:25:34.880
the design and the trade off so to be
00:25:37.840
honest the no I introduce additional
00:25:40.880
effort yeah different people may have
00:25:43.200
different standards about does this
00:25:45.279
effort deserve or not so I can't answer
00:25:47.919
it directly yeah but for me this this
00:25:50.720
journey I mean developing this stack
00:25:52.480
profile is quite interesting even it's
00:25:54.880
challenge to learn a large complex
00:25:57.360
system I mean the Ruby VM and and is
00:26:00.640
much more much more much much more hard
00:26:02.880
to debug in it but but I learned a lot
00:26:06.000
so it's interesting and
00:26:08.679
memorable current for the current state
00:26:11.360
I I believe the SDB is still in an
00:26:14.159
experient state it's not production
00:26:15.919
ready because it only supports one ruby
00:26:19.039
version yeah I will support more in the
00:26:21.520
future another thing is I want to re
00:26:24.320
rewrite the symbolizer without the ebpf
00:26:27.360
even the ebpf is the common solution
00:26:30.240
yeah a common solution for observe
00:26:32.240
variability several several several
00:26:34.559
libraries use it for for example the
00:26:36.960
golan open telemetry implementation but
00:26:39.679
I think it's an unnecessary dependency
00:26:42.640
so I plan to remove it and additionally
00:26:45.679
I'd like to to integrate more Ruby
00:26:49.600
internal events it can help us detect
00:26:52.400
the detect the tail more more easily so
00:26:55.919
that's my plan yeah the this part is the
00:27:00.159
related work if you are interested in
00:27:02.559
this field you can check those things
00:27:04.799
but uh due to time reason I will not
00:27:07.120
cover the detail but I'd like to point
00:27:10.320
out one thing uh pre uh people devote
00:27:13.760
quite a lot of effort for making the
00:27:16.400
performance debugging more easier no
00:27:18.559
matter reduce the effort of manual
00:27:20.919
instrumentation or the effort of root
00:27:23.559
analyzing uh several previous works have
00:27:26.880
proved that if we can collect as much
00:27:30.080
information as we can we can make the
00:27:32.400
root cause analyzing super super simple
00:27:35.840
but it also means it introduce a lot of
00:27:38.559
overhead uh the SDB wants to explore
00:27:42.320
that direction but it wants to aim it
00:27:45.360
want it wants to achieve some balance I
00:27:48.080
mean collect as enough information as we
00:27:50.799
can but keep the resource usage much
00:27:53.840
more resource usage as low as possible
00:27:56.480
so it can make as practical and really
00:27:59.200
be used in industry project
00:28:02.360
projects okay to conclude a stack
00:28:05.760
profiler can release the JVL and I think
00:28:08.559
it's it is beneficial um the SDB
00:28:12.000
provides high sampling rate with low
00:28:14.320
performance impactation on the CPU usage
00:28:17.679
and I believe due to those those things
00:28:20.320
I believe the SDB has the potential to
00:28:22.960
be a default observability solution okay
00:28:26.799
that's my drawing thank you thank you
00:28:28.640
very much and the project is available
00:28:31.200
at here yeah okay so some do you feel
00:28:35.200
free to ask questions i hope I have
00:28:37.600
described it clearly
00:28:48.360
yeah okay to to Oh feel free to reach
00:28:51.360
out to me if you have anything if you're
00:28:53.679
interested in this one yeah anyone have
00:28:59.960
question okay okay if if we do not have
00:29:03.760
questions I have one little thing yeah i
00:29:06.000
I have brought some synthetic Ruby yeah
00:29:08.320
it is true Ruby but a syntactic Ruby as
00:29:10.640
small gift so I think as a Ruby
00:29:13.200
programmers this this gift is quite
00:29:15.200
interesting so feel free to reach out me
00:29:17.120
and I will offer some but I I I haven't
00:29:19.600
brought much yeah yeah that's all thank
00:29:23.120
you thank you