Keynote: Performance Bugs and Low-level Ruby Observability APIs

00:00:02.639 hello

00:00:11.160 again Okay let's get started So uh welcome to performance bugs and

00:00:16.960 low-level Ruby observability APIs And I would like to start with good morning Matsyama

00:00:28.560 It's super exciting and uh and I feel super honored to be here to opening uh of the second day of Rubiki Still so

00:00:35.680 many talks to go So let's get started So who am I to be talking to you about this

00:00:41.200 today my name is Ianu I am originally from Portugal and I'm a senior software engineer at Data Dog I work on the data

00:00:49.280 dog Ruby continuous profiler and the reason I started doing that is that I really enjoy kind of making tools that

00:00:56.719 allow us to look at our applications in many different and new ways and see what we can find out what we can learn when

00:01:02.879 we're doing that If you haven't heard about my employer before data dog uh we

00:01:08.159 do modern monitoring and security as a service And specifically for Ruby applications that means that we have

00:01:14.880 profiling distributed tracing test optimization security errors logs Here's

00:01:20.799 a pretty screenshot showing a bunch of metrics and a bunch of information about different endpoints on the flow between

00:01:27.680 uh microservices Uh here's a pretty screenshot of the profiler the product that I work on And this is a flame graph

00:01:34.720 showing where the performance is going for this application And this is another screenshot of the profiler where I'm

00:01:40.720 showing the timeline view where basically we have a list of threads and you can see what every single thread is

00:01:45.840 doing at any point in time So with that said thanks to my employer for sponsoring this trip Let's talk about

00:01:52.720 low-level Ruby observability APIs And what do I mean when I say low-level Ruby

00:01:58.560 observability APIs by low level I mean APIs that are kind of close to the C

00:02:04.799 Ruby VM implementation to details that like usually you don't need to know about and it's even better to ignore

00:02:11.520 them But in some in some cases when you're debugging hard issues or where you're trying to find out what exactly

00:02:17.760 happened in some situation it's a good thing to to know a bit more about these things Uh this means that J Ruby and

00:02:24.720 Truffle Ruby will not kind of be included in this talk but they do offer some amazing tools and I'm super happy

00:02:30.640 to talk about it like any point in time And well observability APIs which is the

00:02:37.120 idea is that they allow you to X-ray what and how your application is running what's it doing exactly and why is it

00:02:44.239 doing it okay so but why um I like to

00:02:49.840 think of uh the how would you know problem which goes a bit like okay if

00:02:55.599 your tests are passing and the app returns the uh the the right result but

00:03:01.760 it takes 5 seconds to return the result Um how would you know if uh maybe you

00:03:08.080 have an n plus one query problem you're going way more often to the database than you think you're going accidentally

00:03:15.040 Or maybe your service is like perfect but it's slow because a different service is slow It just keeps waiting

00:03:22.000 for the other service and your service has done nothing wrong It's like it's perfect Um no bugs uh if there's how

00:03:30.400 would you know if there's like some background thread that's uh using up a bunch of CPU and uh basically adding uh

00:03:37.840 latency to your requests uh how would you know if the Ruby garbage collector is having to collect a lot of uh objects

00:03:44.720 and so it's kind of adding a bunch of latency to your application and well yeah the idea is that you can use Ruby's

00:03:51.040 observability APIs to understand what's going on or more commonly like tools that you build using these APIs which is

00:03:57.680 the topic of this talk you can also use a lot of put tests and uh shout out to

00:04:02.799 Aaron we are all like big uh put test debuggers it's like the best debugger uh when you have nothing else but sometimes

00:04:10.720 some of these uh observability APIs give you a bit more Okay so um observability

00:04:17.759 also helps with the problem if your tests are passing and the app returns the right result in staging but breaks

00:04:24.000 in production So it can give you a lot of information on a lot of different things There's also a related problem

00:04:31.120 that I like to think about which is the Ruby is slow and like I need more air quotes Lots of air quotes Ruby is slow

00:04:37.440 problem which is like oh uh you build something and it doesn't go as fast as

00:04:42.720 possible and you're like okay obviously this is not going faster because Ruby is slow it it it's not that the service

00:04:49.280 maybe is doing N plus1 queries and you didn't figure that out or that like there's some connection pool that was

00:04:55.520 not configured correctly and like a bunch of threads are waiting on it and or the service is using too many threads

00:05:02.240 and you need to kind of uh configure it a bit differently and use a bit more Ruby processes etc So uh like Ruby is

00:05:10.240 not slow It's just we should learn why our applications are taking time and if we learn about that we can actually fix

00:05:16.800 them and uh Ruby is really fast Um you might like have thought about okay

00:05:22.639 there's a lot of existing tools uh the data continuous profiler stack prof verier JVL tools bybug etc And the

00:05:31.520 really cool thing about these tools is that all of them are based on the APIs that we're going to be discussing today

00:05:38.560 And the thing I find really interesting is that um the different combinations of

00:05:44.000 the APIs that we're going to talk about today create very different tools So you

00:05:49.039 use this API and this API and you get a debugger But if you use this API and this API you get the profiler And like

00:05:55.520 it's kind of very interesting to see that all of these tools are very close to each other and they are all accessing

00:06:02.880 very similar kinds of informations And so it's very cool to see uh the underpinnings What's beneath these tools

00:06:09.360 how are they doing their magic and um I also want to see new and interesting

00:06:14.800 combinations Maybe people have tried this this API and this API but no no one tried this one and this one So I really

00:06:21.360 want to see what other things we could be building what other ways we could be looking at our applications and as well

00:06:28.400 like maybe there are things that can be added maybe there are things that can be improved maybe there are things that right now we can't see in Ruby but we

00:06:34.720 would like to So let's get that conversation started and let's improve those APIs Okay with this said let's look at

00:06:42.800 what Ruby offers and what problems we can investigate with it So where the magic happens today

00:06:49.680 we'll be focusing at like kind of two files uh in Ruby this file uh

00:06:55.960 debug.h that offers the trace point APIs it offers the postpone job API the frame

00:07:03.039 profiling APIs the debugging speector APIs and then there's another file that

00:07:08.240 I want to kind of steal uh a couple more points from which is the thread.h file which offers the JVL instrumentation API

00:07:16.479 and the thread specific storage APIs So yes uh I will be showing some C

00:07:23.360 code today I'm sorry in advance and um but to make it easier I have kind of

00:07:30.000 created a new gem called low-level toolkit And I will be using this gem to uh give you an example of how to use

00:07:36.800 these APIs And the whole idea of this gem is to be two things at once It's

00:07:41.919 going to I'm hoping to make it a repository of very um a lot of very

00:07:47.919 small tools that you can use to investigate applications but it's also a living example of how you can use this

00:07:53.360 API So if you want to use them you want to experiment them instead of having to uh start from scratch and figure out how

00:07:59.599 you build like some C code and how you do that you can just like clone this uh um repository and start using it uh and

00:08:06.720 playing with it Okay so let's talk about the tracepoint APIs And the first thing you might start

00:08:14.000 thinking when I say trace point APIs with wait doesn't Ruby already have the

00:08:19.280 trace point class like it's right there in the documentation like a really great documentation a really amazing

00:08:25.879 tool And yes the that's the answer Uh you can use Ruby uh the trace point

00:08:31.919 class uh from all of with all of these events You can use it to figure out when a method gets called when a method

00:08:38.000 returns when a class gets defined uh when an exception gets raised etc from

00:08:43.200 Ruby code But there is a bit more about trace points is that there are a few

00:08:48.880 more special events that are the internal trace point events and you can

00:08:54.160 really only use this internal events from CC code So you cannot uh use these

00:08:59.440 events using the regular Ruby trace point class in your code Um obviously

00:09:05.120 the C code can still use uh all other events So uh basically C code has a bit

00:09:11.120 more of an advantage here because it can use the other events and it can use this events and the the thing about these

00:09:18.640 internal events is that they are separate because they are emitted at very special like situations where uh

00:09:25.600 it's not possible to access many Ruby APIs and I'm guessing this is why they are not wired up to the regular like

00:09:32.480 trace point Ruby class is because it would be really hard to to do that So let's look at a few of these events and

00:09:39.360 what cool things we can do with them So the first one is like let's look at

00:09:44.480 investigation uh object allocation the new object event Uh it can be used to

00:09:49.760 answer questions such as like are my uh applications web requests delayed by GC

00:09:55.200 because you can use it to look at like uh objects that are being created and because if you are creating too many

00:10:02.080 objects this gets expensive for two reasons One is that creating many objects uses up CPU You're doing a lot

00:10:09.040 of work to create those objects And then a bit a bit of um a bit of time after

00:10:14.959 that is that Ruby will have to clean up all of those objects as they become garbage So when you create a lot of

00:10:20.480 objects you kind of add overhead in two places in creating the objects and in uh

00:10:25.680 collecting the all of those objects with the garbage collection So let's build like a tool to investigate object

00:10:32.079 allocation using uh the new object event specifically this tool the track objects

00:10:38.320 created tool and the track objects created tool basically uh executes a

00:10:43.360 block and then at the end returns all of the objects that were created inside the block and you'll see here that uh object

00:10:50.560 new and hello world are directly created like inside this block but the time object actually gets created inside the

00:10:57.279 UTC uh method So this is basically telling me all of the objects that got

00:11:02.640 created even in other methods that are getting called from this code So how can we build this uh very simple uh tool

00:11:10.079 with the track objects uh with the new object event Okay Uh this is where the C

00:11:15.600 code comes in So we create like this uh track objects created function in C and

00:11:21.040 we create this is how you create a trace point from C code So you see RB trace point new It's the equivalent of what

00:11:26.800 you would do in Ruby But now you can pass the special event which is the a new object event which is one of those

00:11:32.800 events that are not available to Ruby code And then we pass a function on new object event which is the function that

00:11:39.360 Ruby is going to call when a new object gets created And then there's a bit more boiler plate We have to call a function

00:11:45.360 to enable the trace point Then we call RB yield to yield back to Ruby and run

00:11:50.880 the block And then we when Ruby returns we disable the trace point and we have

00:11:56.000 to do that filter hidden objects that I will explain in a bit So the next part

00:12:01.760 is the on new object event So when a new object is created Ruby calls this uh function and what this function does is

00:12:08.880 it gets the object that was created and if the object is not an internal object then it it adds it to an array as simple

00:12:16.320 as this uh I will explain a bit what uh the internal objects are in this case

00:12:21.440 Okay so this is one So we are basically adding the all of the objects that get created into our array and then uh the

00:12:28.480 filter hidden objects is like an extra step that we do at the end where we again go through like this is just a

00:12:34.240 loop So we're going through every object that we collected and we are checking if the object still has a class set and if

00:12:41.120 the object doesn't have a class set then we kind of replace that with a nil And this is because uh the objects that

00:12:47.600 don't have the class set are hidden And I will explain a bit what this means Um and that's it Like this is all we needed

00:12:54.720 to implement track objects created And we can now investigate uh in like a given like piece of code uh what are all

00:13:03.120 the objects that got created and what are those objects and uh an interesting thing that you see is that because we

00:13:10.000 are collecting all of the objects that got created whereas you would expect like object new and time and etc to be

00:13:16.160 garbage collected Uh in this case we're actually holding references to them so they don't get garbage collected Um so

00:13:22.639 we can look at them later Okay things to keep in mind when you are

00:13:28.240 uh using this kind of event The uh one is that the objects are empty when the

00:13:33.680 function gets called So you when Ruby calls your uh your function Ruby gives you the object But you if you try to

00:13:41.680 look at the object you will see nothing because the Ruby calls you just after locating the object So before initialize

00:13:48.240 got called So it means that if you'd like to say oh I wanted to say to see what the value of this variable or

00:13:54.240 whatever was in the the object you can't because the object is still empty Uh furthermore you cannot call into most

00:14:01.120 Ruby APIs inside this callback So um so even like you might want to to try to do

00:14:07.760 a few things but you need to be careful because most Ruby APIs you cannot call them uh in this situation and if you do

00:14:14.800 it leads to very confusing crashes and well ask me how I know that it leads to very confusing crashes Um and yes be

00:14:23.199 careful about internal and hidden objects which is why you saw like two different filters You saw me filter if

00:14:29.120 an object was internal and you saw me filter if an object was hidden and those objects I am not returning back to Ruby

00:14:34.720 code So let's talk about that internal and hidden objects Did you know that

00:14:41.519 Ruby actually itself the Ruby VM uses objects to keep like internal details

00:14:48.320 for itself and this can be uh caches this can be method data such as like the

00:14:54.480 information on the method or even like the bite codes uh included modules etc So Ruby has like special objects that it

00:15:02.240 uses to keep its own things Uh but these are regular objects that the garbage collector also uh kind

00:15:09.560 of controls and scans and whatnot And even you can see them sometimes like for instance if you call object space.count

00:15:16.320 count objects you'll see like a a breakdown of different kinds of objects that Ruby's tracking and at the bottom

00:15:22.399 you'll see I memo and I class and these are like some of these uh internal

00:15:27.440 objects that you don't usually see and the like Ruby usually like uh the reason

00:15:34.079 why you you might not even know that this object exists is that all Ruby APIs

00:15:39.360 hide these objects from you So you're never supposed to get an instance of these objects like going back to your

00:15:45.279 Ruby code And that's why uh we were kind of like uh removing those objects the

00:15:50.320 internal and the hidden objects from uh our tracking because uh we shouldn't

00:15:55.360 return them back to Ruby code because sometimes that causes crashes and issues because again they're not supposed to go

00:16:01.199 back to Ruby code but they exist Um and these low-level APIs show them So this

00:16:06.480 is kind of a pattern with the low-level APIs which is well when you're very close to what Ruby is doing sometimes

00:16:12.959 you're seeing some of the magic under the hood and sometimes you need to collaborate with Ruby so that you don't

00:16:18.880 um break the rules but sometimes you can bend them Okay so let's look at a few

00:16:25.199 more events the garbage collector activity like the the garbage collection events are really cool to to answer

00:16:31.600 questions such as when is garbage collect uh collection happening and there's a bunch of events related to

00:16:37.839 like different uh garbage collection uh garbage collection events when Ruby starts when Ry finishes when different

00:16:44.880 stages of garbage collection finished and the uh this is obviously

00:16:50.800 useful if you're kind of trying to understand uh the activity of the garbage collector

00:16:56.720 in Ruby like why is the garbage collector uh kicking in now what's happening in my application and if

00:17:03.600 you've used GCS start from Ruby code that gives you a lot of metrics you can kind of think about this is like

00:17:10.919 dcstat++ so it gives you a few more things you can experiment and get a bit more information that gcstat does not

00:17:17.199 give you so uh let me give you an example let's build this print GC timing

00:17:23.919 uh tool the print GC timing tool What what it does is well we create a new trace point and it will hook on the GC

00:17:31.440 enter and GC exit events So we're looking for when a GC starts and when a

00:17:37.360 GC finishes and then we call we ask Ruby to call our ongc event uh function And

00:17:43.760 uh in our ongc event function what we do is when we get called we need to figure out okay which event are we talking

00:17:50.720 about is it start or is it uh exit if it's start what we do is u or um enter

00:17:58.000 what we do is we get the current time So basically uh we have I've created this very tiny helper function that kind of

00:18:04.480 gives it's the equivalent of time dot now uh but what's the current time we

00:18:09.600 store this current time and then when the um when the garbage collection finishes we get the time again we

00:18:16.480 subtract the current time with the time of start and we can print like how much time the GC took So and in this case

00:18:23.600 we're printing this directly to uh to the standard output So basically whenever garbage collector uh will run

00:18:30.640 we'll get like a very simple print saying like okay the garbage collector run and it took this amount of time And

00:18:36.240 and that's it This is all we needed to print the to build the print DC timing uh tiny tool And here I'm uh explicitly

00:18:44.320 telling Ruby to do a GC both a minor and a major And then we see that we get the

00:18:50.080 output that we wanted uh in the in this case and you will also see that kind of as expected minor GCs

00:18:57.760 are much faster and uh uh to execute and then a major GC actually takes longer

00:19:02.960 but still 2 milliseconds that's that's pretty fast but yeah ideally this is why

00:19:08.240 you want to avoid having many major GCS in your application and ideally you'd want to have Ruby just kind of do minor

00:19:15.280 GCs uh most of the time and um so Yes Uh so next slide So things to

00:19:24.320 keep keep in mind when you're uh doing this work around the GC APIs Uh you

00:19:29.679 cannot allocate any objects which well if you think about it like uh somewhere

00:19:36.160 in the Ruby code uh you're trying to create an object and you're when you're going to create an object Ruby uh looks

00:19:42.320 at it and thinks oh there's no space I need to do a garbage collection And then Ruby does a gar starts doing a garbage

00:19:48.640 collection and then Ruby will would will call into your code and if what would happen if you located an object it would

00:19:55.360 kind of start the cycle again like oh I need to locate an object I need to do garbage collection it would not be very

00:20:00.960 good So you cannot do this and actually Ruby will immediately stop will kind of say nope nope you're doing this wrong Um

00:20:08.080 so this is very helpful but yes uh obviously it means you need to be a bit careful when you're writing this kind of

00:20:13.520 tool uh to not allocate any objects and also you cannot call into most Ruby APIs

00:20:18.559 inside this callback because again most Ruby APIs often they need to create objects uh so kind of uh this this would

00:20:26.799 h uh happen Okay before we talk about the other APIs I will kind of suggest

00:20:32.720 that if you're doing things around trace point and performance you might want to look at the bottom of debug.hage where

00:20:39.360 this event hook thing comes because uh hint hint the trace point in Ruby is

00:20:45.120 actually built a uh top the low-level event hook API And if you're having uh

00:20:50.960 if you're trying to build a trace point that gets executed very often and you want to make it go faster this one is

00:20:56.400 kind of slightly less overhead or at least can be made to be slightly less overhead So you might want to take a

00:21:01.440 look at this undocumented advanced tracing APIs uh if uh it's something

00:21:06.559 that uh you're interested in Okay let's talk about the postpone job APIs uh

00:21:12.640 which is uh a very interesting API So the postpone job API kind of looks like

00:21:17.919 this There's a pre-register function and then a trigger function And the idea of

00:21:23.760 the postpone job API is to ask Ruby to run code safely or at a safe point And

00:21:30.720 if you haven't heard the term safe point uh well it's not surprising because Ruby

00:21:36.159 doesn't actually call it a safe point Uh the term safe point comes from like a

00:21:41.200 bunch of other languages uh it's used in Java Go.NET truffle and it just means

00:21:47.280 like a moment in execution of the code where it's safe to perform certain operations So like when you're thinking

00:21:54.400 of the Ruby VM the Ruby VM sometimes it's kind of in the middle of doing a bunch of things and doing GC and doing

00:22:00.799 whatever and like it's not always safe to just like run Ruby code there And and this is kind of what we mean by a safe

00:22:07.520 point uh in the case of the Ruby VM again like uh it means uh that we can get called when it's safe to call into

00:22:14.480 Ruby code when the JVL is being held not in the middle of DC allocations etc So

00:22:20.640 why would you want this thing uh well the uh the the thing is that like you've

00:22:25.760 kind of saw in the uh when I was talking about the trace points that I was saying oh yeah uh in low-level APIs it's very

00:22:33.840 common that I will say oh you cannot create any objects when you get called in these APIs you cannot uh you can't

00:22:40.159 call into methods written in Ruby code um you can't call most VM APIs and you

00:22:46.640 might not even have the global VM lock when Ruby kind of calls your low-level thing So the idea of the postpone job

00:22:53.760 APIs is to kind of create a bridge between those APIs and doing some of the things that you couldn't Um it's a way

00:23:00.080 of saying to Ruby hey Ruby when you have a moment like can you call this i know

00:23:06.320 that now you might be busy but like when you are a bit like chill call this and so it's a way of uh doing a lot more

00:23:13.039 things from the low-level APIs that you wouldn't uh otherwise be able to do And well let me give you an example So

00:23:20.240 before you saw me doing um a very simple tool that would print uh whenever a

00:23:25.760 garbage collection happened and you and you might not have noted that I kind of I did something on purpose I print uh I

00:23:33.919 did my printing using C code I told like Code to actually print directly because

00:23:38.960 if I try to use any Ruby code to do the printing it wouldn't work because I can't call Ruby code So what if we

00:23:46.400 wanted to do something like this uh we have the ongc finish and ongc finish gets a lambda and our lambda will

00:23:52.960 actually print when the GC finishes and uh we'll also print the uh kind of GC

00:23:58.400 and here again we're like running two different kinds of GC and we want this kind of to be the output how can we

00:24:05.120 build this uh using the postpone job APIs well uh what we'll do is we start

00:24:10.559 by re uh pre-registering like a callback our callback is called uh postpone in this situation so we kind of tell Ruby

00:24:18.080 Okay I want I at some point in the future I will want to call this and then

00:24:23.440 uh we build our ong finish where we save the uh the block or the lambda that the

00:24:31.600 user got uh called us with and then we create a new trace point with the GC

00:24:37.039 exit event and then when the uh exit event happens so when GC finishes

00:24:43.360 instead of like immediately printing what we do is we tell Ruby Okay we're ready Please trigger the postpone job

00:24:50.480 And then in the postpone job we are now able to call into Ruby code So we do this at the bottom which is just we take

00:24:56.799 the call back and uh we call call on it So we call uh call on the on the lambda

00:25:02.799 that we get and that's it This is all that we needed to build this like

00:25:08.000 variant of the uh of our tool that now can call into a Ruby block whenever DC

00:25:13.120 finishes So this is why the postpone job API is really interesting because we kind of can use it to bridge back to

00:25:19.360 Ruby code and to do things that otherwise would be hard to do Okay so uh things to keep in mind

00:25:26.799 when you use the postpone jobs API Uh this API actually got very uh uh it

00:25:33.039 became very different in Ruby 3.3 So it got redesigned and so if you want to use it on an older Ruby it's still there But

00:25:40.640 uh we uh basically what happened was uh we found out that there's like a a very

00:25:46.799 like difficult to solve design problem in the old version of the API and that's why the API got redesigned So if you

00:25:53.200 want to use with older Rubies you still can but you need to take in in in into account that like there you might be uh

00:25:59.760 hit by this problem And uh the the other thing is that the when you call trigger

00:26:05.200 the trigger flag is thread local Uh which means that if you call it from a background thread it might not work as

00:26:11.840 you expect So you might want to experiment a bit and like read a bit into how to use that correctly The other

00:26:18.880 thing to keep in mind is that there is a limit of 32 different postpone jobs that can be registered and no more after that

00:26:26.480 There is no garbage collection There is no removing postpone jobs was like you register 32 and that's it You need to

00:26:33.200 restart Ruby if you want to to to change your your mind and actually the Ruby garbage collector already uses one So

00:26:40.400 actually we only get 31 So try not to use too many in your application because

00:26:46.000 you might want to use a gem that uses like one or two and then well you have a limit Um the other finally thing a final

00:26:53.360 thing to keep in mind is safe point bias and this is a term that also comes from Java and some of the other runtimes

00:27:00.480 which means that basically the the next safe point So when you call trigger Ruby

00:27:05.919 might usually Ruby is pretty fast in like running your code but it might still call into the next method or

00:27:11.919 return from a method before it actually calls uh the thing that you've asked Ruby to do So the next safe point might

00:27:18.320 actually be in a different method And this is often important for profilers or some tools that okay if you're looking

00:27:24.080 at the current method the current method might have changed by the time that the safe point gets

00:27:30.200 executed Okay let's go to the next one the frame profiling APIs And well

00:27:35.760 there's a bunch of them but basically uh the the the two ones at the top give you

00:27:41.200 uh the current back trace and then the ones at the bottom are just like the C functions to get the information from

00:27:47.279 the back trace And so uh this allows you to get the back trace both cheaply and

00:27:52.320 safely and answer questions such as well what methods are being called and where

00:27:57.600 So you will not be very surprised when I tell you that this is like the source of many like pretty uh profilers that like

00:28:04.799 use this to build like beautiful flame graphs of what's going on in your application And an exception I would say

00:28:11.520 if if you were here for the uh SDB profiler session yesterday Well SDB does

00:28:16.960 not use these APIs which is like the the big key point of SDB but like a lot of

00:28:22.399 the other Ruby profilers uh use these APIs and why they use them because the

00:28:28.720 these APIs are really useful because they give you a back trace without allocating new objects or calling into

00:28:34.720 Ruby code And again if you think back to the the things I was saying before where I say oh uh you cannot create new

00:28:41.360 objects you cannot call into Ruby code uh in some of the other APIs This API is

00:28:46.480 kind of built to kind of connect very easily with that because it doesn't need to uh create any objects and this is

00:28:52.799 also makes it very very cheap which is good for profilers because you don't want your profiler to make your

00:28:58.240 application even slower and it can even be used in a signal handler So this is

00:29:03.520 what's called an a sync signal safe function Although I think it's not officially documented as one yet but

00:29:10.799 that's kind of the intention and that's how everyone uh is using it most of the time And uh so yeah it's very similar to

00:29:19.360 getting back trace locations So if you've used like back trace locations it's basically the same thing but it's

00:29:25.279 very it's designed to be used with the other low-level APIs that we've looked at And it's cheaper than back trace

00:29:31.440 locations uh because well uh it's for profilers So let me give you an example

00:29:37.279 Let's try to build this thing uh the last allocation at So inside this block

00:29:43.200 uh when we call track last allocation at we can uh call a bunch of code and here we're just calling into the hello method

00:29:50.080 And when we call last allocation at what we get is we'll get the stack trace of

00:29:56.559 when the last object got created So in this case we see that the object got created inside the hello uh method

00:30:03.039 because it was that object that you see over there So this is kind of like oh uh

00:30:09.120 we want to investigate where objects are getting created and okay let's build this helper

00:30:15.919 So uh we start with our track last location at that we saw being called from Ruby and again we're using a trace

00:30:22.880 point and now we're using the new object event So you can see how these things connect And when the new object gets

00:30:28.880 called what we do is uh we check if it's an internal object If it's an internal object we don't want to touch it Ignore

00:30:35.919 it If not what we do is we create like this temporary buffer And this buffer is

00:30:40.960 created like this is C code So it's not like an allocation And then we call RB profile frames to get the back trace Uh

00:30:48.720 and then what we do is we kind of copy the back trace to uh a temporary Ruby array that I've created at the top So uh

00:30:57.120 so that we can access it later And the um then when you call last allocation at

00:31:03.679 to be able to ask the question okay what where was the last allocation at uh we

00:31:08.720 have to use a small trick to pause the recording of the information because the one thing we want to do is like we want

00:31:15.279 to create a new array and well you don't want to when you create a new array then

00:31:20.320 the last object was your new array so you kind of do that and then what we do

00:31:25.440 is uh we kind of copy um we ask Ruby okay what's the file name what's the

00:31:30.640 method name and we ask uh we copy them into this new array and then we return this information and resume recording

00:31:36.960 again And that's it That's all we need to build like a helper such as this last

00:31:42.399 allocation app And the really cool thing about this helper is that you saw that whenever a new object gets created we do

00:31:49.039 very very very little work And so keeping this enabled uh will not impact

00:31:54.480 your application a lot So this is the kind of thing why this API is useful and in particular it will not create a lot

00:32:00.480 of objects So it will not add more work to the Ruby garbage collector uh because otherwise if you think about it if you

00:32:07.279 create every if every time you create one Ruby object then you create 10 more to record the information about creating

00:32:14.000 the Ruby object then this becomes a problem because well you suddenly you're creating 10 times more objects or 20

00:32:20.799 times more objects than uh uh than you should So that is why this is useful

00:32:25.880 Okay things to keep in mind when you use this kind of thing If you store the values you get from from Ruby in a C

00:32:33.440 data structure you remember that you need to tell the Ruby GC about it And this is like a general thing from every

00:32:39.519 like Ruby native C extension but it's a hard thing It's like hard to forget It's

00:32:45.200 like easy to do Oh I'll just add this here and I I forgot to tell the Ruby garbage collector about it and you will

00:32:51.600 create problems with it Okay let's talk about the debug inspector APIs And the

00:32:57.440 debug inspector APIs give you like uh a nice way to navigate the current uh

00:33:03.279 active back trace and methods And so um you might be like if you're paying

00:33:08.960 attention you might be thinking well wait back traces again didn't we just like wasn't the previous one like the

00:33:15.760 whole thing about back traces why do we need a second API to do back traces and the answer is that the debug inspector

00:33:22.960 is way more uh expensive than frame profiling So uh again like this is the

00:33:28.559 reverse where I was saying the frame profiling is like no object allocations it's very fast etc This one uh is

00:33:36.000 actually uh does object allocations and it's kind of slow and it's that's why

00:33:41.440 it's not very often used by profilers and it also needs to call into Ruby code

00:33:46.720 etc So it has a bunch more restrictions So why is it useful it's useful because it's built for a very different use case

00:33:53.360 It's built for debuggers and the key thing about the debug inspector is that it allows you to

00:33:59.919 go deeper than any other back trace API allows in Ruby So whereas the previous

00:34:05.600 API I was saying oh this is the same thing as back trace locations the debug inspector is not the same thing as back

00:34:12.399 trace locations It's much more powerful It allows you to access the objects that

00:34:17.520 methods are getting called on It allows you to access the binding objects for those uh calls that are happening which

00:34:24.639 means you can read and change arguments that from those calls You can read and change local variables etc You can even

00:34:31.919 access the instruction sequences and you can then ask Ruby okay print the bite code of the method that's getting called

00:34:38.560 like still here on the stack So this is the debug inspector goes much deeper So

00:34:43.760 let me give you an example uh I'll jump directly to C code and I will say I will build two helpers the who called me

00:34:50.639 helper and the who called me binding and the idea of this is like who called me is like when you call it from a Ruby

00:34:56.800 code it will give you the object that called you and the who called me binding it's like the binding of the object that

00:35:03.359 called you and we do this by calling uh debug inspector open and then inside our

00:35:09.839 callback in debug inspector uh we basically uh if we ask for the who

00:35:15.280 called me We use um RB debug inspector uh frames of get If we ask for the

00:35:20.800 binding we get uh debug inspector frame binding get and then the depth The idea

00:35:26.079 of the depth is that like the current method when we call uh current one like

00:35:31.200 depth zero is us and then one is the color and then we want the color of our

00:35:36.960 collar So we need to ask for depth two and that's why the two is there But this

00:35:42.320 is a bit abstract So let me uh show you uh what cool things you can do with this Uh imagine you have this uh very simple

00:35:50.320 hello class And when when you call say hello you pass it a two object which uh

00:35:56.079 hello will call hello on And there's a secret And the idea is the secret is that like the secret is a local is an

00:36:02.560 argument So it's just there like nobody can see the secret And then we create the spy object And the spy object

00:36:09.599 implements hello exclamation mark And uh it uses the uh who called me and who

00:36:15.839 called me binding to actually access the secret and and see what object called it

00:36:21.280 and what the value of the secret was And so when we create a new instance of hello and we tell it to say hello to the

00:36:28.400 spy and then we give it like a secret trust no one actually uh the spy is able

00:36:34.720 to say this is the object that called me and this is the secret of the object and

00:36:40.160 you like you can see how like otherwise this would never be something that you could do in Ruby

00:36:46.720 So um things to keep in mind when you using the debug inspector uh there's

00:36:52.560 already gems for this So actually if you use the binding of color gems or the debug inspector gems they basically

00:36:59.920 already the whole thing that these gems do is give you access to the debug inspector from Ruby code So uh you don't

00:37:06.400 actually need to write C code to use the debug inspector But this is a really powerful thing that you cannot do in any

00:37:13.280 other way in Ruby Okay let's go on to the last one the JVL

00:37:18.720 or global VM lock instrumentation API And wait if you haven't kind of uh I

00:37:24.800 I've done a few presentations in the past about the JVL but if you skip my presentations and you go like what's

00:37:30.640 this JVL thing again um here's the global VM lock explained in one slide Uh

00:37:38.079 it's similar to the Python gill the global interpreter lock It's just we call it differently on Ruby It's uh it's

00:37:45.839 something inside the Ruby VM that protects the VM state Um it's actually

00:37:51.599 not global anymore Uh if you're using multiple raptors you will have multiple

00:37:57.160 JVLs But if you're not using Raptors in your application you will only have one So you're back to kind of having one And

00:38:05.359 it prevents Ruby code execution parallelism So it means that multiple Ruby threads cannot be executing Ruby

00:38:11.359 code in parallel at the same time But waiting can happen in parallel So if you're doing a bunch of like calls to

00:38:18.000 the database and a bunch of network calls this is fine The JVL only prevents running multiple threads of Ruby code at

00:38:24.320 the same time And this means that like the JVL can impact your application latency if you have multiple threads

00:38:31.119 that are trying to uh that need it to work at the same time So this is the context of the JVL instrumentation API

00:38:38.160 is to be able to know more about the global VM lock and you basically get a

00:38:43.920 bunch of events that we can subscribe to and then there's an API to subscribe to these events So it's kind of similar to

00:38:50.320 trace point but it's not called trace point It's slightly different And so

00:38:55.440 this this API allows you to track Ruby thread activity And so you can answer

00:39:00.560 questions such as are my web requests being delayed by other requests or other threads in general So it allows you to

00:39:08.320 know um when threads are running which Ruby calls resumed or um waiting for

00:39:14.960 something to happen which uh you it's called the suspend event or uh want the

00:39:20.560 JVL which means ready and at the top you see a screenshot from the JVL

00:39:26.800 instrumentation gem which is like a gem that I created where it kind of gets this information from the JVL

00:39:32.400 instrumentation API and just print uh generates it in like a visual timeline

00:39:37.839 but this is the kind of information that you get from this API This is without this API this gem could not exist So

00:39:45.040 let's see what we can use what we can do when we use this API directly Let's say that for instance we have um we have a

00:39:52.640 Ruby method that just kind of uh uses up a bunch of CPU by uh incrementing a counter in a loop And then uh let's

00:40:00.160 build a track once JVL helper uh in which what we're going to do is we create two threads Each of them is going

00:40:06.960 to run that loop and then we uh join the threads and then at the end what we'll

00:40:12.480 want to know is like how how much time those threads spent waiting for the global VM lock So to do that let's build

00:40:20.079 track once JVL We create the add event uh we call add event hook We say that

00:40:25.760 we're interested in the started ready and resumed event and we want Ruby to

00:40:30.880 call us uh on our on thread event um function and and then inside on thread

00:40:37.760 event uh we get the information of the current thread If the thread got started we do some bookkeeping and I will tell

00:40:44.400 you a secret is that actually the the way I did it in the gem right now it's technically not not completely safe So I

00:40:52.560 kind of I need to fix that So uh imagine that that uh is very safe right now and

00:40:58.480 the um when the there's a ready event which means that the thread wants the

00:41:03.920 JVL what we do is we store the information in like a thread local

00:41:09.119 variable So there's this API the um internal thread specific set where uh

00:41:14.800 that is very useful to get like thread local APIs in um in from C code So you

00:41:20.079 basically say we get the current time and we record okay this thread started

00:41:25.200 uh to said that it wanted the JVL at this point in time and then when the

00:41:30.480 thread resumes which means the thread is finally uh it gets the JVL and can run

00:41:36.160 what we do is well we get the time that the thread started waiting we we get

00:41:41.359 also the total time the thread has spent waiting and we basically kind of add that and store all that information in

00:41:46.880 thread local There's a bunch of boilerplate C code and this is a problem with C code Like if you did this with

00:41:53.280 Ruby code it would be like I don't know like three lines of Ruby code and it would be very beautiful But in C code

00:41:58.560 there's always more code Uh but yeah basically I'm reading the true the two

00:42:03.920 um thread local variables and I'm storing the result uh in the total uh

00:42:09.160 value and then this is all that's that's needed So what we do is uh we call the

00:42:16.079 track once JVL and at the end we get a list of threads and how much time those

00:42:21.520 threads spent waiting for the JVL Uh here the result is in nanconds but I'm

00:42:26.560 converting it to seconds because well it's much nicer to read And let's print this And here's the result We can

00:42:33.440 actually see that these two threads they uh they each run for um uh each run for

00:42:39.920 some time but they both needed to wait 1.8 8 seconds to be uh for the JVL So

00:42:46.480 basically in total this took 3.71 uh seconds but both threads had to wait for

00:42:52.000 a bunch of time because they were kind of fighting for the JVL Okay So things to keep in mind when

00:42:58.800 you use this API Um the resumed event only the resumed event gets called with

00:43:05.280 the JVL uh being held For all of the others you cannot call in Ruby API So

00:43:10.480 postpone job uh is your friend and it's this thread internal thread specific API

00:43:16.480 is very useful to do this kind of thing where you want to record information inside the thread itself and um yes I'm

00:43:23.280 glad this API got added because before it was very hard to do this and also if

00:43:28.800 you want this kind of information as a metric as I was showing you in our example the JVL tools gem already offers

00:43:35.359 a lot of nice counters but maybe there's a counter that it doesn't offer yet and you want to experiment a bit So you can

00:43:40.880 experiment and maybe you can then open a PR on JVL tools Okay So you saw a bunch

00:43:47.200 of things a bunch of like very small things So you you might be convinced now that it's very easy to build very small

00:43:53.319 utilities But let's put all of the things we've learned together and let's build an actual new profiler in the time

00:44:00.000 that we have left in this presentation Uh I can do that So let's build a new

00:44:05.119 kind of Ruby profiler that I don't think anyone has built before which is a release JVL profiler And what is a

00:44:12.400 release JVL profiler a release JVL profiler is a profiler that answers the question when is the JVL being

00:44:19.040 explicitly released in our application Um and the global VM lock got gets explicitly released when Ruby is about

00:44:26.240 to doing like some network calls or it's about to do some operation on files or you're going to like the database using

00:44:32.880 a native database driver So we're going to get information about this Plus when

00:44:38.000 native wants to do a bunch of work it may also release the JVL So we want to see where all of these points are

00:44:44.880 happening in our application And by knowing when the JVL gets released we can see the impact of this which is

00:44:51.359 basically also places in our application where Ruby will switch away from the current thread So if you have multiple

00:44:57.200 requests you might go from the current request to another one because well Ruby has released the JVL And here's an

00:45:04.640 example of some code that we want to profile with our release JVL profiler This code actually gets the counter loop

00:45:11.280 again and then it does like a network call to get the Google front page It reads like a a very big file It lists

00:45:18.079 the file the the which files are inside that directory with dear And it also

00:45:23.599 does a bit of compression uh of the big file that we read And this is the result

00:45:28.640 like this is a a beautiful flame graph that I've generated using our new release JVL profiler And you can kind of

00:45:35.440 see that there on the left uh we see the multiple kind of things where Ruby uh is

00:45:41.760 doing work uh to read the Google uh homepage So we can see like multiple

00:45:46.880 stacks and multiple things that Ruby need to do for that We can also see there where Ruby is calling io readads

00:45:53.280 to read from the file and you can also see there where Ruby is uh compressing

00:45:58.800 uh the file and releases the JVL And here you see at the top we're the

00:46:04.480 the width of the flame graph represents time So how much time our very simple

00:46:09.520 application took um with the JVL being released What you might not see is like

00:46:15.760 the counter loop does not show there because the counter loop never releases the JVL Uh Ruby might take away the JVL

00:46:22.240 from this thread but this uh the code never uh explicitly releases the JVL And

00:46:28.480 you might not have seen like the listing of the directory and which is which is because like listing directories it's

00:46:34.560 very fast operation But if we switch our profiler from time to count or how many

00:46:40.319 times the JVL got released actually it shows up very quickly that the

00:46:45.640 dear The JVL 180 times and I checked 180 times is the number of files I had in

00:46:52.400 that directory So basically Ruby will release the JVL once for every file that's in that directory

00:46:58.240 Um okay Let's build our release JVL profiler Um oh before we do that I will

00:47:04.800 say it's often surprisingly easy to generate the output in a format that can be visualized uh using existing UI tools

00:47:12.400 So in that example you saw here I'm using the speedcope.app uh website and it's very easy uh to

00:47:19.920 generate output to like these kind of tools speed scope peretto firefox profiler sometimes even Excel So pro tip

00:47:27.520 sometimes you're looking at the data and you just have like a big text file with a bunch of like outputs Try to see if

00:47:33.680 you can generate in a format that can be loaded in a tool because it's often very useful to look at that data using a UI

00:47:41.200 tool Okay so part number one when you call really JVL profiler uh we add the

00:47:47.359 internal event hook We're using the JVL instrumentation API and we want to hear about suspended and resumed threads And

00:47:54.240 then we have the usual boiler plate We yield back to Ruby code and then at the end we kind of stop the event hook and

00:48:01.520 then we will do some work to write the output to the format that speed scope needs Okay Second part when there's an

00:48:09.520 event and uh we looking for the event suspended and that magic uh Ruby thread

00:48:15.520 has JVLP function I hadn't talked about that one before but basically it tells

00:48:20.720 you if the thread explicitly released the JVL or not So this is what how we

00:48:25.920 can tell when like a thread actually said like oh I want to release the JVL because I want to do something without

00:48:31.280 the JVL And what we do well again we just record the current time stamp uh so

00:48:36.400 we can use it later and then later when the thread is uh able to resume again we

00:48:42.960 read the the time uh in which it kind of released the JVL and then we get the uh

00:48:49.680 current uh back trace And here I'm not using the uh fra profile frames API I'm

00:48:55.680 not using the debug inspector API The I'm using the RB make back trace Why

00:49:01.280 because it's it's shorter to fit in the slides in this case But again like if you wanted to make this profiler like

00:49:07.359 lower overhead you could use RB profile frames If you wanted to make this profiler get more information you could

00:49:12.880 use the debug inspector So that's the idea of this different API existing It's like for different use cases My use

00:49:19.040 cases here is like make it easier to fit in the slides Um and what I'm doing is

00:49:24.559 I'm basically I have a Ruby ash and I'm recording for every uh stack trace I

00:49:29.680 record some information So I first check if I already have information if not I initialize with zeros and if if I do

00:49:36.640 what I do is like basically I count the time spent without the JVL And at the

00:49:42.640 bottom you see basically I keep two counters I keep a counter which is like how much time we spent without the JVL

00:49:48.079 and a counter which is like how many times we release the JVL So we can create the two kinds of profiles that I

00:49:53.520 was showing you the count and the time profile And then that's it The only thing we need to do is to write the

00:50:00.240 stacks in a format that speed scope understands And this is like a lot of boiler plate and I will admit that I've

00:50:06.559 kind of simplified this but basically we're going through the hash with the information that we recorded we have to

00:50:12.880 print out the uh the stacked trace that we got and then we print the counters that we recorded and this is the result

00:50:20.319 This is how it looks It's basically a tech two text files At the top you see a text file with the counts So we see we

00:50:27.119 kind of record the stacked trace and then uh 180 times in the do children and

00:50:32.640 then at the other file is okay we record the stack trace and then uh the time in n seconds that we spent and that's it We

00:50:41.440 built a full new profiler in 107 lines of code and I almost made it fit in the

00:50:46.960 in three slides So before I go let me uh

00:50:52.480 switch gears a bit and talk about performance bugs Um so um and yes uh

00:50:59.839 this is the uh I'm not sure if this counts as uh this being like another talk about profilers in Rubiki So if if

00:51:07.200 that counts uh this year we had three talks about profiler again If not this if it maybe it's like two and a half

00:51:12.800 talks about profilers at Rubik Let's talk about performance bugs

00:51:18.680 Um so are performance bugs really bugs like if something takes 10 seconds but

00:51:26.559 then gives you the right result is it a bug what if it takes one hour is it a

00:51:32.800 bug if it takes one day one week So like performance bugs like sometimes people call them problems or issues but in a

00:51:39.760 way they are usually about resources They're about how much X you use to to

00:51:45.920 to to give back an answer which might still be a correct answer Uh it might be

00:51:51.200 about like how much CPU you need to use to give an answer It might be about how much memory you need to give an answer

00:51:57.839 Uh it might be how much time you take to give that answer And obviously if you run out of that resource if you run out

00:52:04.800 of CPU time whatever then you might crash the application So but the key thing about to think about them is to

00:52:10.800 think about impact and does their priority So when thinking about this you

00:52:16.319 should kind of think about like is this bug uh is this performance bug breaking

00:52:21.640 production how is this affecting the user experience uh will this take down our site on Black Friday is this costing

00:52:30.319 us a lot of money and kind of if any of these are true then you should usually

00:52:36.319 go and like let's fix this We should fix this And uh and if you're not sure how

00:52:41.920 to answer these questions like observability tools can answer can help you answer these questions of what's the

00:52:48.000 impact of this problem and also where is this problem coming from So this is the tools that we looked at today that you

00:52:53.599 can build using Ruby uh Ruby's APIs But like in cases where uh none of these are

00:53:01.040 true I would say use your best judgment Like sometimes uh I always um we have

00:53:06.319 the impulse to kind of optimize something But sometimes uh that something does not need to be optimized

00:53:12.240 and we could do other things So use your best judgment Okay I am very excited to

00:53:18.960 see what tools you will build with debug.h page and what things what bending of rules and things you

00:53:25.200 shouldn't be doing uh you will do with the knowledge you've learned today and I'm very looking forward to seeing

00:53:31.280 observability tools on stage during this and future Rubikes and thank you that

00:53:36.559 was my presentation