Keynote: Performance Bugs and Low-level Ruby Observability APIs


Summarized using AI

Keynote: Performance Bugs and Low-level Ruby Observability APIs

Ivo Anjo • April 17, 2025 • Matsuyama, Ehime, Japan • Keynote

The keynote presentation, titled "Performance Bugs and Low-level Ruby Observability APIs" by Ivo Anjo at RubyKaigi 2025, focuses on enhancing the understanding of Ruby's performance through effective observability APIs. The speaker introduces the concept of low-level observability APIs that delve into the C-level internals of the Ruby virtual machine, allowing developers to debug performance issues more effectively. Key topics discussed include the following:

  • Introduction to the Speaker and Context: Ivo Anjo, a senior software engineer at Data Dog, highlights the importance of monitoring applications and emphasizes the potential of Ruby’s observability tools for performance analysis.

  • Definition of Low-level Observability APIs: These APIs provide insights into application behavior and performance issues that are not commonly visible through standard debugging tools. They can help detect problems like N+1 queries, thread latencies, and garbage collection delays.

  • Utilization of Trace Point and Debug APIs: Anjo explains the usage of trace points to capture significant events in Ruby applications. This includes how to track object allocations and garbage collection timing.

    • He advocates that understanding the performance bottlenecks can help developers debunk the myth that Ruby is inherently slow. Instead, it often arises from inefficient queries or misconfigured services.
  • Examples of Tools and Code Implementations: Throughout the presentation, Anjo presents examples of how to create tools using low-level APIs. He introduces a new gem called 'low-level toolkit' that provides small utilities for investigating Ruby applications.

  • The Garbage Collector (GC) APIs: An explanation is given on how to track garbage collection events and manage the identifiers for different objects being collected, which can enhance profiling accuracy.

  • JVL Instrumentation: The presentation also covers the Global VM Lock (GVL) and its impact on performance. Anjo demonstrates how to track thread activity related to the GVL, helping to identify where Ruby threads may be waiting unnecessarily.

  • Conclusion on Performance Bugs: Anjo concludes by addressing whether performance issues should be considered bugs. He mentions the impact of performance on user experience and how observability tools can help identify and quantify these concerns. Use of observability tools is emphasized to manage and mitigate these performance bugs effectively.

In summary, the presentation invites developers to explore Ruby's low-level APIs to improve their understanding of application performance, urging them to embrace the tools available to address observability and profiling challenges effectively.

Keynote: Performance Bugs and Low-level Ruby Observability APIs
Ivo Anjo • Matsuyama, Ehime, Japan • Keynote

Date: April 17, 2025
Published: May 27, 2025
Announced: unknown

RubyKaigi 2025

00:00:02.639 hello
00:00:11.160 again Okay let's get started So uh welcome to performance bugs and
00:00:16.960 low-level Ruby observability APIs And I would like to start with good morning Matsyama
00:00:28.560 It's super exciting and uh and I feel super honored to be here to opening uh of the second day of Rubiki Still so
00:00:35.680 many talks to go So let's get started So who am I to be talking to you about this
00:00:41.200 today my name is Ianu I am originally from Portugal and I'm a senior software engineer at Data Dog I work on the data
00:00:49.280 dog Ruby continuous profiler and the reason I started doing that is that I really enjoy kind of making tools that
00:00:56.719 allow us to look at our applications in many different and new ways and see what we can find out what we can learn when
00:01:02.879 we're doing that If you haven't heard about my employer before data dog uh we
00:01:08.159 do modern monitoring and security as a service And specifically for Ruby applications that means that we have
00:01:14.880 profiling distributed tracing test optimization security errors logs Here's
00:01:20.799 a pretty screenshot showing a bunch of metrics and a bunch of information about different endpoints on the flow between
00:01:27.680 uh microservices Uh here's a pretty screenshot of the profiler the product that I work on And this is a flame graph
00:01:34.720 showing where the performance is going for this application And this is another screenshot of the profiler where I'm
00:01:40.720 showing the timeline view where basically we have a list of threads and you can see what every single thread is
00:01:45.840 doing at any point in time So with that said thanks to my employer for sponsoring this trip Let's talk about
00:01:52.720 low-level Ruby observability APIs And what do I mean when I say low-level Ruby
00:01:58.560 observability APIs by low level I mean APIs that are kind of close to the C
00:02:04.799 Ruby VM implementation to details that like usually you don't need to know about and it's even better to ignore
00:02:11.520 them But in some in some cases when you're debugging hard issues or where you're trying to find out what exactly
00:02:17.760 happened in some situation it's a good thing to to know a bit more about these things Uh this means that J Ruby and
00:02:24.720 Truffle Ruby will not kind of be included in this talk but they do offer some amazing tools and I'm super happy
00:02:30.640 to talk about it like any point in time And well observability APIs which is the
00:02:37.120 idea is that they allow you to X-ray what and how your application is running what's it doing exactly and why is it
00:02:44.239 doing it okay so but why um I like to
00:02:49.840 think of uh the how would you know problem which goes a bit like okay if
00:02:55.599 your tests are passing and the app returns the uh the the right result but
00:03:01.760 it takes 5 seconds to return the result Um how would you know if uh maybe you
00:03:08.080 have an n plus one query problem you're going way more often to the database than you think you're going accidentally
00:03:15.040 Or maybe your service is like perfect but it's slow because a different service is slow It just keeps waiting
00:03:22.000 for the other service and your service has done nothing wrong It's like it's perfect Um no bugs uh if there's how
00:03:30.400 would you know if there's like some background thread that's uh using up a bunch of CPU and uh basically adding uh
00:03:37.840 latency to your requests uh how would you know if the Ruby garbage collector is having to collect a lot of uh objects
00:03:44.720 and so it's kind of adding a bunch of latency to your application and well yeah the idea is that you can use Ruby's
00:03:51.040 observability APIs to understand what's going on or more commonly like tools that you build using these APIs which is
00:03:57.680 the topic of this talk you can also use a lot of put tests and uh shout out to
00:04:02.799 Aaron we are all like big uh put test debuggers it's like the best debugger uh when you have nothing else but sometimes
00:04:10.720 some of these uh observability APIs give you a bit more Okay so um observability
00:04:17.759 also helps with the problem if your tests are passing and the app returns the right result in staging but breaks
00:04:24.000 in production So it can give you a lot of information on a lot of different things There's also a related problem
00:04:31.120 that I like to think about which is the Ruby is slow and like I need more air quotes Lots of air quotes Ruby is slow
00:04:37.440 problem which is like oh uh you build something and it doesn't go as fast as
00:04:42.720 possible and you're like okay obviously this is not going faster because Ruby is slow it it it's not that the service
00:04:49.280 maybe is doing N plus1 queries and you didn't figure that out or that like there's some connection pool that was
00:04:55.520 not configured correctly and like a bunch of threads are waiting on it and or the service is using too many threads
00:05:02.240 and you need to kind of uh configure it a bit differently and use a bit more Ruby processes etc So uh like Ruby is
00:05:10.240 not slow It's just we should learn why our applications are taking time and if we learn about that we can actually fix
00:05:16.800 them and uh Ruby is really fast Um you might like have thought about okay
00:05:22.639 there's a lot of existing tools uh the data continuous profiler stack prof verier JVL tools bybug etc And the
00:05:31.520 really cool thing about these tools is that all of them are based on the APIs that we're going to be discussing today
00:05:38.560 And the thing I find really interesting is that um the different combinations of
00:05:44.000 the APIs that we're going to talk about today create very different tools So you
00:05:49.039 use this API and this API and you get a debugger But if you use this API and this API you get the profiler And like
00:05:55.520 it's kind of very interesting to see that all of these tools are very close to each other and they are all accessing
00:06:02.880 very similar kinds of informations And so it's very cool to see uh the underpinnings What's beneath these tools
00:06:09.360 how are they doing their magic and um I also want to see new and interesting
00:06:14.800 combinations Maybe people have tried this this API and this API but no no one tried this one and this one So I really
00:06:21.360 want to see what other things we could be building what other ways we could be looking at our applications and as well
00:06:28.400 like maybe there are things that can be added maybe there are things that can be improved maybe there are things that right now we can't see in Ruby but we
00:06:34.720 would like to So let's get that conversation started and let's improve those APIs Okay with this said let's look at
00:06:42.800 what Ruby offers and what problems we can investigate with it So where the magic happens today
00:06:49.680 we'll be focusing at like kind of two files uh in Ruby this file uh
00:06:55.960 debug.h that offers the trace point APIs it offers the postpone job API the frame
00:07:03.039 profiling APIs the debugging speector APIs and then there's another file that
00:07:08.240 I want to kind of steal uh a couple more points from which is the thread.h file which offers the JVL instrumentation API
00:07:16.479 and the thread specific storage APIs So yes uh I will be showing some C
00:07:23.360 code today I'm sorry in advance and um but to make it easier I have kind of
00:07:30.000 created a new gem called low-level toolkit And I will be using this gem to uh give you an example of how to use
00:07:36.800 these APIs And the whole idea of this gem is to be two things at once It's
00:07:41.919 going to I'm hoping to make it a repository of very um a lot of very
00:07:47.919 small tools that you can use to investigate applications but it's also a living example of how you can use this
00:07:53.360 API So if you want to use them you want to experiment them instead of having to uh start from scratch and figure out how
00:07:59.599 you build like some C code and how you do that you can just like clone this uh um repository and start using it uh and
00:08:06.720 playing with it Okay so let's talk about the tracepoint APIs And the first thing you might start
00:08:14.000 thinking when I say trace point APIs with wait doesn't Ruby already have the
00:08:19.280 trace point class like it's right there in the documentation like a really great documentation a really amazing
00:08:25.879 tool And yes the that's the answer Uh you can use Ruby uh the trace point
00:08:31.919 class uh from all of with all of these events You can use it to figure out when a method gets called when a method
00:08:38.000 returns when a class gets defined uh when an exception gets raised etc from
00:08:43.200 Ruby code But there is a bit more about trace points is that there are a few
00:08:48.880 more special events that are the internal trace point events and you can
00:08:54.160 really only use this internal events from CC code So you cannot uh use these
00:08:59.440 events using the regular Ruby trace point class in your code Um obviously
00:09:05.120 the C code can still use uh all other events So uh basically C code has a bit
00:09:11.120 more of an advantage here because it can use the other events and it can use this events and the the thing about these
00:09:18.640 internal events is that they are separate because they are emitted at very special like situations where uh
00:09:25.600 it's not possible to access many Ruby APIs and I'm guessing this is why they are not wired up to the regular like
00:09:32.480 trace point Ruby class is because it would be really hard to to do that So let's look at a few of these events and
00:09:39.360 what cool things we can do with them So the first one is like let's look at
00:09:44.480 investigation uh object allocation the new object event Uh it can be used to
00:09:49.760 answer questions such as like are my uh applications web requests delayed by GC
00:09:55.200 because you can use it to look at like uh objects that are being created and because if you are creating too many
00:10:02.080 objects this gets expensive for two reasons One is that creating many objects uses up CPU You're doing a lot
00:10:09.040 of work to create those objects And then a bit a bit of um a bit of time after
00:10:14.959 that is that Ruby will have to clean up all of those objects as they become garbage So when you create a lot of
00:10:20.480 objects you kind of add overhead in two places in creating the objects and in uh
00:10:25.680 collecting the all of those objects with the garbage collection So let's build like a tool to investigate object
00:10:32.079 allocation using uh the new object event specifically this tool the track objects
00:10:38.320 created tool and the track objects created tool basically uh executes a
00:10:43.360 block and then at the end returns all of the objects that were created inside the block and you'll see here that uh object
00:10:50.560 new and hello world are directly created like inside this block but the time object actually gets created inside the
00:10:57.279 UTC uh method So this is basically telling me all of the objects that got
00:11:02.640 created even in other methods that are getting called from this code So how can we build this uh very simple uh tool
00:11:10.079 with the track objects uh with the new object event Okay Uh this is where the C
00:11:15.600 code comes in So we create like this uh track objects created function in C and
00:11:21.040 we create this is how you create a trace point from C code So you see RB trace point new It's the equivalent of what
00:11:26.800 you would do in Ruby But now you can pass the special event which is the a new object event which is one of those
00:11:32.800 events that are not available to Ruby code And then we pass a function on new object event which is the function that
00:11:39.360 Ruby is going to call when a new object gets created And then there's a bit more boiler plate We have to call a function
00:11:45.360 to enable the trace point Then we call RB yield to yield back to Ruby and run
00:11:50.880 the block And then we when Ruby returns we disable the trace point and we have
00:11:56.000 to do that filter hidden objects that I will explain in a bit So the next part
00:12:01.760 is the on new object event So when a new object is created Ruby calls this uh function and what this function does is
00:12:08.880 it gets the object that was created and if the object is not an internal object then it it adds it to an array as simple
00:12:16.320 as this uh I will explain a bit what uh the internal objects are in this case
00:12:21.440 Okay so this is one So we are basically adding the all of the objects that get created into our array and then uh the
00:12:28.480 filter hidden objects is like an extra step that we do at the end where we again go through like this is just a
00:12:34.240 loop So we're going through every object that we collected and we are checking if the object still has a class set and if
00:12:41.120 the object doesn't have a class set then we kind of replace that with a nil And this is because uh the objects that
00:12:47.600 don't have the class set are hidden And I will explain a bit what this means Um and that's it Like this is all we needed
00:12:54.720 to implement track objects created And we can now investigate uh in like a given like piece of code uh what are all
00:13:03.120 the objects that got created and what are those objects and uh an interesting thing that you see is that because we
00:13:10.000 are collecting all of the objects that got created whereas you would expect like object new and time and etc to be
00:13:16.160 garbage collected Uh in this case we're actually holding references to them so they don't get garbage collected Um so
00:13:22.639 we can look at them later Okay things to keep in mind when you are
00:13:28.240 uh using this kind of event The uh one is that the objects are empty when the
00:13:33.680 function gets called So you when Ruby calls your uh your function Ruby gives you the object But you if you try to
00:13:41.680 look at the object you will see nothing because the Ruby calls you just after locating the object So before initialize
00:13:48.240 got called So it means that if you'd like to say oh I wanted to say to see what the value of this variable or
00:13:54.240 whatever was in the the object you can't because the object is still empty Uh furthermore you cannot call into most
00:14:01.120 Ruby APIs inside this callback So um so even like you might want to to try to do
00:14:07.760 a few things but you need to be careful because most Ruby APIs you cannot call them uh in this situation and if you do
00:14:14.800 it leads to very confusing crashes and well ask me how I know that it leads to very confusing crashes Um and yes be
00:14:23.199 careful about internal and hidden objects which is why you saw like two different filters You saw me filter if
00:14:29.120 an object was internal and you saw me filter if an object was hidden and those objects I am not returning back to Ruby
00:14:34.720 code So let's talk about that internal and hidden objects Did you know that
00:14:41.519 Ruby actually itself the Ruby VM uses objects to keep like internal details
00:14:48.320 for itself and this can be uh caches this can be method data such as like the
00:14:54.480 information on the method or even like the bite codes uh included modules etc So Ruby has like special objects that it
00:15:02.240 uses to keep its own things Uh but these are regular objects that the garbage collector also uh kind
00:15:09.560 of controls and scans and whatnot And even you can see them sometimes like for instance if you call object space.count
00:15:16.320 count objects you'll see like a a breakdown of different kinds of objects that Ruby's tracking and at the bottom
00:15:22.399 you'll see I memo and I class and these are like some of these uh internal
00:15:27.440 objects that you don't usually see and the like Ruby usually like uh the reason
00:15:34.079 why you you might not even know that this object exists is that all Ruby APIs
00:15:39.360 hide these objects from you So you're never supposed to get an instance of these objects like going back to your
00:15:45.279 Ruby code And that's why uh we were kind of like uh removing those objects the
00:15:50.320 internal and the hidden objects from uh our tracking because uh we shouldn't
00:15:55.360 return them back to Ruby code because sometimes that causes crashes and issues because again they're not supposed to go
00:16:01.199 back to Ruby code but they exist Um and these low-level APIs show them So this
00:16:06.480 is kind of a pattern with the low-level APIs which is well when you're very close to what Ruby is doing sometimes
00:16:12.959 you're seeing some of the magic under the hood and sometimes you need to collaborate with Ruby so that you don't
00:16:18.880 um break the rules but sometimes you can bend them Okay so let's look at a few
00:16:25.199 more events the garbage collector activity like the the garbage collection events are really cool to to answer
00:16:31.600 questions such as when is garbage collect uh collection happening and there's a bunch of events related to
00:16:37.839 like different uh garbage collection uh garbage collection events when Ruby starts when Ry finishes when different
00:16:44.880 stages of garbage collection finished and the uh this is obviously
00:16:50.800 useful if you're kind of trying to understand uh the activity of the garbage collector
00:16:56.720 in Ruby like why is the garbage collector uh kicking in now what's happening in my application and if
00:17:03.600 you've used GCS start from Ruby code that gives you a lot of metrics you can kind of think about this is like
00:17:10.919 dcstat++ so it gives you a few more things you can experiment and get a bit more information that gcstat does not
00:17:17.199 give you so uh let me give you an example let's build this print GC timing
00:17:23.919 uh tool the print GC timing tool What what it does is well we create a new trace point and it will hook on the GC
00:17:31.440 enter and GC exit events So we're looking for when a GC starts and when a
00:17:37.360 GC finishes and then we call we ask Ruby to call our ongc event uh function And
00:17:43.760 uh in our ongc event function what we do is when we get called we need to figure out okay which event are we talking
00:17:50.720 about is it start or is it uh exit if it's start what we do is u or um enter
00:17:58.000 what we do is we get the current time So basically uh we have I've created this very tiny helper function that kind of
00:18:04.480 gives it's the equivalent of time dot now uh but what's the current time we
00:18:09.600 store this current time and then when the um when the garbage collection finishes we get the time again we
00:18:16.480 subtract the current time with the time of start and we can print like how much time the GC took So and in this case
00:18:23.600 we're printing this directly to uh to the standard output So basically whenever garbage collector uh will run
00:18:30.640 we'll get like a very simple print saying like okay the garbage collector run and it took this amount of time And
00:18:36.240 and that's it This is all we needed to print the to build the print DC timing uh tiny tool And here I'm uh explicitly
00:18:44.320 telling Ruby to do a GC both a minor and a major And then we see that we get the
00:18:50.080 output that we wanted uh in the in this case and you will also see that kind of as expected minor GCs
00:18:57.760 are much faster and uh uh to execute and then a major GC actually takes longer
00:19:02.960 but still 2 milliseconds that's that's pretty fast but yeah ideally this is why
00:19:08.240 you want to avoid having many major GCS in your application and ideally you'd want to have Ruby just kind of do minor
00:19:15.280 GCs uh most of the time and um so Yes Uh so next slide So things to
00:19:24.320 keep keep in mind when you're uh doing this work around the GC APIs Uh you
00:19:29.679 cannot allocate any objects which well if you think about it like uh somewhere
00:19:36.160 in the Ruby code uh you're trying to create an object and you're when you're going to create an object Ruby uh looks
00:19:42.320 at it and thinks oh there's no space I need to do a garbage collection And then Ruby does a gar starts doing a garbage
00:19:48.640 collection and then Ruby will would will call into your code and if what would happen if you located an object it would
00:19:55.360 kind of start the cycle again like oh I need to locate an object I need to do garbage collection it would not be very
00:20:00.960 good So you cannot do this and actually Ruby will immediately stop will kind of say nope nope you're doing this wrong Um
00:20:08.080 so this is very helpful but yes uh obviously it means you need to be a bit careful when you're writing this kind of
00:20:13.520 tool uh to not allocate any objects and also you cannot call into most Ruby APIs
00:20:18.559 inside this callback because again most Ruby APIs often they need to create objects uh so kind of uh this this would
00:20:26.799 h uh happen Okay before we talk about the other APIs I will kind of suggest
00:20:32.720 that if you're doing things around trace point and performance you might want to look at the bottom of debug.hage where
00:20:39.360 this event hook thing comes because uh hint hint the trace point in Ruby is
00:20:45.120 actually built a uh top the low-level event hook API And if you're having uh
00:20:50.960 if you're trying to build a trace point that gets executed very often and you want to make it go faster this one is
00:20:56.400 kind of slightly less overhead or at least can be made to be slightly less overhead So you might want to take a
00:21:01.440 look at this undocumented advanced tracing APIs uh if uh it's something
00:21:06.559 that uh you're interested in Okay let's talk about the postpone job APIs uh
00:21:12.640 which is uh a very interesting API So the postpone job API kind of looks like
00:21:17.919 this There's a pre-register function and then a trigger function And the idea of
00:21:23.760 the postpone job API is to ask Ruby to run code safely or at a safe point And
00:21:30.720 if you haven't heard the term safe point uh well it's not surprising because Ruby
00:21:36.159 doesn't actually call it a safe point Uh the term safe point comes from like a
00:21:41.200 bunch of other languages uh it's used in Java Go.NET truffle and it just means
00:21:47.280 like a moment in execution of the code where it's safe to perform certain operations So like when you're thinking
00:21:54.400 of the Ruby VM the Ruby VM sometimes it's kind of in the middle of doing a bunch of things and doing GC and doing
00:22:00.799 whatever and like it's not always safe to just like run Ruby code there And and this is kind of what we mean by a safe
00:22:07.520 point uh in the case of the Ruby VM again like uh it means uh that we can get called when it's safe to call into
00:22:14.480 Ruby code when the JVL is being held not in the middle of DC allocations etc So
00:22:20.640 why would you want this thing uh well the uh the the thing is that like you've
00:22:25.760 kind of saw in the uh when I was talking about the trace points that I was saying oh yeah uh in low-level APIs it's very
00:22:33.840 common that I will say oh you cannot create any objects when you get called in these APIs you cannot uh you can't
00:22:40.159 call into methods written in Ruby code um you can't call most VM APIs and you
00:22:46.640 might not even have the global VM lock when Ruby kind of calls your low-level thing So the idea of the postpone job
00:22:53.760 APIs is to kind of create a bridge between those APIs and doing some of the things that you couldn't Um it's a way
00:23:00.080 of saying to Ruby hey Ruby when you have a moment like can you call this i know
00:23:06.320 that now you might be busy but like when you are a bit like chill call this and so it's a way of uh doing a lot more
00:23:13.039 things from the low-level APIs that you wouldn't uh otherwise be able to do And well let me give you an example So
00:23:20.240 before you saw me doing um a very simple tool that would print uh whenever a
00:23:25.760 garbage collection happened and you and you might not have noted that I kind of I did something on purpose I print uh I
00:23:33.919 did my printing using C code I told like Code to actually print directly because
00:23:38.960 if I try to use any Ruby code to do the printing it wouldn't work because I can't call Ruby code So what if we
00:23:46.400 wanted to do something like this uh we have the ongc finish and ongc finish gets a lambda and our lambda will
00:23:52.960 actually print when the GC finishes and uh we'll also print the uh kind of GC
00:23:58.400 and here again we're like running two different kinds of GC and we want this kind of to be the output how can we
00:24:05.120 build this uh using the postpone job APIs well uh what we'll do is we start
00:24:10.559 by re uh pre-registering like a callback our callback is called uh postpone in this situation so we kind of tell Ruby
00:24:18.080 Okay I want I at some point in the future I will want to call this and then
00:24:23.440 uh we build our ong finish where we save the uh the block or the lambda that the
00:24:31.600 user got uh called us with and then we create a new trace point with the GC
00:24:37.039 exit event and then when the uh exit event happens so when GC finishes
00:24:43.360 instead of like immediately printing what we do is we tell Ruby Okay we're ready Please trigger the postpone job
00:24:50.480 And then in the postpone job we are now able to call into Ruby code So we do this at the bottom which is just we take
00:24:56.799 the call back and uh we call call on it So we call uh call on the on the lambda
00:25:02.799 that we get and that's it This is all that we needed to build this like
00:25:08.000 variant of the uh of our tool that now can call into a Ruby block whenever DC
00:25:13.120 finishes So this is why the postpone job API is really interesting because we kind of can use it to bridge back to
00:25:19.360 Ruby code and to do things that otherwise would be hard to do Okay so uh things to keep in mind
00:25:26.799 when you use the postpone jobs API Uh this API actually got very uh uh it
00:25:33.039 became very different in Ruby 3.3 So it got redesigned and so if you want to use it on an older Ruby it's still there But
00:25:40.640 uh we uh basically what happened was uh we found out that there's like a a very
00:25:46.799 like difficult to solve design problem in the old version of the API and that's why the API got redesigned So if you
00:25:53.200 want to use with older Rubies you still can but you need to take in in in into account that like there you might be uh
00:25:59.760 hit by this problem And uh the the other thing is that the when you call trigger
00:26:05.200 the trigger flag is thread local Uh which means that if you call it from a background thread it might not work as
00:26:11.840 you expect So you might want to experiment a bit and like read a bit into how to use that correctly The other
00:26:18.880 thing to keep in mind is that there is a limit of 32 different postpone jobs that can be registered and no more after that
00:26:26.480 There is no garbage collection There is no removing postpone jobs was like you register 32 and that's it You need to
00:26:33.200 restart Ruby if you want to to to change your your mind and actually the Ruby garbage collector already uses one So
00:26:40.400 actually we only get 31 So try not to use too many in your application because
00:26:46.000 you might want to use a gem that uses like one or two and then well you have a limit Um the other finally thing a final
00:26:53.360 thing to keep in mind is safe point bias and this is a term that also comes from Java and some of the other runtimes
00:27:00.480 which means that basically the the next safe point So when you call trigger Ruby
00:27:05.919 might usually Ruby is pretty fast in like running your code but it might still call into the next method or
00:27:11.919 return from a method before it actually calls uh the thing that you've asked Ruby to do So the next safe point might
00:27:18.320 actually be in a different method And this is often important for profilers or some tools that okay if you're looking
00:27:24.080 at the current method the current method might have changed by the time that the safe point gets
00:27:30.200 executed Okay let's go to the next one the frame profiling APIs And well
00:27:35.760 there's a bunch of them but basically uh the the the two ones at the top give you
00:27:41.200 uh the current back trace and then the ones at the bottom are just like the C functions to get the information from
00:27:47.279 the back trace And so uh this allows you to get the back trace both cheaply and
00:27:52.320 safely and answer questions such as well what methods are being called and where
00:27:57.600 So you will not be very surprised when I tell you that this is like the source of many like pretty uh profilers that like
00:28:04.799 use this to build like beautiful flame graphs of what's going on in your application And an exception I would say
00:28:11.520 if if you were here for the uh SDB profiler session yesterday Well SDB does
00:28:16.960 not use these APIs which is like the the big key point of SDB but like a lot of
00:28:22.399 the other Ruby profilers uh use these APIs and why they use them because the
00:28:28.720 these APIs are really useful because they give you a back trace without allocating new objects or calling into
00:28:34.720 Ruby code And again if you think back to the the things I was saying before where I say oh uh you cannot create new
00:28:41.360 objects you cannot call into Ruby code uh in some of the other APIs This API is
00:28:46.480 kind of built to kind of connect very easily with that because it doesn't need to uh create any objects and this is
00:28:52.799 also makes it very very cheap which is good for profilers because you don't want your profiler to make your
00:28:58.240 application even slower and it can even be used in a signal handler So this is
00:29:03.520 what's called an a sync signal safe function Although I think it's not officially documented as one yet but
00:29:10.799 that's kind of the intention and that's how everyone uh is using it most of the time And uh so yeah it's very similar to
00:29:19.360 getting back trace locations So if you've used like back trace locations it's basically the same thing but it's
00:29:25.279 very it's designed to be used with the other low-level APIs that we've looked at And it's cheaper than back trace
00:29:31.440 locations uh because well uh it's for profilers So let me give you an example
00:29:37.279 Let's try to build this thing uh the last allocation at So inside this block
00:29:43.200 uh when we call track last allocation at we can uh call a bunch of code and here we're just calling into the hello method
00:29:50.080 And when we call last allocation at what we get is we'll get the stack trace of
00:29:56.559 when the last object got created So in this case we see that the object got created inside the hello uh method
00:30:03.039 because it was that object that you see over there So this is kind of like oh uh
00:30:09.120 we want to investigate where objects are getting created and okay let's build this helper
00:30:15.919 So uh we start with our track last location at that we saw being called from Ruby and again we're using a trace
00:30:22.880 point and now we're using the new object event So you can see how these things connect And when the new object gets
00:30:28.880 called what we do is uh we check if it's an internal object If it's an internal object we don't want to touch it Ignore
00:30:35.919 it If not what we do is we create like this temporary buffer And this buffer is
00:30:40.960 created like this is C code So it's not like an allocation And then we call RB profile frames to get the back trace Uh
00:30:48.720 and then what we do is we kind of copy the back trace to uh a temporary Ruby array that I've created at the top So uh
00:30:57.120 so that we can access it later And the um then when you call last allocation at
00:31:03.679 to be able to ask the question okay what where was the last allocation at uh we
00:31:08.720 have to use a small trick to pause the recording of the information because the one thing we want to do is like we want
00:31:15.279 to create a new array and well you don't want to when you create a new array then
00:31:20.320 the last object was your new array so you kind of do that and then what we do
00:31:25.440 is uh we kind of copy um we ask Ruby okay what's the file name what's the
00:31:30.640 method name and we ask uh we copy them into this new array and then we return this information and resume recording
00:31:36.960 again And that's it That's all we need to build like a helper such as this last
00:31:42.399 allocation app And the really cool thing about this helper is that you saw that whenever a new object gets created we do
00:31:49.039 very very very little work And so keeping this enabled uh will not impact
00:31:54.480 your application a lot So this is the kind of thing why this API is useful and in particular it will not create a lot
00:32:00.480 of objects So it will not add more work to the Ruby garbage collector uh because otherwise if you think about it if you
00:32:07.279 create every if every time you create one Ruby object then you create 10 more to record the information about creating
00:32:14.000 the Ruby object then this becomes a problem because well you suddenly you're creating 10 times more objects or 20
00:32:20.799 times more objects than uh uh than you should So that is why this is useful
00:32:25.880 Okay things to keep in mind when you use this kind of thing If you store the values you get from from Ruby in a C
00:32:33.440 data structure you remember that you need to tell the Ruby GC about it And this is like a general thing from every
00:32:39.519 like Ruby native C extension but it's a hard thing It's like hard to forget It's
00:32:45.200 like easy to do Oh I'll just add this here and I I forgot to tell the Ruby garbage collector about it and you will
00:32:51.600 create problems with it Okay let's talk about the debug inspector APIs And the
00:32:57.440 debug inspector APIs give you like uh a nice way to navigate the current uh
00:33:03.279 active back trace and methods And so um you might be like if you're paying
00:33:08.960 attention you might be thinking well wait back traces again didn't we just like wasn't the previous one like the
00:33:15.760 whole thing about back traces why do we need a second API to do back traces and the answer is that the debug inspector
00:33:22.960 is way more uh expensive than frame profiling So uh again like this is the
00:33:28.559 reverse where I was saying the frame profiling is like no object allocations it's very fast etc This one uh is
00:33:36.000 actually uh does object allocations and it's kind of slow and it's that's why
00:33:41.440 it's not very often used by profilers and it also needs to call into Ruby code
00:33:46.720 etc So it has a bunch more restrictions So why is it useful it's useful because it's built for a very different use case
00:33:53.360 It's built for debuggers and the key thing about the debug inspector is that it allows you to
00:33:59.919 go deeper than any other back trace API allows in Ruby So whereas the previous
00:34:05.600 API I was saying oh this is the same thing as back trace locations the debug inspector is not the same thing as back
00:34:12.399 trace locations It's much more powerful It allows you to access the objects that
00:34:17.520 methods are getting called on It allows you to access the binding objects for those uh calls that are happening which
00:34:24.639 means you can read and change arguments that from those calls You can read and change local variables etc You can even
00:34:31.919 access the instruction sequences and you can then ask Ruby okay print the bite code of the method that's getting called
00:34:38.560 like still here on the stack So this is the debug inspector goes much deeper So
00:34:43.760 let me give you an example uh I'll jump directly to C code and I will say I will build two helpers the who called me
00:34:50.639 helper and the who called me binding and the idea of this is like who called me is like when you call it from a Ruby
00:34:56.800 code it will give you the object that called you and the who called me binding it's like the binding of the object that
00:35:03.359 called you and we do this by calling uh debug inspector open and then inside our
00:35:09.839 callback in debug inspector uh we basically uh if we ask for the who
00:35:15.280 called me We use um RB debug inspector uh frames of get If we ask for the
00:35:20.800 binding we get uh debug inspector frame binding get and then the depth The idea
00:35:26.079 of the depth is that like the current method when we call uh current one like
00:35:31.200 depth zero is us and then one is the color and then we want the color of our
00:35:36.960 collar So we need to ask for depth two and that's why the two is there But this
00:35:42.320 is a bit abstract So let me uh show you uh what cool things you can do with this Uh imagine you have this uh very simple
00:35:50.320 hello class And when when you call say hello you pass it a two object which uh
00:35:56.079 hello will call hello on And there's a secret And the idea is the secret is that like the secret is a local is an
00:36:02.560 argument So it's just there like nobody can see the secret And then we create the spy object And the spy object
00:36:09.599 implements hello exclamation mark And uh it uses the uh who called me and who
00:36:15.839 called me binding to actually access the secret and and see what object called it
00:36:21.280 and what the value of the secret was And so when we create a new instance of hello and we tell it to say hello to the
00:36:28.400 spy and then we give it like a secret trust no one actually uh the spy is able
00:36:34.720 to say this is the object that called me and this is the secret of the object and
00:36:40.160 you like you can see how like otherwise this would never be something that you could do in Ruby
00:36:46.720 So um things to keep in mind when you using the debug inspector uh there's
00:36:52.560 already gems for this So actually if you use the binding of color gems or the debug inspector gems they basically
00:36:59.920 already the whole thing that these gems do is give you access to the debug inspector from Ruby code So uh you don't
00:37:06.400 actually need to write C code to use the debug inspector But this is a really powerful thing that you cannot do in any
00:37:13.280 other way in Ruby Okay let's go on to the last one the JVL
00:37:18.720 or global VM lock instrumentation API And wait if you haven't kind of uh I
00:37:24.800 I've done a few presentations in the past about the JVL but if you skip my presentations and you go like what's
00:37:30.640 this JVL thing again um here's the global VM lock explained in one slide Uh
00:37:38.079 it's similar to the Python gill the global interpreter lock It's just we call it differently on Ruby It's uh it's
00:37:45.839 something inside the Ruby VM that protects the VM state Um it's actually
00:37:51.599 not global anymore Uh if you're using multiple raptors you will have multiple
00:37:57.160 JVLs But if you're not using Raptors in your application you will only have one So you're back to kind of having one And
00:38:05.359 it prevents Ruby code execution parallelism So it means that multiple Ruby threads cannot be executing Ruby
00:38:11.359 code in parallel at the same time But waiting can happen in parallel So if you're doing a bunch of like calls to
00:38:18.000 the database and a bunch of network calls this is fine The JVL only prevents running multiple threads of Ruby code at
00:38:24.320 the same time And this means that like the JVL can impact your application latency if you have multiple threads
00:38:31.119 that are trying to uh that need it to work at the same time So this is the context of the JVL instrumentation API
00:38:38.160 is to be able to know more about the global VM lock and you basically get a
00:38:43.920 bunch of events that we can subscribe to and then there's an API to subscribe to these events So it's kind of similar to
00:38:50.320 trace point but it's not called trace point It's slightly different And so
00:38:55.440 this this API allows you to track Ruby thread activity And so you can answer
00:39:00.560 questions such as are my web requests being delayed by other requests or other threads in general So it allows you to
00:39:08.320 know um when threads are running which Ruby calls resumed or um waiting for
00:39:14.960 something to happen which uh you it's called the suspend event or uh want the
00:39:20.560 JVL which means ready and at the top you see a screenshot from the JVL
00:39:26.800 instrumentation gem which is like a gem that I created where it kind of gets this information from the JVL
00:39:32.400 instrumentation API and just print uh generates it in like a visual timeline
00:39:37.839 but this is the kind of information that you get from this API This is without this API this gem could not exist So
00:39:45.040 let's see what we can use what we can do when we use this API directly Let's say that for instance we have um we have a
00:39:52.640 Ruby method that just kind of uh uses up a bunch of CPU by uh incrementing a counter in a loop And then uh let's
00:40:00.160 build a track once JVL helper uh in which what we're going to do is we create two threads Each of them is going
00:40:06.960 to run that loop and then we uh join the threads and then at the end what we'll
00:40:12.480 want to know is like how how much time those threads spent waiting for the global VM lock So to do that let's build
00:40:20.079 track once JVL We create the add event uh we call add event hook We say that
00:40:25.760 we're interested in the started ready and resumed event and we want Ruby to
00:40:30.880 call us uh on our on thread event um function and and then inside on thread
00:40:37.760 event uh we get the information of the current thread If the thread got started we do some bookkeeping and I will tell
00:40:44.400 you a secret is that actually the the way I did it in the gem right now it's technically not not completely safe So I
00:40:52.560 kind of I need to fix that So uh imagine that that uh is very safe right now and
00:40:58.480 the um when the there's a ready event which means that the thread wants the
00:41:03.920 JVL what we do is we store the information in like a thread local
00:41:09.119 variable So there's this API the um internal thread specific set where uh
00:41:14.800 that is very useful to get like thread local APIs in um in from C code So you
00:41:20.079 basically say we get the current time and we record okay this thread started
00:41:25.200 uh to said that it wanted the JVL at this point in time and then when the
00:41:30.480 thread resumes which means the thread is finally uh it gets the JVL and can run
00:41:36.160 what we do is well we get the time that the thread started waiting we we get
00:41:41.359 also the total time the thread has spent waiting and we basically kind of add that and store all that information in
00:41:46.880 thread local There's a bunch of boilerplate C code and this is a problem with C code Like if you did this with
00:41:53.280 Ruby code it would be like I don't know like three lines of Ruby code and it would be very beautiful But in C code
00:41:58.560 there's always more code Uh but yeah basically I'm reading the true the two
00:42:03.920 um thread local variables and I'm storing the result uh in the total uh
00:42:09.160 value and then this is all that's that's needed So what we do is uh we call the
00:42:16.079 track once JVL and at the end we get a list of threads and how much time those
00:42:21.520 threads spent waiting for the JVL Uh here the result is in nanconds but I'm
00:42:26.560 converting it to seconds because well it's much nicer to read And let's print this And here's the result We can
00:42:33.440 actually see that these two threads they uh they each run for um uh each run for
00:42:39.920 some time but they both needed to wait 1.8 8 seconds to be uh for the JVL So
00:42:46.480 basically in total this took 3.71 uh seconds but both threads had to wait for
00:42:52.000 a bunch of time because they were kind of fighting for the JVL Okay So things to keep in mind when
00:42:58.800 you use this API Um the resumed event only the resumed event gets called with
00:43:05.280 the JVL uh being held For all of the others you cannot call in Ruby API So
00:43:10.480 postpone job uh is your friend and it's this thread internal thread specific API
00:43:16.480 is very useful to do this kind of thing where you want to record information inside the thread itself and um yes I'm
00:43:23.280 glad this API got added because before it was very hard to do this and also if
00:43:28.800 you want this kind of information as a metric as I was showing you in our example the JVL tools gem already offers
00:43:35.359 a lot of nice counters but maybe there's a counter that it doesn't offer yet and you want to experiment a bit So you can
00:43:40.880 experiment and maybe you can then open a PR on JVL tools Okay So you saw a bunch
00:43:47.200 of things a bunch of like very small things So you you might be convinced now that it's very easy to build very small
00:43:53.319 utilities But let's put all of the things we've learned together and let's build an actual new profiler in the time
00:44:00.000 that we have left in this presentation Uh I can do that So let's build a new
00:44:05.119 kind of Ruby profiler that I don't think anyone has built before which is a release JVL profiler And what is a
00:44:12.400 release JVL profiler a release JVL profiler is a profiler that answers the question when is the JVL being
00:44:19.040 explicitly released in our application Um and the global VM lock got gets explicitly released when Ruby is about
00:44:26.240 to doing like some network calls or it's about to do some operation on files or you're going to like the database using
00:44:32.880 a native database driver So we're going to get information about this Plus when
00:44:38.000 native wants to do a bunch of work it may also release the JVL So we want to see where all of these points are
00:44:44.880 happening in our application And by knowing when the JVL gets released we can see the impact of this which is
00:44:51.359 basically also places in our application where Ruby will switch away from the current thread So if you have multiple
00:44:57.200 requests you might go from the current request to another one because well Ruby has released the JVL And here's an
00:45:04.640 example of some code that we want to profile with our release JVL profiler This code actually gets the counter loop
00:45:11.280 again and then it does like a network call to get the Google front page It reads like a a very big file It lists
00:45:18.079 the file the the which files are inside that directory with dear And it also
00:45:23.599 does a bit of compression uh of the big file that we read And this is the result
00:45:28.640 like this is a a beautiful flame graph that I've generated using our new release JVL profiler And you can kind of
00:45:35.440 see that there on the left uh we see the multiple kind of things where Ruby uh is
00:45:41.760 doing work uh to read the Google uh homepage So we can see like multiple
00:45:46.880 stacks and multiple things that Ruby need to do for that We can also see there where Ruby is calling io readads
00:45:53.280 to read from the file and you can also see there where Ruby is uh compressing
00:45:58.800 uh the file and releases the JVL And here you see at the top we're the
00:46:04.480 the width of the flame graph represents time So how much time our very simple
00:46:09.520 application took um with the JVL being released What you might not see is like
00:46:15.760 the counter loop does not show there because the counter loop never releases the JVL Uh Ruby might take away the JVL
00:46:22.240 from this thread but this uh the code never uh explicitly releases the JVL And
00:46:28.480 you might not have seen like the listing of the directory and which is which is because like listing directories it's
00:46:34.560 very fast operation But if we switch our profiler from time to count or how many
00:46:40.319 times the JVL got released actually it shows up very quickly that the
00:46:45.640 dear The JVL 180 times and I checked 180 times is the number of files I had in
00:46:52.400 that directory So basically Ruby will release the JVL once for every file that's in that directory
00:46:58.240 Um okay Let's build our release JVL profiler Um oh before we do that I will
00:47:04.800 say it's often surprisingly easy to generate the output in a format that can be visualized uh using existing UI tools
00:47:12.400 So in that example you saw here I'm using the speedcope.app uh website and it's very easy uh to
00:47:19.920 generate output to like these kind of tools speed scope peretto firefox profiler sometimes even Excel So pro tip
00:47:27.520 sometimes you're looking at the data and you just have like a big text file with a bunch of like outputs Try to see if
00:47:33.680 you can generate in a format that can be loaded in a tool because it's often very useful to look at that data using a UI
00:47:41.200 tool Okay so part number one when you call really JVL profiler uh we add the
00:47:47.359 internal event hook We're using the JVL instrumentation API and we want to hear about suspended and resumed threads And
00:47:54.240 then we have the usual boiler plate We yield back to Ruby code and then at the end we kind of stop the event hook and
00:48:01.520 then we will do some work to write the output to the format that speed scope needs Okay Second part when there's an
00:48:09.520 event and uh we looking for the event suspended and that magic uh Ruby thread
00:48:15.520 has JVLP function I hadn't talked about that one before but basically it tells
00:48:20.720 you if the thread explicitly released the JVL or not So this is what how we
00:48:25.920 can tell when like a thread actually said like oh I want to release the JVL because I want to do something without
00:48:31.280 the JVL And what we do well again we just record the current time stamp uh so
00:48:36.400 we can use it later and then later when the thread is uh able to resume again we
00:48:42.960 read the the time uh in which it kind of released the JVL and then we get the uh
00:48:49.680 current uh back trace And here I'm not using the uh fra profile frames API I'm
00:48:55.680 not using the debug inspector API The I'm using the RB make back trace Why
00:49:01.280 because it's it's shorter to fit in the slides in this case But again like if you wanted to make this profiler like
00:49:07.359 lower overhead you could use RB profile frames If you wanted to make this profiler get more information you could
00:49:12.880 use the debug inspector So that's the idea of this different API existing It's like for different use cases My use
00:49:19.040 cases here is like make it easier to fit in the slides Um and what I'm doing is
00:49:24.559 I'm basically I have a Ruby ash and I'm recording for every uh stack trace I
00:49:29.680 record some information So I first check if I already have information if not I initialize with zeros and if if I do
00:49:36.640 what I do is like basically I count the time spent without the JVL And at the
00:49:42.640 bottom you see basically I keep two counters I keep a counter which is like how much time we spent without the JVL
00:49:48.079 and a counter which is like how many times we release the JVL So we can create the two kinds of profiles that I
00:49:53.520 was showing you the count and the time profile And then that's it The only thing we need to do is to write the
00:50:00.240 stacks in a format that speed scope understands And this is like a lot of boiler plate and I will admit that I've
00:50:06.559 kind of simplified this but basically we're going through the hash with the information that we recorded we have to
00:50:12.880 print out the uh the stacked trace that we got and then we print the counters that we recorded and this is the result
00:50:20.319 This is how it looks It's basically a tech two text files At the top you see a text file with the counts So we see we
00:50:27.119 kind of record the stacked trace and then uh 180 times in the do children and
00:50:32.640 then at the other file is okay we record the stack trace and then uh the time in n seconds that we spent and that's it We
00:50:41.440 built a full new profiler in 107 lines of code and I almost made it fit in the
00:50:46.960 in three slides So before I go let me uh
00:50:52.480 switch gears a bit and talk about performance bugs Um so um and yes uh
00:50:59.839 this is the uh I'm not sure if this counts as uh this being like another talk about profilers in Rubiki So if if
00:51:07.200 that counts uh this year we had three talks about profiler again If not this if it maybe it's like two and a half
00:51:12.800 talks about profilers at Rubik Let's talk about performance bugs
00:51:18.680 Um so are performance bugs really bugs like if something takes 10 seconds but
00:51:26.559 then gives you the right result is it a bug what if it takes one hour is it a
00:51:32.800 bug if it takes one day one week So like performance bugs like sometimes people call them problems or issues but in a
00:51:39.760 way they are usually about resources They're about how much X you use to to
00:51:45.920 to to give back an answer which might still be a correct answer Uh it might be
00:51:51.200 about like how much CPU you need to use to give an answer It might be about how much memory you need to give an answer
00:51:57.839 Uh it might be how much time you take to give that answer And obviously if you run out of that resource if you run out
00:52:04.800 of CPU time whatever then you might crash the application So but the key thing about to think about them is to
00:52:10.800 think about impact and does their priority So when thinking about this you
00:52:16.319 should kind of think about like is this bug uh is this performance bug breaking
00:52:21.640 production how is this affecting the user experience uh will this take down our site on Black Friday is this costing
00:52:30.319 us a lot of money and kind of if any of these are true then you should usually
00:52:36.319 go and like let's fix this We should fix this And uh and if you're not sure how
00:52:41.920 to answer these questions like observability tools can answer can help you answer these questions of what's the
00:52:48.000 impact of this problem and also where is this problem coming from So this is the tools that we looked at today that you
00:52:53.599 can build using Ruby uh Ruby's APIs But like in cases where uh none of these are
00:53:01.040 true I would say use your best judgment Like sometimes uh I always um we have
00:53:06.319 the impulse to kind of optimize something But sometimes uh that something does not need to be optimized
00:53:12.240 and we could do other things So use your best judgment Okay I am very excited to
00:53:18.960 see what tools you will build with debug.h page and what things what bending of rules and things you
00:53:25.200 shouldn't be doing uh you will do with the knowledge you've learned today and I'm very looking forward to seeing
00:53:31.280 observability tools on stage during this and future Rubikes and thank you that
00:53:36.559 was my presentation
Explore all talks recorded at RubyKaigi 2025
+66