00:00:16.240
welcome you to Rails World. Um, yesterday I learned something really
00:00:21.439
interesting and that is that if you use a specific color like a green color as
00:00:27.599
the background on your slides, it becomes transparent like this. Yeah.
00:00:33.200
Isn't that cool? Yeah. Yeah. So, like if you my display up here is the slide like
00:00:38.800
this, but it's got a green background. So, they they actually key it out for you. I wanted to do this in my slides
00:00:43.920
today because uh I wanted to be uh completely
00:00:49.600
transparent with all of you. Yes. Thank you. Yeah. Yeah.
00:00:57.920
Uh h happy Friday. Happy Friday everybody. Say happy Friday please. Yes.
00:01:04.640
Uh it is always Friday somewhere and today it is Friday here and I'm very happy that it is actually Friday. The
00:01:11.040
sad thing is though today I had planned out my keynote in advance, done all this
00:01:16.799
work. I was going to talk to all of you about uh system tests today
00:01:22.880
using using my Macintosh.
00:01:29.040
But but since I can't do that today, uh today I'm going to talk to you about David's keynote. Uh
00:01:36.240
I really really enjoyed David's keynote. Uh Now we all know how much uh David thinks
00:01:43.520
about the Roman Empire. Uh I'm also I'm also really excited about the new framework action push.
00:01:50.159
That seems really exciting. Right. Right. Yes. It's very exciting. Uh but
00:01:55.600
I'm going to let you all in on a little bit of inside baseball. It wasn't actually originally called Action Push. It was actually originally called Action
00:02:02.479
Push Native. Like that was the original name. But uh the rest of us on the core team, we didn't really like that name so
00:02:09.679
much. Uh so we gave David some active push back. Um but the thing is when it
00:02:16.239
comes to active push, when active push comes to action shove.
00:02:22.000
David can be a reasonable person. So he renamed it and I appreciate that. Um I I
00:02:28.239
also enjoyed David's presentation about omachi and I I hate to correct him on stage. like a really Wait, no, never
00:02:34.879
mind. I love to correct him on stage. Um, it's
00:02:40.720
it's actually GNU/achi just
00:02:52.239
anyway. Um, I I really didn't have David growing a neck beard on my Rails World
00:02:58.160
bingo card this year. Uh, yes. I was also excited about 30 30,000 assertions
00:03:06.239
in 2 minutes. That's really wild, right? But I mean, come on, please. I I know how to beat this. I can easily beat
00:03:12.560
this. So, I want to I want to give a demonstration to you. This is this is intense. Uh, right here, we're going to
00:03:18.000
beat that number. I have a test test right here. Let's go. Um, so we're going
00:03:23.760
to do 30,0001 assertions. Yeah, look at that. using mega test.
00:03:31.680
Look at that. 10 milliseconds. 10 milliseconds. Yes.
00:03:42.000
And this was all done on my slow old MacBook Air M4. Terrible.
00:03:49.599
Anyway, my name is my name is Eric Patterson. Hello everybody. I go I'm on the internet. My name is Tenderlove. Um,
00:03:55.760
I've been on the Ruby core team since 2009 and I've been on the Rails core team since 2011. Uh, I don't know how
00:04:03.840
many of you are here this morning, but it is true I do speak Japanese. And I want to give like a very short lesson
00:04:10.879
like I'm going to teach you all a handy phrase today. Uh, the phrase is shakurasai.
00:04:19.199
And uh, what it means is it means like can you can you wait a little bit please? And you'll hear this all the
00:04:24.639
time. What David didn't know is that omachi is like super duper popular in Japan. Like really popular. You you'll
00:04:31.120
hear this phrase when you go to restaurants, when you go to hotels, everywhere. Uh so I want to teach you
00:04:37.120
another phrase, too. Like what if you're going up to somebody and you say, "Hey, I'd like to have a little omachi,
00:04:42.639
please. Can I have that? I would like to I would like to try out this operating system." Uh you can say exactly the same
00:04:48.160
thing. Showai. It's that is that is how you would say
00:04:54.880
it. Um I work for a senior staff engineer at a mom and pop startup called Shopify.
00:05:01.199
Small company. Hopefully you've heard of us. We use Ruby and Ruby on Rails. Uh I
00:05:07.280
think we run probably the biggest Rails app in the world. Now unfortunately I
00:05:13.280
didn't really know how to measure that. Like how do you measure what is the biggest app in the world? So the way I
00:05:19.039
decided to measure it was by font size and indeed we have
00:05:25.919
biggest Rails app in the world. Um since today is Friday and this is the last
00:05:32.000
talk of the conference uh I thought we would try to have some fun and talk about something some very very light
00:05:38.080
topics. So I hope you're all excited all excited for that. Uh just I'm just
00:05:43.680
kidding. We are not we are not going to do that today. Today we are going to have a very very technical presentation and I apologize. I know for some of you
00:05:51.120
you're happy because this means the end of the pun section of my presentation and onto the technical section of my
00:05:57.600
presentation. But like why why am I doing a technical presentation?
00:06:03.919
Uh the reason is mainly because I love programming. I love programming a lot.
00:06:10.000
Uh, I like to do it as my hobby and I also get to do it as my job and I just
00:06:15.039
really love it and I'm excited about the things that I work on. I'm very excited. Yeah, slide. There we go. I'm very
00:06:21.759
excited and I'm really excited to share the stuff that uh I've been working on with my team at work. So, that's what I
00:06:28.000
want to talk about today is mainly uh work stuff. So today I'm going to talk about work stuff, specifically the stuff
00:06:34.240
that my team has been working on at Shopify and how it's going to improve our lives and your lives, as well as
00:06:40.800
some pro tips for Rails developers. Uh at Shopify, I'm on the Ruby and Rails
00:06:47.840
infrastructure team. And you may be surprised to learn this, but we work on
00:06:53.759
Ruby and Rails and infrastructure. So we are the Ruby and Rails
00:06:59.199
infrastructure team. uh our team like our team's kind of I don't know we we
00:07:05.120
work on a lot of stuff but I want to give you some context around the work that we do and a lot like a big goal of
00:07:12.479
ours is to improve machine utilization at work so we want to work on performance and that helps us improve
00:07:18.960
machine utilization I want to make that a little bit more concrete and give you an example from work of the like kind of
00:07:25.280
things we're focusing on and I want to do this to provide context for the work that we do inside of uh Ruby and Rails.
00:07:32.720
Uh so one of the things that we want to do is we want to increase and when I say
00:07:37.759
improve um what do I say improve productivity we want to increase the number of like the amount of parallel
00:07:43.919
work that we do on a machine but we don't want to increase latency. So for example we want to have we want to
00:07:49.520
service more requests but we don't want to destroy latency for anyone. So we want to improve the amount of parallel
00:07:54.879
work that we're able to handle anywhere. Uh, and I'm careful not to say requests necessarily because we're really talking
00:08:00.879
about parallel work. It could be web servers, test suites, whatever. Uh, we we just want to get more work done on a
00:08:06.800
machine, but we don't want to degrade latency for anybody. And to make this these terms a little bit more concrete,
00:08:13.840
uh, we have a very large application at work uh, that has a very very unpredictable workload. And
00:08:19.520
unfortunately what this means is when we get a request coming in, we don't know whether that request is going to be IO
00:08:25.199
bound or whether it's going to be CPUbound. We can't we don't really know that. Uh and what this means is let's
00:08:32.399
like let's say we have a web server uh a machine a processbased web server with four cores. Uh this is a mom and pop
00:08:40.399
startup so we can only afford four core machines. Um let's say there's
00:08:47.680
four cores and we know that some of the requests are going to be IO bound and we know that some of them are going to be CPUbound and we want to we want to
00:08:54.320
utilize this machine as much as possible. So what we'll do is we'll fork off say I don't know 1.5 times the
00:09:00.080
number of processes. So we'll have four six processes on our four core machine. We'll pre-fork them. So each process can
00:09:07.680
only handle one request at a time. So we can handle six requests in parallel. But I want to consider like two extreme
00:09:15.360
cases. Let's say we get six requests coming in and they're all doing IObound
00:09:20.959
work. So if we if that happens, all of these processes are basically like doing IO stuff. I don't know, writing to a
00:09:27.519
database, doing whatever. And our CPU utilization, our CPUs aren't doing anything at all. This is kind of a
00:09:33.200
bummer because our web server could take on more work, but uh it's not. we're
00:09:39.040
we're not processing any more requests. Now let's consider the other end of the spectrum. Let's say we get six processes
00:09:46.720
coming in or six CPUbound requests coming in. Now we have six processes that are all fighting over four CPUs. So
00:09:55.440
unfortunately this impacts our latency and we don't we don't like that either. Uh this ends up increasing latency
00:10:02.320
because we have noisy neighbors. They all want to get some time on the CPU. So some of them have to be scheduled off
00:10:07.440
the CPU and that's just not good for our end users. And a side note like I want to go on a little bit of a side note.
00:10:13.200
This impacts all web servers. This isn't just process-based web servers. It's also like Falcon, Puma, whatever. And
00:10:20.959
the reason this happens is because we only have one construct in Ruby for doing uh CPUbound parallelization and
00:10:28.160
that's processes. We can only run code in parallel on processes. It's the only way we can do it. And I wanted to I want
00:10:34.959
to show a little bit of a demonstration here. So let's say we've got two examples. Uh we're going to calculate a
00:10:40.560
Fibonacci sequence because that's what we do at work.
00:10:46.160
Fibonacci sequence. There's going to be a lot of this in the presentation. Uh we're going to do it just uh sequentially versus threads. So we'll do
00:10:53.760
one just straight line and we'll compare it to threads. If we do this on my lowly
00:10:59.279
MacBook M4, we get it we we get it done in about 2 seconds. If we do the same example using threads, unfortunately, it
00:11:06.480
takes the same amount of time, 2 seconds. So, we didn't get any parallelization, zero.
00:11:11.760
Now, let's compare that again to uh fibers for example. We use a async framework. Again, 2 seconds is our
00:11:18.560
baseline. If we run this example with async again 2 seconds, we're not doing any better than before.
00:11:25.760
fibers will take the same amount of time. Let's do let's do this again, but this time we'll use processes. So, of
00:11:31.200
course, linearly we'll take two seconds. If we run this with processes on my machine, it was about uh 480
00:11:37.200
milliseconds around there. So, we're actually seeing some parallelization here. We're getting some we're getting some time. So, let's return to web
00:11:45.920
servers now for a minute. uh what we'd really like to have is we'd like to have a loadaware web server
00:11:52.079
where uh for example let's say our first four requests come in they get scheduled
00:11:57.120
to these different processes let's say all of them are IO bound this that's what happens they come in now another
00:12:03.279
request come in uh we can take those on we'll just spin up another process and
00:12:09.040
take that on because our machine isn't isn't that busy we can take on more now let's compare that to say we get a
00:12:15.920
CPUbound request coming in. So the CPUbound request will come in, we start getting load on the CPU. Uh eventually
00:12:22.720
we start using up all four of our CPUs and then when the next request comes in, uh we'll say, "Yeah, you know, we can't
00:12:29.040
take that. Let's not do that one right now. We're busy. Can you send that off to another another machine? We're we're
00:12:35.839
currently at capacity." But the question is like how do you do how do you do this? And this is going to be a little bit handwavy here, but an idea that
00:12:42.399
we've talked about or we've had is to use uh provide back pressure using HTTP 2.0. Uh the reason we are thinking about
00:12:49.600
doing this is because uh H2 can send send information upstream
00:12:54.959
asynchronously. So we can say like oh hey proxy I'm busy. We can set max
00:13:00.880
concurrent streams for example and say like I can't take any more streams right now. Please like send data somewhere
00:13:07.120
else. So we can actually provide back pressure to the proxy. This solves the
00:13:12.240
communication error between the router and the proxy and the web server and we're a we should be able to load balance stuff better. And this is by the
00:13:18.800
way this is all theoretical. This is what we would like to get to. So let's say we had this uh let's say this
00:13:25.360
actually existed. What if it did exist? Now when a request comes in, I showed
00:13:30.639
this before like what do we do in this particular case? We have to create a new process, right? we have to spin up a new
00:13:37.360
process. Now, how do we do that? Like, how do we like how do we do that? What is the code that we write to do that? One idea is
00:13:44.480
well, we could fork. We could just fork a new process in our web server and then take on that request. But the question
00:13:49.920
is like can we can we fork fast enough? We don't know if that's true. Uh, another potential another potential
00:13:55.920
answer is we could do well we could say thread.new or we could do fiber. Create a new thread create a new fiber. But as
00:14:01.600
we saw in the previous benchmarks, those can only handle IO bound requests. They can't handle CPUbound requests. So what
00:14:08.720
could we do? And the answer is uh ractor.new. This is where ractors come into play. We can absolutely allocate
00:14:15.440
new ractors fast enough. And there they allow us to handle CPUbound parallelization.
00:14:21.120
So this is kind of the context for our team. This is what we've been working on. uh the stuff that I've described
00:14:27.680
these production problems the this is the things we've been thinking about and what I want to discuss today is like
00:14:34.720
from the language level where we're trying to attack these problems and we're trying to attack these problems on
00:14:40.480
two two different fronts the the first front is the multi-CPU performance so
00:14:46.000
multi-core performance and that would be working with working with ractors uh ractors we're hoping that ractors
00:14:53.199
will allow us to make the most efficient usage of all CPUs on the machine at the same time. But this doesn't address
00:15:00.560
single single core performance. For single core performance, uh we're working on a new JIT compiler called
00:15:07.600
Zjit. Uh and I want to talk about both of these both of these efforts today, Ractors and and Zjit. So first let's
00:15:16.880
discuss Ractors. Uh this year my team has been working on improving Ractor speed and usability in Ruby 3.5. John
00:15:23.279
Hawthorne if you've met him here at the conference he is leading the project and we've also our team has been working
00:15:29.519
really closely with Kohici Sasada who is the original author of Ractors so we're
00:15:35.600
uh we're working on these now you might be asking we get this question all the time we have a lot of concurrency
00:15:42.240
choices in Ruby which one is the best one we have threads we have fibers we have processes we have ractors which one
00:15:47.600
is the best and the problem is if you ask you know talking heads like me what
00:15:53.839
the answer is to this. The they'll say it depends, but I think this is a small dog answer.
00:16:04.240
I think the big dog answer is raptors. You just always use raptors.
00:16:15.279
Also, uh, John made a really really great logo for for RAT, which I want to
00:16:29.199
all right. So, uh, if you're not familiar, let's discuss let's discuss ractors. Ractors are Ruby Ruby actors,
00:16:35.839
and that's where the name comes from. So, it's Ruby actors. Ractors. uh they are basically an actor style of
00:16:42.720
parallelism in Ruby and they ractors give us true parallelism. Now,
00:16:48.320
unfortunately, well, actually, no, it doesn't matter. Why did I say unfortunately? There is still a GVL. So, Ruby still has still has a GVL, but the
00:16:56.000
way that we've designed the system is such that each Raptor has its own independent GVL that can run they can
00:17:02.880
all run independently of each other. And that means, oh, let's do the slide. Each Ractor has its own GVL that can run
00:17:08.559
independently of each other. And that means that we can get true parallelism out of all of them. So uh let's do our
00:17:15.520
Fibonacci sequence test again. If we run this run this with RA in serially of
00:17:20.720
course it's two seconds like you saw before. If we try this with ractors we'll hit 480 milliseconds. So we're
00:17:26.720
able to do true parallel parallelization with ractors. And just to like refresh
00:17:33.520
your memory here I've got a benchmark here where we're checking our base measurement versus threads fibers
00:17:39.039
processes ractors. And you'll see that the first three are around to 2 seconds
00:17:44.400
whereas if we use processes or ractors we can actually get down to 480 milliseconds. So again uh true
00:17:51.919
parallelism the only way we can get that in Ruby is either via processes and ractors and we want to focus on the
00:17:59.600
ractor uh solution. Unfortunately if you use ractors today you'll see this you'll
00:18:04.960
see this warning like this this warning comes out and it is a big scary warning.
00:18:10.160
uh and it says Ractor is experimental and the behavior the behavior may cha
00:18:15.440
may change and unfortunately that's totally true. It absolutely has changed. Uh APIs are are changing and part of the
00:18:22.960
work that we want to do on the team is stabilize that API, make sure that it works well and fast. And the other the
00:18:30.799
other thing that it says is also there are many implementation issues and that is true as well too. There are many
00:18:36.720
there are many implementation issues. So we are trying to take on we're trying to take on those uh behavioral changes as
00:18:43.600
well as the implementation issues. make sure that the behavior uh behavior is stable and that the implementation
00:18:50.000
issues are fixed and we're hoping that if we can do enough work on that we can actually get rid of this message so that
00:18:56.640
people will feel a lot more comfortable using reactors in the future. So I want to talk a little bit about the behavior
00:19:02.000
changes or how to use how to use ractors. Uh and this is in Ruby 3.5. So you need
00:19:09.760
to be using Ruby either wait for Ruby 3.5 or don't. You can build it from
00:19:14.799
edge. Please do that. Um and we'll also
00:19:19.919
talk about some implementation issues. So ractors are very similar to threads. They're like threads but harder to use
00:19:27.360
but in a good way. And I'm going to explain that in a minute here. So ractors have kind of a rule rule with
00:19:33.200
them. The the rule is that you cannot share uh you cannot share immutable objects between ractors. That's not
00:19:39.840
allowed. So we don't allow that. Uh ractors will copy immutable objects and
00:19:45.520
I have an example of that here. So here we have two two code examples. On the left is a thread example and on the
00:19:51.919
right is our ractor example. And all we're doing here is we're pushing we're
00:19:57.360
getting a mutable string. And this is important. We've got a mutable string here. We're pushing the mutable or we're
00:20:02.480
popping the string off of a que. So in thread world, we'll pop it off a Q. In
00:20:07.760
Ractor world, we'll pop it off a Q. But this is actually our default mailbox, what we call the mailbox for the Ractor.
00:20:14.000
Uh after that, we'll print out the object ID of the string uh along with
00:20:19.760
the string itself. So we don't need to make a queue. One nice thing about ractors is we don't need to make a
00:20:25.440
queue. They already have default cues. So we can just push on to that default Q. Uh if we run this code, we'll see
00:20:33.039
here on the threaded side, the object IDs are identical. There was no copy.
00:20:38.559
But if we look on the Ractor side, we'll see that the object IDs changed. And that's because we actually duped the
00:20:44.320
string when it crossed the boundary. So when that string went from one reactor to the other, we ended up copying the
00:20:49.600
string. So we're not allowed to share mutable objects between ractors. Uh if
00:20:55.919
we change this code, so we we just add where is it? Frozen string literal here. True at the top. The thing I was
00:21:02.000
complaining about this morning. If we add that at the top, we'll see that actually the object IDs are identical.
00:21:08.240
So as long as the objects are immutable, we can pass them between pass references between ractors and it doesn't matter.
00:21:13.840
It's fine. There's no copy. Uh so how do we make immutable data? Uh one I already showed one example. We can
00:21:20.400
have frozen string literal true or uh we can freeze an object but unfortunately
00:21:26.720
that's not going to work if we have like a deeply nested object like let's say for example we parse some JSON we got a
00:21:32.159
big old JSON hash out of it. In that case we can use ractor.sharable make sharable and that will deeply freeze the
00:21:38.960
object's data structure or like some libraries like JSON will provide APIs
00:21:44.159
that give you back frozen data structures already. So in this case you can pass like freeze colon true into
00:21:49.600
JSON.parse and the data structure that comes back out of JSON parse will already be frozen. So you can pass that
00:21:55.520
between your ractors without copying. So uh communication how does that work?
00:22:00.640
Like we know how to make immutable data but how do we communicate between ractors? Uh ractors use what is called a
00:22:06.080
port and this is in Ruby 3.5. Again uh ports are basically a Q and we can
00:22:15.200
use them we use them like this. Every ractor has a default port. These two bits of code are actually exactly
00:22:21.120
identical uh except that rather than calling ractor.curren.default port we
00:22:27.120
can and receive on that we can just call ractor receive. So one is shorthand for the other. uh these two chunk chunks of
00:22:33.679
code do exactly the same thing. Ports are ports are just like cues. However, they have to abide by two rules. One is
00:22:39.840
that any ractor can write to a port. So any ractor can write to a port. And here
00:22:45.120
is the mindbending rule. Uh only the creating ractor only the ractor that
00:22:50.960
created the port can read from it. So this is kind of like takes a second to
00:22:56.080
wrap your head around. So only the creating ractor can read and I'm showing an example here on the left. This
00:23:01.840
example works fine. We create a port. We pass in the main reactor. We pass it to a child ractor and we're we're allowed
00:23:08.080
to write to that port and the main reactor is allowed to read from that port because it created it. Now, on the
00:23:13.840
other hand, if we created a created a port on the main reactor, but we pass it to a child ractor and we try to we try
00:23:19.679
to read from it, you'll get an exception. It doesn't work. So, this these are kind of the rules that we have
00:23:25.039
to play by. Uh, of course, if you're coming from the threaded world, this may require some like mental shifts. I know
00:23:31.679
it did for me. And here's an example of threads. They're all these threads are
00:23:37.280
all sharing references to exactly the same queue. So, we have many threads and they're referring to the same que where
00:23:43.760
we're trying to read each child thread is trying to read from that Q. But if only one Ractor is allowed to read from
00:23:49.280
a port, how can you like how can you write this code example? How can you do this? Uh it's a little bit more
00:23:54.720
complicated in the reactor world and I want to show you how to like how how we can accomplish this. Uh in the reactor
00:24:00.480
world we create a rendevu point. I call it a I don't know the right name for it but I call it kind of a rendevu vu point
00:24:07.120
or a coordination point for producers and consumers. So first we have a coordination point. Uh this ractor is in
00:24:14.480
charge of collecting work from producers and handing that work out to consumers. So in this case we'll have a
00:24:19.919
coordinator. It takes the work in and then it hands that work out to the consumers.
00:24:25.360
Uh the biggest difference between this and the Q-based system is that all of the worker ractors have to specifically
00:24:32.159
ask for work rather than pulling the work. So here we say our coordinator
00:24:37.760
says, "Hey, I'm going to wait for the next ractor that wants work and as soon as I get that ractor, I'm going to give
00:24:44.320
that raptor work." So each of the child raptors has to ask ask for work. So,
00:24:50.159
here are what the workers look like. Our workers say, "Hey, coordinator, I need work.
00:24:56.320
I need work. Please give me something to do." Uh, the coordinator will give it work. It sits here and waits until the
00:25:02.799
coordinator gives it work and then it does whatever it needs to do. So, it's a little bit more complicated setup than
00:25:08.480
the queuing system, but what's nice is there are no locks or mutexes. We didn't have to write synchronize anywhere.
00:25:14.080
That's not a thing. There is no synchronization. There is no there are no deadlocks. And what's really even
00:25:20.480
better is that we get CPU parallelism in pure Ruby in pure Ruby code. Uh one more
00:25:26.799
feature I want to show one more feature and I think this particular feature is especially interesting to Rails developers. So I was talking about how
00:25:34.799
mutable data gets copied like let's say for example we have two ractors uh one ractor creates a string it wants to pass
00:25:41.679
that string to a second ractor. When that happens it copies the data. Now
00:25:46.799
there's one exception to this case which I think is a very interesting exception and that is that the ractor return value
00:25:53.039
is not copied but only once and I'm going to show you an example to make this a little bit more clear. Let's say we have a ractor like this. We're
00:25:59.840
creating a ractor R. This Ractor allocates object O. Uh we're going to
00:26:05.919
print out the object ID and the frozen state of the object. R returns the object O and then our main
00:26:13.679
Raptor will try to read that object, get it, and then it's going to print out the object ID and the frozen status. And if
00:26:19.360
we do that, we'll see that these objects are identical. Oh no. Oh no, it's the same object. All
00:26:26.240
right, I'm going to do the next slide. I'm sad my little thing didn't line up. Anyway, um if you run this code, you'll
00:26:33.200
see that neither of them are frozen. They're not frozen. However, the object IDs are identical. So that means that uh
00:26:41.679
we are able to pass an a mutable object between two ractors without doing a copy. So why is this interesting? Why
00:26:49.120
why would any of this be interesting to us as Rails developers? I'm going to tell you why did I ask that? I'm going
00:26:54.480
to tell you why. It's a rhetorical question. Let's talk let's talk about it. So
00:27:00.880
yes, I am nervous. Um what is one use case for C extensions?
00:27:06.720
There there is a use case for C extensions. Usually we're using C extensions to bind to native libraries
00:27:12.240
like lib XML is binding for or no giri is binding for lib XML. Uh of course
00:27:17.760
there are many other C extensions. There is another use case and that other use case is for releasing the GVL. In C
00:27:25.520
extensions we actually have a way where we can release the GVL and do other work. So a good example of this is the
00:27:31.360
B-crypt gem. So B-crypt is written in C. The B-cript gem is written in C. We have
00:27:36.720
these three lines are in the brypt gem and basically what they're saying is hey I want to call this function but I want
00:27:44.159
you to call the I want you to release the GVL then call the function and then acquire the GVL again. So what this
00:27:51.120
means for us is that we're calculating password hashes without holding the GVL. So let's say for example you're using
00:27:57.440
Puma as your web server. Uh somebody goes to log into your application you start calculating that password hash.
00:28:04.159
while you're calculating the password hash, Puma is able to service another request in parallel. So, we're able to
00:28:10.559
we're able to get CPU parallelism in C extensions by releasing the GVL.
00:28:16.000
What's nice about uh Ractors is we can kind of think of them as a no GVL block.
00:28:22.000
So, let's imagine that we had BCrypt written in pure Ruby. Let's imagine that was pure Ruby and not C. We could write
00:28:28.480
something like this where we create a new ractor, we calculate brypt and we return the value of brypt. And we can
00:28:35.120
think of this as kind of just a node gvl where other threads can run. We can have our puma web server uh and it can serve
00:28:42.480
up requests while we're calculating brypt doing exactly the same thing that the C extension did, but we were able to
00:28:48.399
write it in pure Ruby. So uh maybe not maybe not calculating brypt a real world
00:28:54.080
example might be just parsing JSON for example like maybe you have an API server you're spending a lot of time
00:29:00.159
parsing JSON throw it in a ractor and now all of a sudden you can do other requests in parallel while you're
00:29:05.440
parsing that JSON so this seems like a very loweffort way to start introducing ractors into your system and get some
00:29:12.000
parallelization but without necessarily having a raptor-based web server so one
00:29:17.679
thing I would like to pitch to all of you is when you upgrade to Ruby 3.5 in
00:29:22.720
the future and you're going to do that right away, right? Yes. Yes. Yes. Uh try
00:29:28.320
wrapping your CPU intensive code inside of a Ractor uh and see if that see if that is able to help out your
00:29:33.679
parallelism. Uh of course do this in Ruby 3.5. So another thing I want to talk about here is like weird
00:29:39.360
bottlenecks that we've had to solve. So, we've been we've been trying out Ractors, finding finding crash, fixing
00:29:46.240
crashes, uh trying to improve the API, but I want to talk about a weird bottleneck just because I think it's
00:29:52.720
very very fun and interesting. So, somebody filed a an issue on the Ruby
00:29:58.399
bug tracker. And the issue, like I'm just going to give you a summary of the issue. The issue is uh parsing parsing
00:30:05.360
JSON is slower in Ractors than if I do it serially. So, if I try to parse JSON in parallel, it's actually slower than
00:30:11.840
if I just tried to do one at a time. And uh we've fixed this bug. We've since
00:30:17.120
fixed this bug. But I want to tell you the problem because I think it's a little bit surprising and interesting.
00:30:22.159
Uh yes, we fixed the bug. Nice. Nice. Yes. John, not we. John fixed the bug. Great
00:30:28.880
job, John. So, here is here is a problem. When we parse this JSON here,
00:30:34.240
we get a we get a hashback. And the key to the hash is a string. And interestingly, both of these strings are
00:30:42.000
exactly the same object, right? They're exactly the same. They have the same object ID. They get dduplicated when we
00:30:48.720
parse when we parse string or parse JSON. And we do exactly we have to do
00:30:54.080
this dduplication when we're doing it inside of a Ractor as well. So both of these when we parse the JSON here, we
00:30:59.600
check that keys that keys object ID. We'll see oh no. Oh, this is indeed the
00:31:04.640
same object. And the way that this is implemented internally to Ruby is we have something called an fstring table.
00:31:10.480
I think it stands for frozen strings because these strings these strings are frozen. But this fstring table is
00:31:17.120
essentially a global within within Ruby. And we have to consult this global hash table in order to resolve those keys to
00:31:24.399
be exactly the same object. So while we were doing multiple ractors, these ractors had to take a lock on this
00:31:30.640
fringstring hash. So we're ending up getting lock contention on this this global hash
00:31:36.640
table within the internals of Ruby. And you wouldn't know this necessarily just from looking at the Ruby code itself. Uh
00:31:43.039
the solution to this was to turn the string table into a lock free a lock free hash. Uh which John was able to do.
00:31:50.000
And once he did that uh JSON parsing got 12x faster. So we should Yes. round of
00:31:55.600
applause.
00:32:01.600
Do people do people here parse JSON? Is that a thing?
00:32:07.760
Now, unfortunately, the fstring table is not the only thing like this. We've found we found within Ruby internals.
00:32:13.679
Uh, also we have a global table, an ID table. This is for symbols. So, symbols
00:32:18.880
get resolved into the same object. So, we had contention there. Uh, a thing called the CC table which is for inline
00:32:24.880
caches. whenever we needed to create an inline cache or look up an inline cache,
00:32:29.919
uh we would end up locking. Uh another example is the encoding table when we're looking up string encodings. So
00:32:36.080
internally to C Ruby, we have all these these global data structures that you wouldn't necessarily know are global
00:32:42.000
while you're writing Ruby code. Uh anyway, the point is that our our team is trying to find these bottlenecks and
00:32:48.080
fix them. Figure out how we can remove these bottlenecks so that when you upgrade, we can actually start using Ractors for
00:32:54.320
real in production. Uh the next thing I want to move on to I'm going to move on to the next topic
00:32:59.440
now which is Zjget. Uh Zjget is a new compiler that we're
00:33:06.080
going to be shipping with Ruby 3.5. Uh so I want to talk a little bit about the work that we're doing on that and I want
00:33:12.880
to talk about what like what are what are JIT compilers in general because people have asked me this. uh we'll look
00:33:19.360
at the differences between yjit and zjit and then I want to give some like tips for rails developers on how you can
00:33:25.519
write more JIT friendly code in your application. So the first thing is that uh before we get to the meat of this JIT
00:33:32.399
section uh I kind of want to define a term which was confusing to me when I first started working on uh JIT
00:33:39.279
compilers. Ruby's virtual machine is called Yarve and it's what we call a
00:33:44.320
bite code interpreter. So it's interpreting the bite code. When we compile your Ruby code, we turn that into bite code. That bite code gets
00:33:51.200
interpreted. Uh and we have a virtual machine that interprets that bite code. So if anybody, namely me on the stage
00:33:58.640
here, refers to something as the interpreter, what they mean is this virtual machine that's interpreting that
00:34:04.000
bite code. Uh so what like what is a JIT? I think this is an important question to ask because there I think
00:34:11.040
there are a lot of definitions for what a JIT compiler could be. But to me, a JIT compiler is something that assembles
00:34:16.720
code at runtime and assembles machine code at runtime and is usually as lazy
00:34:21.760
as possible. So here's here's an example of something that could be considered a JIT compiler. Here we're saying, hey, uh
00:34:28.720
we're not going to define the adder accessor right now. We're going to wait until method missing gets called and
00:34:33.760
then when method missing is called we're going to define the adder accessor. So here we'll say like ah yeah uh we'll
00:34:39.839
just define that and then the next time the method is called we don't call method missing anymore because we
00:34:45.040
generated this code at runtime. So we could kind of think of this as a as a JIT compiler. Uh of course we wouldn't
00:34:51.599
actually write this in your in in our code, right? Please don't.
00:34:58.560
Um but this is an example that could be considered one. So we generate code at
00:35:04.560
runtime. I I think of them as generating code at runtime. They're usually lazy. We try to be lazy about it. Meaning that
00:35:10.560
they're late. They're kind of late. They do it later. Uh and the another important aspect is that they they
00:35:16.960
should speed up our program. This is uh unfortunately I have written JIT
00:35:22.800
compilers that do not speed up programs. So yeah, this is I think this is an
00:35:28.560
important aspect. Another question people ask me are like, you know, how how can a JIT speed up our code? Like,
00:35:36.079
how can it do that? It has to do the JIT compiler has to generate code that does exactly the same thing as our program
00:35:42.160
did originally. If it's doing exactly the same thing, how can it possibly be any faster than our byte code
00:35:47.920
interpreter? So, I want to talk about that a little bit. Uh, the JIT and the interpreter must match. As I was saying,
00:35:54.960
whatever whatever code the JIT compiler produces, it must have exactly the same behavior as the interpreter. So, how can
00:36:01.760
they like how can we fix anything? Uh, one one thing that we can do with a JIT
00:36:08.240
compiler is that we can eliminate the interpreter overhead. So, this interpreter has overhead and typically
00:36:13.520
when we're running a program in Ruby, we'll have something that looks like this. So we have our CPU uh we have
00:36:19.440
Yarve which is our byte code interpreter and on top of that we're running our Ruby code. Uh the interpreter is running
00:36:26.160
that running that Ruby code on the CPU. Yarve is a is a C program and that C program is running code on your your
00:36:33.280
actual CPU. The way I think of a the way I think of a JIT compiler is it's taking
00:36:38.800
this it's getting rid of this Yarve step here. It's basically moving promoting your code to running directly directly
00:36:45.520
on the CPU. I can kind of think of this as like well you're running your code inside of a docker container. Uh but
00:36:51.680
somehow you're able to escape that and run on instead of running inside of a container, you're running on the bare
00:36:57.520
metal itself. And the JIT compiler is just doing that at runtime for you automatically. The other way that we can
00:37:03.520
speed up uh speed up programs is by caching values. So here's an example
00:37:08.880
again back to our Fibonacci sequence because that's what what we do speed up Fibonacci. Um we have an example here of
00:37:15.359
the number 35 and you can see if we look at the Yarvite code we have a literal 35
00:37:20.400
in there. We have that number. Uh what we can do in the JIT compiler is we can say hey I'm going to actually embed that
00:37:26.960
number directly into the machine code. So when we compile this, if you were to disassemble the machine code, you might
00:37:32.480
see something like this where we have the number 35 directly directly in there. So that's another way another way
00:37:39.040
that we can speed this up is rather than loading that 35 from memory, we already have it have it in the machine code. Uh
00:37:45.520
another thing that we can do is speculate. We can speculate on values and we're able to do this in a way that
00:37:51.359
the interpreter cannot do. So let's say we have code that looks like this. Sum 2 + 5. uh our JIT compiler might say,
00:37:58.720
"Hey, you know what? I don't think anybody is going to monkey patch plus
00:38:03.839
hopefully." So in advance, what it can do is it can say, "I'm just going to add those two
00:38:09.359
numbers together. We're going to get the number seven and we're just going to keep that there and return that rather
00:38:14.720
than executing the plus value itself." But what's cool that a cool thing a JIT compiler can do that uh our interpreter
00:38:21.680
can't do is it can deoptimize this. So let's say somebody does monkey patch plus like it did that it did the
00:38:27.119
constant folding and then all of a sudden somebody monkey patches plus our compiler can detect that and say oh you
00:38:33.280
know what uh I messed up my speculation was wrong I'm going to fall back to this
00:38:38.800
particular implementation so I'll just go back to calling plus on the number and then do whatever the interpreter
00:38:44.160
would have done. Uh another thing we can do is we can eliminate type checking.
00:38:51.200
Yeah, everyone. Wait, who who likes type checking? Yeah, type checking. Well, we
00:38:57.200
don't have to we don't have to do it. We can eliminate this typeing. This is a different type.
00:39:02.320
This is a different type of type checking. I'm going to explain this here. Uh when we have code like this,
00:39:07.760
our friend Fibonacci again, uh when we have code like this, when it runs, we have to check like, okay, uh when we do
00:39:14.640
this comparison, is n an integer? The interpreter has to check this. Is n an integer? If so, I'm going to do an
00:39:20.079
integer comparison. And then when we do minus, it's like, oh, is n an integer? If so, I'm going to do minus. In both of
00:39:26.880
these cases in our in our compiler, we can say, well, I know that n is an
00:39:33.040
integer on the first one. I did that test. I was able to do that type check, and now there's no reason for me to do
00:39:38.320
it later on in the program because I know for a fact that n is an integer. So, it's able to eliminate these two
00:39:44.880
particular type checks. Uh so these are different ways that a compiler can speed up the code that an
00:39:51.520
interpreter cannot do. So let's take a look at yjit and zjit uh and how they
00:39:58.240
work and the differences between them. Who's who is using widget in produ production? Anyone? Yeah. Oh my gosh, so
00:40:04.400
many people. That's awesome. Thank you. That's great. All right. Uh wit is a
00:40:11.119
lazy basic lazy basic block versioning compiler. LBBV. Uh, of course, this
00:40:17.119
already ships with Ruby. You all are using it. Many many many of you are using it in production. Uh, widget uses
00:40:22.720
a technique that was pioneered by Maxim, the author of the the author of the JIT compiler in her PhD thesis on lazy basic
00:40:30.640
block version. So, I'm going to describe how this compiler works and then uh move
00:40:35.839
on to Zjit, the differences between that and Zjit. So, L this is how LBBV works in in action. LBBV ah this compiler
00:40:47.760
discovers basic blocks lazily and what a basic block is is a straight line of
00:40:53.760
code that has no jumps in it. So no if statements nothing like that. So it's just code until we find an if statement
00:41:00.960
that is a basic block. So in this particular example when we compile the add method what widget will do is it'll
00:41:07.520
go here and it'll say hey I'm going to compile y equals z. all of a sudden it gets to the if statement and it's like oh this is the end of a basic block
00:41:13.920
because it could jump between uh one of these branches. So it compiles that basic block and then it waits. It
00:41:21.040
basically waits and it says which side of this if statement am I going to execute?
00:41:26.720
Let's say it executes this side of the if statement the top one. In that case it'll compile that that side of the if
00:41:32.400
statement. it'll create a new basic block until there's another jump again which is a and after that uh we'll
00:41:39.280
create a third basic block down here that's just return y. So we end up with something that looks like this. So we we
00:41:44.880
compile compile like that. And one thing to notice is that this code right here
00:41:50.480
it never got compiled. We didn't compile it because we didn't actually use it. That never happened. So what this means
00:41:57.599
is YJI gives us really really fast warm-up because we're compiling as
00:42:02.720
little code as possible. Uh another nice thing is that we have low memory overhead because again because we're
00:42:08.880
compiling as little code as possible and we also have low overhead on type discovery. So what I mean by this is
00:42:15.440
when YJIT pauses at those particular locations, it's able to look at all the values on the stack and say, "Oh, n
00:42:22.480
that's an integer. I know that's an integer." and then generate code that's specific for that integer.
00:42:28.480
Some of the downsides of widget though are that it's uh register allocation is a little bit harder. Uh whenever we're
00:42:36.079
dealing with variables in our program, we want the compiler to keep those variables in registers and that's
00:42:41.680
because registers are faster to access than memory. So we would really really want to keep those keep those values
00:42:47.839
inside of registers. Unfortunately, registers are a finite ah I didn't went too far. Registers are a finite
00:42:53.920
resource. So we can't just put all of our variables in registers. We have to only put some of them in registers some
00:42:59.359
of the time. So an issue is like let's say we're compiling this code with y.
00:43:05.440
I've rewritten the code a little bit here, but the functionality is exactly the same. Let's take a look at this. We have y0 at the top. Uh we're defining
00:43:12.720
two other variables y1 in both branches of the if statement and then we return
00:43:17.839
y1 at the end. So let's say y compiled this and we
00:43:23.119
turned this into machine code and we decided that this value y0 should go into the x0 register on my crappy ARM
00:43:31.680
machine. So we put it we put it in there and that's where we decide to put it. Now
00:43:37.280
now when y goes to compile the other branch like let's say we execute this code again and we want to compile the
00:43:43.280
else statement right here. The question is which register do we use? We have to have
00:43:50.560
maintain the knowledge that that other side of the if statement used x0 for the register the value of y1. And this is
00:43:58.000
actually a very very difficult problem. This particular this particular case this if statement is a a big problem in
00:44:05.359
compilers and one of the main things that folks focus on. So the way that we handle this in widget is we actually
00:44:11.200
just end up writing these local variables into essentially known locations in memory. they're written to
00:44:16.800
known locations. So rather than keep track of all the registers, we just basically treat treat it as if it has an
00:44:23.599
infinite number of registers where those registers are essentially just memory. So uh I'm not going to get into the
00:44:30.720
details uh but as we discussed reading memory is not as fast as reading registers. So we'd prefer to keep things
00:44:36.319
in registers. So comparing this with a method-based compiler, a method-based compiler is a little bit different. it
00:44:41.760
can say like, "Oh, okay. I'm just going to look at the entire method and it'll compile both sides of the if statement."
00:44:48.319
And because it has the context of all the variables used in either side of those if statements, it's able to figure
00:44:53.680
out like, oh, okay, in one side we stored y1 in x0, so we're going to do
00:44:58.720
the same thing in the other side. And then all of our code works together. So, we're able to keep those values in registers 100% of the time rather than
00:45:05.200
writing them spilling them to memory uh a lot of the time. So another nice thing
00:45:10.720
with uh methodbased compilers is it's a little bit easier for us to fold constants. Uh as I said we can do more
00:45:16.960
efficient register usage. The other thing is it's easier to learn a little bit. Uh a lot of the compiler
00:45:23.200
documentation out there they focus on methodbased compilers. You can use traditional uh traditional resources for
00:45:29.200
uh compiler theory on a on a method-based compiler. Now the downside though is that it might compile too much
00:45:35.520
code as we saw earlier like if we don't take one side of that if statement a methodbased compiler is going to compile
00:45:42.000
it where where widget would have done nothing. Another downside is that tracking types
00:45:47.359
is a little bit harder. We're not really pausing with we're not pausing in the
00:45:52.560
actual code with a method-based JIT. So what we have to do is we have to start tracking types as the interpreter the
00:45:58.720
interpreter is executing. So we had to add we had to add some infrastructure for for uh tracking types. So I guess
00:46:06.319
the idea with Zjit was we thought you know what if what if what if we could
00:46:12.640
take what we had learned from YJIT uh and like how to actually put a JIT
00:46:18.400
into C Ruby, how to deploy it, uh how to do development on a JIT compiler, how to
00:46:24.640
make it low overhead. What if we could take all of those techniques and apply those to a method-based compiler? And
00:46:30.400
that's where we came up with the idea for Zjit, a next generation JIT compiler
00:46:35.440
uh which is essentially our method-based JIT. Now, I am very very excited about this work. However,
00:46:41.520
uh I would like to temper your expectations a little bit. It is very very new. So, if I if I can make a
00:46:49.040
humble request that you lower your expectations of this chick compiler, a little bit lower,
00:46:56.240
this is probably good. Uh, it's it's not rock solid at the moment. If you go check out if you go
00:47:02.560
check out uh Ruby Edge and you try to run Ruby Edge in production like all of you are going to do after this
00:47:08.160
presentation, right? Yeah. Yeah. Yes. Uh you can use zjit if you do ruby-
00:47:16.000
zjit. Uh but it's probably not going to be as fast as yit. Wait, no, not
00:47:21.359
probably. It will it will not be as fast as widget, but we are going to get it there. So with that said, I want to talk
00:47:27.440
about like what what can I do today? Like what can I do today to write JIT friendly code?
00:47:34.319
So as a as a Rails developer, why do I care about this and what can I do about it? Now, me working on compiler stuff, I
00:47:43.200
really want to say to you, write whatever code that you want because the idea is like you should be able to write
00:47:49.839
any code that you want to and then we'll just figure out how to speed it up so that you don't have to change anything.
00:47:56.079
But, uh that is theoretically how things should work. But the the truth is that
00:48:01.200
some patterns are easier to speed up than others. So, otherwise like why you
00:48:06.960
know why bother profiling your code? just wait for the newer version of Ruby. But sometimes we do we do want to speed
00:48:12.319
things up. Uh there's an opportunity cost. So you can wait for us to improve the JIT compiler or you can write code
00:48:19.680
that is a little bit more friendly and have those speed ups today. So do you want to wait or do you want to have
00:48:24.880
speed ups today? And I'm going to give you one. I said I was going to give you pro tips, but I'm going to give you just
00:48:31.040
one pro tip. We are going to monomorphize these pro tips.
00:48:38.319
Thank you, Uffuk. Yes. So, uh,
00:48:44.559
we're gonna talk. Thanks. We're going to talk about monomorphizing call sites.
00:48:49.599
This is a a very dumb compiler joke. I'm so sorry. Um, in order to describe to
00:48:55.440
you what this is, first I want to describe polymorphic polymorphic call sites. We're all familiar with
00:49:00.559
polymorphism, right? We use polymorphism every day. Yes. Yes. We write an O.
00:49:06.079
Polymorphism is great. Here's an example of a polymorphic call site. So we're we have two classes A and B. Uh and we're
00:49:13.599
passing instances into call fu. We call foo on the thing. This is what we call a polymorphic call site. And it we call it
00:49:20.480
polymorph polymorphic because it sees different types. So um computer vision is getting really
00:49:27.680
good. It's able to see see them. Uh yeah. So here we have an
00:49:33.119
instance of A and an instance of B both passed to the same method and this thing calls a method on two different types.
00:49:40.480
Now we'll compare this with monomorphic. All monomorphic means is that it just sees one type. So it's just one versus
00:49:46.960
many. So in this case we we only see one type here. Uh it's monomorphic call site. So because it only ever sees that
00:49:53.680
one a type. So if you have a polymorphic call site and you JIT compile that that
00:50:00.960
code, the JIT code is going to end up with a whole bunch of tests saying like, okay, is this an instance of A? Is it an
00:50:08.000
class A? Is it an an instance of class B, is it an instance of class C, etc., etc., etc. So the more types that you
00:50:13.920
see there, the longer it's going to take to find that particular method and call it. So if you can monomorphize that call
00:50:21.119
site, make it only one type, you can actually speed up that call site very nicely. And I have an example like a real world example of it today. Here um
00:50:28.400
we had a pull request for visitor methods in the uh in Prism. And we were
00:50:34.800
able to see a 13% a 13% uh speed improvement over Ruby 34 by
00:50:40.480
monomorphizing call sites inside of inside of Prism. Uh and this isn't even
00:50:45.839
with the JIT compiler. This is just with a normal interpreter. So this particular pro tip will work for you whether you're
00:50:51.119
using YJIT or not. Uh so the other thing is it's not just about types like which
00:50:57.760
class you use. It's also about instance variables as well. So we can think of this here as a polymorphic instance
00:51:03.680
variable read because depending on the branch that you take and initialize you'll end up with two different shapes.
00:51:09.920
So the order in which instance variables are set actually matters. So the thing that I want to convey to
00:51:16.559
you is like I don't want to say to you you know don't use polymorphism. Polymorphism is terrible. I think it's
00:51:21.839
great. I really think it's great. But what I want to convey is you should use useful types. Uh set set instance
00:51:30.240
variables in a consistent order. So try to set all of your IVs in a consistent
00:51:35.359
order. Don't randomly set them. When I say useful types, like what does that mean? What is a useful a useful type?
00:51:41.920
When I say useful, I mean useful to your application. So something that actually provides you with value. So for example
00:51:48.000
here like maybe maybe you have multiple different credit card processors uh and you want to have some sort of
00:51:55.040
strategy object that changes depending on the type of credit card processor or process payment processor backend that
00:52:01.440
you're using. So in this case we have an example where polymorphism is really helping out our application. It's
00:52:07.280
encoding business logic. It's making the code easier to understand and easier to maintain. This is this is something that
00:52:14.079
should be like celebrated and something that we should be doing in our code. When I say like low value polymorphism,
00:52:20.880
what I'm talking about is code code that looks kind of like this where we've got a cache and we're passing a key to the
00:52:26.800
cache and we're just calling 2S on that thing. And the reason we're calling 2S on that thing is because we want to take
00:52:32.480
either strings or we want to take symbols as our inputs and we need them to be consistent when we look it up in
00:52:38.400
the cache. So we're just calling to s on that. So, I don't really think that this type of code is this is my this is my
00:52:45.040
very hot take on stage today. I don't think that this type of code is very valuable because like what what are we
00:52:51.599
actually getting out of this? You could just change the caller to use a string instead. So, we could say, "Oh, instead
00:52:57.119
of using a symbol here, let's just use be consistent and always use a string in this case." Or even better, you're like,
00:53:03.359
"No, no, no, Aaron. I need to use a symbol there. I have to got to use a symbol. We don't know where the data is coming from. It could be a symbol. we
00:53:10.079
need to make it a symbol. So maybe instead of uh doing the 2S here, we could actually change the caller and
00:53:15.839
say, well, let's just call 2S in this case. And if you do that, you're ending up with two monomorphic call sites. And
00:53:23.359
of course, well, since it's always a consistent type going into the lookup method, we can just remove the 2s and
00:53:28.640
say, well, now we only have a single monomorphic call site. So this is the type of like in my opinion low value
00:53:36.000
polymorphism that we should be removing from our removing from our applications. Uh a famous person once said uh keep
00:53:44.720
useful polymorphism remove useless polymorphism.
00:53:51.599
Love quoting myself. All right let's let's wrap this up. Uh we talked about
00:53:56.640
ractors. Uh we talked about parallelism. We talked about weird bottlenecks which I thought were very very fun. Uh we
00:54:03.119
talked about how Ractor new can be thought of as a no GVL block. We talked about JIT compilers uh why how and why
00:54:10.960
they work. Uh we talked about the differences between Zjit and YJIT as well as writing JIT friendly code with
00:54:16.720
by monomorphizing our code. So I want all of you to please please upgrade to
00:54:22.000
Ruby 3.5 now. Right now. I'll wait. I know we have a party coming. Oh, don't
00:54:28.720
no don't get hold on one more thing. I have a question for all of you. Um, do
00:54:34.880
do you do you allocate objects in your Rails app?
00:54:40.319
Yeah, object allocate. Yes, let's cheer on object allocation. Woo.
00:54:45.599
Uh, I love asking questions that I know the answer. Like, yes, we allocate. Woo. Um,
00:54:53.920
allocations are much much faster in Ruby 3.5, which is I hope I hope is a good
00:54:59.200
reason for all of you to upgrade. I want to show you a benchmark. We we made them faster. Uh, here's a benchmark. We have
00:55:04.559
a user class here, and we're going to instantiate it 500,000 times. Uh, we
00:55:11.359
actually made this 70% faster on Ruby 3.5 than it is in Ruby 3.4. So,
00:55:24.160
Please, please, please upgrade. It's been an honor to be here with you this year. I'm so happy I could come to
00:55:29.520
Amsterdam. Oh, wait. I have one more joke. So,
00:55:34.880
uh I was going out going out to dinner here in Amsterdam and I was really really worried uh whether or not they
00:55:41.200
would split the bill for me. Um but it turns out that everybody here is dining Dutch.
00:55:49.839
Oh, come on. All right. Thank you.