Closing Keynote

Closing Keynote

Play on YouTube

Aaron Patterson

Rails World 2025

#jit-compilation

Closing Keynote

Aaron Patterson • September 05, 2025 • Amsterdam, Netherlands • Keynote

Closing Keynote at Rails World 2025

Aaron Patterson, a longstanding Ruby and Rails core team member and Senior Staff Engineer at Shopify, delivered the closing keynote at Rails World 2025. The primary focus was on innovations Shopify’s Ruby & Rails Infrastructure team is contributing to Ruby core, specifically around enabling better parallelism (via Ractors) and introducing a new method-based JIT compiler (ZJIT). Patterson also shared actionable advice for Rails developers on optimizing their code for JIT performance and parallelism.

Main Theme

The main topic addressed was improving Ruby’s concurrency, parallelism, and runtime performance, making Ruby and Rails applications faster and more scalable for production use cases.

Key Points

Conclusions & Takeaways

Ruby 3.5 will introduce both faster object allocations and substantial improvements to parallelism and JIT compilation.
Developers are encouraged to experiment with Ractors and wrap CPU-intensive work in Ractors to utilize all CPU cores efficiently.
Upgrading to Ruby 3.5 is recommended to achieve these performance benefits in Rails apps.

Relevant Examples & Anecdotes

Demonstrated benchmarks for request handling, Fibonacci calculations using threads/fibers/ractors, and effect of Ractor improvements on JSON parsing.
Discussed real-world issues Shopify’s team faced and resolved in Ruby internals related to global shared tables.
Used relatable programming sketches, jokes, and compiler metaphors to explain complex concepts accessibly.

Audience

This talk targets Ruby and Rails developers interested in concurrency, application optimization, and upcoming Ruby core features.

Closing Keynote
Aaron Patterson • Amsterdam, Netherlands • Keynote

Date: September 05, 2025
Published: Sat, 13 Sep 2025 00:00:00 +0000
Announced: Tue, 20 May 2025 00:00:00 +0000

In the #RailsWorld Closing Keynote, Aaron Patterson (Ruby core team member since 2009, Rails core since 2011, and Senior Staff Engineer at Shopify) talks about the work that Shopify’s Ruby & Rails Infrastructure team is tackling in Ruby core, including Ractors for better parallelism and a new method-based JIT compiler, ZJIT, and shares some pro tips for Rails developers on writing JIT-friendly code.

Rails World 2025

00:00:16.240 welcome you to Rails World. Um, yesterday I learned something really

00:00:21.439 interesting and that is that if you use a specific color like a green color as

00:00:27.599 the background on your slides, it becomes transparent like this. Yeah.

00:00:33.200 Isn't that cool? Yeah. Yeah. So, like if you my display up here is the slide like

00:00:38.800 this, but it's got a green background. So, they they actually key it out for you. I wanted to do this in my slides

00:00:43.920 today because uh I wanted to be uh completely

00:00:49.600 transparent with all of you. Yes. Thank you. Yeah. Yeah.

00:00:57.920 Uh h happy Friday. Happy Friday everybody. Say happy Friday please. Yes.

00:01:04.640 Uh it is always Friday somewhere and today it is Friday here and I'm very happy that it is actually Friday. The

00:01:11.040 sad thing is though today I had planned out my keynote in advance, done all this

00:01:16.799 work. I was going to talk to all of you about uh system tests today

00:01:22.880 using using my Macintosh.

00:01:29.040 But but since I can't do that today, uh today I'm going to talk to you about David's keynote. Uh

00:01:36.240 I really really enjoyed David's keynote. Uh Now we all know how much uh David thinks

00:01:43.520 about the Roman Empire. Uh I'm also I'm also really excited about the new framework action push.

00:01:50.159 That seems really exciting. Right. Right. Yes. It's very exciting. Uh but

00:01:55.600 I'm going to let you all in on a little bit of inside baseball. It wasn't actually originally called Action Push. It was actually originally called Action

00:02:02.479 Push Native. Like that was the original name. But uh the rest of us on the core team, we didn't really like that name so

00:02:09.679 much. Uh so we gave David some active push back. Um but the thing is when it

00:02:16.239 comes to active push, when active push comes to action shove.

00:02:22.000 David can be a reasonable person. So he renamed it and I appreciate that. Um I I

00:02:28.239 also enjoyed David's presentation about omachi and I I hate to correct him on stage. like a really Wait, no, never

00:02:34.879 mind. I love to correct him on stage. Um, it's

00:02:40.720 it's actually GNU/achi just

00:02:52.239 anyway. Um, I I really didn't have David growing a neck beard on my Rails World

00:02:58.160 bingo card this year. Uh, yes. I was also excited about 30 30,000 assertions

00:03:06.239 in 2 minutes. That's really wild, right? But I mean, come on, please. I I know how to beat this. I can easily beat

00:03:12.560 this. So, I want to I want to give a demonstration to you. This is this is intense. Uh, right here, we're going to

00:03:18.000 beat that number. I have a test test right here. Let's go. Um, so we're going

00:03:23.760 to do 30,0001 assertions. Yeah, look at that. using mega test.

00:03:31.680 Look at that. 10 milliseconds. 10 milliseconds. Yes.

00:03:42.000 And this was all done on my slow old MacBook Air M4. Terrible.

00:03:49.599 Anyway, my name is my name is Eric Patterson. Hello everybody. I go I'm on the internet. My name is Tenderlove. Um,

00:03:55.760 I've been on the Ruby core team since 2009 and I've been on the Rails core team since 2011. Uh, I don't know how

00:04:03.840 many of you are here this morning, but it is true I do speak Japanese. And I want to give like a very short lesson

00:04:10.879 like I'm going to teach you all a handy phrase today. Uh, the phrase is shakurasai.

00:04:19.199 And uh, what it means is it means like can you can you wait a little bit please? And you'll hear this all the

00:04:24.639 time. What David didn't know is that omachi is like super duper popular in Japan. Like really popular. You you'll

00:04:31.120 hear this phrase when you go to restaurants, when you go to hotels, everywhere. Uh so I want to teach you

00:04:37.120 another phrase, too. Like what if you're going up to somebody and you say, "Hey, I'd like to have a little omachi,

00:04:42.639 please. Can I have that? I would like to I would like to try out this operating system." Uh you can say exactly the same

00:04:48.160 thing. Showai. It's that is that is how you would say

00:04:54.880 it. Um I work for a senior staff engineer at a mom and pop startup called Shopify.

00:05:01.199 Small company. Hopefully you've heard of us. We use Ruby and Ruby on Rails. Uh I

00:05:07.280 think we run probably the biggest Rails app in the world. Now unfortunately I

00:05:13.280 didn't really know how to measure that. Like how do you measure what is the biggest app in the world? So the way I

00:05:19.039 decided to measure it was by font size and indeed we have

00:05:25.919 biggest Rails app in the world. Um since today is Friday and this is the last

00:05:32.000 talk of the conference uh I thought we would try to have some fun and talk about something some very very light

00:05:38.080 topics. So I hope you're all excited all excited for that. Uh just I'm just

00:05:43.680 kidding. We are not we are not going to do that today. Today we are going to have a very very technical presentation and I apologize. I know for some of you

00:05:51.120 you're happy because this means the end of the pun section of my presentation and onto the technical section of my

00:05:57.600 presentation. But like why why am I doing a technical presentation?

00:06:03.919 Uh the reason is mainly because I love programming. I love programming a lot.

00:06:10.000 Uh, I like to do it as my hobby and I also get to do it as my job and I just

00:06:15.039 really love it and I'm excited about the things that I work on. I'm very excited. Yeah, slide. There we go. I'm very

00:06:21.759 excited and I'm really excited to share the stuff that uh I've been working on with my team at work. So, that's what I

00:06:28.000 want to talk about today is mainly uh work stuff. So today I'm going to talk about work stuff, specifically the stuff

00:06:34.240 that my team has been working on at Shopify and how it's going to improve our lives and your lives, as well as

00:06:40.800 some pro tips for Rails developers. Uh at Shopify, I'm on the Ruby and Rails

00:06:47.840 infrastructure team. And you may be surprised to learn this, but we work on

00:06:53.759 Ruby and Rails and infrastructure. So we are the Ruby and Rails

00:06:59.199 infrastructure team. uh our team like our team's kind of I don't know we we

00:07:05.120 work on a lot of stuff but I want to give you some context around the work that we do and a lot like a big goal of

00:07:12.479 ours is to improve machine utilization at work so we want to work on performance and that helps us improve

00:07:18.960 machine utilization I want to make that a little bit more concrete and give you an example from work of the like kind of

00:07:25.280 things we're focusing on and I want to do this to provide context for the work that we do inside of uh Ruby and Rails.

00:07:32.720 Uh so one of the things that we want to do is we want to increase and when I say

00:07:37.759 improve um what do I say improve productivity we want to increase the number of like the amount of parallel

00:07:43.919 work that we do on a machine but we don't want to increase latency. So for example we want to have we want to

00:07:49.520 service more requests but we don't want to destroy latency for anyone. So we want to improve the amount of parallel

00:07:54.879 work that we're able to handle anywhere. Uh, and I'm careful not to say requests necessarily because we're really talking

00:08:00.879 about parallel work. It could be web servers, test suites, whatever. Uh, we we just want to get more work done on a

00:08:06.800 machine, but we don't want to degrade latency for anybody. And to make this these terms a little bit more concrete,

00:08:13.840 uh, we have a very large application at work uh, that has a very very unpredictable workload. And

00:08:19.520 unfortunately what this means is when we get a request coming in, we don't know whether that request is going to be IO

00:08:25.199 bound or whether it's going to be CPUbound. We can't we don't really know that. Uh and what this means is let's

00:08:32.399 like let's say we have a web server uh a machine a processbased web server with four cores. Uh this is a mom and pop

00:08:40.399 startup so we can only afford four core machines. Um let's say there's

00:08:47.680 four cores and we know that some of the requests are going to be IO bound and we know that some of them are going to be CPUbound and we want to we want to

00:08:54.320 utilize this machine as much as possible. So what we'll do is we'll fork off say I don't know 1.5 times the

00:09:00.080 number of processes. So we'll have four six processes on our four core machine. We'll pre-fork them. So each process can

00:09:07.680 only handle one request at a time. So we can handle six requests in parallel. But I want to consider like two extreme

00:09:15.360 cases. Let's say we get six requests coming in and they're all doing IObound

00:09:20.959 work. So if we if that happens, all of these processes are basically like doing IO stuff. I don't know, writing to a

00:09:27.519 database, doing whatever. And our CPU utilization, our CPUs aren't doing anything at all. This is kind of a

00:09:33.200 bummer because our web server could take on more work, but uh it's not. we're

00:09:39.040 we're not processing any more requests. Now let's consider the other end of the spectrum. Let's say we get six processes

00:09:46.720 coming in or six CPUbound requests coming in. Now we have six processes that are all fighting over four CPUs. So

00:09:55.440 unfortunately this impacts our latency and we don't we don't like that either. Uh this ends up increasing latency

00:10:02.320 because we have noisy neighbors. They all want to get some time on the CPU. So some of them have to be scheduled off

00:10:07.440 the CPU and that's just not good for our end users. And a side note like I want to go on a little bit of a side note.

00:10:13.200 This impacts all web servers. This isn't just process-based web servers. It's also like Falcon, Puma, whatever. And

00:10:20.959 the reason this happens is because we only have one construct in Ruby for doing uh CPUbound parallelization and

00:10:28.160 that's processes. We can only run code in parallel on processes. It's the only way we can do it. And I wanted to I want

00:10:34.959 to show a little bit of a demonstration here. So let's say we've got two examples. Uh we're going to calculate a

00:10:40.560 Fibonacci sequence because that's what we do at work.

00:10:46.160 Fibonacci sequence. There's going to be a lot of this in the presentation. Uh we're going to do it just uh sequentially versus threads. So we'll do

00:10:53.760 one just straight line and we'll compare it to threads. If we do this on my lowly

00:10:59.279 MacBook M4, we get it we we get it done in about 2 seconds. If we do the same example using threads, unfortunately, it

00:11:06.480 takes the same amount of time, 2 seconds. So, we didn't get any parallelization, zero.

00:11:11.760 Now, let's compare that again to uh fibers for example. We use a async framework. Again, 2 seconds is our

00:11:18.560 baseline. If we run this example with async again 2 seconds, we're not doing any better than before.

00:11:25.760 fibers will take the same amount of time. Let's do let's do this again, but this time we'll use processes. So, of

00:11:31.200 course, linearly we'll take two seconds. If we run this with processes on my machine, it was about uh 480

00:11:37.200 milliseconds around there. So, we're actually seeing some parallelization here. We're getting some we're getting some time. So, let's return to web

00:11:45.920 servers now for a minute. uh what we'd really like to have is we'd like to have a loadaware web server

00:11:52.079 where uh for example let's say our first four requests come in they get scheduled

00:11:57.120 to these different processes let's say all of them are IO bound this that's what happens they come in now another

00:12:03.279 request come in uh we can take those on we'll just spin up another process and

00:12:09.040 take that on because our machine isn't isn't that busy we can take on more now let's compare that to say we get a

00:12:15.920 CPUbound request coming in. So the CPUbound request will come in, we start getting load on the CPU. Uh eventually

00:12:22.720 we start using up all four of our CPUs and then when the next request comes in, uh we'll say, "Yeah, you know, we can't

00:12:29.040 take that. Let's not do that one right now. We're busy. Can you send that off to another another machine? We're we're

00:12:35.839 currently at capacity." But the question is like how do you do how do you do this? And this is going to be a little bit handwavy here, but an idea that

00:12:42.399 we've talked about or we've had is to use uh provide back pressure using HTTP 2.0. Uh the reason we are thinking about

00:12:49.600 doing this is because uh H2 can send send information upstream

00:12:54.959 asynchronously. So we can say like oh hey proxy I'm busy. We can set max

00:13:00.880 concurrent streams for example and say like I can't take any more streams right now. Please like send data somewhere

00:13:07.120 else. So we can actually provide back pressure to the proxy. This solves the

00:13:12.240 communication error between the router and the proxy and the web server and we're a we should be able to load balance stuff better. And this is by the

00:13:18.800 way this is all theoretical. This is what we would like to get to. So let's say we had this uh let's say this

00:13:25.360 actually existed. What if it did exist? Now when a request comes in, I showed

00:13:30.639 this before like what do we do in this particular case? We have to create a new process, right? we have to spin up a new

00:13:37.360 process. Now, how do we do that? Like, how do we like how do we do that? What is the code that we write to do that? One idea is

00:13:44.480 well, we could fork. We could just fork a new process in our web server and then take on that request. But the question

00:13:49.920 is like can we can we fork fast enough? We don't know if that's true. Uh, another potential another potential

00:13:55.920 answer is we could do well we could say thread.new or we could do fiber. Create a new thread create a new fiber. But as

00:14:01.600 we saw in the previous benchmarks, those can only handle IO bound requests. They can't handle CPUbound requests. So what

00:14:08.720 could we do? And the answer is uh ractor.new. This is where ractors come into play. We can absolutely allocate

00:14:15.440 new ractors fast enough. And there they allow us to handle CPUbound parallelization.

00:14:21.120 So this is kind of the context for our team. This is what we've been working on. uh the stuff that I've described

00:14:27.680 these production problems the this is the things we've been thinking about and what I want to discuss today is like

00:14:34.720 from the language level where we're trying to attack these problems and we're trying to attack these problems on

00:14:40.480 two two different fronts the the first front is the multi-CPU performance so

00:14:46.000 multi-core performance and that would be working with working with ractors uh ractors we're hoping that ractors

00:14:53.199 will allow us to make the most efficient usage of all CPUs on the machine at the same time. But this doesn't address

00:15:00.560 single single core performance. For single core performance, uh we're working on a new JIT compiler called

00:15:07.600 Zjit. Uh and I want to talk about both of these both of these efforts today, Ractors and and Zjit. So first let's

00:15:16.880 discuss Ractors. Uh this year my team has been working on improving Ractor speed and usability in Ruby 3.5. John

00:15:23.279 Hawthorne if you've met him here at the conference he is leading the project and we've also our team has been working

00:15:29.519 really closely with Kohici Sasada who is the original author of Ractors so we're

00:15:35.600 uh we're working on these now you might be asking we get this question all the time we have a lot of concurrency

00:15:42.240 choices in Ruby which one is the best one we have threads we have fibers we have processes we have ractors which one

00:15:47.600 is the best and the problem is if you ask you know talking heads like me what

00:15:53.839 the answer is to this. The they'll say it depends, but I think this is a small dog answer.

00:16:04.240 I think the big dog answer is raptors. You just always use raptors.

00:16:15.279 Also, uh, John made a really really great logo for for RAT, which I want to

00:16:29.199 all right. So, uh, if you're not familiar, let's discuss let's discuss ractors. Ractors are Ruby Ruby actors,

00:16:35.839 and that's where the name comes from. So, it's Ruby actors. Ractors. uh they are basically an actor style of

00:16:42.720 parallelism in Ruby and they ractors give us true parallelism. Now,

00:16:48.320 unfortunately, well, actually, no, it doesn't matter. Why did I say unfortunately? There is still a GVL. So, Ruby still has still has a GVL, but the

00:16:56.000 way that we've designed the system is such that each Raptor has its own independent GVL that can run they can

00:17:02.880 all run independently of each other. And that means, oh, let's do the slide. Each Ractor has its own GVL that can run

00:17:08.559 independently of each other. And that means that we can get true parallelism out of all of them. So uh let's do our

00:17:15.520 Fibonacci sequence test again. If we run this run this with RA in serially of

00:17:20.720 course it's two seconds like you saw before. If we try this with ractors we'll hit 480 milliseconds. So we're

00:17:26.720 able to do true parallel parallelization with ractors. And just to like refresh

00:17:33.520 your memory here I've got a benchmark here where we're checking our base measurement versus threads fibers

00:17:39.039 processes ractors. And you'll see that the first three are around to 2 seconds

00:17:44.400 whereas if we use processes or ractors we can actually get down to 480 milliseconds. So again uh true

00:17:51.919 parallelism the only way we can get that in Ruby is either via processes and ractors and we want to focus on the

00:17:59.600 ractor uh solution. Unfortunately if you use ractors today you'll see this you'll

00:18:04.960 see this warning like this this warning comes out and it is a big scary warning.

00:18:10.160 uh and it says Ractor is experimental and the behavior the behavior may cha

00:18:15.440 may change and unfortunately that's totally true. It absolutely has changed. Uh APIs are are changing and part of the

00:18:22.960 work that we want to do on the team is stabilize that API, make sure that it works well and fast. And the other the

00:18:30.799 other thing that it says is also there are many implementation issues and that is true as well too. There are many

00:18:36.720 there are many implementation issues. So we are trying to take on we're trying to take on those uh behavioral changes as

00:18:43.600 well as the implementation issues. make sure that the behavior uh behavior is stable and that the implementation

00:18:50.000 issues are fixed and we're hoping that if we can do enough work on that we can actually get rid of this message so that

00:18:56.640 people will feel a lot more comfortable using reactors in the future. So I want to talk a little bit about the behavior

00:19:02.000 changes or how to use how to use ractors. Uh and this is in Ruby 3.5. So you need

00:19:09.760 to be using Ruby either wait for Ruby 3.5 or don't. You can build it from

00:19:14.799 edge. Please do that. Um and we'll also

00:19:19.919 talk about some implementation issues. So ractors are very similar to threads. They're like threads but harder to use

00:19:27.360 but in a good way. And I'm going to explain that in a minute here. So ractors have kind of a rule rule with

00:19:33.200 them. The the rule is that you cannot share uh you cannot share immutable objects between ractors. That's not

00:19:39.840 allowed. So we don't allow that. Uh ractors will copy immutable objects and

00:19:45.520 I have an example of that here. So here we have two two code examples. On the left is a thread example and on the

00:19:51.919 right is our ractor example. And all we're doing here is we're pushing we're

00:19:57.360 getting a mutable string. And this is important. We've got a mutable string here. We're pushing the mutable or we're

00:20:02.480 popping the string off of a que. So in thread world, we'll pop it off a Q. In

00:20:07.760 Ractor world, we'll pop it off a Q. But this is actually our default mailbox, what we call the mailbox for the Ractor.

00:20:14.000 Uh after that, we'll print out the object ID of the string uh along with

00:20:19.760 the string itself. So we don't need to make a queue. One nice thing about ractors is we don't need to make a

00:20:25.440 queue. They already have default cues. So we can just push on to that default Q. Uh if we run this code, we'll see

00:20:33.039 here on the threaded side, the object IDs are identical. There was no copy.

00:20:38.559 But if we look on the Ractor side, we'll see that the object IDs changed. And that's because we actually duped the

00:20:44.320 string when it crossed the boundary. So when that string went from one reactor to the other, we ended up copying the

00:20:49.600 string. So we're not allowed to share mutable objects between ractors. Uh if

00:20:55.919 we change this code, so we we just add where is it? Frozen string literal here. True at the top. The thing I was

00:21:02.000 complaining about this morning. If we add that at the top, we'll see that actually the object IDs are identical.

00:21:08.240 So as long as the objects are immutable, we can pass them between pass references between ractors and it doesn't matter.

00:21:13.840 It's fine. There's no copy. Uh so how do we make immutable data? Uh one I already showed one example. We can

00:21:20.400 have frozen string literal true or uh we can freeze an object but unfortunately

00:21:26.720 that's not going to work if we have like a deeply nested object like let's say for example we parse some JSON we got a

00:21:32.159 big old JSON hash out of it. In that case we can use ractor.sharable make sharable and that will deeply freeze the

00:21:38.960 object's data structure or like some libraries like JSON will provide APIs

00:21:44.159 that give you back frozen data structures already. So in this case you can pass like freeze colon true into

00:21:49.600 JSON.parse and the data structure that comes back out of JSON parse will already be frozen. So you can pass that

00:21:55.520 between your ractors without copying. So uh communication how does that work?

00:22:00.640 Like we know how to make immutable data but how do we communicate between ractors? Uh ractors use what is called a

00:22:06.080 port and this is in Ruby 3.5. Again uh ports are basically a Q and we can

00:22:15.200 use them we use them like this. Every ractor has a default port. These two bits of code are actually exactly

00:22:21.120 identical uh except that rather than calling ractor.curren.default port we

00:22:27.120 can and receive on that we can just call ractor receive. So one is shorthand for the other. uh these two chunk chunks of

00:22:33.679 code do exactly the same thing. Ports are ports are just like cues. However, they have to abide by two rules. One is

00:22:39.840 that any ractor can write to a port. So any ractor can write to a port. And here

00:22:45.120 is the mindbending rule. Uh only the creating ractor only the ractor that

00:22:50.960 created the port can read from it. So this is kind of like takes a second to

00:22:56.080 wrap your head around. So only the creating ractor can read and I'm showing an example here on the left. This

00:23:01.840 example works fine. We create a port. We pass in the main reactor. We pass it to a child ractor and we're we're allowed

00:23:08.080 to write to that port and the main reactor is allowed to read from that port because it created it. Now, on the

00:23:13.840 other hand, if we created a created a port on the main reactor, but we pass it to a child ractor and we try to we try

00:23:19.679 to read from it, you'll get an exception. It doesn't work. So, this these are kind of the rules that we have

00:23:25.039 to play by. Uh, of course, if you're coming from the threaded world, this may require some like mental shifts. I know

00:23:31.679 it did for me. And here's an example of threads. They're all these threads are

00:23:37.280 all sharing references to exactly the same queue. So, we have many threads and they're referring to the same que where

00:23:43.760 we're trying to read each child thread is trying to read from that Q. But if only one Ractor is allowed to read from

00:23:49.280 a port, how can you like how can you write this code example? How can you do this? Uh it's a little bit more

00:23:54.720 complicated in the reactor world and I want to show you how to like how how we can accomplish this. Uh in the reactor

00:24:00.480 world we create a rendevu point. I call it a I don't know the right name for it but I call it kind of a rendevu vu point

00:24:07.120 or a coordination point for producers and consumers. So first we have a coordination point. Uh this ractor is in

00:24:14.480 charge of collecting work from producers and handing that work out to consumers. So in this case we'll have a

00:24:19.919 coordinator. It takes the work in and then it hands that work out to the consumers.

00:24:25.360 Uh the biggest difference between this and the Q-based system is that all of the worker ractors have to specifically

00:24:32.159 ask for work rather than pulling the work. So here we say our coordinator

00:24:37.760 says, "Hey, I'm going to wait for the next ractor that wants work and as soon as I get that ractor, I'm going to give

00:24:44.320 that raptor work." So each of the child raptors has to ask ask for work. So,

00:24:50.159 here are what the workers look like. Our workers say, "Hey, coordinator, I need work.

00:24:56.320 I need work. Please give me something to do." Uh, the coordinator will give it work. It sits here and waits until the

00:25:02.799 coordinator gives it work and then it does whatever it needs to do. So, it's a little bit more complicated setup than

00:25:08.480 the queuing system, but what's nice is there are no locks or mutexes. We didn't have to write synchronize anywhere.

00:25:14.080 That's not a thing. There is no synchronization. There is no there are no deadlocks. And what's really even

00:25:20.480 better is that we get CPU parallelism in pure Ruby in pure Ruby code. Uh one more

00:25:26.799 feature I want to show one more feature and I think this particular feature is especially interesting to Rails developers. So I was talking about how

00:25:34.799 mutable data gets copied like let's say for example we have two ractors uh one ractor creates a string it wants to pass

00:25:41.679 that string to a second ractor. When that happens it copies the data. Now

00:25:46.799 there's one exception to this case which I think is a very interesting exception and that is that the ractor return value

00:25:53.039 is not copied but only once and I'm going to show you an example to make this a little bit more clear. Let's say we have a ractor like this. We're

00:25:59.840 creating a ractor R. This Ractor allocates object O. Uh we're going to

00:26:05.919 print out the object ID and the frozen state of the object. R returns the object O and then our main

00:26:13.679 Raptor will try to read that object, get it, and then it's going to print out the object ID and the frozen status. And if

00:26:19.360 we do that, we'll see that these objects are identical. Oh no. Oh no, it's the same object. All

00:26:26.240 right, I'm going to do the next slide. I'm sad my little thing didn't line up. Anyway, um if you run this code, you'll

00:26:33.200 see that neither of them are frozen. They're not frozen. However, the object IDs are identical. So that means that uh

00:26:41.679 we are able to pass an a mutable object between two ractors without doing a copy. So why is this interesting? Why

00:26:49.120 why would any of this be interesting to us as Rails developers? I'm going to tell you why did I ask that? I'm going

00:26:54.480 to tell you why. It's a rhetorical question. Let's talk let's talk about it. So

00:27:00.880 yes, I am nervous. Um what is one use case for C extensions?

00:27:06.720 There there is a use case for C extensions. Usually we're using C extensions to bind to native libraries

00:27:12.240 like lib XML is binding for or no giri is binding for lib XML. Uh of course

00:27:17.760 there are many other C extensions. There is another use case and that other use case is for releasing the GVL. In C

00:27:25.520 extensions we actually have a way where we can release the GVL and do other work. So a good example of this is the

00:27:31.360 B-crypt gem. So B-crypt is written in C. The B-cript gem is written in C. We have

00:27:36.720 these three lines are in the brypt gem and basically what they're saying is hey I want to call this function but I want

00:27:44.159 you to call the I want you to release the GVL then call the function and then acquire the GVL again. So what this

00:27:51.120 means for us is that we're calculating password hashes without holding the GVL. So let's say for example you're using

00:27:57.440 Puma as your web server. Uh somebody goes to log into your application you start calculating that password hash.

00:28:04.159 while you're calculating the password hash, Puma is able to service another request in parallel. So, we're able to

00:28:10.559 we're able to get CPU parallelism in C extensions by releasing the GVL.

00:28:16.000 What's nice about uh Ractors is we can kind of think of them as a no GVL block.

00:28:22.000 So, let's imagine that we had BCrypt written in pure Ruby. Let's imagine that was pure Ruby and not C. We could write

00:28:28.480 something like this where we create a new ractor, we calculate brypt and we return the value of brypt. And we can

00:28:35.120 think of this as kind of just a node gvl where other threads can run. We can have our puma web server uh and it can serve

00:28:42.480 up requests while we're calculating brypt doing exactly the same thing that the C extension did, but we were able to

00:28:48.399 write it in pure Ruby. So uh maybe not maybe not calculating brypt a real world

00:28:54.080 example might be just parsing JSON for example like maybe you have an API server you're spending a lot of time

00:29:00.159 parsing JSON throw it in a ractor and now all of a sudden you can do other requests in parallel while you're

00:29:05.440 parsing that JSON so this seems like a very loweffort way to start introducing ractors into your system and get some

00:29:12.000 parallelization but without necessarily having a raptor-based web server so one

00:29:17.679 thing I would like to pitch to all of you is when you upgrade to Ruby 3.5 in

00:29:22.720 the future and you're going to do that right away, right? Yes. Yes. Yes. Uh try

00:29:28.320 wrapping your CPU intensive code inside of a Ractor uh and see if that see if that is able to help out your

00:29:33.679 parallelism. Uh of course do this in Ruby 3.5. So another thing I want to talk about here is like weird

00:29:39.360 bottlenecks that we've had to solve. So, we've been we've been trying out Ractors, finding finding crash, fixing

00:29:46.240 crashes, uh trying to improve the API, but I want to talk about a weird bottleneck just because I think it's

00:29:52.720 very very fun and interesting. So, somebody filed a an issue on the Ruby

00:29:58.399 bug tracker. And the issue, like I'm just going to give you a summary of the issue. The issue is uh parsing parsing

00:30:05.360 JSON is slower in Ractors than if I do it serially. So, if I try to parse JSON in parallel, it's actually slower than

00:30:11.840 if I just tried to do one at a time. And uh we've fixed this bug. We've since

00:30:17.120 fixed this bug. But I want to tell you the problem because I think it's a little bit surprising and interesting.

00:30:22.159 Uh yes, we fixed the bug. Nice. Nice. Yes. John, not we. John fixed the bug. Great

00:30:28.880 job, John. So, here is here is a problem. When we parse this JSON here,

00:30:34.240 we get a we get a hashback. And the key to the hash is a string. And interestingly, both of these strings are

00:30:42.000 exactly the same object, right? They're exactly the same. They have the same object ID. They get dduplicated when we

00:30:48.720 parse when we parse string or parse JSON. And we do exactly we have to do

00:30:54.080 this dduplication when we're doing it inside of a Ractor as well. So both of these when we parse the JSON here, we

00:30:59.600 check that keys that keys object ID. We'll see oh no. Oh, this is indeed the

00:31:04.640 same object. And the way that this is implemented internally to Ruby is we have something called an fstring table.

00:31:10.480 I think it stands for frozen strings because these strings these strings are frozen. But this fstring table is

00:31:17.120 essentially a global within within Ruby. And we have to consult this global hash table in order to resolve those keys to

00:31:24.399 be exactly the same object. So while we were doing multiple ractors, these ractors had to take a lock on this

00:31:30.640 fringstring hash. So we're ending up getting lock contention on this this global hash

00:31:36.640 table within the internals of Ruby. And you wouldn't know this necessarily just from looking at the Ruby code itself. Uh

00:31:43.039 the solution to this was to turn the string table into a lock free a lock free hash. Uh which John was able to do.

00:31:50.000 And once he did that uh JSON parsing got 12x faster. So we should Yes. round of

00:31:55.600 applause.

00:32:01.600 Do people do people here parse JSON? Is that a thing?

00:32:07.760 Now, unfortunately, the fstring table is not the only thing like this. We've found we found within Ruby internals.

00:32:13.679 Uh, also we have a global table, an ID table. This is for symbols. So, symbols

00:32:18.880 get resolved into the same object. So, we had contention there. Uh, a thing called the CC table which is for inline

00:32:24.880 caches. whenever we needed to create an inline cache or look up an inline cache,

00:32:29.919 uh we would end up locking. Uh another example is the encoding table when we're looking up string encodings. So

00:32:36.080 internally to C Ruby, we have all these these global data structures that you wouldn't necessarily know are global

00:32:42.000 while you're writing Ruby code. Uh anyway, the point is that our our team is trying to find these bottlenecks and

00:32:48.080 fix them. Figure out how we can remove these bottlenecks so that when you upgrade, we can actually start using Ractors for

00:32:54.320 real in production. Uh the next thing I want to move on to I'm going to move on to the next topic

00:32:59.440 now which is Zjget. Uh Zjget is a new compiler that we're

00:33:06.080 going to be shipping with Ruby 3.5. Uh so I want to talk a little bit about the work that we're doing on that and I want

00:33:12.880 to talk about what like what are what are JIT compilers in general because people have asked me this. uh we'll look

00:33:19.360 at the differences between yjit and zjit and then I want to give some like tips for rails developers on how you can

00:33:25.519 write more JIT friendly code in your application. So the first thing is that uh before we get to the meat of this JIT

00:33:32.399 section uh I kind of want to define a term which was confusing to me when I first started working on uh JIT

00:33:39.279 compilers. Ruby's virtual machine is called Yarve and it's what we call a

00:33:44.320 bite code interpreter. So it's interpreting the bite code. When we compile your Ruby code, we turn that into bite code. That bite code gets

00:33:51.200 interpreted. Uh and we have a virtual machine that interprets that bite code. So if anybody, namely me on the stage

00:33:58.640 here, refers to something as the interpreter, what they mean is this virtual machine that's interpreting that

00:34:04.000 bite code. Uh so what like what is a JIT? I think this is an important question to ask because there I think

00:34:11.040 there are a lot of definitions for what a JIT compiler could be. But to me, a JIT compiler is something that assembles

00:34:16.720 code at runtime and assembles machine code at runtime and is usually as lazy

00:34:21.760 as possible. So here's here's an example of something that could be considered a JIT compiler. Here we're saying, hey, uh

00:34:28.720 we're not going to define the adder accessor right now. We're going to wait until method missing gets called and

00:34:33.760 then when method missing is called we're going to define the adder accessor. So here we'll say like ah yeah uh we'll

00:34:39.839 just define that and then the next time the method is called we don't call method missing anymore because we

00:34:45.040 generated this code at runtime. So we could kind of think of this as a as a JIT compiler. Uh of course we wouldn't

00:34:51.599 actually write this in your in in our code, right? Please don't.

00:34:58.560 Um but this is an example that could be considered one. So we generate code at

00:35:04.560 runtime. I I think of them as generating code at runtime. They're usually lazy. We try to be lazy about it. Meaning that

00:35:10.560 they're late. They're kind of late. They do it later. Uh and the another important aspect is that they they

00:35:16.960 should speed up our program. This is uh unfortunately I have written JIT

00:35:22.800 compilers that do not speed up programs. So yeah, this is I think this is an

00:35:28.560 important aspect. Another question people ask me are like, you know, how how can a JIT speed up our code? Like,

00:35:36.079 how can it do that? It has to do the JIT compiler has to generate code that does exactly the same thing as our program

00:35:42.160 did originally. If it's doing exactly the same thing, how can it possibly be any faster than our byte code

00:35:47.920 interpreter? So, I want to talk about that a little bit. Uh, the JIT and the interpreter must match. As I was saying,

00:35:54.960 whatever whatever code the JIT compiler produces, it must have exactly the same behavior as the interpreter. So, how can

00:36:01.760 they like how can we fix anything? Uh, one one thing that we can do with a JIT

00:36:08.240 compiler is that we can eliminate the interpreter overhead. So, this interpreter has overhead and typically

00:36:13.520 when we're running a program in Ruby, we'll have something that looks like this. So we have our CPU uh we have

00:36:19.440 Yarve which is our byte code interpreter and on top of that we're running our Ruby code. Uh the interpreter is running

00:36:26.160 that running that Ruby code on the CPU. Yarve is a is a C program and that C program is running code on your your

00:36:33.280 actual CPU. The way I think of a the way I think of a JIT compiler is it's taking

00:36:38.800 this it's getting rid of this Yarve step here. It's basically moving promoting your code to running directly directly

00:36:45.520 on the CPU. I can kind of think of this as like well you're running your code inside of a docker container. Uh but

00:36:51.680 somehow you're able to escape that and run on instead of running inside of a container, you're running on the bare

00:36:57.520 metal itself. And the JIT compiler is just doing that at runtime for you automatically. The other way that we can

00:37:03.520 speed up uh speed up programs is by caching values. So here's an example

00:37:08.880 again back to our Fibonacci sequence because that's what what we do speed up Fibonacci. Um we have an example here of

00:37:15.359 the number 35 and you can see if we look at the Yarvite code we have a literal 35

00:37:20.400 in there. We have that number. Uh what we can do in the JIT compiler is we can say hey I'm going to actually embed that

00:37:26.960 number directly into the machine code. So when we compile this, if you were to disassemble the machine code, you might

00:37:32.480 see something like this where we have the number 35 directly directly in there. So that's another way another way

00:37:39.040 that we can speed this up is rather than loading that 35 from memory, we already have it have it in the machine code. Uh

00:37:45.520 another thing that we can do is speculate. We can speculate on values and we're able to do this in a way that

00:37:51.359 the interpreter cannot do. So let's say we have code that looks like this. Sum 2 + 5. uh our JIT compiler might say,

00:37:58.720 "Hey, you know what? I don't think anybody is going to monkey patch plus

00:38:03.839 hopefully." So in advance, what it can do is it can say, "I'm just going to add those two

00:38:09.359 numbers together. We're going to get the number seven and we're just going to keep that there and return that rather

00:38:14.720 than executing the plus value itself." But what's cool that a cool thing a JIT compiler can do that uh our interpreter

00:38:21.680 can't do is it can deoptimize this. So let's say somebody does monkey patch plus like it did that it did the

00:38:27.119 constant folding and then all of a sudden somebody monkey patches plus our compiler can detect that and say oh you

00:38:33.280 know what uh I messed up my speculation was wrong I'm going to fall back to this

00:38:38.800 particular implementation so I'll just go back to calling plus on the number and then do whatever the interpreter

00:38:44.160 would have done. Uh another thing we can do is we can eliminate type checking.

00:38:51.200 Yeah, everyone. Wait, who who likes type checking? Yeah, type checking. Well, we

00:38:57.200 don't have to we don't have to do it. We can eliminate this typeing. This is a different type.

00:39:02.320 This is a different type of type checking. I'm going to explain this here. Uh when we have code like this,

00:39:07.760 our friend Fibonacci again, uh when we have code like this, when it runs, we have to check like, okay, uh when we do

00:39:14.640 this comparison, is n an integer? The interpreter has to check this. Is n an integer? If so, I'm going to do an

00:39:20.079 integer comparison. And then when we do minus, it's like, oh, is n an integer? If so, I'm going to do minus. In both of

00:39:26.880 these cases in our in our compiler, we can say, well, I know that n is an

00:39:33.040 integer on the first one. I did that test. I was able to do that type check, and now there's no reason for me to do

00:39:38.320 it later on in the program because I know for a fact that n is an integer. So, it's able to eliminate these two

00:39:44.880 particular type checks. Uh so these are different ways that a compiler can speed up the code that an

00:39:51.520 interpreter cannot do. So let's take a look at yjit and zjit uh and how they

00:39:58.240 work and the differences between them. Who's who is using widget in produ production? Anyone? Yeah. Oh my gosh, so

00:40:04.400 many people. That's awesome. Thank you. That's great. All right. Uh wit is a

00:40:11.119 lazy basic lazy basic block versioning compiler. LBBV. Uh, of course, this

00:40:17.119 already ships with Ruby. You all are using it. Many many many of you are using it in production. Uh, widget uses

00:40:22.720 a technique that was pioneered by Maxim, the author of the the author of the JIT compiler in her PhD thesis on lazy basic

00:40:30.640 block version. So, I'm going to describe how this compiler works and then uh move

00:40:35.839 on to Zjit, the differences between that and Zjit. So, L this is how LBBV works in in action. LBBV ah this compiler

00:40:47.760 discovers basic blocks lazily and what a basic block is is a straight line of

00:40:53.760 code that has no jumps in it. So no if statements nothing like that. So it's just code until we find an if statement

00:41:00.960 that is a basic block. So in this particular example when we compile the add method what widget will do is it'll

00:41:07.520 go here and it'll say hey I'm going to compile y equals z. all of a sudden it gets to the if statement and it's like oh this is the end of a basic block

00:41:13.920 because it could jump between uh one of these branches. So it compiles that basic block and then it waits. It

00:41:21.040 basically waits and it says which side of this if statement am I going to execute?

00:41:26.720 Let's say it executes this side of the if statement the top one. In that case it'll compile that that side of the if

00:41:32.400 statement. it'll create a new basic block until there's another jump again which is a and after that uh we'll

00:41:39.280 create a third basic block down here that's just return y. So we end up with something that looks like this. So we we

00:41:44.880 compile compile like that. And one thing to notice is that this code right here

00:41:50.480 it never got compiled. We didn't compile it because we didn't actually use it. That never happened. So what this means

00:41:57.599 is YJI gives us really really fast warm-up because we're compiling as

00:42:02.720 little code as possible. Uh another nice thing is that we have low memory overhead because again because we're

00:42:08.880 compiling as little code as possible and we also have low overhead on type discovery. So what I mean by this is

00:42:15.440 when YJIT pauses at those particular locations, it's able to look at all the values on the stack and say, "Oh, n

00:42:22.480 that's an integer. I know that's an integer." and then generate code that's specific for that integer.

00:42:28.480 Some of the downsides of widget though are that it's uh register allocation is a little bit harder. Uh whenever we're

00:42:36.079 dealing with variables in our program, we want the compiler to keep those variables in registers and that's

00:42:41.680 because registers are faster to access than memory. So we would really really want to keep those keep those values

00:42:47.839 inside of registers. Unfortunately, registers are a finite ah I didn't went too far. Registers are a finite

00:42:53.920 resource. So we can't just put all of our variables in registers. We have to only put some of them in registers some

00:42:59.359 of the time. So an issue is like let's say we're compiling this code with y.

00:43:05.440 I've rewritten the code a little bit here, but the functionality is exactly the same. Let's take a look at this. We have y0 at the top. Uh we're defining

00:43:12.720 two other variables y1 in both branches of the if statement and then we return

00:43:17.839 y1 at the end. So let's say y compiled this and we

00:43:23.119 turned this into machine code and we decided that this value y0 should go into the x0 register on my crappy ARM

00:43:31.680 machine. So we put it we put it in there and that's where we decide to put it. Now

00:43:37.280 now when y goes to compile the other branch like let's say we execute this code again and we want to compile the

00:43:43.280 else statement right here. The question is which register do we use? We have to have

00:43:50.560 maintain the knowledge that that other side of the if statement used x0 for the register the value of y1. And this is

00:43:58.000 actually a very very difficult problem. This particular this particular case this if statement is a a big problem in

00:44:05.359 compilers and one of the main things that folks focus on. So the way that we handle this in widget is we actually

00:44:11.200 just end up writing these local variables into essentially known locations in memory. they're written to

00:44:16.800 known locations. So rather than keep track of all the registers, we just basically treat treat it as if it has an

00:44:23.599 infinite number of registers where those registers are essentially just memory. So uh I'm not going to get into the

00:44:30.720 details uh but as we discussed reading memory is not as fast as reading registers. So we'd prefer to keep things

00:44:36.319 in registers. So comparing this with a method-based compiler, a method-based compiler is a little bit different. it

00:44:41.760 can say like, "Oh, okay. I'm just going to look at the entire method and it'll compile both sides of the if statement."

00:44:48.319 And because it has the context of all the variables used in either side of those if statements, it's able to figure

00:44:53.680 out like, oh, okay, in one side we stored y1 in x0, so we're going to do

00:44:58.720 the same thing in the other side. And then all of our code works together. So, we're able to keep those values in registers 100% of the time rather than

00:45:05.200 writing them spilling them to memory uh a lot of the time. So another nice thing

00:45:10.720 with uh methodbased compilers is it's a little bit easier for us to fold constants. Uh as I said we can do more

00:45:16.960 efficient register usage. The other thing is it's easier to learn a little bit. Uh a lot of the compiler

00:45:23.200 documentation out there they focus on methodbased compilers. You can use traditional uh traditional resources for

00:45:29.200 uh compiler theory on a on a method-based compiler. Now the downside though is that it might compile too much

00:45:35.520 code as we saw earlier like if we don't take one side of that if statement a methodbased compiler is going to compile

00:45:42.000 it where where widget would have done nothing. Another downside is that tracking types

00:45:47.359 is a little bit harder. We're not really pausing with we're not pausing in the

00:45:52.560 actual code with a method-based JIT. So what we have to do is we have to start tracking types as the interpreter the

00:45:58.720 interpreter is executing. So we had to add we had to add some infrastructure for for uh tracking types. So I guess

00:46:06.319 the idea with Zjit was we thought you know what if what if what if we could

00:46:12.640 take what we had learned from YJIT uh and like how to actually put a JIT

00:46:18.400 into C Ruby, how to deploy it, uh how to do development on a JIT compiler, how to

00:46:24.640 make it low overhead. What if we could take all of those techniques and apply those to a method-based compiler? And

00:46:30.400 that's where we came up with the idea for Zjit, a next generation JIT compiler

00:46:35.440 uh which is essentially our method-based JIT. Now, I am very very excited about this work. However,

00:46:41.520 uh I would like to temper your expectations a little bit. It is very very new. So, if I if I can make a

00:46:49.040 humble request that you lower your expectations of this chick compiler, a little bit lower,

00:46:56.240 this is probably good. Uh, it's it's not rock solid at the moment. If you go check out if you go

00:47:02.560 check out uh Ruby Edge and you try to run Ruby Edge in production like all of you are going to do after this

00:47:08.160 presentation, right? Yeah. Yeah. Yes. Uh you can use zjit if you do ruby-

00:47:16.000 zjit. Uh but it's probably not going to be as fast as yit. Wait, no, not

00:47:21.359 probably. It will it will not be as fast as widget, but we are going to get it there. So with that said, I want to talk

00:47:27.440 about like what what can I do today? Like what can I do today to write JIT friendly code?

00:47:34.319 So as a as a Rails developer, why do I care about this and what can I do about it? Now, me working on compiler stuff, I

00:47:43.200 really want to say to you, write whatever code that you want because the idea is like you should be able to write

00:47:49.839 any code that you want to and then we'll just figure out how to speed it up so that you don't have to change anything.

00:47:56.079 But, uh that is theoretically how things should work. But the the truth is that

00:48:01.200 some patterns are easier to speed up than others. So, otherwise like why you

00:48:06.960 know why bother profiling your code? just wait for the newer version of Ruby. But sometimes we do we do want to speed

00:48:12.319 things up. Uh there's an opportunity cost. So you can wait for us to improve the JIT compiler or you can write code

00:48:19.680 that is a little bit more friendly and have those speed ups today. So do you want to wait or do you want to have

00:48:24.880 speed ups today? And I'm going to give you one. I said I was going to give you pro tips, but I'm going to give you just

00:48:31.040 one pro tip. We are going to monomorphize these pro tips.

00:48:38.319 Thank you, Uffuk. Yes. So, uh,

00:48:44.559 we're gonna talk. Thanks. We're going to talk about monomorphizing call sites.

00:48:49.599 This is a a very dumb compiler joke. I'm so sorry. Um, in order to describe to

00:48:55.440 you what this is, first I want to describe polymorphic polymorphic call sites. We're all familiar with

00:49:00.559 polymorphism, right? We use polymorphism every day. Yes. Yes. We write an O.

00:49:06.079 Polymorphism is great. Here's an example of a polymorphic call site. So we're we have two classes A and B. Uh and we're

00:49:13.599 passing instances into call fu. We call foo on the thing. This is what we call a polymorphic call site. And it we call it

00:49:20.480 polymorph polymorphic because it sees different types. So um computer vision is getting really

00:49:27.680 good. It's able to see see them. Uh yeah. So here we have an

00:49:33.119 instance of A and an instance of B both passed to the same method and this thing calls a method on two different types.

00:49:40.480 Now we'll compare this with monomorphic. All monomorphic means is that it just sees one type. So it's just one versus

00:49:46.960 many. So in this case we we only see one type here. Uh it's monomorphic call site. So because it only ever sees that

00:49:53.680 one a type. So if you have a polymorphic call site and you JIT compile that that

00:50:00.960 code, the JIT code is going to end up with a whole bunch of tests saying like, okay, is this an instance of A? Is it an

00:50:08.000 class A? Is it an an instance of class B, is it an instance of class C, etc., etc., etc. So the more types that you

00:50:13.920 see there, the longer it's going to take to find that particular method and call it. So if you can monomorphize that call

00:50:21.119 site, make it only one type, you can actually speed up that call site very nicely. And I have an example like a real world example of it today. Here um

00:50:28.400 we had a pull request for visitor methods in the uh in Prism. And we were

00:50:34.800 able to see a 13% a 13% uh speed improvement over Ruby 34 by

00:50:40.480 monomorphizing call sites inside of inside of Prism. Uh and this isn't even

00:50:45.839 with the JIT compiler. This is just with a normal interpreter. So this particular pro tip will work for you whether you're

00:50:51.119 using YJIT or not. Uh so the other thing is it's not just about types like which

00:50:57.760 class you use. It's also about instance variables as well. So we can think of this here as a polymorphic instance

00:51:03.680 variable read because depending on the branch that you take and initialize you'll end up with two different shapes.

00:51:09.920 So the order in which instance variables are set actually matters. So the thing that I want to convey to

00:51:16.559 you is like I don't want to say to you you know don't use polymorphism. Polymorphism is terrible. I think it's

00:51:21.839 great. I really think it's great. But what I want to convey is you should use useful types. Uh set set instance

00:51:30.240 variables in a consistent order. So try to set all of your IVs in a consistent

00:51:35.359 order. Don't randomly set them. When I say useful types, like what does that mean? What is a useful a useful type?

00:51:41.920 When I say useful, I mean useful to your application. So something that actually provides you with value. So for example

00:51:48.000 here like maybe maybe you have multiple different credit card processors uh and you want to have some sort of

00:51:55.040 strategy object that changes depending on the type of credit card processor or process payment processor backend that

00:52:01.440 you're using. So in this case we have an example where polymorphism is really helping out our application. It's

00:52:07.280 encoding business logic. It's making the code easier to understand and easier to maintain. This is this is something that

00:52:14.079 should be like celebrated and something that we should be doing in our code. When I say like low value polymorphism,

00:52:20.880 what I'm talking about is code code that looks kind of like this where we've got a cache and we're passing a key to the

00:52:26.800 cache and we're just calling 2S on that thing. And the reason we're calling 2S on that thing is because we want to take

00:52:32.480 either strings or we want to take symbols as our inputs and we need them to be consistent when we look it up in

00:52:38.400 the cache. So we're just calling to s on that. So, I don't really think that this type of code is this is my this is my

00:52:45.040 very hot take on stage today. I don't think that this type of code is very valuable because like what what are we

00:52:51.599 actually getting out of this? You could just change the caller to use a string instead. So, we could say, "Oh, instead

00:52:57.119 of using a symbol here, let's just use be consistent and always use a string in this case." Or even better, you're like,

00:53:03.359 "No, no, no, Aaron. I need to use a symbol there. I have to got to use a symbol. We don't know where the data is coming from. It could be a symbol. we

00:53:10.079 need to make it a symbol. So maybe instead of uh doing the 2S here, we could actually change the caller and

00:53:15.839 say, well, let's just call 2S in this case. And if you do that, you're ending up with two monomorphic call sites. And

00:53:23.359 of course, well, since it's always a consistent type going into the lookup method, we can just remove the 2s and

00:53:28.640 say, well, now we only have a single monomorphic call site. So this is the type of like in my opinion low value

00:53:36.000 polymorphism that we should be removing from our removing from our applications. Uh a famous person once said uh keep

00:53:44.720 useful polymorphism remove useless polymorphism.

00:53:51.599 Love quoting myself. All right let's let's wrap this up. Uh we talked about

00:53:56.640 ractors. Uh we talked about parallelism. We talked about weird bottlenecks which I thought were very very fun. Uh we

00:54:03.119 talked about how Ractor new can be thought of as a no GVL block. We talked about JIT compilers uh why how and why

00:54:10.960 they work. Uh we talked about the differences between Zjit and YJIT as well as writing JIT friendly code with

00:54:16.720 by monomorphizing our code. So I want all of you to please please upgrade to

00:54:22.000 Ruby 3.5 now. Right now. I'll wait. I know we have a party coming. Oh, don't

00:54:28.720 no don't get hold on one more thing. I have a question for all of you. Um, do

00:54:34.880 do you do you allocate objects in your Rails app?

00:54:40.319 Yeah, object allocate. Yes, let's cheer on object allocation. Woo.

00:54:45.599 Uh, I love asking questions that I know the answer. Like, yes, we allocate. Woo. Um,

00:54:53.920 allocations are much much faster in Ruby 3.5, which is I hope I hope is a good

00:54:59.200 reason for all of you to upgrade. I want to show you a benchmark. We we made them faster. Uh, here's a benchmark. We have

00:55:04.559 a user class here, and we're going to instantiate it 500,000 times. Uh, we

00:55:11.359 actually made this 70% faster on Ruby 3.5 than it is in Ruby 3.4. So,

00:55:24.160 Please, please, please upgrade. It's been an honor to be here with you this year. I'm so happy I could come to

00:55:29.520 Amsterdam. Oh, wait. I have one more joke. So,

00:55:34.880 uh I was going out going out to dinner here in Amsterdam and I was really really worried uh whether or not they

00:55:41.200 would split the bill for me. Um but it turns out that everybody here is dining Dutch.

00:55:49.839 Oh, come on. All right. Thank you.

Aaron Patterson

explore all talks recorded at Rails World 2025

Explore all talks recorded at Rails World 2025

Rails World 2025