Summarized using AI

ZJIT: Building a Next Generation Ruby JIT

Maxime Chevalier-Boisvert • April 17, 2025 • Matsuyama, Ehime, Japan • Talk

In his talk titled "ZJIT: Building a Next Generation Ruby JIT" at RubyKaigi 2025, Maxime Chevalier-Boisvert discusses the development and challenges of the new JIT (Just-In-Time) compiler for Ruby. The presentation covers the history, status, and future expectations for the ZJIT project, comparing it to its predecessor, YJIT.

Key Points Discussed:

  • Background and Motivation:

    • The YJIT project, initiated in partnership with Shopify, successfully improved Ruby's performance, delivering significant speed-ups on various benchmarks. However, YJIT is reaching a performance plateau, highlighting the need for a new compiler to continue enhancing Ruby's efficiency.
  • Development History of YJIT:

    • Maxime highlights the creation of the YJIT, its benchmarks, and notable achievements, such as 20% improvements over Ruby 2.7, particularly on Ruby's crucial Upcar benchmark.
    • YJIT saw widespread deployment in systems like Shopify, Discourse, and Mastodon, with reported speed increases of up to 30%.
  • Introduction to ZJIT:

    • ZJIT aims to overcome the limitations of YJIT by adopting a new architecture, focusing on maintainability, extensibility, and long-term viability in Ruby.
    • The design shifts from a prototype to a method-based JIT architecture, which is more standard and proven effective in compile operations.
  • Core Architectural Changes:

    • ZJIT will incorporate its own Intermediate Representation (IR), moving away from directly compiling Ruby's YARV bytecode into machine code, allowing for better optimization.
    • Introduction of features such as fast JIT-to-JIT calls using CPU call instructions for improved performance in method calls, aiming to drastically reduce overhead and increase throughput.
    • Plans for saving and reusing compiled code to enhance warm-up times and recompile cycle efficiency.
  • Current and Future Status:

    • Development for ZJIT commenced just months prior but has already made significant progress, including essential features like fast method calls.
    • A timeline anticipates further development throughout the year, leading up to Ruby 3.5, which will maintain backward compatibility with YJIT while introducing ZJIT.
    • Maxime encourages collaboration and invites talents to join the Ruby and Rails infrastructure team at Shopify.

Conclusions:

Maxime emphasizes the critical need to enhance Ruby's performance not only to thrive but also to keep pace in a rapidly evolving technical landscape. The transitions from YJIT to ZJIT represent an essential evolution to fulfill Ruby's long-term goals of speed and efficiency.

ZJIT: Building a Next Generation Ruby JIT
Maxime Chevalier-Boisvert • Matsuyama, Ehime, Japan • Talk

Date: April 17, 2025
Published: May 27, 2025
Announced: unknown

RubyKaigi 2025

00:00:16.560 uh so hi everybody uh my name is Maxim Shar i work at Shopify as part of the
00:00:22.160 Ruby and Rails infrastructure team and today I'm here to tell you about uh Zjits a project to build a next
00:00:29.039 generation uh Ruby Jet compiler uh so in this talk I'll give a short history of YJITS i'll talk a
00:00:36.719 little bit about why we want to build a new JIT uh the limitations of YJIT uh
00:00:41.920 two key design changes that we want to make with this new compiler uh two new
00:00:47.440 major features that we want to include in Zjit uh the current uh status of the
00:00:52.800 the project and what's to expect for uh Ruby 3.5 so first a short and incomplete
00:00:59.680 history of YJIT uh I joined Shopify in 2020 and formed a team uh with two other
00:01:06.000 engineers and uh we built uh what was first called uh microjet which was kind of a prototype JIT compiler uh we had a
00:01:13.680 time budget of 3 months uh we wanted and we got up to 10% uh speed up on smarter
00:01:19.360 benchmarks with this prototype and we managed to outperform uh vanilla Ruby by
00:01:24.400 1% on the liquid template render ring benchmark but unfortunately this prototype underperformed on on real
00:01:30.960 bench because it spent too much time entering and leaving Jet code uh but still uh Ruby and Rails infrastructure
00:01:37.520 was uh kind enough to uh give us the green light to uh work on on a real jet
00:01:43.439 and this time we had nine months to deliver double digit speed ups on op carrots single digit speed ups on rails
00:01:48.560 bench and a low warm-up time and we ended up having better success than expected uh wet worked even better uh
00:01:55.920 than the objectives we had initially set we got some clear double-digit speed ups
00:02:01.200 on uh realistic benchmarks sometimes over 20% and we were able to run all of
00:02:06.640 uh Shopify storefront render infrastructure and serve uh real web
00:02:12.239 requests uh I first presented uh WJET at Rubikagi Takeout 2021
00:02:18.879 uh with some great performance numbers and following this uh Takashi Kokun uh
00:02:24.560 made an invitation for us to uh to open a ticket to uh upstream Wget into Cir
00:02:31.160 Ruby uh this was very wellreceived uh Matt saw it and uh approved it and
00:02:37.599 shortly after uh it was merged and so wget went on to be part of our Ruby 3.1
00:02:45.040 uh over the years we've had better and better performance numbers uh particularly on upcare which is the most
00:02:51.680 important Ruby benchmark uh with Ruby 3.3 we we managed to to reach a 3x speed
00:02:57.920 up over uh the Ruby 2.7 interpreter um and in in all of my talks I like to
00:03:05.360 have these slides uh with people who have contributed to uh YJ just to remind people that it's it's a team effort it's
00:03:12.159 not just me and the team has kind of uh had many people contributing over over the years uh eventually Takashi Kokubun
00:03:19.519 uh joined us and made some some amazing contributions to to Wget as well and the
00:03:25.120 team had more and more contributors at different points in in time and uh in 2025 one slide doesn't even suffice so
00:03:32.000 we we have a new amazing uh contributors helping us make our Ruby
00:03:37.159 better um we've also seen many deployments of of white jet the first big deployment was at Shopify on our
00:03:44.080 storefront render infrastructure initially we had something like 10% end to end speed up so this is like the the
00:03:49.760 total time that it takes to do the the web request including uh database and nio and now those speed ups are closer
00:03:56.959 to to 20% uh why is also been deployed at mastadon the social at discourse and
00:04:04.720 at many other uh tech companies over the years uh with people reporting uh
00:04:10.400 excellent results sometimes all the way up to 30% and even today in 2025 there are still
00:04:16.880 people uh enabling Y and getting uh great performance results uh but I would
00:04:22.320 say for me something something uh clicked the last time I was uh here in Okin here in Okinawa for Rubik 2024 uh
00:04:31.600 someone from uh from Zenesk came to talk to me and they said uh we just enabled
00:04:37.280 uh whitejits at Zenesk right before um leaving for Ruby Kai and we're getting
00:04:44.280 20% 20% improvements just by flipping a switch this is amazing thank you uh but
00:04:49.680 the thing that struck me is that they said they were running Ruby uh 331 and uh at Shopify we were also running the
00:04:56.160 the latest Ruby uh I knew GitHub was also running the latest Ruby because they they had uh deployed wet and and
00:05:03.199 this made me go kind of like wow you know when I joined Shopify I remember people being several Ruby versions
00:05:08.560 behind and we were kind of struggling to get people to upgrade to the latest Ruby
00:05:14.000 but now all of a sudden it's like if you can tell people hey if you get the latest Ruby your software is going to
00:05:19.680 run 15% faster you don't have to ask twice in fact maybe you don't even have to ask at all um so as of today uh we
00:05:28.479 track the performance of of uh YJIT on speed.wyg.org and uh we've got some some
00:05:34.639 pretty rate results on on benchmarks uh we can see like on the liquid render benchmark with uh it's actually Y 3.4 we
00:05:42.960 get about 2.8x speed up over the interpreter on average we get about 2x
00:05:48.560 across our benchmarks these are all benchmarks that are meant to be like representative of like real world uh
00:05:54.479 workloads that people would use Ruby for uh primarily web workloads rails Rails bench were over uh 2x faster and there's
00:06:01.759 something like a 6 to 8% uh performance difference between uh Y36
00:06:07.520 and uh the latest uh YG development version um so congratulations we we did
00:06:14.400 it uh we transformed the Ruby performance landscape uh which is great
00:06:20.080 uh but but there's always a butt uh I feel like we're kind of we're kind of
00:06:26.240 hitting a plateau with the performance of of YJ to be honest and what I mean by that is that the first version of YJ had
00:06:32.639 20% speed ups over the interpreter and then the next version after that we had like a 15% speed up on top of that 20%
00:06:40.080 speed up and then with uh wedg uh widget 3.4 we're getting like maybe a six six
00:06:47.039 or eight% speed up but it's not uniformly distributed there are some some benchmarks where it can be like 20%
00:06:54.000 but there are some benchmarks where we're not really seeing that much of a speed up uh Tammy reported they upgraded
00:06:59.919 from uh Y33 to Y34 and it got about the same uh performance with Y34 as as with
00:07:07.680 33 uh so well maybe that's not entirely our fault because you know um web apps
00:07:14.240 have database requests and obviously widget cannot make your database requests uh run any faster it it just
00:07:22.479 can't do that because it's not a database JIT um but still there's also
00:07:28.000 this this this microbenchmark that went around Twitter which is like a billion nested loop iterations and um well this
00:07:35.360 is actually Ruby without without Yet um with YJ it's it's about the same speed
00:07:40.720 as uh as PHP which you know it's nice like with Y we can double the speed of
00:07:47.919 this microbenchmark and we can say like haha take that take that Python but
00:07:54.680 uh but if you compare uh you know even YJ to to the JavaScript JIT like even
00:08:02.080 even if Ruby can do in 10 seconds like it's like an order of magnitude difference and this is what the the
00:08:08.560 microbenchmark looks like it's like two nested loops it does something with an array so obviously you can look at this
00:08:13.759 benchmark and you can say well that's not that's nothing like Rails that's not it's not a real benchmark like like you
00:08:21.520 don't run this in production obviously uh you know this micro benchmark is is
00:08:27.120 not it's not at all representative but at the same time if this benchmark is so basic uh why do we perform so poorly
00:08:34.640 because obviously like um it's a very simple benchmark like we should be able to run this faster and
00:08:40.880 probably your real applications has loops inside of it too so if we could speed up speed up code like this then
00:08:47.279 probably we could make uh all Ruby code run faster so we could do better and uh
00:08:53.360 I yeah I really I really want us to do better like I'm I'm sorry we I'm sorry
00:08:58.560 we couldn't make those those micro mesh works run faster so I asked Chad GPT how this is story in Japan and it said
00:09:06.839 u just kidding um so obviously JavaScript is more
00:09:13.760 faster on this microbenchmark why couldn't Ruby match JavaScript speed on those simple loops we should make that
00:09:18.839 happen that's not possible everywhere right it's like even if we make a better Ruby JIT we can't we can't guarantee a
00:09:25.040 10x speed up on on everything because for example like how fast how much faster can you make string concatenation
00:09:31.120 with a JIT like I think actually you can make it run faster but you can't make you know me copy run 10x faster but
00:09:38.000 still what if we could double wet speak performance like there's many obvious optimizations that we're currently not
00:09:43.839 doing and those speedups should and would translate into real workloads too because real workloads use loops as well
00:09:51.240 um I think performance is a feature like I've already talked about how people were so motivated to upgrade to Ruby to
00:09:57.680 retrieve for performance improvements history tells us we can never have enough compute i also think to some
00:10:03.200 extent like this is important because it's a matter of survival like if you want Ruby to survive long term
00:10:08.720 performance definitely matters like Ruby Ruby is a is a cool language it's a great language but if your language is
00:10:13.760 not fast I think it will eventually die just because of the relentless march of
00:10:18.959 progress like there's a relentless push to always optimize performance always reduce costs and in a world where single
00:10:26.640 core performance is is uh al also hitting a plateau uh we need to remove
00:10:32.399 all bottlenecks and so let's not let Ruby ever be a bottleneck let's try to build a truly a world-class Ruby
00:10:40.200 Jet so interesting introducing Zjit for the last two and a half months the YJ team has been working on Zjit it's not
00:10:47.839 just a better Yjit it's a prototype of a next generation Rubyjet that incorporates the learnings from last
00:10:53.519 four and a half years of YJIT works all of the hundreds of experiments that we've done uh both on benchmarks and
00:11:00.000 also using production data all the experience that that the YJ team has i have tremendous confidence in in the
00:11:07.480 YJIT and Zjet team and this jet is going to be designed to be more uh
00:11:12.720 maintainable and extensible uh hopefully hopefully a jet that could last for maybe like the next 20 plus years of of
00:11:18.880 C Ruby so that you know we don't need to uh to keep rewriting better JIT like
00:11:24.320 build something that we can build and maintain and extend for the long term um
00:11:31.040 so first the most important question I can ask how do you pronounce it some people say Zjit personally I prefer Zjit
00:11:37.200 because it sounds more like Zjit you know um so let's talk about Zj's core
00:11:44.440 architecture um why architecture widget directly compiles uh
00:11:49.760 the y by code into machine code it's a relatively simple architecture which is was good for us to get something uh
00:11:56.079 working quickly like like I said earlier originally uh wet was built with a very
00:12:01.279 limited time budget you know like uh three people 9 months uh we grew widget
00:12:07.600 incrementally over time we added a crossplatform assembler which allowed us to support both x8664 and ARM 64 so that
00:12:15.600 makes it possible for uh widget to run on uh new MacBook M1 M2 M3 uh laptops
00:12:22.399 for example uh we've added some uh basic inlining for for very small methods like
00:12:28.000 if if you have a method that returns a constant we can inline that we added custom code gen for core C methods we
00:12:34.480 added context metadata compression to uh make it use uh less memory but it we're getting to a point I think where we're
00:12:40.480 hitting the limits of WIT's current architecture uh it's getting difficult maybe to extend and improve this this
00:12:47.720 architecture uh which is why I think uh we need to to make some changes uh
00:12:53.440 engineering is all about tradeoffs uh finding the best trade-off given uh different constraints so so it's kind of
00:12:59.120 like the classic you know uh fast good and cheap uh pick two was based on lazy
00:13:04.399 busy block versioning which is a JIT architecture based on my my PhD research with the goal of like building a JIT
00:13:10.000 with good performance quickly that was like the aim of of my my research uh but for Zjits uh I want us to build a method
00:13:18.160 based JIT compiler that's not going to be based on lazy busy blog versioning and this is this is a deliberate choice
00:13:25.800 um there's kind of like a question like could we design a more advanced jet based on lazy basic block versioning
00:13:30.880 like I have some ideas on on maybe how we could do that uh but that's that's
00:13:36.240 kind of a research project like in in a way right it's like lazy busy blog visioning came from my PhD research but
00:13:43.040 if if we wanted to uh try to extend like this this kind of like exotic compiler architecture into something more uh we'd
00:13:50.880 be taking some amount of risks because there are many unknowns and I feel like I don't want to uh impose that risk on
00:13:56.959 the team and Shopify and the Ruby community uh because Ruby is not my my personal research project so for for
00:14:03.920 Zjit I want us to make a safer bet and go with a traditional uh established way to build a JIT uh an architecture that
00:14:10.800 has known a known design which helps us to minimize risk and uh means that we
00:14:16.000 have very high like likelihood of success uh the reason we think we have very high likelihood of success is
00:14:21.120 because uh most JITs are are method based jets and we know it works there's
00:14:26.160 no reason why it wouldn't work um so Zj is going to to have a design that I like
00:14:32.480 I want the design to be more standard like something you would read about in in a compiler textbook
00:14:38.320 uh with more standard foundations which doesn't mean that we can't build cool stuff on top but methodbased jets can
00:14:44.560 make for an architecture that's very modular and extensible uh with you know different parts that that we can swap in
00:14:50.240 and out uh as as we uh as we improve the design
00:14:55.600 uh and but yeah the the the benefits would be low risks because it's proven design that we know works and I think
00:15:02.160 this would also make maybe uh Zjit more accessible to newcomers um like a
00:15:08.959 problem that we have right now I think is that um like the knowledge of how YJIT works uh exists mostly at Shopify
00:15:17.600 and uh GitHub but we've had very few uh very few commits from Ruby core contributors so I think if we had a JIT
00:15:24.959 that had a more standard design it might make it easier for example for uh the rest of the Ruby community to to also
00:15:32.160 participate uh in the development of this jet which is good for Ruby long
00:15:37.320 term um standard doesn't have to mean boring like I think once we have stable foundations uh we can do many cool
00:15:43.519 experiments uh my PhD adviser Mark Philly and his PhD student Olivia Melson
00:15:48.880 published about static basic block versioning which is kind of an offshoot of my PhD work and I think this could be
00:15:54.240 done in a methodbased Jet context later if we want to but it can be uh
00:16:00.399 self-contained and a very low risk experiment as opposed to like part of the core foundation of the Jet compiler
00:16:06.440 itself um the next uh significant design change for Zjit is uh it's it's going to
00:16:12.800 have its own intermediate representation um so YJIT compiles Yarve
00:16:18.639 byte code directly into machine code which makes for a very simple architecture but this is not at all
00:16:24.480 optimal for a JIT compiler y byte code is designed for uh the MRI interpreter
00:16:31.040 if you want to maximize the performance of an interpreter basically like you want bigger instructions so you want to
00:16:37.839 uh minimize the dispatch overhead in your interpreter loops right so you don't want lots of tiny instructions you want bigger instructions that do more
00:16:44.360 work uh you also want those bigger instructions because you you can give your C compiler assuming it's written in
00:16:50.560 C a bigger chunks of code to optimize all at once right so bigger instructions
00:16:55.600 means it's better for the interpreter so you end up with like a very CISK design so to speak uh but this is kind of the
00:17:02.000 opposite of what you want for a Jet compiler uh Jet compilers typically have
00:17:07.280 what's called an intermediate representation this is like the the way the compiler internally represents uh code it's the compiler's internal
00:17:14.280 language and what you want from a JIT IR is to decompose a complex semantics into
00:17:19.839 like small uh composable primitives that enable kind of like algebraic um
00:17:25.679 operations and transformations on on code uh you want those instructions to have as little internal control for as
00:17:31.760 possible because that makes the code easier to uh to reason about and to optimize so you want you want something
00:17:38.080 that's more like a minimalistic risk-like instruction set uh if any of you have seen this amazing documentary
00:17:44.240 the key takeaway was that risk is good um so Zj is going to have an SSA based
00:17:50.720 IR ssa means a static single assignment um this is something I learned about in
00:17:56.960 university in a in a compiler class um it was originally developed at IBM in the 1980s and you can read about it in
00:18:04.160 many compiler textbooks it's it's basically like the most widely used compiler IR both for JIT compilers and
00:18:11.120 also for uh many ahead of time compilers it's kind of a de facto standard that's used by LVM and GCC and it's proven to
00:18:18.559 be uh flexible and and robust so again like I talked about having a standard design uh this this also goes uh
00:18:25.679 strongly in that direction um this is the original paper paper from PPP 1988
00:18:32.320 including the the formalism of the SSIR um so just to show you what that
00:18:38.240 might look like so here here's a very small example of a a factorial recursive factorial microbenchmark um the Yarve um
00:18:46.799 by code for for this microbenchmark looks something like this and then um in terms of of this is a dump of of uh
00:18:53.600 Zjit's IR the IR is divided into uh blocks not to be confused with basic
00:18:59.919 block versioning but just like control flow graph blocks uh and it has instructions like fixed num le is fixed
00:19:06.720 num less than so this checking if v5 is less than v2 uh we have if false this is
00:19:11.919 a branching instruction like basically if v8 is false jump to uh bb1
00:19:18.000 fixed num sub would be fixed num subtraction uh send without block direct
00:19:23.200 is is a recursive send where like the factorial uh calls itself and also something you can see is like all of the
00:19:30.080 instructions are annotated uh with type information we also have patch points
00:19:36.000 which allow us to deoptimize code like Okubin talked a little bit about this sort of thing in his talk yesterday and
00:19:42.240 we have guards basically here if V12 is not a fixed num uh exit to the
00:19:48.919 interpreter um now I want to talk a little bit about some some key features that are going to be part of
00:19:55.880 Zjit um so C Ruby one of the things that's challenging about C Ruby and
00:20:01.120 about YJIT is that calls method calls are very expensive uh because they have many corner cases that they need to to
00:20:07.400 handle um the Ruby VM also has two stacks has a a VM stack or value stack
00:20:12.799 and a and a CFP stack so there's lots of work in involved in like setting up a
00:20:17.919 complex CFP object uh widget can make that run faster like actually widget can
00:20:23.600 make microbenchmarks that are basically only calls run like 8 to 12x faster in C
00:20:28.720 Ruby but this is still probably like half of like the peak theoretical throughput that of what's possible and
00:20:35.919 also we have an issue where like we're generating tons of machine code and even though this we can make this
00:20:43.039 run really fast on microbenchmarks like massive code size is not good for your instruction cache it's not good for
00:20:49.360 instruction decoding like it slows things down elsewhere uh so we need to do something about this like other
00:20:54.880 languages like JavaScript have much faster calls uh so Ruby should too so in
00:21:00.799 in Zjit we want to have uh what we call fast JIT to JIT calls uh why JIT calls
00:21:06.080 are suboptimal for multiple reasons one is they use the the JMP like the jump instructions and they use a very long
00:21:12.880 code sequence to set up the VM and CP stacks but to get the maximum performance we want to use the CPU's
00:21:19.120 call and return instructions because those are better optimized and we want to directly use the C stack to uh to
00:21:25.760 make those calls as opposed to the the VM CFP stacks so we're designing Zjit directly to use the the the Cstack for
00:21:32.799 JIT to JIT calls so that means like when the JIT when a Jed function calls another Jed function it's going to use
00:21:38.640 the the C stack uh but when it wants when it needs to exit to the interpreter
00:21:44.080 uh it needs to pay a little bit of a performance penalty to to uh do this side exit so again it's like engineering
00:21:49.919 is all about trade-offs right like the the shinken can go really really fast but it can't make sharp turns but uh I
00:21:57.280 think we we can all agree that high speeded trains is still a very good idea uh so we have some early experiments
00:22:03.360 experimental results takashikubun got uh fast jit calls working in Zjit using uh
00:22:10.080 call and return instructions as as as of just a few days ago um and we we got the
00:22:16.880 recursive Fibonacci uh benchmark working um you shouldn't this is not how you
00:22:22.880 should compute Fibonacci numbers it's just like an exercise to test the performance of a of a method calls in in
00:22:29.280 Zjit so this is what the microbenchmark looks like um and as as you can see here
00:22:35.200 um y is about 10 times faster than the interpreter on this microbenchmark uh
00:22:40.559 and then zit is like another 60% faster i think we can even make it run faster and this we might be able to be like
00:22:46.880 twice as fast uh as as y eventually so this is very very encouraging uh the
00:22:53.919 second key new feature of Zjit um JIT compilation is kind of a balancing act um your JIT compiler has to juggle
00:23:01.360 multiple trade-offs one is the performance of the generated code the other one is the warm-up time then there's the memory overhead how much
00:23:07.440 memory the J compiler uses so your JIT compiler runs at the same time as your program which means the Jet compiler is
00:23:13.600 kind of like competing with the running program for for resources so JIT compilers effectively operate in like a
00:23:20.240 resource constraint environment where you need to minimize the amount of of memory and CPU time that your JIT uses
00:23:27.600 uh because otherwise you're taking those resources away from the running program but by optimizing to reduce memory and
00:23:34.400 CPU usage you're kind of limiting how effective your JIT compiler can be uh
00:23:40.000 and then we have another problem like Shopify has big uh has a certain number of servers bazillions of servers running
00:23:45.520 in large data centers and we're redeploying code sometimes multiple times a day on these three bazillion
00:23:51.400 servers but at the same time it's like when you're redeploying code in production probably 99.99% of the code
00:23:57.679 that you're deploying has not changed you know it's like maybe someone made a poll request and they changed they
00:24:04.960 changed like the variable food to be named bar and so it's like okay let's
00:24:10.440 redeploy all this code on lots of servers and like okay like recompile everything on all of those servers uh so
00:24:17.120 which just seems really inefficient right so what if we could save and reuse compilation work what if we could
00:24:23.200 serialize and persist machine code if we could do that then we could maybe hit the ground running and warm up much
00:24:28.720 faster than we could with uh with YJIT which also means potentially we could
00:24:34.000 spend more time compiling code uh and enable higher optimization levels because if you can JIT once and reuse
00:24:40.559 that work many many times later then it becomes worthwhile to spend more time
00:24:45.840 jitting in the first place uh so there's some challenges here like we need to save not just code but also metadata we
00:24:52.080 need to save the optimization information uh and also compile code can contain pointers to IC6 or Ruby strings
00:24:59.039 uh so we need some kind of like a mini a mini linker to handle these things so it's not trivial but but it is possible
00:25:05.919 there are JITs that do this uh already and if we could make this work in in Zjit this this could again be kind of a
00:25:13.000 gamecher uh so what is the current status of Zjits um well we started
00:25:18.960 development just like two and a half months ago we've already implemented our custom SSIR we have comparisons and fix
00:25:25.360 operations working control flow working fast jittojit calls constant and type propagation dead columination uh we
00:25:32.720 after two and a half months we can run simple microbenchmarks like the ones I've just showed you and we're faster
00:25:37.919 than the interpreter and even faster than y on some of those microbenchmarks but obviously like this is still very
00:25:44.080 early innings uh at this point um we've also opened a proposal to upstream
00:25:50.240 Zjgetjet to make development uh easier we've discussed this with uh the Ruby
00:25:55.440 core members which were uh very receptive uh Matt has expressed uh that
00:26:00.720 he has complete confidence in in the ZG team and so he's he's in uh in favor of
00:26:06.840 upstreaming so thank you Matt for the vote of confidence um upstreaming should probably happen in
00:26:13.440 the next three to five weeks would be my expectation we uh we have a command line
00:26:18.559 option to enable Zjit as with YJIT so Ruby-ZIET uh if you're interested you
00:26:24.720 should keep in mind that right now Zjit is not at all in in a usable state for u production or any like real workloads i
00:26:31.760 would expect that it's become is going to become more usable around like the second half of this year because right
00:26:36.799 now we're just like in a process of laying down the foundations but we're making really rapid
00:26:41.880 progress so what to expect for Ruby 3.5 uh the plan is for YJIT to still be
00:26:48.000 available in Ruby 3.5 because we want to guarantee that Ruby 3.5 uh ships with a working JIT so we're not
00:26:55.600 going to remove Y until we're confident that Zjit is faster and just as stable
00:27:00.640 but for now it's like we're putting uh YJ into mainten maintenance mode so that we can dedicate all of our all of our
00:27:08.159 mental energy um and all of the resources that we have to building uh this this much better uh much more
00:27:14.720 forward-looking JIT um when Ruby 2.5 comes out you you may or may not need to
00:27:22.159 uh run configure with enable Zjit to build it that's going to depend on the maturity level of Zjett around the time
00:27:28.480 of the Ruby 3.5 release so we're going to make that call closer to the end of the year uh we believe it should be
00:27:35.760 possible to have both y and zjit in the same binary uh to be determined but we think it's possible which means like uh
00:27:42.080 for ruby 2.5 you should be able to toggle both like you know ruby-c or and ruby- y if so so basically
00:27:50.240 like you shouldn't need to change anything in your production deployment but you should also be able to to try zit um kind of like in in a in a beta
00:27:58.399 state maybe um we're hoping to match zit uh w's performance for for this release
00:28:05.120 uh I think it's very likely we're going like my expectation is that we're going to start to be wed on more and more
00:28:12.159 benchmarks as the year progresses and hopefully hopefully you know wipe the
00:28:18.000 floor at the end of the year but obviously it's like we we can't necessarily like guarantee the the pace
00:28:23.120 at which things are are going to go so beating wget on larger more realistic benchmarks would take a little bit of
00:28:28.399 time uh the next steps so fast digit calls are already like working thanks to
00:28:34.720 Kokubin uh after that we're going to add the ability to side exit to the interpreter next few weeks uh which
00:28:40.320 Google explained in his talk that's going to enable us to more extensively test uh zjit with run the uh test suites
00:28:47.440 and also to run benchmarks on speed.org so what I what I I want is for like the second half of this year I'd like to
00:28:54.080 start tracking you know like oh Z is outperforming Wit on this many benchmarks out of this many you know
00:29:00.720 kind of like gify the thing a little bit for for the team make it fun uh we're going to gradually grow the feature set
00:29:07.200 of like what CJ just supports which is a lot of work um but it you know we'll get
00:29:13.279 through it and we're going to measure and optimize uh the compilation speed uh to make sure that it runs fast and uh
00:29:18.640 tune the compiler uh so that's it for me thank you for listening uh if you want
00:29:24.799 to learn more if you want to follow our work uh you can go subscribe to the Rails at Scale blog or check it
00:29:31.279 periodically you can also come talk to us after this talk and I should also know that uh Shopify's Ruby and Rails
00:29:37.200 infrastructure is hiring uh we're hiring compiler experts for Zjit we're hiring
00:29:42.480 uh C and Rust systems programmers as well as worldclass uh Ruby and Rails experts if you're interested uh in
00:29:49.840 working as part of uh uh as part of Ruby and Rails infrastructure uh you can use our QR
00:29:56.880 code to apply uh if you're interested and uh yeah that's it for me thank you
Explore all talks recorded at RubyKaigi 2025
Cristian Planas
Ryo Kajiwara
Alexander Momchilov
Soutaro Matsumoto
Kazuhiro NISHIYAMA
Mari Imaizumi
Charles Nutter
Misaki Shioi
+66