Summarized using AI

ZJIT: Building a Next Generation Ruby JIT

Maxime Chevalier-Boisvert • April 17, 2025 • Matsuyama, Ehime, Japan • Talk

In his talk titled "ZJIT: Building a Next Generation Ruby JIT" at RubyKaigi 2025, Maxime Chevalier-Boisvert discusses the development and challenges of the new JIT (Just-In-Time) compiler for Ruby. The presentation covers the history, status, and future expectations for the ZJIT project, comparing it to its predecessor, YJIT.

Key Points Discussed:

  • Background and Motivation:

    • The YJIT project, initiated in partnership with Shopify, successfully improved Ruby's performance, delivering significant speed-ups on various benchmarks. However, YJIT is reaching a performance plateau, highlighting the need for a new compiler to continue enhancing Ruby's efficiency.
  • Development History of YJIT:

    • Maxime highlights the creation of the YJIT, its benchmarks, and notable achievements, such as 20% improvements over Ruby 2.7, particularly on Ruby's crucial Upcar benchmark.
    • YJIT saw widespread deployment in systems like Shopify, Discourse, and Mastodon, with reported speed increases of up to 30%.
  • Introduction to ZJIT:

    • ZJIT aims to overcome the limitations of YJIT by adopting a new architecture, focusing on maintainability, extensibility, and long-term viability in Ruby.
    • The design shifts from a prototype to a method-based JIT architecture, which is more standard and proven effective in compile operations.
  • Core Architectural Changes:

    • ZJIT will incorporate its own Intermediate Representation (IR), moving away from directly compiling Ruby's YARV bytecode into machine code, allowing for better optimization.
    • Introduction of features such as fast JIT-to-JIT calls using CPU call instructions for improved performance in method calls, aiming to drastically reduce overhead and increase throughput.
    • Plans for saving and reusing compiled code to enhance warm-up times and recompile cycle efficiency.
  • Current and Future Status:

    • Development for ZJIT commenced just months prior but has already made significant progress, including essential features like fast method calls.
    • A timeline anticipates further development throughout the year, leading up to Ruby 3.5, which will maintain backward compatibility with YJIT while introducing ZJIT.
    • Maxime encourages collaboration and invites talents to join the Ruby and Rails infrastructure team at Shopify.

Conclusions:

Maxime emphasizes the critical need to enhance Ruby's performance not only to thrive but also to keep pace in a rapidly evolving technical landscape. The transitions from YJIT to ZJIT represent an essential evolution to fulfill Ruby's long-term goals of speed and efficiency.

ZJIT: Building a Next Generation Ruby JIT
Maxime Chevalier-Boisvert • Matsuyama, Ehime, Japan • Talk

Date: April 17, 2025
Published: May 27, 2025
Announced: unknown

RubyKaigi 2025

00:00:16.560 uh so hi everybody uh my name is Maxim Shar i work at Shopify as part of the
00:00:22.160 Ruby and Rails infrastructure team and today I'm here to tell you about uh Zjits a project to build a next
00:00:29.039 generation uh Ruby Jet compiler uh so in this talk I'll give a short history of YJITS i'll talk a
00:00:36.719 little bit about why we want to build a new JIT uh the limitations of YJIT uh
00:00:41.920 two key design changes that we want to make with this new compiler uh two new
00:00:47.440 major features that we want to include in Zjit uh the current uh status of the
00:00:52.800 the project and what's to expect for uh Ruby 3.5 so first a short and incomplete
00:00:59.680 history of YJIT uh I joined Shopify in 2020 and formed a team uh with two other
00:01:06.000 engineers and uh we built uh what was first called uh microjet which was kind of a prototype JIT compiler uh we had a
00:01:13.680 time budget of 3 months uh we wanted and we got up to 10% uh speed up on smarter
00:01:19.360 benchmarks with this prototype and we managed to outperform uh vanilla Ruby by
00:01:24.400 1% on the liquid template render ring benchmark but unfortunately this prototype underperformed on on real
00:01:30.960 bench because it spent too much time entering and leaving Jet code uh but still uh Ruby and Rails infrastructure
00:01:37.520 was uh kind enough to uh give us the green light to uh work on on a real jet
00:01:43.439 and this time we had nine months to deliver double digit speed ups on op carrots single digit speed ups on rails
00:01:48.560 bench and a low warm-up time and we ended up having better success than expected uh wet worked even better uh
00:01:55.920 than the objectives we had initially set we got some clear double-digit speed ups
00:02:01.200 on uh realistic benchmarks sometimes over 20% and we were able to run all of
00:02:06.640 uh Shopify storefront render infrastructure and serve uh real web
00:02:12.239 requests uh I first presented uh WJET at Rubikagi Takeout 2021
00:02:18.879 uh with some great performance numbers and following this uh Takashi Kokun uh
00:02:24.560 made an invitation for us to uh to open a ticket to uh upstream Wget into Cir
00:02:31.160 Ruby uh this was very wellreceived uh Matt saw it and uh approved it and
00:02:37.599 shortly after uh it was merged and so wget went on to be part of our Ruby 3.1
00:02:45.040 uh over the years we've had better and better performance numbers uh particularly on upcare which is the most
00:02:51.680 important Ruby benchmark uh with Ruby 3.3 we we managed to to reach a 3x speed
00:02:57.920 up over uh the Ruby 2.7 interpreter um and in in all of my talks I like to
00:03:05.360 have these slides uh with people who have contributed to uh YJ just to remind people that it's it's a team effort it's
00:03:12.159 not just me and the team has kind of uh had many people contributing over over the years uh eventually Takashi Kokubun
00:03:19.519 uh joined us and made some some amazing contributions to to Wget as well and the
00:03:25.120 team had more and more contributors at different points in in time and uh in 2025 one slide doesn't even suffice so
00:03:32.000 we we have a new amazing uh contributors helping us make our Ruby
00:03:37.159 better um we've also seen many deployments of of white jet the first big deployment was at Shopify on our
00:03:44.080 storefront render infrastructure initially we had something like 10% end to end speed up so this is like the the
00:03:49.760 total time that it takes to do the the web request including uh database and nio and now those speed ups are closer
00:03:56.959 to to 20% uh why is also been deployed at mastadon the social at discourse and
00:04:04.720 at many other uh tech companies over the years uh with people reporting uh
00:04:10.400 excellent results sometimes all the way up to 30% and even today in 2025 there are still
00:04:16.880 people uh enabling Y and getting uh great performance results uh but I would
00:04:22.320 say for me something something uh clicked the last time I was uh here in Okin here in Okinawa for Rubik 2024 uh
00:04:31.600 someone from uh from Zenesk came to talk to me and they said uh we just enabled
00:04:37.280 uh whitejits at Zenesk right before um leaving for Ruby Kai and we're getting
00:04:44.280 20% 20% improvements just by flipping a switch this is amazing thank you uh but
00:04:49.680 the thing that struck me is that they said they were running Ruby uh 331 and uh at Shopify we were also running the
00:04:56.160 the latest Ruby uh I knew GitHub was also running the latest Ruby because they they had uh deployed wet and and
00:05:03.199 this made me go kind of like wow you know when I joined Shopify I remember people being several Ruby versions
00:05:08.560 behind and we were kind of struggling to get people to upgrade to the latest Ruby
00:05:14.000 but now all of a sudden it's like if you can tell people hey if you get the latest Ruby your software is going to
00:05:19.680 run 15% faster you don't have to ask twice in fact maybe you don't even have to ask at all um so as of today uh we
00:05:28.479 track the performance of of uh YJIT on speed.wyg.org and uh we've got some some
00:05:34.639 pretty rate results on on benchmarks uh we can see like on the liquid render benchmark with uh it's actually Y 3.4 we
00:05:42.960 get about 2.8x speed up over the interpreter on average we get about 2x
00:05:48.560 across our benchmarks these are all benchmarks that are meant to be like representative of like real world uh
00:05:54.479 workloads that people would use Ruby for uh primarily web workloads rails Rails bench were over uh 2x faster and there's
00:06:01.759 something like a 6 to 8% uh performance difference between uh Y36
00:06:07.520 and uh the latest uh YG development version um so congratulations we we did
00:06:14.400 it uh we transformed the Ruby performance landscape uh which is great
00:06:20.080 uh but but there's always a butt uh I feel like we're kind of we're kind of
00:06:26.240 hitting a plateau with the performance of of YJ to be honest and what I mean by that is that the first version of YJ had
00:06:32.639 20% speed ups over the interpreter and then the next version after that we had like a 15% speed up on top of that 20%
00:06:40.080 speed up and then with uh wedg uh widget 3.4 we're getting like maybe a six six
00:06:47.039 or eight% speed up but it's not uniformly distributed there are some some benchmarks where it can be like 20%
00:06:54.000 but there are some benchmarks where we're not really seeing that much of a speed up uh Tammy reported they upgraded
00:06:59.919 from uh Y33 to Y34 and it got about the same uh performance with Y34 as as with
00:07:07.680 33 uh so well maybe that's not entirely our fault because you know um web apps
00:07:14.240 have database requests and obviously widget cannot make your database requests uh run any faster it it just
00:07:22.479 can't do that because it's not a database JIT um but still there's also
00:07:28.000 this this this microbenchmark that went around Twitter which is like a billion nested loop iterations and um well this
00:07:35.360 is actually Ruby without without Yet um with YJ it's it's about the same speed
00:07:40.720 as uh as PHP which you know it's nice like with Y we can double the speed of
00:07:47.919 this microbenchmark and we can say like haha take that take that Python but
00:07:54.680 uh but if you compare uh you know even YJ to to the JavaScript JIT like even
00:08:02.080 even if Ruby can do in 10 seconds like it's like an order of magnitude difference and this is what the the
00:08:08.560 microbenchmark looks like it's like two nested loops it does something with an array so obviously you can look at this
00:08:13.759 benchmark and you can say well that's not that's nothing like Rails that's not it's not a real benchmark like like you
00:08:21.520 don't run this in production obviously uh you know this micro benchmark is is
00:08:27.120 not it's not at all representative but at the same time if this benchmark is so basic uh why do we perform so poorly
00:08:34.640 because obviously like um it's a very simple benchmark like we should be able to run this faster and
00:08:40.880 probably your real applications has loops inside of it too so if we could speed up speed up code like this then
00:08:47.279 probably we could make uh all Ruby code run faster so we could do better and uh
00:08:53.360 I yeah I really I really want us to do better like I'm I'm sorry we I'm sorry
00:08:58.560 we couldn't make those those micro mesh works run faster so I asked Chad GPT how this is story in Japan and it said
00:09:06.839 u just kidding um so obviously JavaScript is more
00:09:13.760 faster on this microbenchmark why couldn't Ruby match JavaScript speed on those simple loops we should make that
00:09:18.839 happen that's not possible everywhere right it's like even if we make a better Ruby JIT we can't we can't guarantee a
00:09:25.040 10x speed up on on everything because for example like how fast how much faster can you make string concatenation
00:09:31.120 with a JIT like I think actually you can make it run faster but you can't make you know me copy run 10x faster but
00:09:38.000 still what if we could double wet speak performance like there's many obvious optimizations that we're currently not
00:09:43.839 doing and those speedups should and would translate into real workloads too because real workloads use loops as well
00:09:51.240 um I think performance is a feature like I've already talked about how people were so motivated to upgrade to Ruby to
00:09:57.680 retrieve for performance improvements history tells us we can never have enough compute i also think to some
00:10:03.200 extent like this is important because it's a matter of survival like if you want Ruby to survive long term
00:10:08.720 performance definitely matters like Ruby Ruby is a is a cool language it's a great language but if your language is
00:10:13.760 not fast I think it will eventually die just because of the relentless march of
00:10:18.959 progress like there's a relentless push to always optimize performance always reduce costs and in a world where single
00:10:26.640 core performance is is uh al also hitting a plateau uh we need to remove
00:10:32.399 all bottlenecks and so let's not let Ruby ever be a bottleneck let's try to build a truly a world-class Ruby
00:10:40.200 Jet so interesting introducing Zjit for the last two and a half months the YJ team has been working on Zjit it's not
00:10:47.839 just a better Yjit it's a prototype of a next generation Rubyjet that incorporates the learnings from last
00:10:53.519 four and a half years of YJIT works all of the hundreds of experiments that we've done uh both on benchmarks and
00:11:00.000 also using production data all the experience that that the YJ team has i have tremendous confidence in in the
00:11:07.480 YJIT and Zjet team and this jet is going to be designed to be more uh
00:11:12.720 maintainable and extensible uh hopefully hopefully a jet that could last for maybe like the next 20 plus years of of
00:11:18.880 C Ruby so that you know we don't need to uh to keep rewriting better JIT like
00:11:24.320 build something that we can build and maintain and extend for the long term um
00:11:31.040 so first the most important question I can ask how do you pronounce it some people say Zjit personally I prefer Zjit
00:11:37.200 because it sounds more like Zjit you know um so let's talk about Zj's core
00:11:44.440 architecture um why architecture widget directly compiles uh
00:11:49.760 the y by code into machine code it's a relatively simple architecture which is was good for us to get something uh
00:11:56.079 working quickly like like I said earlier originally uh wet was built with a very
00:12:01.279 limited time budget you know like uh three people 9 months uh we grew widget
00:12:07.600 incrementally over time we added a crossplatform assembler which allowed us to support both x8664 and ARM 64 so that
00:12:15.600 makes it possible for uh widget to run on uh new MacBook M1 M2 M3 uh laptops
00:12:22.399 for example uh we've added some uh basic inlining for for very small methods like
00:12:28.000 if if you have a method that returns a constant we can inline that we added custom code gen for core C methods we
00:12:34.480 added context metadata compression to uh make it use uh less memory but it we're getting to a point I think where we're
00:12:40.480 hitting the limits of WIT's current architecture uh it's getting difficult maybe to extend and improve this this
00:12:47.720 architecture uh which is why I think uh we need to to make some changes uh
00:12:53.440 engineering is all about tradeoffs uh finding the best trade-off given uh different constraints so so it's kind of
00:12:59.120 like the classic you know uh fast good and cheap uh pick two was based on lazy
00:13:04.399 busy block versioning which is a JIT architecture based on my my PhD research with the goal of like building a JIT
00:13:10.000 with good performance quickly that was like the aim of of my my research uh but for Zjits uh I want us to build a method
00:13:18.160 based JIT compiler that's not going to be based on lazy busy blog versioning and this is this is a deliberate choice
00:13:25.800 um there's kind of like a question like could we design a more advanced jet based on lazy basic block versioning
00:13:30.880 like I have some ideas on on maybe how we could do that uh but that's that's
00:13:36.240 kind of a research project like in in a way right it's like lazy busy blog visioning came from my PhD research but
00:13:43.040 if if we wanted to uh try to extend like this this kind of like exotic compiler architecture into something more uh we'd
00:13:50.880 be taking some amount of risks because there are many unknowns and I feel like I don't want to uh impose that risk on
00:13:56.959 the team and Shopify and the Ruby community uh because Ruby is not my my personal research project so for for
00:14:03.920 Zjit I want us to make a safer bet and go with a traditional uh established way to build a JIT uh an architecture that
00:14:10.800 has known a known design which helps us to minimize risk and uh means that we
00:14:16.000 have very high like likelihood of success uh the reason we think we have very high likelihood of success is
00:14:21.120 because uh most JITs are are method based jets and we know it works there's
00:14:26.160 no reason why it wouldn't work um so Zj is going to to have a design that I like
00:14:32.480 I want the design to be more standard like something you would read about in in a compiler textbook
00:14:38.320 uh with more standard foundations which doesn't mean that we can't build cool stuff on top but methodbased jets can
00:14:44.560 make for an architecture that's very modular and extensible uh with you know different parts that that we can swap in
00:14:50.240 and out uh as as we uh as we improve the design
00:14:55.600 uh and but yeah the the the benefits would be low risks because it's proven design that we know works and I think
00:15:02.160 this would also make maybe uh Zjit more accessible to newcomers um like a
00:15:08.959 problem that we have right now I think is that um like the knowledge of how YJIT works uh exists mostly at Shopify
00:15:17.600 and uh GitHub but we've had very few uh very few commits from Ruby core contributors so I think if we had a JIT
00:15:24.959 that had a more standard design it might make it easier for example for uh the rest of the Ruby community to to also
00:15:32.160 participate uh in the development of this jet which is good for Ruby long
00:15:37.320 term um standard doesn't have to mean boring like I think once we have stable foundations uh we can do many cool
00:15:43.519 experiments uh my PhD adviser Mark Philly and his PhD student Olivia Melson
00:15:48.880 published about static basic block versioning which is kind of an offshoot of my PhD work and I think this could be
00:15:54.240 done in a methodbased Jet context later if we want to but it can be uh
00:16:00.399 self-contained and a very low risk experiment as opposed to like part of the core foundation of the Jet compiler
00:16:06.440 itself um the next uh significant design change for Zjit is uh it's it's going to
00:16:12.800 have its own intermediate representation um so YJIT compiles Yarve
00:16:18.639 byte code directly into machine code which makes for a very simple architecture but this is not at all
00:16:24.480 optimal for a JIT compiler y byte code is designed for uh the MRI interpreter
00:16:31.040 if you want to maximize the performance of an interpreter basically like you want bigger instructions so you want to
00:16:37.839 uh minimize the dispatch overhead in your interpreter loops right so you don't want lots of tiny instructions you want bigger instructions that do more
00:16:44.360 work uh you also want those bigger instructions because you you can give your C compiler assuming it's written in
00:16:50.560 C a bigger chunks of code to optimize all at once right so bigger instructions
00:16:55.600 means it's better for the interpreter so you end up with like a very CISK design so to speak uh but this is kind of the
00:17:02.000 opposite of what you want for a Jet compiler uh Jet compilers typically have
00:17:07.280 what's called an intermediate representation this is like the the way the compiler internally represents uh code it's the compiler's internal
00:17:14.280 language and what you want from a JIT IR is to decompose a complex semantics into
00:17:19.839 like small uh composable primitives that enable kind of like algebraic um
00:17:25.679 operations and transformations on on code uh you want those instructions to have as little internal control for as
00:17:31.760 possible because that makes the code easier to uh to reason about and to optimize so you want you want something
00:17:38.080 that's more like a minimalistic risk-like instruction set uh if any of you have seen this amazing documentary
00:17:44.240 the key takeaway was that risk is good um so Zj is going to have an SSA based
00:17:50.720 IR ssa means a static single assignment um this is something I learned about in
00:17:56.960 university in a in a compiler class um it was originally developed at IBM in the 1980s and you can read about it in
00:18:04.160 many compiler textbooks it's it's basically like the most widely used compiler IR both for JIT compilers and
00:18:11.120 also for uh many ahead of time compilers it's kind of a de facto standard that's used by LVM and GCC and it's proven to
00:18:18.559 be uh flexible and and robust so again like I talked about having a standard design uh this this also goes uh
00:18:25.679 strongly in that direction um this is the original paper paper from PPP 1988
00:18:32.320 including the the formalism of the SSIR um so just to show you what that
00:18:38.240 might look like so here here's a very small example of a a factorial recursive factorial microbenchmark um the Yarve um
00:18:46.799 by code for for this microbenchmark looks something like this and then um in terms of of this is a dump of of uh
00:18:53.600 Zjit's IR the IR is divided into uh blocks not to be confused with basic
00:18:59.919 block versioning but just like control flow graph blocks uh and it has instructions like fixed num le is fixed
00:19:06.720 num less than so this checking if v5 is less than v2 uh we have if false this is
00:19:11.919 a branching instruction like basically if v8 is false jump to uh bb1
00:19:18.000 fixed num sub would be fixed num subtraction uh send without block direct
00:19:23.200 is is a recursive send where like the factorial uh calls itself and also something you can see is like all of the
00:19:30.080 instructions are annotated uh with type information we also have patch points
00:19:36.000 which allow us to deoptimize code like Okubin talked a little bit about this sort of thing in his talk yesterday and
00:19:42.240 we have guards basically here if V12 is not a fixed num uh exit to the
00:19:48.919 interpreter um now I want to talk a little bit about some some key features that are going to be part of
00:19:55.880 Zjit um so C Ruby one of the things that's challenging about C Ruby and
00:20:01.120 about YJIT is that calls method calls are very expensive uh because they have many corner cases that they need to to
00:20:07.400 handle um the Ruby VM also has two stacks has a a VM stack or value stack
00:20:12.799 and a and a CFP stack so there's lots of work in involved in like setting up a
00:20:17.919 complex CFP object uh widget can make that run faster like actually widget can
00:20:23.600 make microbenchmarks that are basically only calls run like 8 to 12x faster in C
00:20:28.720 Ruby but this is still probably like half of like the peak theoretical throughput that of what's possible and
00:20:35.919 also we have an issue where like we're generating tons of machine code and even though this we can make this
00:20:43.039 run really fast on microbenchmarks like massive code size is not good for your instruction cache it's not good for
00:20:49.360 instruction decoding like it slows things down elsewhere uh so we need to do something about this like other
00:20:54.880 languages like JavaScript have much faster calls uh so Ruby should too so in
00:21:00.799 in Zjit we want to have uh what we call fast JIT to JIT calls uh why JIT calls
00:21:06.080 are suboptimal for multiple reasons one is they use the the JMP like the jump instructions and they use a very long
00:21:12.880 code sequence to set up the VM and CP stacks but to get the maximum performance we want to use the CPU's
00:21:19.120 call and return instructions because those are better optimized and we want to directly use the C stack to uh to
00:21:25.760 make those calls as opposed to the the VM CFP stacks so we're designing Zjit directly to use the the the Cstack for
00:21:32.799 JIT to JIT calls so that means like when the JIT when a Jed function calls another Jed function it's going to use
00:21:38.640 the the C stack uh but when it wants when it needs to exit to the interpreter
00:21:44.080 uh it needs to pay a little bit of a performance penalty to to uh do this side exit so again it's like engineering
00:21:49.919 is all about trade-offs right like the the shinken can go really really fast but it can't make sharp turns but uh I
00:21:57.280 think we we can all agree that high speeded trains is still a very good idea uh so we have some early experiments
00:22:03.360 experimental results takashikubun got uh fast jit calls working in Zjit using uh
00:22:10.080 call and return instructions as as as of just a few days ago um and we we got the
00:22:16.880 recursive Fibonacci uh benchmark working um you shouldn't this is not how you
00:22:22.880 should compute Fibonacci numbers it's just like an exercise to test the performance of a of a method calls in in
00:22:29.280 Zjit so this is what the microbenchmark looks like um and as as you can see here
00:22:35.200 um y is about 10 times faster than the interpreter on this microbenchmark uh
00:22:40.559 and then zit is like another 60% faster i think we can even make it run faster and this we might be able to be like
00:22:46.880 twice as fast uh as as y eventually so this is very very encouraging uh the
00:22:53.919 second key new feature of Zjit um JIT compilation is kind of a balancing act um your JIT compiler has to juggle
00:23:01.360 multiple trade-offs one is the performance of the generated code the other one is the warm-up time then there's the memory overhead how much
00:23:07.440 memory the J compiler uses so your JIT compiler runs at the same time as your program which means the Jet compiler is
00:23:13.600 kind of like competing with the running program for for resources so JIT compilers effectively operate in like a
00:23:20.240 resource constraint environment where you need to minimize the amount of of memory and CPU time that your JIT uses
00:23:27.600 uh because otherwise you're taking those resources away from the running program but by optimizing to reduce memory and
00:23:34.400 CPU usage you're kind of limiting how effective your JIT compiler can be uh
00:23:40.000 and then we have another problem like Shopify has big uh has a certain number of servers bazillions of servers running
00:23:45.520 in large data centers and we're redeploying code sometimes multiple times a day on these three bazillion
00:23:51.400 servers but at the same time it's like when you're redeploying code in production probably 99.99% of the code
00:23:57.679 that you're deploying has not changed you know it's like maybe someone made a poll request and they changed they
00:24:04.960 changed like the variable food to be named bar and so it's like okay let's
00:24:10.440 redeploy all this code on lots of servers and like okay like recompile everything on all of those servers uh so
00:24:17.120 which just seems really inefficient right so what if we could save and reuse compilation work what if we could
00:24:23.200 serialize and persist machine code if we could do that then we could maybe hit the ground running and warm up much
00:24:28.720 faster than we could with uh with YJIT which also means potentially we could
00:24:34.000 spend more time compiling code uh and enable higher optimization levels because if you can JIT once and reuse
00:24:40.559 that work many many times later then it becomes worthwhile to spend more time
00:24:45.840 jitting in the first place uh so there's some challenges here like we need to save not just code but also metadata we
00:24:52.080 need to save the optimization information uh and also compile code can contain pointers to IC6 or Ruby strings
00:24:59.039 uh so we need some kind of like a mini a mini linker to handle these things so it's not trivial but but it is possible
00:25:05.919 there are JITs that do this uh already and if we could make this work in in Zjit this this could again be kind of a
00:25:13.000 gamecher uh so what is the current status of Zjits um well we started
00:25:18.960 development just like two and a half months ago we've already implemented our custom SSIR we have comparisons and fix
00:25:25.360 operations working control flow working fast jittojit calls constant and type propagation dead columination uh we
00:25:32.720 after two and a half months we can run simple microbenchmarks like the ones I've just showed you and we're faster
00:25:37.919 than the interpreter and even faster than y on some of those microbenchmarks but obviously like this is still very
00:25:44.080 early innings uh at this point um we've also opened a proposal to upstream
00:25:50.240 Zjgetjet to make development uh easier we've discussed this with uh the Ruby
00:25:55.440 core members which were uh very receptive uh Matt has expressed uh that
00:26:00.720 he has complete confidence in in the ZG team and so he's he's in uh in favor of
00:26:06.840 upstreaming so thank you Matt for the vote of confidence um upstreaming should probably happen in
00:26:13.440 the next three to five weeks would be my expectation we uh we have a command line
00:26:18.559 option to enable Zjit as with YJIT so Ruby-ZIET uh if you're interested you
00:26:24.720 should keep in mind that right now Zjit is not at all in in a usable state for u production or any like real workloads i
00:26:31.760 would expect that it's become is going to become more usable around like the second half of this year because right
00:26:36.799 now we're just like in a process of laying down the foundations but we're making really rapid
00:26:41.880 progress so what to expect for Ruby 3.5 uh the plan is for YJIT to still be
00:26:48.000 available in Ruby 3.5 because we want to guarantee that Ruby 3.5 uh ships with a working JIT so we're not
00:26:55.600 going to remove Y until we're confident that Zjit is faster and just as stable
00:27:00.640 but for now it's like we're putting uh YJ into mainten maintenance mode so that we can dedicate all of our all of our
00:27:08.159 mental energy um and all of the resources that we have to building uh this this much better uh much more
00:27:14.720 forward-looking JIT um when Ruby 2.5 comes out you you may or may not need to
00:27:22.159 uh run configure with enable Zjit to build it that's going to depend on the maturity level of Zjett around the time
00:27:28.480 of the Ruby 3.5 release so we're going to make that call closer to the end of the year uh we believe it should be
00:27:35.760 possible to have both y and zjit in the same binary uh to be determined but we think it's possible which means like uh
00:27:42.080 for ruby 2.5 you should be able to toggle both like you know ruby-c or and ruby- y if so so basically
00:27:50.240 like you shouldn't need to change anything in your production deployment but you should also be able to to try zit um kind of like in in a in a beta
00:27:58.399 state maybe um we're hoping to match zit uh w's performance for for this release
00:28:05.120 uh I think it's very likely we're going like my expectation is that we're going to start to be wed on more and more
00:28:12.159 benchmarks as the year progresses and hopefully hopefully you know wipe the
00:28:18.000 floor at the end of the year but obviously it's like we we can't necessarily like guarantee the the pace
00:28:23.120 at which things are are going to go so beating wget on larger more realistic benchmarks would take a little bit of
00:28:28.399 time uh the next steps so fast digit calls are already like working thanks to
00:28:34.720 Kokubin uh after that we're going to add the ability to side exit to the interpreter next few weeks uh which
00:28:40.320 Google explained in his talk that's going to enable us to more extensively test uh zjit with run the uh test suites
00:28:47.440 and also to run benchmarks on speed.org so what I what I I want is for like the second half of this year I'd like to
00:28:54.080 start tracking you know like oh Z is outperforming Wit on this many benchmarks out of this many you know
00:29:00.720 kind of like gify the thing a little bit for for the team make it fun uh we're going to gradually grow the feature set
00:29:07.200 of like what CJ just supports which is a lot of work um but it you know we'll get
00:29:13.279 through it and we're going to measure and optimize uh the compilation speed uh to make sure that it runs fast and uh
00:29:18.640 tune the compiler uh so that's it for me thank you for listening uh if you want
00:29:24.799 to learn more if you want to follow our work uh you can go subscribe to the Rails at Scale blog or check it
00:29:31.279 periodically you can also come talk to us after this talk and I should also know that uh Shopify's Ruby and Rails
00:29:37.200 infrastructure is hiring uh we're hiring compiler experts for Zjit we're hiring
00:29:42.480 uh C and Rust systems programmers as well as worldclass uh Ruby and Rails experts if you're interested uh in
00:29:49.840 working as part of uh uh as part of Ruby and Rails infrastructure uh you can use our QR
00:29:56.880 code to apply uh if you're interested and uh yeah that's it for me thank you
Explore all talks recorded at RubyKaigi 2025
+66