Summarized using AI

Ruby Internals: A Guide For Rails Developers

Matheus Richard • July 08, 2025 • Philadelphia, PA • Talk

Introduction

This talk, delivered by Matheus Richard at RailsConf 2025, explores the internal workings of the Ruby programming language, specifically aimed at Rails developers. Without requiring C experience, it demystifies how Ruby interprets, parses, and executes code, highlighting how these processes impact Rails applications.

Key Points

  • Origins and Necessity

    • Ruby, created in the mid-1990s, is the foundation of Rails. Understanding Ruby’s internals helps Rails developers write better, more efficient code.
  • How Ruby Reads Code

    • The process mirrors how humans understand text: tokenization (lexing), parsing (structuring relationships), and finally evaluation for meaning. Similar stages exist in Ruby’s code processing pipeline.
  • Building an Interpreter (Step-by-Step)

    • Demonstration of constructing a simplified interpreter with:
    • Tokenization: Breaking input into tokens using regex.
    • Parsing: Constructing an Abstract Syntax Tree (AST) by analyzing tokens.
    • Interpretation: Traversing the AST to produce a result by evaluating nodes recursively.
    • Example walkthroughs illustrate tokenizing and parsing numeric expressions, constructing ASTs, and interpreting them for results like a simple calculator.
  • From Tree-Walking to Virtual Machine

    • Until Ruby 1.8, the interpreter directly traversed the AST, making evaluation easy but slow due to non-sequential data structures affecting CPU cache efficiency.
    • Ruby 1.9 introduced compilation to bytecode, interpreted by a stack-based Virtual Machine (VM), improving performance 2-4 times.
    • Compilation flattens trees into sequential instruction arrays, which the VM executes more efficiently using a stack.
  • Real-World Examples and Insights

    • Examples from the Ruby source highlight similarities between the simple interpreted model and Ruby’s real RB_eval C function, demonstrating how constructs (like logical operators) handle control flow.
    • Fast-path optimizations: Ruby creates specialized bytecode instructions for common operations (e.g., adding constants like 0 and 1) to further accelerate execution.
  • The Importance of Parsers in the Ecosystem

    • Rails, Rubocop, editors, and other tools each use different Ruby parsers, leading to fragmentation and lag in adopting new Ruby features.
    • The Prism project aims to unify Ruby parsing, providing a single parser (now default in Ruby 3.4 and used by multiple implementations and tools) for consistency across the ecosystem.
  • Impact of Ruby’s Evolving Internals

    • Ruby’s constant evolution, with multiple JIT compilers now available, can outperform C extensions for certain workloads (evidenced by Shopify’s benchmarks and young rewritten Ruby methods beating older C code).
    • The migration toward more Ruby-written internals makes contribution easier and code more maintainable.

Conclusion

  • Understanding Ruby’s internals—tokenization, parsing, bytecode compilation, and VMs—not only illuminates Rails’ capabilities but helps developers write more efficient, maintainable applications.
  • Ecosystem initiatives like Prism and advances in JITs indicate a future with faster, more unified Ruby tools.
  • Developers are encouraged to explore beyond frameworks and gems, deepening their language knowledge for performance debugging and optimal code choices.

References and Further Learning

  • Cited the book "Crafting Interpreters" as a practical resource.
  • All the demo code is shared via a public GitHub repository.

Ruby Internals: A Guide For Rails Developers
Matheus Richard • Philadelphia, PA • Talk

Date: July 08, 2025
Published: July 23, 2025
Announced: unknown

Ever wondered how Ruby takes your code and runs it? It’s time to go on a journey into Ruby internals designed especially for Rails developers! We’ll learn about all the parts of an interpreter, and how their inner-workings affect our Rails apps! No experience with C is required!

RailsConf 2025

00:00:17.039 Hello, Railscom. How y'all doing?
00:00:21.039 Great.
00:00:22.880 So, my name is Matias. Uh, I work for a
00:00:25.279 company called ThoughtBot. You might
00:00:26.720 know us from our open source work, our
00:00:29.279 blog. If you use factorybot, that's us.
00:00:33.680 We are on a conference about Rails. In
00:00:35.840 fact, the last Rails conf. But way
00:00:38.640 before Rails existed in the middle of
00:00:41.280 the '90s, Japanese men has this divine
00:00:44.000 revelation and created this language and
00:00:46.800 we call it Ruby. Without Ruby, there
00:00:49.280 will be no Rails. So as a way to
00:00:51.920 celebrate Rail's history, we'll dive
00:00:54.399 into Ruby today and understand its
00:00:56.879 internal workings and how it affects our
00:00:59.280 lives.
00:01:00.879 There will be a fair amount of code in
00:01:02.960 this slides. Don't worry about it. I'll
00:01:05.199 share the slides later if you need to.
00:01:07.280 Just think about the big picture.
00:01:10.720 Before we start looking at how Ruby
00:01:13.760 understand our our code, let's think
00:01:16.159 about how we process text.
00:01:19.200 Take this sentence as an example. When I
00:01:22.080 see this sentence, my brain kind of
00:01:24.320 divides it into different tokens. And
00:01:28.159 for example, when I see the token like
00:01:29.920 the colon here, I know that what comes
00:01:32.640 after the colon is explaining what's
00:01:35.280 before the colon or when I see the
00:01:37.840 apostrophe s here, I know that the thing
00:01:40.320 after that belongs to the thing before
00:01:42.560 that. So my brain separates a list of
00:01:46.240 tokens and we kind of get the
00:01:49.920 relationships between those tokens as
00:01:51.840 well. And after this process which is
00:01:54.880 very fast in our brains, we try to
00:01:56.479 understand the meaning of it if there's
00:01:58.719 any meaning because not everything
00:02:01.360 that's parsible has a meaning. So this
00:02:04.240 sentence as an example, I don't know
00:02:06.719 what it means.
00:02:09.039 And with Ruby is the same thing. Ruby
00:02:11.760 reads our code,
00:02:13.920 it splits it into a list of tokens, then
00:02:16.959 it builds a structure that with a
00:02:19.200 relationships between those tokens.
00:02:22.640 And again, not everything that's
00:02:24.400 parsible makes sense. So for example, in
00:02:27.280 this example, the method is the
00:02:29.520 definition is correct and the call is
00:02:31.440 correct grammarwise, but it doesn't have
00:02:34.080 a meaning in runtime.
00:02:36.319 So to understand all of this more
00:02:38.080 deeply, we'll create an interpreter here
00:02:40.160 right now. It's a very simple one, but
00:02:42.640 it does everything that a normal
00:02:44.560 interpreter would do.
00:02:47.040 So I'll start presenting the language.
00:02:49.200 And the language is very simple for now,
00:02:51.360 but it would get increasingly more
00:02:53.120 complex.
00:02:54.720 At this point, our language, our
00:02:56.480 programs are just numbers. And by
00:02:58.400 number, I mean a character from 0 to 9.
00:03:01.040 That's only the only thing that our
00:03:02.959 language ex extends.
00:03:05.680 Um so let's start lexing which means
00:03:08.480 also called tokenizing. Tokenizing it's
00:03:12.080 taking the code and outputting the list
00:03:14.159 of tokens. So because the language is so
00:03:17.120 simple the tokenizer here will will be
00:03:19.760 very simple too. All we do is we use a
00:03:22.159 rejects to get all the words and numbers
00:03:25.599 and special characters like plus sign
00:03:27.920 minus sign from the the string.
00:03:31.680 And we'll also create this language
00:03:33.360 module which is the entry point for our
00:03:35.440 language. So whenever we run want to run
00:03:37.599 our language we'll call language call
00:03:39.680 with a code for for now all we do is
00:03:43.040 tokenize it. So again the tokenizer gets
00:03:46.000 some code and outputs a list of tokens
00:03:48.879 out.
00:03:50.799 Even though we have the tokens out we
00:03:52.879 are not ready to parse to understand its
00:03:55.280 information yet. So take this expression
00:03:58.159 as an example. Because of math, we know
00:04:01.360 that we have to do the left uh addition,
00:04:04.400 the left subtraction first and then the
00:04:06.159 addition. Because if we did it the
00:04:08.319 reverse way, we would get a different
00:04:10.640 result. So the order of operations
00:04:13.439 matter and we need some kind of
00:04:15.120 structure that tells us the order of
00:04:17.199 operations and that is called parsing.
00:04:21.280 So parsing the parser is similar to our
00:04:24.400 tokenizer, but it receives tokens
00:04:26.240 instead of the raw string and it doesn't
00:04:28.800 work.
00:04:30.400 And again this is the language that we
00:04:32.080 are working it with and we'll try to
00:04:34.639 make our parser code look exactly like
00:04:37.360 the grammar. So we have the call method
00:04:40.160 and it has a program method and the
00:04:42.160 program is a number and inside this
00:04:44.960 number method will parse a number. So
00:04:48.639 first thing we do is we advance to get a
00:04:50.800 token and by advance I just mean
00:04:52.639 grabbing the first token from the list
00:04:55.600 and then we check if it's nil then we'll
00:04:58.160 raise an error because our language
00:05:00.000 requires a token a number then we check
00:05:04.080 if it doesn't match the rejects for a
00:05:06.080 digit then it's not a digit so we raise
00:05:08.160 an error again all we care is number
00:05:11.600 but if it indeed is a number then we
00:05:14.320 will return a node which in this case is
00:05:16.639 just a hash with a type number and the
00:05:19.280 value is that token converted to an
00:05:21.680 integer. So that's it. In 10 lines of
00:05:24.400 code, we made a significant design
00:05:26.800 decision for our language. All numbers
00:05:28.720 are integers. If we had for example
00:05:31.759 division, it would be division for
00:05:34.080 integer numbers.
00:05:36.720 Okay. So we add our parser now to our
00:05:39.440 language module. And let's make this
00:05:42.160 language a little more complex by adding
00:05:44.479 addition. So the grammar is now like
00:05:47.280 this. A program is a term and a term is
00:05:50.160 either a number like we had before or a
00:05:53.280 number a plus sign and another number or
00:05:56.160 maybe a number a plus sign another
00:05:57.840 number a plus sign another number and
00:05:59.440 you know where this is going. So because
00:06:01.440 we can have multiple infinite sized
00:06:04.160 addition uh we can write the grammar
00:06:05.840 like this. A term is either a number and
00:06:08.800 optionally any number of plus signs and
00:06:12.160 other numbers.
00:06:14.479 So we'll reflect the grammar in the
00:06:16.560 code. So now program calls term and
00:06:20.160 inside term we try to parse a number
00:06:22.400 like we did before with the number
00:06:24.319 method. Then we check does the next
00:06:27.520 token matches
00:06:29.919 a plus sign.
00:06:32.080 If it doesn't match it's just a number
00:06:33.600 we return it. But if it matches a plus
00:06:35.680 sign then we'll try to parse an addition
00:06:37.600 here. So we advance again and that will
00:06:40.880 return the operator which is the plus
00:06:42.720 sign in this case. And then we try to
00:06:45.120 parse a second number.
00:06:47.600 And now instead of returning a single
00:06:49.759 number node, we return a node of the
00:06:51.840 type binary. And the binary node has the
00:06:54.479 operator just the plus sign. And on the
00:06:57.039 left side the first number, and on the
00:06:59.039 right side, what would be the second
00:07:00.800 number?
00:07:02.880 We could make this language more complex
00:07:04.880 by adding subtraction as well. And to
00:07:07.919 support it, it's pretty simple. Now we
00:07:10.000 just just look for a plus sign or a
00:07:12.800 minus sign.
00:07:14.880 So to understand this more deeply, let's
00:07:17.039 walk step by step on how we tokenize and
00:07:20.160 parse this language, this expression. So
00:07:23.680 everything starts on the language
00:07:25.280 module, receive the code and we call
00:07:27.599 tokenize with that string and it will
00:07:30.800 return for us the tokens
00:07:33.440 and then we parse that. So inside parser
00:07:36.880 we initialize the tokens with a list of
00:07:39.520 tokens that we received. We call program
00:07:42.560 then term and inside term we call
00:07:45.120 number. And now inside number we advance
00:07:48.080 to get a token. So we got a token.
00:07:51.520 It's not nil. It is uh matching the
00:07:54.639 reject for a digit. So we'll return now
00:07:57.199 a node of type of number with that value
00:08:00.240 converted to an integer. So there you
00:08:02.319 go. We get the first expression. Now we
00:08:05.199 check does the next token match a plus
00:08:07.360 sign or a minus sign. Yes. So we enter
00:08:10.319 the while loop. We advance. We get the
00:08:12.720 operator. This case the operator plus.
00:08:15.440 And then we try to parse a second
00:08:17.039 number.
00:08:18.720 And we run the the code again. But now
00:08:20.879 with the we receive the number two.
00:08:24.160 And we'll return now the node binary.
00:08:28.240 And just replace the values here. The
00:08:30.800 operator is plus. first expression is
00:08:33.440 the number one and the second expression
00:08:35.839 is the number two and that's it. Uh we
00:08:39.440 just built that tree that structure that
00:08:42.479 I mentioned that had the order of
00:08:44.000 operations.
00:08:45.680 We call this an abstract syntax tree
00:08:48.640 also called as for short. But how do we
00:08:52.080 execute it? How do we run this language?
00:08:55.519 Well, we add another step to our
00:08:57.839 language. We tokenize the code. Then we
00:08:59.839 parse it. Now we're going to interpret
00:09:01.839 it. And to interpret uh the a we receive
00:09:06.160 it the note uh and we check its type. If
00:09:09.360 it's a number then we just return its
00:09:11.920 value. But if it's a binary node then
00:09:15.360 what we do is we interpret the left side
00:09:18.000 recursively. So that get us the the left
00:09:21.040 side and then we interpret the right
00:09:23.120 expression.
00:09:24.640 And lastly, we'll do the send call using
00:09:27.279 the left side and passing as arguments
00:09:29.519 the operator and the right side. And
00:09:32.160 that's it. We did it. Uh we just built a
00:09:34.880 very simple interpreter. But again,
00:09:37.360 let's walk step by step of of how this
00:09:39.920 is interpreted. So using that binary
00:09:43.120 note that we received from parsing,
00:09:45.360 let's walk step by step. Here we check
00:09:47.680 the node type. Uh is it a number? No,
00:09:50.560 it's a binary expression. So we go to
00:09:52.800 the binary expression branch and then
00:09:55.200 now we try to interpret the left side.
00:09:57.519 So this will call the function
00:09:59.120 recursively. So we are now interpreting
00:10:02.000 the left side. We check its type. It's a
00:10:05.519 number now. So we'll return its value in
00:10:07.920 this case one. So on the left side we
00:10:10.160 have one. Now we interpret the right
00:10:12.640 side and it's the same thing. And we'll
00:10:14.720 get two back. And now we do that send
00:10:18.079 expression. So this is basically one
00:10:21.600 send plus with two because addition is a
00:10:25.519 method in Ruby we can use send to call
00:10:27.760 it and if you do it your language will
00:10:31.040 return three
00:10:33.760 and you might be thinking at this point
00:10:35.839 is that really it seems too simple to be
00:10:39.120 true but that's actually how Ruby worked
00:10:42.079 until the version 1.8 8. So over 10
00:10:44.800 years of Ruby it worked exactly like
00:10:47.040 that. And to show you let's see how Ruby
00:10:49.920 used to interpret the number 44 42 sorry
00:10:53.200 only that and that will be some code now
00:10:56.800 it's C. So I don't know if your saw C
00:10:59.839 code before it's fine.
00:11:02.959 So Ruby had this RB evolve function. It
00:11:07.279 received a node
00:11:09.600 and first thing it does is defining this
00:11:11.920 label. Labels in C are like checkpoints
00:11:14.880 in the code. We'll see how this works
00:11:17.519 later.
00:11:19.360 Um first thing we do is we check the
00:11:21.920 note type here. So there's this big
00:11:24.399 switch statement with several cases and
00:11:27.120 for the number 42 we fall into the node
00:11:29.680 lit case
00:11:31.920 and node lit here is similar to our
00:11:34.399 number type. Uh so we get the result
00:11:38.000 from the NDE attribute
00:11:41.360 and we break basically return from the
00:11:44.000 the the switch case and that's it. It's
00:11:47.279 not too different from what we did
00:11:48.959 before with the case statement and the
00:11:51.360 number node and the value. So let's try
00:11:54.640 something a little more complex. Let's
00:11:56.480 try this and expression. So again we are
00:12:00.079 inside the RB eval function. We switch
00:12:03.680 on the node type. There's a bunch of
00:12:05.519 cases but eventually we hit the node end
00:12:09.120 and first thing uh after entering the
00:12:12.000 branch we evolve the nd first attribute
00:12:14.959 that's similar to our left attribute on
00:12:17.680 a hash. So this call the function
00:12:19.760 recursively interprets the left side.
00:12:23.519 Then Ruby checks if the left side is
00:12:27.440 either new or false then break stop
00:12:30.880 executing. But if it's truthy then grab
00:12:34.160 the right side and the second here and
00:12:37.600 go to again remember that label that I
00:12:39.920 talked about. Oh, in see the label when
00:12:43.040 you use go to you basically makes the
00:12:45.600 code start executing where you pointed
00:12:48.480 at. So the code will go back to the
00:12:50.240 again label and starts switching on the
00:12:53.440 node type again but now we are
00:12:54.959 interpreting the right side. So this is
00:12:58.079 why in a language like Ruby even though
00:13:00.079 you have something on the right side
00:13:01.760 that would be an error like summing a
00:13:04.160 number and a string because the left
00:13:07.120 side is falsy we don't even execute
00:13:09.680 that. So there's no errors happening
00:13:11.920 here. And now you know how this is
00:13:13.440 implemented.
00:13:15.120 What good about this interpreter is that
00:13:17.680 it's very easy to be the one. That's why
00:13:20.720 Matt chose this architecture.
00:13:23.760 But what's bad about it, it's that it's
00:13:26.079 super slow and it's slow because of how
00:13:29.519 we represent this data. If you've been
00:13:31.839 to the caching uh talk before this one,
00:13:35.279 you probably know why. CPUs like
00:13:38.480 sequential data and when we store our
00:13:41.680 data in that hash that tree our data is
00:13:44.800 all spread out in memory and that really
00:13:46.800 hurts performance. To give you an
00:13:49.279 example to access the L1 cache in your
00:13:52.160 CPU should take about one ncond
00:13:55.680 for the L2 cache it's about four nconds
00:14:00.160 and to access the RAN which is where our
00:14:02.320 data lives it's over 100 nconds. So it's
00:14:06.399 pretty slow. So that's why in Ruby 1.9
00:14:10.079 after 14 years of this first
00:14:12.320 architecture we Ruby changed and it
00:14:15.360 changed its interpreter to a compiler
00:14:17.839 and now the compiler receives the a and
00:14:20.639 compiles it into byte code which are a
00:14:23.199 list of instructions and those
00:14:25.199 instructions are run by a VM and this
00:14:28.480 made will be two to four times faster on
00:14:31.279 average.
00:14:33.120 So instead of a three like this, we need
00:14:36.720 a data structure that is more sequential
00:14:39.199 where the data is packed sequentially
00:14:41.519 and that that is a lot easier for CPUs
00:14:44.240 to run.
00:14:46.000 So we need to flatten that tree and to
00:14:48.480 do that it's fairly simple. We'll walk
00:14:51.600 this tree starting at the root. Uh we
00:14:54.000 first evaluate the left branch and we'll
00:14:56.399 generate an instruction for that. Then
00:14:59.199 we'll go to the right right branch and
00:15:01.440 generate another instruction for that.
00:15:04.000 And lastly for the root node we generate
00:15:06.240 an instruction and that's how we'll go
00:15:08.480 from a tree to a array. So instead of
00:15:12.480 interpreting the code right away now
00:15:14.399 we'll compile it and to compile we'll
00:15:16.800 create a compiler. Don't worry.
00:15:20.160 Oh compilers are scary but this one is
00:15:22.560 very simple. We'll receive the a from
00:15:24.959 the parsing phase and we'll return
00:15:27.680 instructions an array of instructions.
00:15:30.320 And to create those instructions again
00:15:32.079 we we check the the a type and for a
00:15:35.760 number for example we generate a put
00:15:38.639 object instruction with the value of
00:15:40.959 that number note. But if it's a binary
00:15:43.839 note you might guess it we'll call this
00:15:46.399 function recursively on the left side.
00:15:48.560 Generate instructions for the left side.
00:15:51.199 Then we generate instructions for the
00:15:53.040 right side. And lastly, we'll generate
00:15:55.600 this send instruction with the operator
00:15:58.880 and that will give us an array like
00:16:01.120 this.
00:16:02.800 But how do we run this? Now like I said,
00:16:05.759 we will use a virtual machine and like
00:16:08.160 Ruby, we'll use a stack based virtual
00:16:11.120 machine. So after compiling the code,
00:16:14.399 we'll run it and we'll call this VM.
00:16:18.560 So the VM receives the instructions and
00:16:21.279 like I said it's a stackbased VM. So we
00:16:23.920 initialize a stack. Ruby doesn't have a
00:16:26.079 stack data structure but we can use just
00:16:28.480 an array. We do some work of the with
00:16:31.759 the instructions and the last step would
00:16:34.240 be popping the last value from the stack
00:16:36.959 returning that. Let's check the middle
00:16:39.519 bit. So for each instruction, we check
00:16:43.759 uh it a case expression and if it's a
00:16:46.639 put object instruction, we push that
00:16:49.199 value onto the stack. But if it's a send
00:16:52.800 instruction, we'll do something
00:16:54.160 different. We pop a value from the stack
00:16:56.880 that will be on the the right side. Then
00:16:59.600 we pop a second value from the stack.
00:17:01.519 That's the left side. Then we do that
00:17:04.000 send operation that we had before again
00:17:06.319 but now we push that value back onto the
00:17:09.360 stack.
00:17:11.280 So once more let's walk step by step
00:17:13.760 with a real example. So this is are the
00:17:16.079 instructions for oneplus 2. So we check
00:17:18.880 the instruction type. The first
00:17:20.720 instruction is put object one and
00:17:24.640 then we push its value onto the stack.
00:17:26.799 So the stack now contains the number
00:17:28.720 one. Second instruction is put object
00:17:31.280 with two. Again we check its type, grab
00:17:33.919 its value, its put object and we push
00:17:36.720 the value onto the stack. So now the
00:17:38.559 stack contains one and two. Third and
00:17:41.679 last instruction is the send
00:17:43.440 instruction. So we check uh its branch
00:17:47.039 is the send branch. We get the operator.
00:17:50.320 We pop a value from the stack. So we get
00:17:52.559 the number two on the variable right. Uh
00:17:54.960 we pop a second value from the stack.
00:17:56.880 That's the left side.
00:17:59.120 and we calculate the result using the
00:18:02.000 send method and we got three back and we
00:18:06.559 push three back onto the stack. So this
00:18:09.520 stack now contains the number three and
00:18:12.720 like I said before we pop the last value
00:18:15.039 from the stack as the last step and if
00:18:17.760 you do that it will pop three and our
00:18:20.080 language still returns three.
00:18:24.000 What's cool about Ruby is that you can
00:18:26.080 see these instructions for yourself. If
00:18:28.880 you call Ruby with some code and the
00:18:31.520 d-dump instruction option, you will get
00:18:34.559 this. If I clean it up a little bit and
00:18:37.360 zoom in, you get this. These are the
00:18:39.360 instructions for 10 up to 20.
00:18:44.000 First instruction is put object with 10.
00:18:46.640 Then put object with 20. Then we have
00:18:49.360 this opt send without block with the up
00:18:52.720 to argument. And lastly, e leaves, which
00:18:55.919 basically returns.
00:18:58.720 So in a nutshell, that's how Ruby works
00:19:00.960 since 1.9
00:19:03.360 until we introduce JIT compilers. And at
00:19:07.280 this point, there's a lot of them.
00:19:10.240 There's four JIT compilers at this point
00:19:12.559 in Ruby histories.
00:19:15.280 Uh so it's similar to what we had
00:19:18.000 before. So instead of just having a
00:19:20.559 compiler that compiles the EST into byte
00:19:22.880 code, now we have a second compiler that
00:19:25.520 compiles the byte code into machine
00:19:27.919 code. So instead of this compiler that
00:19:30.720 we created, we also have a compiler for
00:19:33.679 assembly, which is kind of like this.
00:19:37.679 And I know what you might be thinking at
00:19:39.600 this point. Uh yeah, this is cool and
00:19:41.440 all, but I'm a I'm a Rails dev. I'm not
00:19:44.240 writing a compiler. Why should I care
00:19:46.960 about parsers?
00:19:49.280 and one because it's fun. But parsers
00:19:52.240 aren't just for compilers. If you use
00:19:54.480 Rails, and I assume you do, or IRB or
00:19:58.000 Rubocop standard, if you use the VS Code
00:20:01.200 extension for Ruby or really any of
00:20:03.360 these gems, you are using a parser. In
00:20:06.799 fact, you are using several different
00:20:08.559 parsers. And that difference affects us
00:20:12.000 on our daily work. You know when Ruby
00:20:14.640 adds a new syntax and then Rubocop
00:20:17.039 doesn't understand it right away or your
00:20:20.320 editor thinks it's a grammar error. It's
00:20:23.280 because of this. Each tool has a
00:20:25.440 different parser. So when Ruby adds new
00:20:27.760 features, everyone has to catch up with
00:20:29.440 a new thing. This is also true if you
00:20:32.080 are using a different Ruby
00:20:33.280 implementation like truffle Ruby or J
00:20:35.120 Ruby. Everyone has to catch up to Sir
00:20:37.600 Ruby. That's why they created this new
00:20:40.799 parser. You might have heard about it.
00:20:42.799 Prism. The idea of Prism was to be a
00:20:46.320 single parser that could handle it all.
00:20:50.000 It's a parser that is using the Ruby LSP
00:20:53.200 extension
00:20:54.799 and since last year actually it's the
00:20:57.280 default parser for C Ruby. So if you are
00:20:59.440 using Ruby 3.4 you are using Prism. So
00:21:03.120 now we have this one parser. using C
00:21:05.840 Ruby but at this point it's already
00:21:07.360 using J Ruby truffle Ruby Natalie Opal
00:21:11.360 and several gems because it can do C it
00:21:14.240 can do it in Ruby. It's very powerful.
00:21:16.720 So maybe one day instead of this
00:21:18.320 fracture ecosystem will just have one
00:21:21.440 single parser and whenever Ruby adds a
00:21:24.080 new feature everyone can benefit from
00:21:26.000 that right away.
00:21:28.640 Okay, but why should I care about a VM
00:21:31.039 then? Well, take this example. These are
00:21:34.559 the instructions for the expression two
00:21:36.880 send plus and three like 2 + 3
00:21:40.799 and these are the instructions for 0 +
00:21:43.360 1. Can we spot the difference?
00:21:50.880 The first instructions
00:21:53.200 uh we the first expression has more
00:21:55.120 instructions and the instructions all
00:21:57.600 have arguments while the second
00:22:00.080 instruction for example we don't use
00:22:01.919 pure object for zero and one you have
00:22:04.080 this put object int to fix zero int to
00:22:07.919 fix one these are instructions
00:22:10.960 specifically designed to put one and
00:22:13.679 zero on the stack instead of using sand
00:22:16.799 Ruby uses the opt plus instructions and
00:22:19.840 Ruby that does that because we do we
00:22:21.919 deal with zeros and ones and summing
00:22:24.159 numbers those are very common operations
00:22:26.080 so it has optimized instructions just
00:22:28.880 for that it's called a fast path
00:22:32.080 optimization and we'll do the same thing
00:22:34.559 to our VM to understand how it works so
00:22:37.919 in our compiler specifically in the
00:22:40.080 binary branch we we used to have this
00:22:43.120 now let's say we want to make addition
00:22:45.039 faster for some reason So we check if
00:22:48.400 the operation is an addition and we have
00:22:50.640 a number on the left side and a number
00:22:52.559 on the right side. Then instead of doing
00:22:55.919 what we did before we do this we add the
00:22:58.559 put object instruction but we'll sum the
00:23:01.280 values right away in the compiler and
00:23:03.760 put that on the set. So set in a way we
00:23:07.200 are interpreting inside the compiler
00:23:08.960 now. But if it's not an addition of
00:23:12.159 numbers we'll do what we did before. And
00:23:15.200 if you benchmark this, you'll see that
00:23:17.440 now addition is 20% faster than
00:23:21.200 subtraction. What I'm trying to say that
00:23:23.840 here is that how you write code matters.
00:23:27.760 There's difference between using method
00:23:29.840 missing or define method. There's a
00:23:32.880 difference between while true and using
00:23:34.960 loop. And because of object shapes,
00:23:38.559 there's a penalty if you use too much
00:23:40.960 memorization in your code.
00:23:43.679 What I'm saying is I'm not saying that
00:23:45.919 you don't have to use those features.
00:23:48.559 You can write whatever you want, but
00:23:51.280 know the trade-offs that you are
00:23:53.039 choosing and understanding the Ruby VM
00:23:55.520 helps you to make those decisions.
00:23:59.039 I guess no one would ask uh who cares
00:24:01.360 about Jet just because why JIT? Why JIT
00:24:03.919 makes your Rails code 10 to 30% faster
00:24:06.720 and you don't have to do anything. So
00:24:08.880 yeah, it's very welcoming. It's even
00:24:11.279 enabled in Rails by default at this
00:24:13.440 point. But what I like about YJIT is
00:24:16.559 that its impact is much more than just
00:24:18.799 performance.
00:24:20.559 So Shopify did this benchmark where they
00:24:23.120 benchmark parsing GraphQL queries in
00:24:25.760 Ruby. So they compare pure Ruby with C
00:24:28.880 extension with pure Ruby with YJIT, all
00:24:31.440 the variations that you could have. And
00:24:33.840 what they found out was that writing
00:24:36.720 pure Ruby with YJ was faster than a C
00:24:40.240 extension with YJI.
00:24:42.480 So in a way Ruby is faster than C and we
00:24:46.880 started seeing PRs like this where they
00:24:48.880 re rewrote parts of Ruby from C to Ruby.
00:24:53.200 So you we used to have this method for
00:24:56.320 int times and it got replaced with this
00:24:59.919 which is not only as fast and much
00:25:02.640 smaller but a lot closer to the kind of
00:25:05.200 work that we do do on a day-to-day basis
00:25:08.799 or this other example which is more
00:25:10.480 recent. They rewrote path name mostly in
00:25:13.520 Ruby and not only it was twice as fast
00:25:16.960 but it was a third of the code.
00:25:20.240 So I believe that the future is more
00:25:22.080 Ruby because Ruby is easier. It's easier
00:25:25.679 to read. It's easier to understand. It's
00:25:28.400 easier to write. And I believe that it
00:25:30.799 will be easier to contribute to because
00:25:33.840 it will be written in Ruby, a language
00:25:35.760 that we use every day. So while Ruby,
00:25:39.600 yeah, Ruby is easy, but at the same
00:25:41.760 time, I hope you noticed that it's
00:25:44.000 complex too. Has a parser, has a
00:25:46.320 compiler, several compilers at this
00:25:48.240 point. So to put it into Matt's words,
00:25:52.159 the Ruby creator, Ruby is simple in
00:25:54.720 apparence but very complex in the inside
00:25:57.520 just like the human body.
00:26:00.320 So to give you an example of this
00:26:02.480 complexity, the Prism compiler has 8,000
00:26:06.159 lines of C Ruby Code. The Prism parser
00:26:10.799 has 16,000 lines of C. Yet itself has
00:26:14.880 25,000 lines. And this is what the last
00:26:17.200 time I checked months ago. And if you
00:26:19.919 count all files inside the Ruby
00:26:21.679 repository, you have over 1.5 million
00:26:24.320 lines of code. That's the work of all
00:26:27.279 these individuals, almost 400 people.
00:26:30.240 And I just wanted to take a moment here
00:26:32.320 for all of us to thank the contributors
00:26:34.480 of Ruby. So give it up for the
00:26:36.320 contributors.
00:26:44.400 Yeah, they work really hard to make our
00:26:46.400 lives easier. So to answer again this
00:26:49.679 question, why should you care about all
00:26:51.679 of this? Well, the most important thing
00:26:54.720 is as you go through your path in
00:26:58.240 development, you eventually have to go
00:27:01.279 beyond what you do now. You have to go
00:27:04.080 into this technical deep topics.
00:27:06.159 Sometimes you have to read the gem
00:27:07.760 source code because there are no
00:27:09.760 documentation. You have go you have to
00:27:12.000 go beyond tutorials and blog posts and
00:27:14.080 even official docs. You write your first
00:27:16.480 gem and maybe one day you have to debug
00:27:19.200 a performance issue and you have to dump
00:27:21.440 the C instructions. So I hope this talk
00:27:25.360 is the first step in that journey for
00:27:27.679 you. I hope it helps in unveiling the
00:27:30.640 magic behind Ruby. And if you like this
00:27:33.520 topic, you can take this further.
00:27:35.679 There's this wonderful book called
00:27:37.679 Crafting Interpreters. You will create
00:27:40.159 two interpreters for the same language.
00:27:42.559 The first one is the just like we did uh
00:27:44.880 initially a tree walker. And then the
00:27:47.039 one the second one is a VM. You create
00:27:49.360 everything from scratch. No libraries,
00:27:51.520 no nothing. So it's very fun. And if you
00:27:54.799 want to play with the code that I showed
00:27:56.559 you, it's all here in this repository.
00:27:58.960 It's a GitHub repo at tbot.io/ io/math-
00:28:03.760 interpreter and you can add something to
00:28:06.080 your language. Do whatever you want with
00:28:07.840 it. It's yours now.
00:28:10.320 Yeah. And I hope this has helped you to
00:28:13.279 see how Rails benefits from Ruby, how
00:28:15.840 Ruby made Rails possible. And if it
00:28:19.120 wasn't for Ruby, we wouldn't be here
00:28:20.799 today. And that's what I had for today.
00:28:23.279 Thank you everyone.
00:28:30.159 Do we have the time for questions?
00:28:34.559 One question. Uh they asked why do I
00:28:37.919 want to wanted to learn about this?
00:28:40.640 Well, I started messing with this during
00:28:43.360 the pandemic. So, I had some free time
00:28:46.320 and I found that book uh crafting
00:28:49.039 interpreters. You can read the whole
00:28:50.640 book online for free. And once I I
00:28:54.480 started doing it, I could just couldn't
00:28:56.080 stop. Uh it was fun to like build my own
00:28:59.679 language and like to add whatever I
00:29:01.840 wanted. Uh yeah. So that that that was
00:29:04.399 it for me. That's it. Thank you
00:29:06.880 everyone.
Explore all talks recorded at RailsConf 2025
Ben Sheldon
Sam Poder
Rhiannon Payne
Joe Masilotti
Josh Puetz
Wade Winningham
Irina Nazarova
Tess Griffin
+77