Ruby Internals: A Guide For Rails Developers

Abstract Syntax Tree (AST)

Ruby Internals: A Guide For Rails Developers

Play on YouTube

Matheus Richard

#ruby-internals

#abstract-syntax-tree-ast

#virtual-machine

#just-in-time-jit

Ruby Internals: A Guide For Rails Developers

Matheus Richard • July 08, 2025 • Philadelphia, PA • Talk

Introduction

This talk, delivered by Matheus Richard at RailsConf 2025, explores the internal workings of the Ruby programming language, specifically aimed at Rails developers. Without requiring C experience, it demystifies how Ruby interprets, parses, and executes code, highlighting how these processes impact Rails applications.

Key Points

Conclusion

Understanding Ruby’s internals—tokenization, parsing, bytecode compilation, and VMs—not only illuminates Rails’ capabilities but helps developers write more efficient, maintainable applications.
Ecosystem initiatives like Prism and advances in JITs indicate a future with faster, more unified Ruby tools.
Developers are encouraged to explore beyond frameworks and gems, deepening their language knowledge for performance debugging and optimal code choices.

References and Further Learning

Cited the book "Crafting Interpreters" as a practical resource.
All the demo code is shared via a public GitHub repository.

Ruby Internals: A Guide For Rails Developers
Matheus Richard • Philadelphia, PA • Talk

Date: July 08, 2025
Published: July 23, 2025
Announced: unknown

Ever wondered how Ruby takes your code and runs it? It’s time to go on a journey into Ruby internals designed especially for Rails developers! We’ll learn about all the parts of an interpreter, and how their inner-workings affect our Rails apps! No experience with C is required!

RailsConf 2025

00:00:17.039 Hello, Railscom. How y'all doing?

00:00:21.039 Great.

00:00:22.880 So, my name is Matias. Uh, I work for a

00:00:25.279 company called ThoughtBot. You might

00:00:26.720 know us from our open source work, our

00:00:29.279 blog. If you use factorybot, that's us.

00:00:33.680 We are on a conference about Rails. In

00:00:35.840 fact, the last Rails conf. But way

00:00:38.640 before Rails existed in the middle of

00:00:41.280 the '90s, Japanese men has this divine

00:00:44.000 revelation and created this language and

00:00:46.800 we call it Ruby. Without Ruby, there

00:00:49.280 will be no Rails. So as a way to

00:00:51.920 celebrate Rail's history, we'll dive

00:00:54.399 into Ruby today and understand its

00:00:56.879 internal workings and how it affects our

00:00:59.280 lives.

00:01:00.879 There will be a fair amount of code in

00:01:02.960 this slides. Don't worry about it. I'll

00:01:05.199 share the slides later if you need to.

00:01:07.280 Just think about the big picture.

00:01:10.720 Before we start looking at how Ruby

00:01:13.760 understand our our code, let's think

00:01:16.159 about how we process text.

00:01:19.200 Take this sentence as an example. When I

00:01:22.080 see this sentence, my brain kind of

00:01:24.320 divides it into different tokens. And

00:01:28.159 for example, when I see the token like

00:01:29.920 the colon here, I know that what comes

00:01:32.640 after the colon is explaining what's

00:01:35.280 before the colon or when I see the

00:01:37.840 apostrophe s here, I know that the thing

00:01:40.320 after that belongs to the thing before

00:01:42.560 that. So my brain separates a list of

00:01:46.240 tokens and we kind of get the

00:01:49.920 relationships between those tokens as

00:01:51.840 well. And after this process which is

00:01:54.880 very fast in our brains, we try to

00:01:56.479 understand the meaning of it if there's

00:01:58.719 any meaning because not everything

00:02:01.360 that's parsible has a meaning. So this

00:02:04.240 sentence as an example, I don't know

00:02:06.719 what it means.

00:02:09.039 And with Ruby is the same thing. Ruby

00:02:11.760 reads our code,

00:02:13.920 it splits it into a list of tokens, then

00:02:16.959 it builds a structure that with a

00:02:19.200 relationships between those tokens.

00:02:22.640 And again, not everything that's

00:02:24.400 parsible makes sense. So for example, in

00:02:27.280 this example, the method is the

00:02:29.520 definition is correct and the call is

00:02:31.440 correct grammarwise, but it doesn't have

00:02:34.080 a meaning in runtime.

00:02:36.319 So to understand all of this more

00:02:38.080 deeply, we'll create an interpreter here

00:02:40.160 right now. It's a very simple one, but

00:02:42.640 it does everything that a normal

00:02:44.560 interpreter would do.

00:02:47.040 So I'll start presenting the language.

00:02:49.200 And the language is very simple for now,

00:02:51.360 but it would get increasingly more

00:02:53.120 complex.

00:02:54.720 At this point, our language, our

00:02:56.480 programs are just numbers. And by

00:02:58.400 number, I mean a character from 0 to 9.

00:03:01.040 That's only the only thing that our

00:03:02.959 language ex extends.

00:03:05.680 Um so let's start lexing which means

00:03:08.480 also called tokenizing. Tokenizing it's

00:03:12.080 taking the code and outputting the list

00:03:14.159 of tokens. So because the language is so

00:03:17.120 simple the tokenizer here will will be

00:03:19.760 very simple too. All we do is we use a

00:03:22.159 rejects to get all the words and numbers

00:03:25.599 and special characters like plus sign

00:03:27.920 minus sign from the the string.

00:03:31.680 And we'll also create this language

00:03:33.360 module which is the entry point for our

00:03:35.440 language. So whenever we run want to run

00:03:37.599 our language we'll call language call

00:03:39.680 with a code for for now all we do is

00:03:43.040 tokenize it. So again the tokenizer gets

00:03:46.000 some code and outputs a list of tokens

00:03:48.879 out.

00:03:50.799 Even though we have the tokens out we

00:03:52.879 are not ready to parse to understand its

00:03:55.280 information yet. So take this expression

00:03:58.159 as an example. Because of math, we know

00:04:01.360 that we have to do the left uh addition,

00:04:04.400 the left subtraction first and then the

00:04:06.159 addition. Because if we did it the

00:04:08.319 reverse way, we would get a different

00:04:10.640 result. So the order of operations

00:04:13.439 matter and we need some kind of

00:04:15.120 structure that tells us the order of

00:04:17.199 operations and that is called parsing.

00:04:21.280 So parsing the parser is similar to our

00:04:24.400 tokenizer, but it receives tokens

00:04:26.240 instead of the raw string and it doesn't

00:04:28.800 work.

00:04:30.400 And again this is the language that we

00:04:32.080 are working it with and we'll try to

00:04:34.639 make our parser code look exactly like

00:04:37.360 the grammar. So we have the call method

00:04:40.160 and it has a program method and the

00:04:42.160 program is a number and inside this

00:04:44.960 number method will parse a number. So

00:04:48.639 first thing we do is we advance to get a

00:04:50.800 token and by advance I just mean

00:04:52.639 grabbing the first token from the list

00:04:55.600 and then we check if it's nil then we'll

00:04:58.160 raise an error because our language

00:05:00.000 requires a token a number then we check

00:05:04.080 if it doesn't match the rejects for a

00:05:06.080 digit then it's not a digit so we raise

00:05:08.160 an error again all we care is number

00:05:11.600 but if it indeed is a number then we

00:05:14.320 will return a node which in this case is

00:05:16.639 just a hash with a type number and the

00:05:19.280 value is that token converted to an

00:05:21.680 integer. So that's it. In 10 lines of

00:05:24.400 code, we made a significant design

00:05:26.800 decision for our language. All numbers

00:05:28.720 are integers. If we had for example

00:05:31.759 division, it would be division for

00:05:34.080 integer numbers.

00:05:36.720 Okay. So we add our parser now to our

00:05:39.440 language module. And let's make this

00:05:42.160 language a little more complex by adding

00:05:44.479 addition. So the grammar is now like

00:05:47.280 this. A program is a term and a term is

00:05:50.160 either a number like we had before or a

00:05:53.280 number a plus sign and another number or

00:05:56.160 maybe a number a plus sign another

00:05:57.840 number a plus sign another number and

00:05:59.440 you know where this is going. So because

00:06:01.440 we can have multiple infinite sized

00:06:04.160 addition uh we can write the grammar

00:06:05.840 like this. A term is either a number and

00:06:08.800 optionally any number of plus signs and

00:06:12.160 other numbers.

00:06:14.479 So we'll reflect the grammar in the

00:06:16.560 code. So now program calls term and

00:06:20.160 inside term we try to parse a number

00:06:22.400 like we did before with the number

00:06:24.319 method. Then we check does the next

00:06:27.520 token matches

00:06:29.919 a plus sign.

00:06:32.080 If it doesn't match it's just a number

00:06:33.600 we return it. But if it matches a plus

00:06:35.680 sign then we'll try to parse an addition

00:06:37.600 here. So we advance again and that will

00:06:40.880 return the operator which is the plus

00:06:42.720 sign in this case. And then we try to

00:06:45.120 parse a second number.

00:06:47.600 And now instead of returning a single

00:06:49.759 number node, we return a node of the

00:06:51.840 type binary. And the binary node has the

00:06:54.479 operator just the plus sign. And on the

00:06:57.039 left side the first number, and on the

00:06:59.039 right side, what would be the second

00:07:00.800 number?

00:07:02.880 We could make this language more complex

00:07:04.880 by adding subtraction as well. And to

00:07:07.919 support it, it's pretty simple. Now we

00:07:10.000 just just look for a plus sign or a

00:07:12.800 minus sign.

00:07:14.880 So to understand this more deeply, let's

00:07:17.039 walk step by step on how we tokenize and

00:07:20.160 parse this language, this expression. So

00:07:23.680 everything starts on the language

00:07:25.280 module, receive the code and we call

00:07:27.599 tokenize with that string and it will

00:07:30.800 return for us the tokens

00:07:33.440 and then we parse that. So inside parser

00:07:36.880 we initialize the tokens with a list of

00:07:39.520 tokens that we received. We call program

00:07:42.560 then term and inside term we call

00:07:45.120 number. And now inside number we advance

00:07:48.080 to get a token. So we got a token.

00:07:51.520 It's not nil. It is uh matching the

00:07:54.639 reject for a digit. So we'll return now

00:07:57.199 a node of type of number with that value

00:08:00.240 converted to an integer. So there you

00:08:02.319 go. We get the first expression. Now we

00:08:05.199 check does the next token match a plus

00:08:07.360 sign or a minus sign. Yes. So we enter

00:08:10.319 the while loop. We advance. We get the

00:08:12.720 operator. This case the operator plus.

00:08:15.440 And then we try to parse a second

00:08:17.039 number.

00:08:18.720 And we run the the code again. But now

00:08:20.879 with the we receive the number two.

00:08:24.160 And we'll return now the node binary.

00:08:28.240 And just replace the values here. The

00:08:30.800 operator is plus. first expression is

00:08:33.440 the number one and the second expression

00:08:35.839 is the number two and that's it. Uh we

00:08:39.440 just built that tree that structure that

00:08:42.479 I mentioned that had the order of

00:08:44.000 operations.

00:08:45.680 We call this an abstract syntax tree

00:08:48.640 also called as for short. But how do we

00:08:52.080 execute it? How do we run this language?

00:08:55.519 Well, we add another step to our

00:08:57.839 language. We tokenize the code. Then we

00:08:59.839 parse it. Now we're going to interpret

00:09:01.839 it. And to interpret uh the a we receive

00:09:06.160 it the note uh and we check its type. If

00:09:09.360 it's a number then we just return its

00:09:11.920 value. But if it's a binary node then

00:09:15.360 what we do is we interpret the left side

00:09:18.000 recursively. So that get us the the left

00:09:21.040 side and then we interpret the right

00:09:23.120 expression.

00:09:24.640 And lastly, we'll do the send call using

00:09:27.279 the left side and passing as arguments

00:09:29.519 the operator and the right side. And

00:09:32.160 that's it. We did it. Uh we just built a

00:09:34.880 very simple interpreter. But again,

00:09:37.360 let's walk step by step of of how this

00:09:39.920 is interpreted. So using that binary

00:09:43.120 note that we received from parsing,

00:09:45.360 let's walk step by step. Here we check

00:09:47.680 the node type. Uh is it a number? No,

00:09:50.560 it's a binary expression. So we go to

00:09:52.800 the binary expression branch and then

00:09:55.200 now we try to interpret the left side.

00:09:57.519 So this will call the function

00:09:59.120 recursively. So we are now interpreting

00:10:02.000 the left side. We check its type. It's a

00:10:05.519 number now. So we'll return its value in

00:10:07.920 this case one. So on the left side we

00:10:10.160 have one. Now we interpret the right

00:10:12.640 side and it's the same thing. And we'll

00:10:14.720 get two back. And now we do that send

00:10:18.079 expression. So this is basically one

00:10:21.600 send plus with two because addition is a

00:10:25.519 method in Ruby we can use send to call

00:10:27.760 it and if you do it your language will

00:10:31.040 return three

00:10:33.760 and you might be thinking at this point

00:10:35.839 is that really it seems too simple to be

00:10:39.120 true but that's actually how Ruby worked

00:10:42.079 until the version 1.8 8. So over 10

00:10:44.800 years of Ruby it worked exactly like

00:10:47.040 that. And to show you let's see how Ruby

00:10:49.920 used to interpret the number 44 42 sorry

00:10:53.200 only that and that will be some code now

00:10:56.800 it's C. So I don't know if your saw C

00:10:59.839 code before it's fine.

00:11:02.959 So Ruby had this RB evolve function. It

00:11:07.279 received a node

00:11:09.600 and first thing it does is defining this

00:11:11.920 label. Labels in C are like checkpoints

00:11:14.880 in the code. We'll see how this works

00:11:17.519 later.

00:11:19.360 Um first thing we do is we check the

00:11:21.920 note type here. So there's this big

00:11:24.399 switch statement with several cases and

00:11:27.120 for the number 42 we fall into the node

00:11:29.680 lit case

00:11:31.920 and node lit here is similar to our

00:11:34.399 number type. Uh so we get the result

00:11:38.000 from the NDE attribute

00:11:41.360 and we break basically return from the

00:11:44.000 the the switch case and that's it. It's

00:11:47.279 not too different from what we did

00:11:48.959 before with the case statement and the

00:11:51.360 number node and the value. So let's try

00:11:54.640 something a little more complex. Let's

00:11:56.480 try this and expression. So again we are

00:12:00.079 inside the RB eval function. We switch

00:12:03.680 on the node type. There's a bunch of

00:12:05.519 cases but eventually we hit the node end

00:12:09.120 and first thing uh after entering the

00:12:12.000 branch we evolve the nd first attribute

00:12:14.959 that's similar to our left attribute on

00:12:17.680 a hash. So this call the function

00:12:19.760 recursively interprets the left side.

00:12:23.519 Then Ruby checks if the left side is

00:12:27.440 either new or false then break stop

00:12:30.880 executing. But if it's truthy then grab

00:12:34.160 the right side and the second here and

00:12:37.600 go to again remember that label that I

00:12:39.920 talked about. Oh, in see the label when

00:12:43.040 you use go to you basically makes the

00:12:45.600 code start executing where you pointed

00:12:48.480 at. So the code will go back to the

00:12:50.240 again label and starts switching on the

00:12:53.440 node type again but now we are

00:12:54.959 interpreting the right side. So this is

00:12:58.079 why in a language like Ruby even though

00:13:00.079 you have something on the right side

00:13:01.760 that would be an error like summing a

00:13:04.160 number and a string because the left

00:13:07.120 side is falsy we don't even execute

00:13:09.680 that. So there's no errors happening

00:13:11.920 here. And now you know how this is

00:13:13.440 implemented.

00:13:15.120 What good about this interpreter is that

00:13:17.680 it's very easy to be the one. That's why

00:13:20.720 Matt chose this architecture.

00:13:23.760 But what's bad about it, it's that it's

00:13:26.079 super slow and it's slow because of how

00:13:29.519 we represent this data. If you've been

00:13:31.839 to the caching uh talk before this one,

00:13:35.279 you probably know why. CPUs like

00:13:38.480 sequential data and when we store our

00:13:41.680 data in that hash that tree our data is

00:13:44.800 all spread out in memory and that really

00:13:46.800 hurts performance. To give you an

00:13:49.279 example to access the L1 cache in your

00:13:52.160 CPU should take about one ncond

00:13:55.680 for the L2 cache it's about four nconds

00:14:00.160 and to access the RAN which is where our

00:14:02.320 data lives it's over 100 nconds. So it's

00:14:06.399 pretty slow. So that's why in Ruby 1.9

00:14:10.079 after 14 years of this first

00:14:12.320 architecture we Ruby changed and it

00:14:15.360 changed its interpreter to a compiler

00:14:17.839 and now the compiler receives the a and

00:14:20.639 compiles it into byte code which are a

00:14:23.199 list of instructions and those

00:14:25.199 instructions are run by a VM and this

00:14:28.480 made will be two to four times faster on

00:14:31.279 average.

00:14:33.120 So instead of a three like this, we need

00:14:36.720 a data structure that is more sequential

00:14:39.199 where the data is packed sequentially

00:14:41.519 and that that is a lot easier for CPUs

00:14:44.240 to run.

00:14:46.000 So we need to flatten that tree and to

00:14:48.480 do that it's fairly simple. We'll walk

00:14:51.600 this tree starting at the root. Uh we

00:14:54.000 first evaluate the left branch and we'll

00:14:56.399 generate an instruction for that. Then

00:14:59.199 we'll go to the right right branch and

00:15:01.440 generate another instruction for that.

00:15:04.000 And lastly for the root node we generate

00:15:06.240 an instruction and that's how we'll go

00:15:08.480 from a tree to a array. So instead of

00:15:12.480 interpreting the code right away now

00:15:14.399 we'll compile it and to compile we'll

00:15:16.800 create a compiler. Don't worry.

00:15:20.160 Oh compilers are scary but this one is

00:15:22.560 very simple. We'll receive the a from

00:15:24.959 the parsing phase and we'll return

00:15:27.680 instructions an array of instructions.

00:15:30.320 And to create those instructions again

00:15:32.079 we we check the the a type and for a

00:15:35.760 number for example we generate a put

00:15:38.639 object instruction with the value of

00:15:40.959 that number note. But if it's a binary

00:15:43.839 note you might guess it we'll call this

00:15:46.399 function recursively on the left side.

00:15:48.560 Generate instructions for the left side.

00:15:51.199 Then we generate instructions for the

00:15:53.040 right side. And lastly, we'll generate

00:15:55.600 this send instruction with the operator

00:15:58.880 and that will give us an array like

00:16:01.120 this.

00:16:02.800 But how do we run this? Now like I said,

00:16:05.759 we will use a virtual machine and like

00:16:08.160 Ruby, we'll use a stack based virtual

00:16:11.120 machine. So after compiling the code,

00:16:14.399 we'll run it and we'll call this VM.

00:16:18.560 So the VM receives the instructions and

00:16:21.279 like I said it's a stackbased VM. So we

00:16:23.920 initialize a stack. Ruby doesn't have a

00:16:26.079 stack data structure but we can use just

00:16:28.480 an array. We do some work of the with

00:16:31.759 the instructions and the last step would

00:16:34.240 be popping the last value from the stack

00:16:36.959 returning that. Let's check the middle

00:16:39.519 bit. So for each instruction, we check

00:16:43.759 uh it a case expression and if it's a

00:16:46.639 put object instruction, we push that

00:16:49.199 value onto the stack. But if it's a send

00:16:52.800 instruction, we'll do something

00:16:54.160 different. We pop a value from the stack

00:16:56.880 that will be on the the right side. Then

00:16:59.600 we pop a second value from the stack.

00:17:01.519 That's the left side. Then we do that

00:17:04.000 send operation that we had before again

00:17:06.319 but now we push that value back onto the

00:17:09.360 stack.

00:17:11.280 So once more let's walk step by step

00:17:13.760 with a real example. So this is are the

00:17:16.079 instructions for oneplus 2. So we check

00:17:18.880 the instruction type. The first

00:17:20.720 instruction is put object one and

00:17:24.640 then we push its value onto the stack.

00:17:26.799 So the stack now contains the number

00:17:28.720 one. Second instruction is put object

00:17:31.280 with two. Again we check its type, grab

00:17:33.919 its value, its put object and we push

00:17:36.720 the value onto the stack. So now the

00:17:38.559 stack contains one and two. Third and

00:17:41.679 last instruction is the send

00:17:43.440 instruction. So we check uh its branch

00:17:47.039 is the send branch. We get the operator.

00:17:50.320 We pop a value from the stack. So we get

00:17:52.559 the number two on the variable right. Uh

00:17:54.960 we pop a second value from the stack.

00:17:56.880 That's the left side.

00:17:59.120 and we calculate the result using the

00:18:02.000 send method and we got three back and we

00:18:06.559 push three back onto the stack. So this

00:18:09.520 stack now contains the number three and

00:18:12.720 like I said before we pop the last value

00:18:15.039 from the stack as the last step and if

00:18:17.760 you do that it will pop three and our

00:18:20.080 language still returns three.

00:18:24.000 What's cool about Ruby is that you can

00:18:26.080 see these instructions for yourself. If

00:18:28.880 you call Ruby with some code and the

00:18:31.520 d-dump instruction option, you will get

00:18:34.559 this. If I clean it up a little bit and

00:18:37.360 zoom in, you get this. These are the

00:18:39.360 instructions for 10 up to 20.

00:18:44.000 First instruction is put object with 10.

00:18:46.640 Then put object with 20. Then we have

00:18:49.360 this opt send without block with the up

00:18:52.720 to argument. And lastly, e leaves, which

00:18:55.919 basically returns.

00:18:58.720 So in a nutshell, that's how Ruby works

00:19:00.960 since 1.9

00:19:03.360 until we introduce JIT compilers. And at

00:19:07.280 this point, there's a lot of them.

00:19:10.240 There's four JIT compilers at this point

00:19:12.559 in Ruby histories.

00:19:15.280 Uh so it's similar to what we had

00:19:18.000 before. So instead of just having a

00:19:20.559 compiler that compiles the EST into byte

00:19:22.880 code, now we have a second compiler that

00:19:25.520 compiles the byte code into machine

00:19:27.919 code. So instead of this compiler that

00:19:30.720 we created, we also have a compiler for

00:19:33.679 assembly, which is kind of like this.

00:19:37.679 And I know what you might be thinking at

00:19:39.600 this point. Uh yeah, this is cool and

00:19:41.440 all, but I'm a I'm a Rails dev. I'm not

00:19:44.240 writing a compiler. Why should I care

00:19:46.960 about parsers?

00:19:49.280 and one because it's fun. But parsers

00:19:52.240 aren't just for compilers. If you use

00:19:54.480 Rails, and I assume you do, or IRB or

00:19:58.000 Rubocop standard, if you use the VS Code

00:20:01.200 extension for Ruby or really any of

00:20:03.360 these gems, you are using a parser. In

00:20:06.799 fact, you are using several different

00:20:08.559 parsers. And that difference affects us

00:20:12.000 on our daily work. You know when Ruby

00:20:14.640 adds a new syntax and then Rubocop

00:20:17.039 doesn't understand it right away or your

00:20:20.320 editor thinks it's a grammar error. It's

00:20:23.280 because of this. Each tool has a

00:20:25.440 different parser. So when Ruby adds new

00:20:27.760 features, everyone has to catch up with

00:20:29.440 a new thing. This is also true if you

00:20:32.080 are using a different Ruby

00:20:33.280 implementation like truffle Ruby or J

00:20:35.120 Ruby. Everyone has to catch up to Sir

00:20:37.600 Ruby. That's why they created this new

00:20:40.799 parser. You might have heard about it.

00:20:42.799 Prism. The idea of Prism was to be a

00:20:46.320 single parser that could handle it all.

00:20:50.000 It's a parser that is using the Ruby LSP

00:20:53.200 extension

00:20:54.799 and since last year actually it's the

00:20:57.280 default parser for C Ruby. So if you are

00:20:59.440 using Ruby 3.4 you are using Prism. So

00:21:03.120 now we have this one parser. using C

00:21:05.840 Ruby but at this point it's already

00:21:07.360 using J Ruby truffle Ruby Natalie Opal

00:21:11.360 and several gems because it can do C it

00:21:14.240 can do it in Ruby. It's very powerful.

00:21:16.720 So maybe one day instead of this

00:21:18.320 fracture ecosystem will just have one

00:21:21.440 single parser and whenever Ruby adds a

00:21:24.080 new feature everyone can benefit from

00:21:26.000 that right away.

00:21:28.640 Okay, but why should I care about a VM

00:21:31.039 then? Well, take this example. These are

00:21:34.559 the instructions for the expression two

00:21:36.880 send plus and three like 2 + 3

00:21:40.799 and these are the instructions for 0 +

00:21:43.360 1. Can we spot the difference?

00:21:50.880 The first instructions

00:21:53.200 uh we the first expression has more

00:21:55.120 instructions and the instructions all

00:21:57.600 have arguments while the second

00:22:00.080 instruction for example we don't use

00:22:01.919 pure object for zero and one you have

00:22:04.080 this put object int to fix zero int to

00:22:07.919 fix one these are instructions

00:22:10.960 specifically designed to put one and

00:22:13.679 zero on the stack instead of using sand

00:22:16.799 Ruby uses the opt plus instructions and

00:22:19.840 Ruby that does that because we do we

00:22:21.919 deal with zeros and ones and summing

00:22:24.159 numbers those are very common operations

00:22:26.080 so it has optimized instructions just

00:22:28.880 for that it's called a fast path

00:22:32.080 optimization and we'll do the same thing

00:22:34.559 to our VM to understand how it works so

00:22:37.919 in our compiler specifically in the

00:22:40.080 binary branch we we used to have this

00:22:43.120 now let's say we want to make addition

00:22:45.039 faster for some reason So we check if

00:22:48.400 the operation is an addition and we have

00:22:50.640 a number on the left side and a number

00:22:52.559 on the right side. Then instead of doing

00:22:55.919 what we did before we do this we add the

00:22:58.559 put object instruction but we'll sum the

00:23:01.280 values right away in the compiler and

00:23:03.760 put that on the set. So set in a way we

00:23:07.200 are interpreting inside the compiler

00:23:08.960 now. But if it's not an addition of

00:23:12.159 numbers we'll do what we did before. And

00:23:15.200 if you benchmark this, you'll see that

00:23:17.440 now addition is 20% faster than

00:23:21.200 subtraction. What I'm trying to say that

00:23:23.840 here is that how you write code matters.

00:23:27.760 There's difference between using method

00:23:29.840 missing or define method. There's a

00:23:32.880 difference between while true and using

00:23:34.960 loop. And because of object shapes,

00:23:38.559 there's a penalty if you use too much

00:23:40.960 memorization in your code.

00:23:43.679 What I'm saying is I'm not saying that

00:23:45.919 you don't have to use those features.

00:23:48.559 You can write whatever you want, but

00:23:51.280 know the trade-offs that you are

00:23:53.039 choosing and understanding the Ruby VM

00:23:55.520 helps you to make those decisions.

00:23:59.039 I guess no one would ask uh who cares

00:24:01.360 about Jet just because why JIT? Why JIT

00:24:03.919 makes your Rails code 10 to 30% faster

00:24:06.720 and you don't have to do anything. So

00:24:08.880 yeah, it's very welcoming. It's even

00:24:11.279 enabled in Rails by default at this

00:24:13.440 point. But what I like about YJIT is

00:24:16.559 that its impact is much more than just

00:24:18.799 performance.

00:24:20.559 So Shopify did this benchmark where they

00:24:23.120 benchmark parsing GraphQL queries in

00:24:25.760 Ruby. So they compare pure Ruby with C

00:24:28.880 extension with pure Ruby with YJIT, all

00:24:31.440 the variations that you could have. And

00:24:33.840 what they found out was that writing

00:24:36.720 pure Ruby with YJ was faster than a C

00:24:40.240 extension with YJI.

00:24:42.480 So in a way Ruby is faster than C and we

00:24:46.880 started seeing PRs like this where they

00:24:48.880 re rewrote parts of Ruby from C to Ruby.

00:24:53.200 So you we used to have this method for

00:24:56.320 int times and it got replaced with this

00:24:59.919 which is not only as fast and much

00:25:02.640 smaller but a lot closer to the kind of

00:25:05.200 work that we do do on a day-to-day basis

00:25:08.799 or this other example which is more

00:25:10.480 recent. They rewrote path name mostly in

00:25:13.520 Ruby and not only it was twice as fast

00:25:16.960 but it was a third of the code.

00:25:20.240 So I believe that the future is more

00:25:22.080 Ruby because Ruby is easier. It's easier

00:25:25.679 to read. It's easier to understand. It's

00:25:28.400 easier to write. And I believe that it

00:25:30.799 will be easier to contribute to because

00:25:33.840 it will be written in Ruby, a language

00:25:35.760 that we use every day. So while Ruby,

00:25:39.600 yeah, Ruby is easy, but at the same

00:25:41.760 time, I hope you noticed that it's

00:25:44.000 complex too. Has a parser, has a

00:25:46.320 compiler, several compilers at this

00:25:48.240 point. So to put it into Matt's words,

00:25:52.159 the Ruby creator, Ruby is simple in

00:25:54.720 apparence but very complex in the inside

00:25:57.520 just like the human body.

00:26:00.320 So to give you an example of this

00:26:02.480 complexity, the Prism compiler has 8,000

00:26:06.159 lines of C Ruby Code. The Prism parser

00:26:10.799 has 16,000 lines of C. Yet itself has

00:26:14.880 25,000 lines. And this is what the last

00:26:17.200 time I checked months ago. And if you

00:26:19.919 count all files inside the Ruby

00:26:21.679 repository, you have over 1.5 million

00:26:24.320 lines of code. That's the work of all

00:26:27.279 these individuals, almost 400 people.

00:26:30.240 And I just wanted to take a moment here

00:26:32.320 for all of us to thank the contributors

00:26:34.480 of Ruby. So give it up for the

00:26:36.320 contributors.

00:26:44.400 Yeah, they work really hard to make our

00:26:46.400 lives easier. So to answer again this

00:26:49.679 question, why should you care about all

00:26:51.679 of this? Well, the most important thing

00:26:54.720 is as you go through your path in

00:26:58.240 development, you eventually have to go

00:27:01.279 beyond what you do now. You have to go

00:27:04.080 into this technical deep topics.

00:27:06.159 Sometimes you have to read the gem

00:27:07.760 source code because there are no

00:27:09.760 documentation. You have go you have to

00:27:12.000 go beyond tutorials and blog posts and

00:27:14.080 even official docs. You write your first

00:27:16.480 gem and maybe one day you have to debug

00:27:19.200 a performance issue and you have to dump

00:27:21.440 the C instructions. So I hope this talk

00:27:25.360 is the first step in that journey for

00:27:27.679 you. I hope it helps in unveiling the

00:27:30.640 magic behind Ruby. And if you like this

00:27:33.520 topic, you can take this further.

00:27:35.679 There's this wonderful book called

00:27:37.679 Crafting Interpreters. You will create

00:27:40.159 two interpreters for the same language.

00:27:42.559 The first one is the just like we did uh

00:27:44.880 initially a tree walker. And then the

00:27:47.039 one the second one is a VM. You create

00:27:49.360 everything from scratch. No libraries,

00:27:51.520 no nothing. So it's very fun. And if you

00:27:54.799 want to play with the code that I showed

00:27:56.559 you, it's all here in this repository.

00:27:58.960 It's a GitHub repo at tbot.io/ io/math-

00:28:03.760 interpreter and you can add something to

00:28:06.080 your language. Do whatever you want with

00:28:07.840 it. It's yours now.

00:28:10.320 Yeah. And I hope this has helped you to

00:28:13.279 see how Rails benefits from Ruby, how

00:28:15.840 Ruby made Rails possible. And if it

00:28:19.120 wasn't for Ruby, we wouldn't be here

00:28:20.799 today. And that's what I had for today.

00:28:23.279 Thank you everyone.

00:28:30.159 Do we have the time for questions?

00:28:34.559 One question. Uh they asked why do I

00:28:37.919 want to wanted to learn about this?

00:28:40.640 Well, I started messing with this during

00:28:43.360 the pandemic. So, I had some free time

00:28:46.320 and I found that book uh crafting

00:28:49.039 interpreters. You can read the whole

00:28:50.640 book online for free. And once I I

00:28:54.480 started doing it, I could just couldn't

00:28:56.080 stop. Uh it was fun to like build my own

00:28:59.679 language and like to add whatever I

00:29:01.840 wanted. Uh yeah. So that that that was

00:29:04.399 it for me. That's it. Thank you

00:29:06.880 everyone.

Matheus Richard

explore all talks recorded at RailsConf 2025

Explore all talks recorded at RailsConf 2025

RailsConf 2025