00:00:17.039
Hello, Railscom. How y'all doing?
00:00:21.039
Great.
00:00:22.880
So, my name is Matias. Uh, I work for a
00:00:25.279
company called ThoughtBot. You might
00:00:26.720
know us from our open source work, our
00:00:29.279
blog. If you use factorybot, that's us.
00:00:33.680
We are on a conference about Rails. In
00:00:35.840
fact, the last Rails conf. But way
00:00:38.640
before Rails existed in the middle of
00:00:41.280
the '90s, Japanese men has this divine
00:00:44.000
revelation and created this language and
00:00:46.800
we call it Ruby. Without Ruby, there
00:00:49.280
will be no Rails. So as a way to
00:00:51.920
celebrate Rail's history, we'll dive
00:00:54.399
into Ruby today and understand its
00:00:56.879
internal workings and how it affects our
00:00:59.280
lives.
00:01:00.879
There will be a fair amount of code in
00:01:02.960
this slides. Don't worry about it. I'll
00:01:05.199
share the slides later if you need to.
00:01:07.280
Just think about the big picture.
00:01:10.720
Before we start looking at how Ruby
00:01:13.760
understand our our code, let's think
00:01:16.159
about how we process text.
00:01:19.200
Take this sentence as an example. When I
00:01:22.080
see this sentence, my brain kind of
00:01:24.320
divides it into different tokens. And
00:01:28.159
for example, when I see the token like
00:01:29.920
the colon here, I know that what comes
00:01:32.640
after the colon is explaining what's
00:01:35.280
before the colon or when I see the
00:01:37.840
apostrophe s here, I know that the thing
00:01:40.320
after that belongs to the thing before
00:01:42.560
that. So my brain separates a list of
00:01:46.240
tokens and we kind of get the
00:01:49.920
relationships between those tokens as
00:01:51.840
well. And after this process which is
00:01:54.880
very fast in our brains, we try to
00:01:56.479
understand the meaning of it if there's
00:01:58.719
any meaning because not everything
00:02:01.360
that's parsible has a meaning. So this
00:02:04.240
sentence as an example, I don't know
00:02:06.719
what it means.
00:02:09.039
And with Ruby is the same thing. Ruby
00:02:11.760
reads our code,
00:02:13.920
it splits it into a list of tokens, then
00:02:16.959
it builds a structure that with a
00:02:19.200
relationships between those tokens.
00:02:22.640
And again, not everything that's
00:02:24.400
parsible makes sense. So for example, in
00:02:27.280
this example, the method is the
00:02:29.520
definition is correct and the call is
00:02:31.440
correct grammarwise, but it doesn't have
00:02:34.080
a meaning in runtime.
00:02:36.319
So to understand all of this more
00:02:38.080
deeply, we'll create an interpreter here
00:02:40.160
right now. It's a very simple one, but
00:02:42.640
it does everything that a normal
00:02:44.560
interpreter would do.
00:02:47.040
So I'll start presenting the language.
00:02:49.200
And the language is very simple for now,
00:02:51.360
but it would get increasingly more
00:02:53.120
complex.
00:02:54.720
At this point, our language, our
00:02:56.480
programs are just numbers. And by
00:02:58.400
number, I mean a character from 0 to 9.
00:03:01.040
That's only the only thing that our
00:03:02.959
language ex extends.
00:03:05.680
Um so let's start lexing which means
00:03:08.480
also called tokenizing. Tokenizing it's
00:03:12.080
taking the code and outputting the list
00:03:14.159
of tokens. So because the language is so
00:03:17.120
simple the tokenizer here will will be
00:03:19.760
very simple too. All we do is we use a
00:03:22.159
rejects to get all the words and numbers
00:03:25.599
and special characters like plus sign
00:03:27.920
minus sign from the the string.
00:03:31.680
And we'll also create this language
00:03:33.360
module which is the entry point for our
00:03:35.440
language. So whenever we run want to run
00:03:37.599
our language we'll call language call
00:03:39.680
with a code for for now all we do is
00:03:43.040
tokenize it. So again the tokenizer gets
00:03:46.000
some code and outputs a list of tokens
00:03:48.879
out.
00:03:50.799
Even though we have the tokens out we
00:03:52.879
are not ready to parse to understand its
00:03:55.280
information yet. So take this expression
00:03:58.159
as an example. Because of math, we know
00:04:01.360
that we have to do the left uh addition,
00:04:04.400
the left subtraction first and then the
00:04:06.159
addition. Because if we did it the
00:04:08.319
reverse way, we would get a different
00:04:10.640
result. So the order of operations
00:04:13.439
matter and we need some kind of
00:04:15.120
structure that tells us the order of
00:04:17.199
operations and that is called parsing.
00:04:21.280
So parsing the parser is similar to our
00:04:24.400
tokenizer, but it receives tokens
00:04:26.240
instead of the raw string and it doesn't
00:04:28.800
work.
00:04:30.400
And again this is the language that we
00:04:32.080
are working it with and we'll try to
00:04:34.639
make our parser code look exactly like
00:04:37.360
the grammar. So we have the call method
00:04:40.160
and it has a program method and the
00:04:42.160
program is a number and inside this
00:04:44.960
number method will parse a number. So
00:04:48.639
first thing we do is we advance to get a
00:04:50.800
token and by advance I just mean
00:04:52.639
grabbing the first token from the list
00:04:55.600
and then we check if it's nil then we'll
00:04:58.160
raise an error because our language
00:05:00.000
requires a token a number then we check
00:05:04.080
if it doesn't match the rejects for a
00:05:06.080
digit then it's not a digit so we raise
00:05:08.160
an error again all we care is number
00:05:11.600
but if it indeed is a number then we
00:05:14.320
will return a node which in this case is
00:05:16.639
just a hash with a type number and the
00:05:19.280
value is that token converted to an
00:05:21.680
integer. So that's it. In 10 lines of
00:05:24.400
code, we made a significant design
00:05:26.800
decision for our language. All numbers
00:05:28.720
are integers. If we had for example
00:05:31.759
division, it would be division for
00:05:34.080
integer numbers.
00:05:36.720
Okay. So we add our parser now to our
00:05:39.440
language module. And let's make this
00:05:42.160
language a little more complex by adding
00:05:44.479
addition. So the grammar is now like
00:05:47.280
this. A program is a term and a term is
00:05:50.160
either a number like we had before or a
00:05:53.280
number a plus sign and another number or
00:05:56.160
maybe a number a plus sign another
00:05:57.840
number a plus sign another number and
00:05:59.440
you know where this is going. So because
00:06:01.440
we can have multiple infinite sized
00:06:04.160
addition uh we can write the grammar
00:06:05.840
like this. A term is either a number and
00:06:08.800
optionally any number of plus signs and
00:06:12.160
other numbers.
00:06:14.479
So we'll reflect the grammar in the
00:06:16.560
code. So now program calls term and
00:06:20.160
inside term we try to parse a number
00:06:22.400
like we did before with the number
00:06:24.319
method. Then we check does the next
00:06:27.520
token matches
00:06:29.919
a plus sign.
00:06:32.080
If it doesn't match it's just a number
00:06:33.600
we return it. But if it matches a plus
00:06:35.680
sign then we'll try to parse an addition
00:06:37.600
here. So we advance again and that will
00:06:40.880
return the operator which is the plus
00:06:42.720
sign in this case. And then we try to
00:06:45.120
parse a second number.
00:06:47.600
And now instead of returning a single
00:06:49.759
number node, we return a node of the
00:06:51.840
type binary. And the binary node has the
00:06:54.479
operator just the plus sign. And on the
00:06:57.039
left side the first number, and on the
00:06:59.039
right side, what would be the second
00:07:00.800
number?
00:07:02.880
We could make this language more complex
00:07:04.880
by adding subtraction as well. And to
00:07:07.919
support it, it's pretty simple. Now we
00:07:10.000
just just look for a plus sign or a
00:07:12.800
minus sign.
00:07:14.880
So to understand this more deeply, let's
00:07:17.039
walk step by step on how we tokenize and
00:07:20.160
parse this language, this expression. So
00:07:23.680
everything starts on the language
00:07:25.280
module, receive the code and we call
00:07:27.599
tokenize with that string and it will
00:07:30.800
return for us the tokens
00:07:33.440
and then we parse that. So inside parser
00:07:36.880
we initialize the tokens with a list of
00:07:39.520
tokens that we received. We call program
00:07:42.560
then term and inside term we call
00:07:45.120
number. And now inside number we advance
00:07:48.080
to get a token. So we got a token.
00:07:51.520
It's not nil. It is uh matching the
00:07:54.639
reject for a digit. So we'll return now
00:07:57.199
a node of type of number with that value
00:08:00.240
converted to an integer. So there you
00:08:02.319
go. We get the first expression. Now we
00:08:05.199
check does the next token match a plus
00:08:07.360
sign or a minus sign. Yes. So we enter
00:08:10.319
the while loop. We advance. We get the
00:08:12.720
operator. This case the operator plus.
00:08:15.440
And then we try to parse a second
00:08:17.039
number.
00:08:18.720
And we run the the code again. But now
00:08:20.879
with the we receive the number two.
00:08:24.160
And we'll return now the node binary.
00:08:28.240
And just replace the values here. The
00:08:30.800
operator is plus. first expression is
00:08:33.440
the number one and the second expression
00:08:35.839
is the number two and that's it. Uh we
00:08:39.440
just built that tree that structure that
00:08:42.479
I mentioned that had the order of
00:08:44.000
operations.
00:08:45.680
We call this an abstract syntax tree
00:08:48.640
also called as for short. But how do we
00:08:52.080
execute it? How do we run this language?
00:08:55.519
Well, we add another step to our
00:08:57.839
language. We tokenize the code. Then we
00:08:59.839
parse it. Now we're going to interpret
00:09:01.839
it. And to interpret uh the a we receive
00:09:06.160
it the note uh and we check its type. If
00:09:09.360
it's a number then we just return its
00:09:11.920
value. But if it's a binary node then
00:09:15.360
what we do is we interpret the left side
00:09:18.000
recursively. So that get us the the left
00:09:21.040
side and then we interpret the right
00:09:23.120
expression.
00:09:24.640
And lastly, we'll do the send call using
00:09:27.279
the left side and passing as arguments
00:09:29.519
the operator and the right side. And
00:09:32.160
that's it. We did it. Uh we just built a
00:09:34.880
very simple interpreter. But again,
00:09:37.360
let's walk step by step of of how this
00:09:39.920
is interpreted. So using that binary
00:09:43.120
note that we received from parsing,
00:09:45.360
let's walk step by step. Here we check
00:09:47.680
the node type. Uh is it a number? No,
00:09:50.560
it's a binary expression. So we go to
00:09:52.800
the binary expression branch and then
00:09:55.200
now we try to interpret the left side.
00:09:57.519
So this will call the function
00:09:59.120
recursively. So we are now interpreting
00:10:02.000
the left side. We check its type. It's a
00:10:05.519
number now. So we'll return its value in
00:10:07.920
this case one. So on the left side we
00:10:10.160
have one. Now we interpret the right
00:10:12.640
side and it's the same thing. And we'll
00:10:14.720
get two back. And now we do that send
00:10:18.079
expression. So this is basically one
00:10:21.600
send plus with two because addition is a
00:10:25.519
method in Ruby we can use send to call
00:10:27.760
it and if you do it your language will
00:10:31.040
return three
00:10:33.760
and you might be thinking at this point
00:10:35.839
is that really it seems too simple to be
00:10:39.120
true but that's actually how Ruby worked
00:10:42.079
until the version 1.8 8. So over 10
00:10:44.800
years of Ruby it worked exactly like
00:10:47.040
that. And to show you let's see how Ruby
00:10:49.920
used to interpret the number 44 42 sorry
00:10:53.200
only that and that will be some code now
00:10:56.800
it's C. So I don't know if your saw C
00:10:59.839
code before it's fine.
00:11:02.959
So Ruby had this RB evolve function. It
00:11:07.279
received a node
00:11:09.600
and first thing it does is defining this
00:11:11.920
label. Labels in C are like checkpoints
00:11:14.880
in the code. We'll see how this works
00:11:17.519
later.
00:11:19.360
Um first thing we do is we check the
00:11:21.920
note type here. So there's this big
00:11:24.399
switch statement with several cases and
00:11:27.120
for the number 42 we fall into the node
00:11:29.680
lit case
00:11:31.920
and node lit here is similar to our
00:11:34.399
number type. Uh so we get the result
00:11:38.000
from the NDE attribute
00:11:41.360
and we break basically return from the
00:11:44.000
the the switch case and that's it. It's
00:11:47.279
not too different from what we did
00:11:48.959
before with the case statement and the
00:11:51.360
number node and the value. So let's try
00:11:54.640
something a little more complex. Let's
00:11:56.480
try this and expression. So again we are
00:12:00.079
inside the RB eval function. We switch
00:12:03.680
on the node type. There's a bunch of
00:12:05.519
cases but eventually we hit the node end
00:12:09.120
and first thing uh after entering the
00:12:12.000
branch we evolve the nd first attribute
00:12:14.959
that's similar to our left attribute on
00:12:17.680
a hash. So this call the function
00:12:19.760
recursively interprets the left side.
00:12:23.519
Then Ruby checks if the left side is
00:12:27.440
either new or false then break stop
00:12:30.880
executing. But if it's truthy then grab
00:12:34.160
the right side and the second here and
00:12:37.600
go to again remember that label that I
00:12:39.920
talked about. Oh, in see the label when
00:12:43.040
you use go to you basically makes the
00:12:45.600
code start executing where you pointed
00:12:48.480
at. So the code will go back to the
00:12:50.240
again label and starts switching on the
00:12:53.440
node type again but now we are
00:12:54.959
interpreting the right side. So this is
00:12:58.079
why in a language like Ruby even though
00:13:00.079
you have something on the right side
00:13:01.760
that would be an error like summing a
00:13:04.160
number and a string because the left
00:13:07.120
side is falsy we don't even execute
00:13:09.680
that. So there's no errors happening
00:13:11.920
here. And now you know how this is
00:13:13.440
implemented.
00:13:15.120
What good about this interpreter is that
00:13:17.680
it's very easy to be the one. That's why
00:13:20.720
Matt chose this architecture.
00:13:23.760
But what's bad about it, it's that it's
00:13:26.079
super slow and it's slow because of how
00:13:29.519
we represent this data. If you've been
00:13:31.839
to the caching uh talk before this one,
00:13:35.279
you probably know why. CPUs like
00:13:38.480
sequential data and when we store our
00:13:41.680
data in that hash that tree our data is
00:13:44.800
all spread out in memory and that really
00:13:46.800
hurts performance. To give you an
00:13:49.279
example to access the L1 cache in your
00:13:52.160
CPU should take about one ncond
00:13:55.680
for the L2 cache it's about four nconds
00:14:00.160
and to access the RAN which is where our
00:14:02.320
data lives it's over 100 nconds. So it's
00:14:06.399
pretty slow. So that's why in Ruby 1.9
00:14:10.079
after 14 years of this first
00:14:12.320
architecture we Ruby changed and it
00:14:15.360
changed its interpreter to a compiler
00:14:17.839
and now the compiler receives the a and
00:14:20.639
compiles it into byte code which are a
00:14:23.199
list of instructions and those
00:14:25.199
instructions are run by a VM and this
00:14:28.480
made will be two to four times faster on
00:14:31.279
average.
00:14:33.120
So instead of a three like this, we need
00:14:36.720
a data structure that is more sequential
00:14:39.199
where the data is packed sequentially
00:14:41.519
and that that is a lot easier for CPUs
00:14:44.240
to run.
00:14:46.000
So we need to flatten that tree and to
00:14:48.480
do that it's fairly simple. We'll walk
00:14:51.600
this tree starting at the root. Uh we
00:14:54.000
first evaluate the left branch and we'll
00:14:56.399
generate an instruction for that. Then
00:14:59.199
we'll go to the right right branch and
00:15:01.440
generate another instruction for that.
00:15:04.000
And lastly for the root node we generate
00:15:06.240
an instruction and that's how we'll go
00:15:08.480
from a tree to a array. So instead of
00:15:12.480
interpreting the code right away now
00:15:14.399
we'll compile it and to compile we'll
00:15:16.800
create a compiler. Don't worry.
00:15:20.160
Oh compilers are scary but this one is
00:15:22.560
very simple. We'll receive the a from
00:15:24.959
the parsing phase and we'll return
00:15:27.680
instructions an array of instructions.
00:15:30.320
And to create those instructions again
00:15:32.079
we we check the the a type and for a
00:15:35.760
number for example we generate a put
00:15:38.639
object instruction with the value of
00:15:40.959
that number note. But if it's a binary
00:15:43.839
note you might guess it we'll call this
00:15:46.399
function recursively on the left side.
00:15:48.560
Generate instructions for the left side.
00:15:51.199
Then we generate instructions for the
00:15:53.040
right side. And lastly, we'll generate
00:15:55.600
this send instruction with the operator
00:15:58.880
and that will give us an array like
00:16:01.120
this.
00:16:02.800
But how do we run this? Now like I said,
00:16:05.759
we will use a virtual machine and like
00:16:08.160
Ruby, we'll use a stack based virtual
00:16:11.120
machine. So after compiling the code,
00:16:14.399
we'll run it and we'll call this VM.
00:16:18.560
So the VM receives the instructions and
00:16:21.279
like I said it's a stackbased VM. So we
00:16:23.920
initialize a stack. Ruby doesn't have a
00:16:26.079
stack data structure but we can use just
00:16:28.480
an array. We do some work of the with
00:16:31.759
the instructions and the last step would
00:16:34.240
be popping the last value from the stack
00:16:36.959
returning that. Let's check the middle
00:16:39.519
bit. So for each instruction, we check
00:16:43.759
uh it a case expression and if it's a
00:16:46.639
put object instruction, we push that
00:16:49.199
value onto the stack. But if it's a send
00:16:52.800
instruction, we'll do something
00:16:54.160
different. We pop a value from the stack
00:16:56.880
that will be on the the right side. Then
00:16:59.600
we pop a second value from the stack.
00:17:01.519
That's the left side. Then we do that
00:17:04.000
send operation that we had before again
00:17:06.319
but now we push that value back onto the
00:17:09.360
stack.
00:17:11.280
So once more let's walk step by step
00:17:13.760
with a real example. So this is are the
00:17:16.079
instructions for oneplus 2. So we check
00:17:18.880
the instruction type. The first
00:17:20.720
instruction is put object one and
00:17:24.640
then we push its value onto the stack.
00:17:26.799
So the stack now contains the number
00:17:28.720
one. Second instruction is put object
00:17:31.280
with two. Again we check its type, grab
00:17:33.919
its value, its put object and we push
00:17:36.720
the value onto the stack. So now the
00:17:38.559
stack contains one and two. Third and
00:17:41.679
last instruction is the send
00:17:43.440
instruction. So we check uh its branch
00:17:47.039
is the send branch. We get the operator.
00:17:50.320
We pop a value from the stack. So we get
00:17:52.559
the number two on the variable right. Uh
00:17:54.960
we pop a second value from the stack.
00:17:56.880
That's the left side.
00:17:59.120
and we calculate the result using the
00:18:02.000
send method and we got three back and we
00:18:06.559
push three back onto the stack. So this
00:18:09.520
stack now contains the number three and
00:18:12.720
like I said before we pop the last value
00:18:15.039
from the stack as the last step and if
00:18:17.760
you do that it will pop three and our
00:18:20.080
language still returns three.
00:18:24.000
What's cool about Ruby is that you can
00:18:26.080
see these instructions for yourself. If
00:18:28.880
you call Ruby with some code and the
00:18:31.520
d-dump instruction option, you will get
00:18:34.559
this. If I clean it up a little bit and
00:18:37.360
zoom in, you get this. These are the
00:18:39.360
instructions for 10 up to 20.
00:18:44.000
First instruction is put object with 10.
00:18:46.640
Then put object with 20. Then we have
00:18:49.360
this opt send without block with the up
00:18:52.720
to argument. And lastly, e leaves, which
00:18:55.919
basically returns.
00:18:58.720
So in a nutshell, that's how Ruby works
00:19:00.960
since 1.9
00:19:03.360
until we introduce JIT compilers. And at
00:19:07.280
this point, there's a lot of them.
00:19:10.240
There's four JIT compilers at this point
00:19:12.559
in Ruby histories.
00:19:15.280
Uh so it's similar to what we had
00:19:18.000
before. So instead of just having a
00:19:20.559
compiler that compiles the EST into byte
00:19:22.880
code, now we have a second compiler that
00:19:25.520
compiles the byte code into machine
00:19:27.919
code. So instead of this compiler that
00:19:30.720
we created, we also have a compiler for
00:19:33.679
assembly, which is kind of like this.
00:19:37.679
And I know what you might be thinking at
00:19:39.600
this point. Uh yeah, this is cool and
00:19:41.440
all, but I'm a I'm a Rails dev. I'm not
00:19:44.240
writing a compiler. Why should I care
00:19:46.960
about parsers?
00:19:49.280
and one because it's fun. But parsers
00:19:52.240
aren't just for compilers. If you use
00:19:54.480
Rails, and I assume you do, or IRB or
00:19:58.000
Rubocop standard, if you use the VS Code
00:20:01.200
extension for Ruby or really any of
00:20:03.360
these gems, you are using a parser. In
00:20:06.799
fact, you are using several different
00:20:08.559
parsers. And that difference affects us
00:20:12.000
on our daily work. You know when Ruby
00:20:14.640
adds a new syntax and then Rubocop
00:20:17.039
doesn't understand it right away or your
00:20:20.320
editor thinks it's a grammar error. It's
00:20:23.280
because of this. Each tool has a
00:20:25.440
different parser. So when Ruby adds new
00:20:27.760
features, everyone has to catch up with
00:20:29.440
a new thing. This is also true if you
00:20:32.080
are using a different Ruby
00:20:33.280
implementation like truffle Ruby or J
00:20:35.120
Ruby. Everyone has to catch up to Sir
00:20:37.600
Ruby. That's why they created this new
00:20:40.799
parser. You might have heard about it.
00:20:42.799
Prism. The idea of Prism was to be a
00:20:46.320
single parser that could handle it all.
00:20:50.000
It's a parser that is using the Ruby LSP
00:20:53.200
extension
00:20:54.799
and since last year actually it's the
00:20:57.280
default parser for C Ruby. So if you are
00:20:59.440
using Ruby 3.4 you are using Prism. So
00:21:03.120
now we have this one parser. using C
00:21:05.840
Ruby but at this point it's already
00:21:07.360
using J Ruby truffle Ruby Natalie Opal
00:21:11.360
and several gems because it can do C it
00:21:14.240
can do it in Ruby. It's very powerful.
00:21:16.720
So maybe one day instead of this
00:21:18.320
fracture ecosystem will just have one
00:21:21.440
single parser and whenever Ruby adds a
00:21:24.080
new feature everyone can benefit from
00:21:26.000
that right away.
00:21:28.640
Okay, but why should I care about a VM
00:21:31.039
then? Well, take this example. These are
00:21:34.559
the instructions for the expression two
00:21:36.880
send plus and three like 2 + 3
00:21:40.799
and these are the instructions for 0 +
00:21:43.360
1. Can we spot the difference?
00:21:50.880
The first instructions
00:21:53.200
uh we the first expression has more
00:21:55.120
instructions and the instructions all
00:21:57.600
have arguments while the second
00:22:00.080
instruction for example we don't use
00:22:01.919
pure object for zero and one you have
00:22:04.080
this put object int to fix zero int to
00:22:07.919
fix one these are instructions
00:22:10.960
specifically designed to put one and
00:22:13.679
zero on the stack instead of using sand
00:22:16.799
Ruby uses the opt plus instructions and
00:22:19.840
Ruby that does that because we do we
00:22:21.919
deal with zeros and ones and summing
00:22:24.159
numbers those are very common operations
00:22:26.080
so it has optimized instructions just
00:22:28.880
for that it's called a fast path
00:22:32.080
optimization and we'll do the same thing
00:22:34.559
to our VM to understand how it works so
00:22:37.919
in our compiler specifically in the
00:22:40.080
binary branch we we used to have this
00:22:43.120
now let's say we want to make addition
00:22:45.039
faster for some reason So we check if
00:22:48.400
the operation is an addition and we have
00:22:50.640
a number on the left side and a number
00:22:52.559
on the right side. Then instead of doing
00:22:55.919
what we did before we do this we add the
00:22:58.559
put object instruction but we'll sum the
00:23:01.280
values right away in the compiler and
00:23:03.760
put that on the set. So set in a way we
00:23:07.200
are interpreting inside the compiler
00:23:08.960
now. But if it's not an addition of
00:23:12.159
numbers we'll do what we did before. And
00:23:15.200
if you benchmark this, you'll see that
00:23:17.440
now addition is 20% faster than
00:23:21.200
subtraction. What I'm trying to say that
00:23:23.840
here is that how you write code matters.
00:23:27.760
There's difference between using method
00:23:29.840
missing or define method. There's a
00:23:32.880
difference between while true and using
00:23:34.960
loop. And because of object shapes,
00:23:38.559
there's a penalty if you use too much
00:23:40.960
memorization in your code.
00:23:43.679
What I'm saying is I'm not saying that
00:23:45.919
you don't have to use those features.
00:23:48.559
You can write whatever you want, but
00:23:51.280
know the trade-offs that you are
00:23:53.039
choosing and understanding the Ruby VM
00:23:55.520
helps you to make those decisions.
00:23:59.039
I guess no one would ask uh who cares
00:24:01.360
about Jet just because why JIT? Why JIT
00:24:03.919
makes your Rails code 10 to 30% faster
00:24:06.720
and you don't have to do anything. So
00:24:08.880
yeah, it's very welcoming. It's even
00:24:11.279
enabled in Rails by default at this
00:24:13.440
point. But what I like about YJIT is
00:24:16.559
that its impact is much more than just
00:24:18.799
performance.
00:24:20.559
So Shopify did this benchmark where they
00:24:23.120
benchmark parsing GraphQL queries in
00:24:25.760
Ruby. So they compare pure Ruby with C
00:24:28.880
extension with pure Ruby with YJIT, all
00:24:31.440
the variations that you could have. And
00:24:33.840
what they found out was that writing
00:24:36.720
pure Ruby with YJ was faster than a C
00:24:40.240
extension with YJI.
00:24:42.480
So in a way Ruby is faster than C and we
00:24:46.880
started seeing PRs like this where they
00:24:48.880
re rewrote parts of Ruby from C to Ruby.
00:24:53.200
So you we used to have this method for
00:24:56.320
int times and it got replaced with this
00:24:59.919
which is not only as fast and much
00:25:02.640
smaller but a lot closer to the kind of
00:25:05.200
work that we do do on a day-to-day basis
00:25:08.799
or this other example which is more
00:25:10.480
recent. They rewrote path name mostly in
00:25:13.520
Ruby and not only it was twice as fast
00:25:16.960
but it was a third of the code.
00:25:20.240
So I believe that the future is more
00:25:22.080
Ruby because Ruby is easier. It's easier
00:25:25.679
to read. It's easier to understand. It's
00:25:28.400
easier to write. And I believe that it
00:25:30.799
will be easier to contribute to because
00:25:33.840
it will be written in Ruby, a language
00:25:35.760
that we use every day. So while Ruby,
00:25:39.600
yeah, Ruby is easy, but at the same
00:25:41.760
time, I hope you noticed that it's
00:25:44.000
complex too. Has a parser, has a
00:25:46.320
compiler, several compilers at this
00:25:48.240
point. So to put it into Matt's words,
00:25:52.159
the Ruby creator, Ruby is simple in
00:25:54.720
apparence but very complex in the inside
00:25:57.520
just like the human body.
00:26:00.320
So to give you an example of this
00:26:02.480
complexity, the Prism compiler has 8,000
00:26:06.159
lines of C Ruby Code. The Prism parser
00:26:10.799
has 16,000 lines of C. Yet itself has
00:26:14.880
25,000 lines. And this is what the last
00:26:17.200
time I checked months ago. And if you
00:26:19.919
count all files inside the Ruby
00:26:21.679
repository, you have over 1.5 million
00:26:24.320
lines of code. That's the work of all
00:26:27.279
these individuals, almost 400 people.
00:26:30.240
And I just wanted to take a moment here
00:26:32.320
for all of us to thank the contributors
00:26:34.480
of Ruby. So give it up for the
00:26:36.320
contributors.
00:26:44.400
Yeah, they work really hard to make our
00:26:46.400
lives easier. So to answer again this
00:26:49.679
question, why should you care about all
00:26:51.679
of this? Well, the most important thing
00:26:54.720
is as you go through your path in
00:26:58.240
development, you eventually have to go
00:27:01.279
beyond what you do now. You have to go
00:27:04.080
into this technical deep topics.
00:27:06.159
Sometimes you have to read the gem
00:27:07.760
source code because there are no
00:27:09.760
documentation. You have go you have to
00:27:12.000
go beyond tutorials and blog posts and
00:27:14.080
even official docs. You write your first
00:27:16.480
gem and maybe one day you have to debug
00:27:19.200
a performance issue and you have to dump
00:27:21.440
the C instructions. So I hope this talk
00:27:25.360
is the first step in that journey for
00:27:27.679
you. I hope it helps in unveiling the
00:27:30.640
magic behind Ruby. And if you like this
00:27:33.520
topic, you can take this further.
00:27:35.679
There's this wonderful book called
00:27:37.679
Crafting Interpreters. You will create
00:27:40.159
two interpreters for the same language.
00:27:42.559
The first one is the just like we did uh
00:27:44.880
initially a tree walker. And then the
00:27:47.039
one the second one is a VM. You create
00:27:49.360
everything from scratch. No libraries,
00:27:51.520
no nothing. So it's very fun. And if you
00:27:54.799
want to play with the code that I showed
00:27:56.559
you, it's all here in this repository.
00:27:58.960
It's a GitHub repo at tbot.io/ io/math-
00:28:03.760
interpreter and you can add something to
00:28:06.080
your language. Do whatever you want with
00:28:07.840
it. It's yours now.
00:28:10.320
Yeah. And I hope this has helped you to
00:28:13.279
see how Rails benefits from Ruby, how
00:28:15.840
Ruby made Rails possible. And if it
00:28:19.120
wasn't for Ruby, we wouldn't be here
00:28:20.799
today. And that's what I had for today.
00:28:23.279
Thank you everyone.
00:28:30.159
Do we have the time for questions?
00:28:34.559
One question. Uh they asked why do I
00:28:37.919
want to wanted to learn about this?
00:28:40.640
Well, I started messing with this during
00:28:43.360
the pandemic. So, I had some free time
00:28:46.320
and I found that book uh crafting
00:28:49.039
interpreters. You can read the whole
00:28:50.640
book online for free. And once I I
00:28:54.480
started doing it, I could just couldn't
00:28:56.080
stop. Uh it was fun to like build my own
00:28:59.679
language and like to add whatever I
00:29:01.840
wanted. Uh yeah. So that that that was
00:29:04.399
it for me. That's it. Thank you
00:29:06.880
everyone.