00:00:17.560
hi everybody um I'm Constantine I'm going to talk about message in a bottle but
00:00:24.720
before I start I wanted to thank engineer who are um
00:00:30.720
sponsoring um my talk here they're basically paying for a trip and toel so
00:00:36.079
without them I wouldn't be here right now so thanks engineered and yes as I said I'm
00:00:43.520
Constantine I'm on GitHub rch I have this extremely long Twitter
00:00:49.000
handle um uh it's probably easier to go to my
00:00:55.920
blog and click on Twitter than to type that in case you want to follow follow me or just check out my stream or
00:01:01.920
something um I'm currently a student I just graduated at that hustle pler
00:01:07.360
Institute down there that's in Germany near Berlin and I'm going back there to continue with my
00:01:13.880
master's program this year um I'm currently the maintainer of Sinatra
00:01:20.400
don't know if you guys have heard of it but it's quite handy and together with
00:01:26.159
um Ellen Harris I'm also working on a book about Sinatra which
00:01:31.759
is supposed to be released rather soon um I'm in the I'm a member of the r
00:01:40.240
core team and I'm currently working on robinus as in together with
00:01:49.240
Brian Ford from Portland and that's an internship sponsored Again by
00:01:56.799
engineer okay in this
00:02:02.079
talk I want to go into detail what's going on under the hood what makes Ruby
00:02:08.360
tick um how does how do those Ruby implementations actually run
00:02:16.280
Ruby um yes well here's here's the knowledge I'm
00:02:23.040
going to drop one8 is slow because it's interpreted and surprise
00:02:28.440
surprise uh so is one nine okay that was my talk
00:02:38.959
thanks I'm just kidding uh so Ruby
00:02:44.239
right um and message and Bottles um if you haven't figured it out
00:02:51.120
the message and the bottle is actually a metaphor for um sending a method call to
00:02:57.519
an object and that's what I'm going to talk about uh there will be a lot of code in this
00:03:03.560
presentation and I try to make sure that I don't lose anyone feel free to ask
00:03:08.799
questions right away rather than at the end
00:03:24.000
yes thanks um so for some reason there are a
00:03:29.879
lot of Ruby implementations out there I counted 58 or actually 56 because um or
00:03:36.560
maybe 55 depending on what you count as an
00:03:42.280
implementation um however I'm going to focus on the three large ones um J Ruby
00:03:49.720
MRI One n and rinus and here's a short overview I'll
00:03:56.760
talk about the method dis patch and execution were as while looking at at
00:04:02.400
MRI and I'm talking about inline crashes and jitting while talking about Rus and
00:04:08.760
I'll talk about invoke Dynamic while talking about J Ruby that doesn't mean that only those implementations do that
00:04:16.479
okay only J Ruby is doing invoke Dynamic but both rinas and J Ruby have a jit for
00:04:22.120
instance um I had some help for this talk from COI and a lot of help from
00:04:28.560
Brian and and also from Charles Nutter all those three guys are uh rather
00:04:35.240
active in the the specific Ruby implementations and all those three guys are here so buy them a beer or
00:04:42.720
something okay this is what I'm going to talk about
00:04:49.360
42 just for the example so um to make sure you all have an idea what I'm
00:04:55.039
talking about most of the examples will be talking about this code specifically
00:05:00.320
about what's happened at this point if we call this
00:05:06.080
method it's basically just uh um yeah checking if the value is 42 you can
00:05:14.400
pause in a string in the in the an integer well all the Ruby
00:05:19.720
implementations you're not supposed to understand that all that right now but um all the Ruby implementations compiled
00:05:26.360
to bite code and the basic idea of my talk is to to start from the bite code
00:05:31.560
and then look at what's going on there it's that's a bite code or somewhat the
00:05:38.360
bite code for um value TOS for MRI rubinus and J Ruby or J Ruby has you can
00:05:46.160
get different bite code but I come to that So the plan is if you want to
00:05:52.120
execute some Ruby code some method call you first have to search for the method
00:05:57.680
and then you have to execute the method but but you also want want to have that fast so how do we get that fast
00:06:04.639
well we search faster and we execute faster it's easy right okay we're done
00:06:11.240
um but how do we search faster um the first thing we can do is or one thing we
00:06:16.400
can do is inline caches I'll explain that which is basically in the code store where we're going um then look up
00:06:23.960
caches which is at the object store or at the class store where to find CL uh
00:06:30.120
methods that are inherited and we can do inlining to avoid a lookup or a method
00:06:35.759
called like at all and we can speed up the execution by
00:06:41.039
reducing the operations I think MRI is about to do that like instead of having one operation that does one thing have
00:06:49.039
one operation that does three things things so you don't have to have three op codes after each other you can do
00:06:55.720
just in time compilation to machine code um you can do inlining and well you can
00:07:02.479
speed up search cuz then your execution is faster because the main part of the execution is just looking for
00:07:09.440
methods all right MRI so if you call a method in MRI there
00:07:17.080
are different ways but basically almost always you end up in the C method called
00:07:24.080
rbor call zero uh that's the method and what is
00:07:32.120
important about this method is uh here's the search oh no I'm sorry
00:07:38.199
I'm on the wrong here's the search and here's the execute just like
00:07:43.400
I promised you uh okay now how does the search work
00:07:50.360
well uh did I say there's a lot of code in the slides it's basically the
00:07:57.479
um oh dang I'm I messed
00:08:06.280
up um it's basically the method dispatch we're all used to just start from the
00:08:12.199
class start from the class of the um of the object go through all the super
00:08:19.120
classes here and check if the method exists um well and super classes in that
00:08:27.159
case also includes included mod or I claes how they're called in MRI well and
00:08:34.039
that's a basic dispatch now that we have the method all we have to do is execute that
00:08:40.240
method um this is done by this method you don't have to get all that I just um
00:08:46.000
wanted to show you like I always put like comments where important stuff is
00:08:51.399
here it's actually going to execute or whatever the state of the of the current context is and what that does is an
00:09:00.160
extremely long method where there where I couldn't find important parts to tell you from but basically it goes to to uh
00:09:07.440
there's an instructions. def file where all the um
00:09:12.839
for every bite code there is a a chunk of code that will be executed when MRI
00:09:19.519
or Yar in that case is interpreting the B code and that looks like this oh I
00:09:26.000
always mess this up
00:09:32.320
I'm sorry I have the new show off with the present of VI and I'm not used to that um
00:09:39.959
so I would just wanted to show you the code that you get a feel of how the implementations look like this is
00:09:46.760
basically the code for the bite code for sending a method and and um it's taking
00:09:54.399
some um arguments from their s and um basically down here it's calling
00:10:01.640
the method which will again get you to RB to the method dispatch I just
00:10:09.240
show okay so there was a lot of code let's talk about
00:10:16.760
robinus well in robinus you have basically this
00:10:22.600
structure which is similar to what all the Ruby implementations have so every
00:10:27.959
module and the module is a super class of class and you know included module has a method table which is basically
00:10:33.959
like a hash um but more lowlevel construct you can easily easier access
00:10:39.800
from from C++ and um for every method there is a
00:10:45.480
bucket which is some metad data and it references to a compiled method which is
00:10:52.440
basically the B code representation of a method and the fun thing about robinus
00:10:59.000
is that you can actually do play with that from within Ruby and this is basically
00:11:07.000
the same lookup logic I showed you just implemented in in Ruby um note that we
00:11:13.519
have to use direct super class down there um to make sure we also
00:11:20.000
dispatching through all the included modules and if you get a hold of that
00:11:25.920
mod um method object that um compiled method object you can do some fun things
00:11:33.160
with that but first we have to get a hold of it with the lookup method and
00:11:38.720
then that will only give us the bucket so we have to ask for the compiled method and we'll well return a compiled
00:11:45.079
method object and we can decode that to take a look at the bite
00:11:50.560
code um we can also get the op codes which is more like the B the the decode
00:11:57.480
display is more like semantic names you can understand and this is
00:12:03.560
what is actually stored in memory or on file if you save the compiled output on
00:12:10.079
file and it's referencing literals in there I'll come to that on the next
00:12:15.760
slide that are stored in the literals Tuple Tuple is kind of like a fixed size
00:12:24.279
array um okay let's try to figure out what this does so in the literal Tuple
00:12:29.680
there's 2 s 42 and equal equal and we
00:12:35.000
also have a local that are used in there which is
00:12:40.240
value which in our case is 42 and then we have this Tuple of bytes that are the
00:12:47.639
method if you um if you take a close look at those we
00:12:52.800
start off with the B bite code for pushing a local onto the stack this back
00:12:58.639
here is the deck so we have 42 on the deck then this is the bite code for um
00:13:05.360
sending method which basically takes two arguments the first argument is the index of the method name in the literal
00:13:13.040
stuple up here to S has the index zero and the second one is the number of
00:13:18.680
arguments which is zero and then we have the string 42 on the stack after that we
00:13:25.680
push the literal with the index one onto the stack which is
00:13:32.800
42 um so we have 2 * 42 on the stack and then robinus has to duplicate that since
00:13:39.639
um if it doesn't and someone calls for instance G's a bang on that
00:13:45.320
string then um this will be modified if you
00:13:51.399
replace it for instance with 55 and next time you have to pass 55 to the method to get it to return true which actually
00:13:57.160
works if you go on GitHub on the repository I have a fun script there which modifies the
00:14:03.800
literals um and after that it's sending equal equal which has an a special bite code
00:14:11.199
so it can do some optimizations and we have to return since on bite Cod level there's no implicit
00:14:19.399
return okay but so we want to get inline caches into that so we don't have to
00:14:26.720
have a look up whenever it's doing the Cent de cuz you remember I told you I was talking about this 2 s
00:14:35.680
mainly so what rinus does it takes the bite code and replaces the 2s call with
00:14:42.240
an inline cache object um if you don't believe me here's
00:14:47.399
the code or at least the code which is using it as an inline cache object this is
00:14:54.320
basically again from the from the op codes defin from the instructions. D
00:15:00.399
from rinas and this is like I showed you before the send white code I mainly put
00:15:09.440
that in here because in my um Talk description I said I going to have C++ code and I made my slides and figured I
00:15:16.360
don't have C++ code so I figured let's put that slide in there C++ ladies and
00:15:25.279
gentlemen okay um uh the inline cash actually has a reference to something
00:15:32.920
called the VM method and the compiled method also has this reference so this
00:15:39.560
is the for instance the inline cache in this example um would be referencing
00:15:47.240
the VM method of 2s now the fun thing is um that the
00:15:53.519
inline Cache can actually during its lifetime change um
00:15:59.920
what method it is referring to or basically
00:16:05.480
anything oh yeah first of all the caching part is that that VM method is
00:16:10.639
already referencing the 2s code so it doesn't once the lookup is done and it's
00:16:16.360
in the inline cache it doesn't have to do the lookup again when visiting that
00:16:21.720
that point in the bite code unless um the type change so there's a
00:16:27.399
simple check where where it checks that nothing has changed if someone defined 2s then it will do the the lookup again
00:16:34.480
or if that's a different class it's that's coming in there it will do the look up again but if nothing changed
00:16:42.000
then it we use that method and yeah it can run that
00:16:49.199
method and as you can kind of see there you can specialize you can
00:16:56.279
change such a VM method um but why would you do that well
00:17:04.120
the standard version of such a method is just have the bite code stored in there and execute that bite code with the VM
00:17:11.760
basically interpret that bite code but that isn't fun yeah that isn't really fun except
00:17:18.799
for you don't have to do a lookup um you can actually have bite code with break
00:17:24.039
points in there which has the advantage of you can run a debugger and it will only slow down the method where you have
00:17:30.840
set a break point that's why rubinus can do debugging without slowing down the
00:17:37.200
the Ruby program which MRI can't um and you can specialize for
00:17:42.880
arguments like doing some optimization if that's a string you get in there
00:17:48.880
whatever that's currently under development I think and last but not
00:17:54.000
least you can jit that code that means you can actually instead of referencing bite code you will inter interpret you
00:17:59.720
can reference machine code that will well run directly on your
00:18:05.559
CPU and it works like this you take the uh robinus by code robinus translates
00:18:11.760
that into the immediate representation for llvm and then let's just llvm figure
00:18:18.280
out how to compile that to native code it is all the inlining optimization stuff that llvm is known for and that
00:18:25.600
way speeds up execution
00:18:30.919
all right any questions so far no
00:18:42.720
yes yes yes so the inline cache tracks how often it's called it tracks if
00:18:49.600
there's always the same coming in and um there's a heat and basically if
00:18:56.039
you call the method once then the heat values increase by one if the if the method does a loop then the uh heat
00:19:03.080
value is increased by one and I think it's something about if it hits 4,000 or
00:19:08.360
something uh then it will compile it to native code because basically if you
00:19:13.440
just execute the method once or twice it's way more expensive to compile it than to just interpret
00:19:20.559
it it's doing that at runtime just like Hotpot um okay and J Ruby is is
00:19:28.840
basically doing this something similar because that's um why it's using the jvm amongst
00:19:35.000
other things jvm is known for doing that well okay so let's talk about J Ruby and
00:19:42.360
lately if people talk about J Ruby and about performance and about method dispatch what they talk about is invoke
00:19:53.919
Dynamic okay but there's a lot of buzs about it and do you guys think you know
00:19:59.559
what it's doing no all right then let's try to figure
00:20:05.280
that out so there are two different kinds of B codes the K jvm trunk Master is skit
00:20:14.799
so it's Master uh the current Master can produce depending on what jvm version
00:20:19.840
you're on the one is the one you also get with the older J Ruby versions which is using invoke virtual and a custom
00:20:28.280
call object well in dyamic is using a custom call head object too
00:20:34.039
but um and the the new One is using um invoke Dynamic this is
00:20:43.240
again uh the extract from the method I showed you at the beginning that is calling
00:20:51.280
2s so let's so invoke virtual is the normal method bite code invocation for the JV M
00:20:59.120
and invoke Dynamic is the new hot thing they added um let's take a look at
00:21:05.840
those invoke virtual looks like this it's it uses the custom runtime call
00:21:11.919
site which actually encodes the dispatch Logic for uh J Ruby um and whenever you
00:21:19.840
come across that point will go to that call site and say Hey I want to call that J Ruby method and it
00:21:26.279
hence three arguments if you don't send any arguments the one is the thread
00:21:31.320
context that's running in the uh next is the caller and the third one is the
00:21:38.000
receiver and then it returns again a ruby object as you would expect
00:21:44.279
um the fun thing is about invoke Dynamic you could actually put any name you
00:21:51.080
want oh you don't see that at the where the call is and uh invoke Dynamic would
00:21:58.080
be able to um resolve that name or your logic would be able to do that but what
00:22:05.279
J Ruby is doing Instead at the moment I was told that that will change might change it's using
00:22:11.960
call just as an identifier whatever and
00:22:17.600
pushes the um the method name onto the stack so it
00:22:24.720
will actually be passed as argument to the method
00:22:30.520
and yeah again it Returns the a ruby object but with only that the jvm would
00:22:37.679
not know what to do so there's a signature down
00:22:42.919
here which looks like this fun okay so it's there is in J Ruby this
00:22:52.679
invocation Linker which is taking care of setting up invoke Dynamic properly
00:22:59.440
and the important thing about that Linker is it has um a bootstrap method
00:23:05.200
and a fallback method and it returns a call site and
00:23:10.919
that call site is like the inline cach in rinus it will be placed at that point
00:23:17.000
kind so it's kind of like that inline C and rinus it will be placed at that point in the bite code and how it's
00:23:24.840
supposed to be used is in in the in the bootstrap search for the method with
00:23:32.480
your method lookup logic and place it there and if if that fails or if a guard
00:23:39.640
fails you can place a guard in front of it then please call the fallback
00:23:44.960
method now the interesting thing about the bootstrap and J Ruby
00:23:50.960
is that it's not looking for the method like at all
00:23:56.000
because it doesn't know what argument will be passed as as the method name so
00:24:04.200
all it does is place a call site there that will on the first call call the
00:24:10.960
fallback so independent of whatever so there's no lookup logic in
00:24:18.360
here and then in the fallback there's actually the lookup logic
00:24:30.960
the lookup logic
00:24:36.880
is search with cache here and if it doesn't find it it has
00:24:43.399
actually it has to place a method missing in there and then it is actually updating the call site to point to that
00:24:52.159
method it has found and then calls the method so um
00:24:58.440
all the jvm optimizations can start from the second time you visit that point in
00:25:04.760
the bite code or then it probably happen whenever you place it there and then uh
00:25:10.760
the jvm can do all those inlining um hotspotting and so on way
00:25:17.120
easier because it's treated equally or somewhat equally to a
00:25:22.440
normal jvm method call oh
00:25:28.960
yes I kind of rushed through that did I
00:25:34.760
yes that was
00:25:40.799
it oh wow man I'm really sorry for rushing
00:25:47.080
through that cuz um I know I have this new presenter view from show off and it
00:25:54.600
indicates the color here and I was under the impression that I have time issues
00:26:00.240
okay should I explain something more in detail because I'm I was going through through a lot of code
00:26:07.240
in not that much time I think yes I have a specific question
00:26:13.360
about the G about um so the difference between
00:26:18.600
invoke and invoke Dynamic it sounds like what you said
00:26:23.720
invoke Dynamic does is the first time it tries to call a method it automatically nail and has to do a workout no that
00:26:31.120
yeah that's um so the first time it visits at IOD the bootstrap will be called to find um the method that should
00:26:39.880
should be placed there uh that it's going to the fall back is specific to the implementation J is using because
00:26:47.120
the issue is in the bootstrap you do not actually have aru the arguments you only
00:26:52.360
have the classes of the arguments past to it CU um the jvm as Java the lookup
00:26:59.279
and Method signatures that's all based on the number and the classes of the
00:27:04.480
arguments so that's what it uses as kind of key to fill up the caches and key for
00:27:10.760
doing inlining so within B Dynamic you basically have to map it onto those signatures for dis patches that do not
00:27:19.399
work in that way what what expires the caches are
00:27:24.440
there um so there are guards I think both robinus and jby are down to
00:27:30.559
actually one single check for normal method dispatch which is basically a
00:27:35.960
true false if it returns true then everything's still fine and if it returns false then we have to do it all
00:27:41.240
over again and it's basically like can number check something where if someone
00:27:47.840
defines a new method that that value changes and then other that value
00:27:53.240
changes for that class and then uh it will be invalidated
00:27:59.399
yes in your opinion what's the number one thing that MRI could add or
00:28:06.200
change um to do what become faster make
00:28:11.440
it fter yeah could add for instance so what MRI is doing is
00:28:19.880
trying to add um super instructions if you have like in the
00:28:25.679
white code
00:28:33.200
in the B code up here if you have um oh I'm always on one
00:28:39.679
slide if you have um get local and send then it has to go through the VM Loop
00:28:46.720
two times and what what they are trying to do is um for some cases for instance
00:28:52.480
in that case you could do a get local and send up code um you would only have
00:28:58.760
one up code and encode all the logic in there uh the issue with that is that um
00:29:04.799
you will not um you're not able to
00:29:10.000
change the class of complexity so if you if you
00:29:15.159
double the the SI the number of up codes the best thing you can do is cut the um
00:29:23.039
performance the the time it needs in half but that's not realistic so it will only scale the near it will not um
00:29:31.000
change the performance overall whereas um jitting and inlining can actually
00:29:36.240
change it over all because you enable a new much bigger world of optimizations
00:29:42.240
and even the CPU engineering people can help you run your stuff faster basically
00:29:48.279
so that would be one thing I think yes I saw on the rage slide that it was a
00:29:54.799
dment string yes that seems like potential to add a lot a lot of over do
00:30:01.159
you have any IDE okay so the thing is um
00:30:06.640
allation yes you you have to um Let me show that um because the string in the
00:30:14.159
in the literal to there actually one string and if you hand out the same string always and someone modifies it
00:30:20.559
then will be modified for all this it might be might be easier to understand if you take a look at
00:30:26.720
this um yes so I load the example code go to the
00:30:36.279
go to the literals tle of the method and place 55 in there and all of a sudden 55
00:30:42.200
is the ultimate answer and if you if you um Would Not Duplicate that string then
00:30:48.000
anyone could do uh take that string that it's handed to from the method and do um
00:30:55.240
replay pass 55 to it and then you could do really evil things by
00:31:13.000
accident so you don't have to do that with symbols cuz you can modify those
00:31:18.080
did that answer your question okay so try use symbols were possible uh yeah
00:31:24.519
but the um bad thing about symbols is that garbage C so if you use um for
00:31:31.519
instance symbols for user generated data or something then we'll just fill up
00:31:36.639
your memory and they'll never go
00:31:45.760
away feel free to ask
00:31:52.200
more okay you don't want to yes I think we have lot of time for
00:31:58.760
discussion why are symbols not garbage collected um cuz that's what ruby did
00:32:05.559
does um because symbols are supposed to be
00:32:10.760
um it's way cheaper if you and they're used for like the method handles for
00:32:15.840
interning and also um if they garbage collected then they would be created all
00:32:21.760
over um basically all the literals so if you if you define the method in rinus
00:32:29.159
for instance that string 42 is never garbage collected because it's referenced by that method so it will
00:32:34.880
only be garbage collected if the method is garbage collected and it would extremely complicate literals for
00:32:40.840
instance if you would garbage collect symbols but
00:32:46.200
um you might want to talk to don't oh yeah the big thing is the Syms the same
00:32:52.880
object everywhere so you have to have a reference to it somewhere you can't create a symbol Be
00:32:58.760
object yes some some languages some lists for example do have function
00:33:05.559
called uned so you can remove like user generated but it does not make sense to
00:33:11.639
do it
00:33:17.480
for yes there was another question back there I was I was nice meatball here just H all apart
00:33:24.880
but if somebody were interested in trying to understand and piece together
00:33:30.200
U or explore these virtual machines um like you've done in this study how would you how did you approach this
00:33:38.080
um I grabbed the hold of Brian Ford and told him explained that to me
00:33:44.679
fromus um I mean is there a particular say you know Class C file a starting
00:33:50.240
point um yes if you're in depends on
00:33:55.320
what you want to look at so um the both I found for myself that both
00:34:03.240
the rinus code and the J Ruby code is pretty easy to read but it's a ton of
00:34:09.399
code um if you if you go into the C Ruby code uh there are it depends on where
00:34:16.960
you go I would recommend I would recommend starting with the search method method because that's small code
00:34:23.839
it's pretty understandable all the related methods are right below it which is not always the case with the other
00:34:29.839
methods and then okay what I mean by how to understand this
00:34:39.040
um wow this is actually from the Wim
00:34:44.679
methods one of the so they all have pretty complex workflow and this doesn't
00:34:50.240
have the most complex workflow but still it has the pattern of uh having a
00:34:56.040
label in front of switch then that switch is nested there are more switches in that switch and from time to time it
00:35:02.800
says go to again and that is just might make sense
00:35:08.960
but it's really hard to understand if you just want to follow the the the
00:35:14.119
program flow and yeah I had the advantage of
00:35:20.320
just asking qu about that code something
00:35:28.520
yes and there's the book picture um any more questions yes over there again do you
00:35:35.240
think there's any chance or any possibility that the entire state of the execution stat could be serialized for
00:35:42.480
any of these I don't believe any implementation support um Meg does kind
00:35:49.280
of because um it's image based um apart from that no proper Ruby
00:35:57.000
implementation so supports that I'm not sure if that's hard or anything feel free to talk to all the people that are
00:36:03.079
here and make sure I'm with you cuz I I know that there have been discussion
00:36:10.119
about this from time to time but there's no implementation on any other apart from
00:36:18.000
me the Mega people are over there
00:36:25.520
okay um in that case thank you all for listening to my talk