00:00:16.119
so maybe this is the time to start the my presentation
00:00:22.640
so today's title my my present titles is implementation details of Ruby 2.0
00:00:28.640
virtual machine so I'm Co from hu so this is a uh seven times uh
00:00:37.960
disclaimer so this is my time luik but I can uh my English is not gross so if you
00:00:48.440
cannot understand my my English speaking please read uh the slide I wrot all of
00:00:56.359
uh my message to the slides so if you you can't uh read the
00:01:01.760
these characters please go the uh front of this this
00:01:09.720
SE okay so let me introduce myself so my name is
00:01:16.119
k SAS from H at M
00:01:21.159
gy and uh I I am a full time Comm uh CV committer and he is he is nakasan
00:01:30.159
he's also full-time committer uh this is a business time so he's making a ruby C
00:01:36.520
Ruby 2.0 he he don't make any uh presentation
00:01:51.960
uh i wrot v machine and I also helped uh rby 2.0
00:01:59.360
release so I organized the uh feature request and 20 z uh last week Z are 200
00:02:07.880
ticket left but uh now we uh we only
00:02:13.800
have under 100 tickets but I sent an email to the Rubik
00:02:20.360
and ruid Maring list over the 150 emails so I'm sorry so so spamming
00:02:29.000
Okay so I want to at first I want to talk about the Ruby 2.0 so m m is uh Ruby 2 uh Ruby
00:02:39.840
2.0 is the 20 of Ruby program
00:02:45.319
language so our this time this time our policy is uh
00:02:52.319
keeping the compatibility at Ru level and uh so
00:02:58.480
feature request for the usability or performance uh it is not high PRI
00:03:06.360
priority so this is a uh load map of Ruby
00:03:11.959
2.0 so this is a goal of uh release the ne
00:03:18.159
uh next February we we we need to release so we need uh we will release
00:03:23.680
the uh several times uh in this time we
00:03:29.879
we are working on the releasing the uh 2.0 preview one just now just now and we
00:03:37.239
will release the preview two at the the first December and we will close uh read the
00:03:47.040
the the code at this Christmas next Christmas and release the Le caned one
00:03:54.159
and two so this means that we only have two month uh
00:04:00.760
there are no two months but only about two months to introduce a new uh codes
00:04:06.959
so if you have any requests so ask much
00:04:13.360
so so next the features features of lby 2.0 I want to introduce all of
00:04:27.520
features so you can read a n file to read uh uh new features of Ruby 2.0 so
00:04:35.240
there are many many features or changes so please check it check it
00:04:40.960
out so I finish about rup 2.0 so I
00:04:47.360
changed my title I want to change my
00:05:05.960
so let's just re about the how to make a
00:05:13.600
interpreter the lecture see number three so this is a revie SL so to uh to
00:05:24.639
attend the this course so you need to uh finish the the Language basic of course
00:05:31.400
you know of course you finished and this course uh I
00:05:37.479
use uh Ruby program language as you know Ruby is very
00:05:42.600
popular language of course you know and I use shui to to learn the how to make
00:05:49.520
an interpreter so shui has their own B Machine written by one
00:05:56.600
Japanese and the so this is this is also a review slide
00:06:02.600
so uh to make a b machine we need to execute the instruction of B machine so
00:06:08.960
the program counter pointed the instruction how which we need to uh
00:06:17.080
execute and uh usually we use the stock machine uh v
00:06:23.960
machine suck machine architecture to to implement the uh
00:06:30.240
uh virtu machine so all on the stock machine virtu machine all of bodies are
00:06:36.479
on the stock and stack top is pointed by the stock pointer uh shortly uh sp
00:06:45.240
sp and we already discussed the uh the comparison with the register machines
00:06:51.120
viral machines and St machine vir machines so there are several advantage
00:06:56.479
and disadvantage uh I eliminate this uh
00:07:02.120
discussion so this is a uh figure of how to execute uh St
00:07:11.280
machine B machine so program counter point is the the first of the program uh
00:07:17.400
uh instruction and stock pointer show uh pointed the the top of
00:07:24.440
Stack then uh uh doing the this
00:07:30.280
instruction gets the the value from uh local variable B and push
00:07:36.280
the uh value on the stock and the stock pointer
00:07:44.680
incremented and next uh same thing so push the the value of local varable
00:07:53.400
C and last uh execute the plus method
00:08:00.360
method so push the two vales from the stocktop and uh calculat the the B plus
00:08:07.759
C and then uh Lo uh at last pop the value of the St toop and push the uh set
00:08:16.400
the the local variable a this is a simple uh execution
00:08:23.960
sample so we uh I already uh introduced the uh
00:08:30.039
several uh optimizing techniques for the stock machine B machines so people optimization at the compile time and
00:08:36.680
make Maring section and D sting and St cing so there are many many uh
00:08:42.599
optimization techniques and also the the B Machine
00:08:49.200
generator is helps us the building a
00:08:57.440
flexible B Machine building so back to the today's topic we use the
00:09:04.279
uh this example the one line method one line sentence so rece do Sor ar ar
00:09:13.560
rece means reer and selector means the method IDE or method
00:09:19.200
name and argue one and AR two
00:09:26.200
arguments before mthod dispatch making meth method thisp we need to evaluate
00:09:31.640
the thece and evaluate the AR one and AR
00:09:36.920
two and push then on the stacks so uh after the the procedure
00:09:44.839
number two then the receiver and ar1 and ar2 are pushed on
00:09:53.760
stock so after that we uh we dispar the meth methods so we have uh this is the
00:10:00.959
overview of uh method dispat process so
00:10:07.959
uh there are several uh processes
00:10:14.880
so at first uh getting the class of receiver and searge the method body from
00:10:22.920
the class and this part method with checking something and the St the
00:10:29.959
resistance and building a local environment and set a program counter so
00:10:35.480
at first method so we need to meth sear from uh reer
00:10:41.480
cross so in objectoriented langage the the there are hierarchy of classes so in
00:10:50.760
rubby Ruby's case the basic uh basic object are inherited
00:10:56.040
and the inheritors here is z uh this and
00:11:01.880
we we need to search the uh methods method for
00:11:08.399
selector for C2 and C1 and object and can and B object when we can find a uh
00:11:16.079
Sor at C1 we we we can get the the method
00:11:21.760
body uh in the case if there are no uh method uh found then uh method missing
00:11:29.920
will be dispatched and checking uh means how
00:11:37.839
many uh how many argument the method have expects so
00:11:44.880
compare with the given argument number and expect argument number of course it is different than uh C the argument
00:11:53.519
error and checking visibility uh Ruby has three Vis visibility uh public
00:12:00.360
private and protected so I can't I can't say the which is what
00:12:09.360
I need to see the the source code of rby v machine to understand what is which is
00:12:15.079
one which is what so maybe you you you
00:12:20.440
know and we need to build a local environment so each method have a local
00:12:27.519
variables so we need to store the local variable and get the store the local
00:12:32.920
variable so uh we prepare the local VAR variable space from at the stack so
00:12:40.600
extend the stack and prepare the uh local variable space so we we show we say uh this space
00:12:50.399
as local environment or environment shortly in short so especially the parameters are
00:12:58.680
also in environment so this is a figure before the stock method dispatch stock
00:13:06.440
is uh receive and AR one AR two the this order and after the method dispatch we
00:13:13.360
prepare the local variable space to with using extending the St stock and the top
00:13:21.720
of environment are pointed by the environment pointer shortly
00:13:27.519
EP and E can uh point the every uh local
00:13:33.399
values for example using the uh 0 index
00:13:38.480
0 is local Val two and so
00:13:43.560
on uh this index uh index of local variables are uh can be calculated at
00:13:50.600
compile time so again uh this this slide show
00:13:57.800
again uh this is the overview so there are about seven steps so it's as you can
00:14:04.680
see it is a very simple but a bit slow you you maybe you you
00:14:10.839
think so so this is the lecture so I want to
00:14:16.440
ask the quiz so I show the the seven steps overview but Ruby is more complex
00:14:25.199
Ruby in Ruby case it is more complex so what do you think about how many uh
00:14:31.880
steps uh Ruby need to uh make a this me dispatch what do you think about eight
00:14:39.040
St eight steps uh 12 steps
00:14:45.399
oh 60 steps oh most of people select the 60 steps so
00:14:53.720
the answer is the 20s
00:15:03.959
so in luid case we need to uh make a uh such a uh uh complex method disp
00:15:14.279
processes so next Quiz is uh we have uh
00:15:19.759
uh we we we use several method method type for
00:15:26.600
example the the most of the case you the method is defined by a def keyword
00:15:32.639
in rubby script so this is one type and
00:15:38.279
also uh you can you can load a c extension Library written in C
00:15:44.279
programming language and uh the method type there is a method type uh by
00:15:52.000
defined by sh function the sh function method is only call the sh function
00:15:59.519
so there are at least two types at least two types but there are more uh
00:16:07.399
uh method types we implemented so the
00:16:12.639
this is next uh quiz so what do you think about the H so what's the number
00:16:18.519
number what do you think about so three types three
00:16:23.720
types six types oh uh nine
00:16:30.639
types oh thank you most of people understand this
00:16:43.000
types 11 types uh meth 11 method types so such a things at defined at the
00:16:51.720
uh the method do H in C Shu so there are many many types
00:16:59.600
different method dispatch procedure we implemented so we use the dispatch
00:17:04.640
switch case statement in C language at every uh method
00:17:13.880
dispatch so this is the last quiz so I introduced the B registers program
00:17:20.280
counter stock pointer and environment pointer
00:17:25.839
so so but uh Ruby has more uh virtual machine virtual registers so what do you
00:17:33.240
think about the H so so if you please raise your hand if you
00:17:39.360
think it is a four registers six
00:17:44.720
registers six registers oh ever nine registers n reg oh most of
00:17:53.360
people understand the system so all right so um what do you think
00:17:59.760
about there how many 11
00:18:12.440
registers so this means that uh we need to uh store the 11 regist 11 registers
00:18:21.039
for each method dispatch so it is very uh cost for for we need to reduce the
00:18:26.600
register number so at 1 193 we use 11 11 registers but uh
00:18:35.840
at the rubby 2.0 uh we use only uh 10 registers but it is uh big number so we
00:18:44.440
need to reduce and we uh introduced the control frame SE to St the
00:18:52.880
register uh to maintain the method frames so each method frames uh V uh
00:19:00.559
control uh suchal resistance so it mean that this is a figure of the structure
00:19:08.000
of uh value stack and control frame Stacks so this is a method uh we need to
00:19:17.120
build a method me control frame frame for each method so we we St the uh the
00:19:26.640
such 11 register and and point the the such
00:19:33.600
things okay and also Ruby has many many complex
00:19:40.600
features like a parameter so Ruby has a
00:19:45.720
uh mandatory mandatory parameters or optional parameters and post parameters
00:19:51.960
and rest parameter and block parameter and uh after Ruby 2.0 we introduce
00:19:59.000
keyword parameters so we need to uh uh process this for the this uh
00:20:08.520
complex
00:20:14.159
parameters so uh we need to uh every um
00:20:21.440
at every method dispatch we execute such a uh complex processes
00:20:32.799
so this figure shows how method dispatch consumes the uh uh the consume time so
00:20:41.520
at the pon Benchmark it's a micro Benchmark the method dispatch consumes
00:20:47.080
the quarter of time quter of one thir time of all of execution time and the
00:20:54.679
pom solving uh puzzle uh we need uh 10 about 10% method dis per time so we need
00:21:02.559
to reduce the uh method dispatch time so method reduce
00:21:09.799
the method dispatch time is important so this is a
00:21:15.559
homework so please report to uh method dispatch speed up techniques and analyze
00:21:21.600
the method dispatch overhead over your uh your favorite application and Survey method dis speed of techniques and
00:21:27.240
propose your own optimizing techniques to improve the method performance and to implement techniques and evalate Z
00:21:34.279
performance the deadline is z and this report is important for your grade all
00:21:42.919
right so lecture was
00:21:48.600
finished no no no no no presentation is not
00:21:57.320
finished so so this is a report of this uh homework so optimizing techniques so
00:22:04.720
Ruby meth dispatch so we implement the three kind
00:22:10.400
uh three new optimiz techniques to speed up the method dispatch so this is the main topic of
00:22:17.919
this today's presentation so uh summarize the method
00:22:23.159
dispatch overhead so there are several uh processes so we need to skip or
00:22:29.480
eliminate the such processes so at first the we use
00:22:34.799
specializ instruction for example The Fix N plus Fix N plus is uh is very
00:22:41.080
frequent maybe you use frequently so we need to eliminate reduce the uh the
00:22:48.600
execution execution time of fix and plus so we use the special B machine
00:22:53.679
instruction to to do uh uh fix number
00:23:01.520
and we use plus minus or times or uh some other meth special
00:23:09.640
methods so this code shows the the specializ instruction s so
00:23:16.960
code F code in Ruby so check checking the uh receiver as is either is it is it
00:23:24.840
a fixed number or object is a fixed number and the Fix Plus is not redefined
00:23:30.240
then we can use uh Fix Plus optimized method without building uh
00:23:38.600
building uh method frame so it is very fast but if this condition is false then
00:23:45.720
uh using using the uh orinal General method dispatch
00:23:52.039
code so there is a advantage and disadvantages so eliminate all dispatch
00:23:57.559
cost so this is a this technique has very effective for only a few methods so
00:24:06.840
if we make uh each instruction for the
00:24:12.080
the C method then uh it reduce the v machine performance so it is a very uh
00:24:19.840
limited method and if the uh we use a plus
00:24:26.640
method for other classes for for example the fix now then
00:24:33.240
we need we need uh additional overhead uh the checking
00:24:40.440
overhead so this is advantage and disadvantages and we use the method
00:24:47.360
caching I explained the the na uh simple method search
00:24:55.640
algorithm uh from this uh class hierarchy but
00:25:01.600
the uh we can cat the uh result of method search so we use the two types uh
00:25:09.120
methods uh two types method uh
00:25:14.480
caching so one is the inline method cach uh this
00:25:19.880
this has only one cach entry for each for each
00:25:26.360
uh uh call site and rby process has has
00:25:32.320
a uh one Global method cach table it has
00:25:37.840
uh hundreds of entries so we use two types
00:25:46.600
cashing so uh this technique are implemented at from the 1 n so here uh
00:25:54.440
this is a uh from Ruby 2.0 technique
00:26:07.120
so the next technique is the cing the checking result uh I explain that we
00:26:13.760
need to check the ality or
00:26:19.159
visibility but this check uh can be eliminated can be skipped uh after the
00:26:27.000
fast checking uh this means that the if I the if there is a loop and the same
00:26:34.679
call site uh at same call site and the previous the first method uh first
00:26:41.919
method called this part uh if it is skipped
00:26:47.000
then we can eliminate the checking of this uh
00:26:52.240
checking so the so after after the checking the
00:26:59.200
first time then cat the result of the such a checking checking result at into the
00:27:07.080
inline method cach entry so if inline method cach are are HED so we don't need
00:27:14.200
to search uh search methods then we can uh skip the this uh
00:27:24.760
checking so this is the main contribution of 2.0 for method part
00:27:30.200
speed up and we use we uh prepare the framess
00:27:37.840
C function uh framework so introduce a
00:27:42.919
frameless meth C fun methods so several methods such as a string length uh don't
00:27:49.559
need building a a new method frame so if the method is a frame C function then we
00:27:56.919
eliminate the meth eliminates the the building frame
00:28:03.159
so we can skip these processes for the some methods like a
00:28:09.679
string length so there is so it is a similar
00:28:15.799
technique to for with uh to uh similar
00:28:21.039
technique to uh specialized instruction but uh compare with specialized
00:28:26.640
instruction uh we can use uh ultimate number of uh frame method but it is a
00:28:34.039
bit slower than the specialized instruction and I show you the uh
00:28:40.000
evaluation result after that and but this technique
00:28:45.240
are not included and I prepar the special part
00:28:52.440
for the sent of method missing from 2.0 uh actually we prep uh
00:28:59.559
we you we make a Technique we implemented the the technique for the S
00:29:07.279
method but we refactoring and speed up are given
00:29:13.080
so before this technique this optimization we you we call the send
00:29:18.640
method send C function methods and then she send method uh call the the F method
00:29:26.080
but after this optimization if the send we call the send F then uh
00:29:35.360
search the F directory
00:29:40.600
not we call the method who directory and return this method
00:29:49.240
code so this is a micro Benchmark results this this graph shows the
00:29:55.880
uh how far faster than the this date uh
00:30:01.080
October 13th October 1
00:30:07.360
so um because I I implement the optimize
00:30:12.960
technique from this date so this is a daily date daily build to the from the
00:30:19.679
October 13th to the the yesterday October
00:30:26.559
31 so we can we can see the uh several uh performance
00:30:32.919
improvements uh this is a uh uh meth me
00:30:38.320
simple method sending so we we can get 40% of 50% and
00:30:45.519
there are a typical uh optimization result you can see uh uh we can see the
00:30:54.039
speed up results and for the uh
00:31:00.679
application sever application has uh 20% performance Improvement and sever
00:31:06.760
application doesn't uh get the speed up so we need
00:31:12.600
to we need to make more
00:31:17.799
effort so future also as you you can see
00:31:23.720
the uh we need to one more effort to speed up the method dispatch we uh we um
00:31:31.760
I want to restructure the method frame so 11 or 10 uh registers it is very huge
00:31:39.279
number of register so we need to reduce them and improve the E performance I
00:31:45.159
only uh improve the performance of uh Methodist but not e block impation so we
00:31:51.720
need to reduce the RO invocation performance
00:32:01.840
so the conclusion of the this report methodes by speed up we need to make uh
00:32:08.960
so method R method dispatch is very very very complex so but we can see the
00:32:16.600
15% uh speed up for simple meth dispatch
00:32:21.679
with uh new uh rby 2.0 uh from rby 2.
00:32:31.120
and we need more effort to achieve the performance to improve so this is a conclusion of this
00:32:38.639
method disp report and we introduce
00:32:43.880
the the some other methods uh some other optimization techniques into from Ruby
00:32:51.639
2. Z yesterday I
00:32:58.519
make practice this presentation I I consume the 40 minutes from uh to this
00:33:05.480
right so I want to say there are no time to say so I want to skip but there are
00:33:12.799
10 more minutes so I want to introduce the several optimiz techniques so we
00:33:20.159
introduce a Chron technique for the uh 64 CP operating systems to improve the
00:33:28.679
the floating number performance calculation performance and we prepare the right
00:33:35.399
weight back uh capturing so if you ra on
00:33:40.760
L then the back uh we need to build a back to see the back uh to print out the
00:33:49.320
back but most of case we don't need to get the
00:33:54.679
back if you if you don't see the bace
00:34:00.120
so uh we uh make a a lightweight
00:34:06.200
method and restructure a small uh small restructuring of v machine stocks and
00:34:13.599
instruction secet data and maybe most of people want to know
00:34:20.560
the bit marking garbage cure and require performance Improvement by other people
00:34:27.079
so by but um I don't have a presentation slide to explain that so please ask
00:34:34.520
please ask me after this
00:34:43.040
presentation so this is this needs a long time so I want to skip the the uh
00:34:51.960
details of the technique but we can see the some bemark uh
00:34:58.880
performance improvements so most of uh micro bench uh micro benchmarks don't
00:35:05.760
use uh floating number calculation so some some uh benchmarks us
00:35:12.839
uses some benchmarks use the floating number so we can see the performance
00:35:18.520
improvement with this optimization so I want to skipe the the
00:35:27.079
detail Tech technique of this optimization uh so if you want to know
00:35:34.040
the how to implement this so uh please run the uh itop
00:35:40.640
754 floting number uh
00:35:46.200
layout uh the simply explanation we
00:35:51.920
uh this technique uh Ed the the value 64 bits floating
00:36:01.920
number to 64 bits uh value
00:36:09.760
size and this is a explanation of right with back capturing so back as you know
00:36:16.480
back is string array of string objects so file name method and blah blah blah
00:36:22.839
but uh capture only instruction and
00:36:27.920
translate to string uh only when it is access access so it it uh can
00:36:36.720
uh so cap only capture the instruction sequence data array of instruction
00:36:41.920
sequence data and then when the back are accessed then translate this data
00:36:49.040
structure to the uh the orinal VOR array stting with array
00:36:57.800
so I have uh five time
00:37:03.040
minutes so we we will release uh Ruby
00:37:09.240
2.0.0 next February so we need we need
00:37:14.640
more effort I think so the this graph shows the performance
00:37:21.839
over uh performance overhead analysis the same same graph I explained but this graph is for
00:37:30.400
the aroc and L application uh I'm not sure what what
00:37:37.040
kind of RA application but this in this kind so method dispatch
00:37:42.440
overhead is 10% but most of uh overhead is most of time are consumed at the uh
00:37:52.160
garbage coru and Sh function so like string manipulation or something
00:37:58.160
and at ra application the garbage production time is
00:38:04.160
here here and there
00:38:10.960
uh uh sh so string manipulation or some other uh performance uh some other part
00:38:19.079
consumes many many time so we need more and more other
00:38:26.680
techniques so as you can see the on and uh actual practical application v
00:38:33.280
machine is not on bot so but I think we the mathematic of symol computation the
00:38:39.800
v machine is a matter so I will introduce some other
00:38:46.359
techniqu such as a compil framework or something after the littleit point0 and uh garbage corlection as you
00:38:54.680
know the generational garbage corlection are required but introducing the G generational
00:39:00.800
garage collection makes a uh huge huge number of bugs and we need to the in
00:39:07.680
health so we need more dependable way so I I have an idea so I want to try the
00:39:14.359
the to implement the r uh generational garbage corlection after 2.0
00:39:20.640
released and Par paration so we don't I don't have any
00:39:27.040
time or space to discuss about that so please ask me if you want to know about
00:39:32.640
that so conclusion so there are many many tasks left after even after Ruby
00:39:40.040
2.0 so conclusion is our challenge is has just become so thank you very much