Summarized using AI

Implementation Details of Ruby 2.0 VM

Koichi Sasada • November 01, 2012 • Denver, Colorado • Talk

In this presentation at RubyConf 2012, Koichi Sasada discusses the implementation details of Ruby 2.0's virtual machine (VM), outlining the significant internal changes aimed at enhancing performance while maintaining backward compatibility. The major focus is on the changes in VM data structures, method invocations, and optimization techniques that contribute to faster execution of Ruby programs.

Key Points:

  • Overview of Ruby 2.0: Ruby 2.0, planned for release in February, aims to introduce several new features without compromising the existing behavior of the language. The speaker emphasizes the importance of maintaining compatibility at the Ruby level as they progress through the development cycle.

  • Roadmap and Feature Requests: The timeline for Ruby 2.0 includes milestones for previews and the final release, as well as an open call for usability and performance feature requests.

  • Method Dispatch Process: The presentation dives into the complexities of the method dispatching process, illustrating how method calls are handled through various steps including class lookup, method resolution, and local variable preparation. Sasada emphasizes that this process is more intricate than it might appear, often involving numerous checks and state management.

  • Optimization Techniques: Sasada details specific strategies implemented in Ruby 2.0 to improve performance, particularly:

    • Specialized Instructions: Introduction of specialized instructions for common operations (like Fixnum additions) to reduce overhead.
    • Method Caching: Implementing inline caching and method caches to speed up method lookups and reduce dispatch costs.
    • Frameless Methods: Discussing methods which do not require creating a new method frame, thereby saving on setup time during execution.
  • Performance Evaluation: The improvements noted show promising metrics, with specific benchmarks indicating substantial performance enhancements resulting from these optimizations. However, Sasada acknowledges ongoing challenges, particularly with managing method dispatch overhead and garbage collection.

  • Future Work and Challenges: The final part of the presentation addresses future challenges, including enhancing floating-point performance, general performance optimizations, and dealing with garbage collection complexities.

Conclusion:

Koichi Sasada concludes that while Ruby 2.0 introduces significant improvements and maintains its broad usability, the evolution of Ruby continues to face intricacies that require further innovations to enhance speed and efficiency. He emphasizes the balance between maintaining existing language features and making necessary forward-looking optimizations.

In summary, the presentation not only outlines the body's engineering improvements but also hints at a continuing commitment to identifying avenues for enhancements in subsequent Ruby versions.

Implementation Details of Ruby 2.0 VM
Koichi Sasada • Denver, Colorado • Talk

Date: November 01, 2012
Published: March 19, 2013
Announced: unknown

As a member of Matz's team in Heroku, I'm working Ruby 2.0 to be released in next February. Ruby 2.0 doesn't have drastic changes in behavior except adding some new features. However, an internal of Ruby 2.0 VM is changing to improve performance! For example, we changed VM data structures and method invocation processes to achieve fast execution. In this presentation, I will introduce the internal changes and its performance.

RubyConf 2012

00:00:16.119 so maybe this is the time to start the my presentation
00:00:22.640 so today's title my my present titles is implementation details of Ruby 2.0
00:00:28.640 virtual machine so I'm Co from hu so this is a uh seven times uh
00:00:37.960 disclaimer so this is my time luik but I can uh my English is not gross so if you
00:00:48.440 cannot understand my my English speaking please read uh the slide I wrot all of
00:00:56.359 uh my message to the slides so if you you can't uh read the
00:01:01.760 these characters please go the uh front of this this
00:01:09.720 SE okay so let me introduce myself so my name is
00:01:16.119 k SAS from H at M
00:01:21.159 gy and uh I I am a full time Comm uh CV committer and he is he is nakasan
00:01:30.159 he's also full-time committer uh this is a business time so he's making a ruby C
00:01:36.520 Ruby 2.0 he he don't make any uh presentation
00:01:51.960 uh i wrot v machine and I also helped uh rby 2.0
00:01:59.360 release so I organized the uh feature request and 20 z uh last week Z are 200
00:02:07.880 ticket left but uh now we uh we only
00:02:13.800 have under 100 tickets but I sent an email to the Rubik
00:02:20.360 and ruid Maring list over the 150 emails so I'm sorry so so spamming
00:02:29.000 Okay so I want to at first I want to talk about the Ruby 2.0 so m m is uh Ruby 2 uh Ruby
00:02:39.840 2.0 is the 20 of Ruby program
00:02:45.319 language so our this time this time our policy is uh
00:02:52.319 keeping the compatibility at Ru level and uh so
00:02:58.480 feature request for the usability or performance uh it is not high PRI
00:03:06.360 priority so this is a uh load map of Ruby
00:03:11.959 2.0 so this is a goal of uh release the ne
00:03:18.159 uh next February we we we need to release so we need uh we will release
00:03:23.680 the uh several times uh in this time we
00:03:29.879 we are working on the releasing the uh 2.0 preview one just now just now and we
00:03:37.239 will release the preview two at the the first December and we will close uh read the
00:03:47.040 the the code at this Christmas next Christmas and release the Le caned one
00:03:54.159 and two so this means that we only have two month uh
00:04:00.760 there are no two months but only about two months to introduce a new uh codes
00:04:06.959 so if you have any requests so ask much
00:04:13.360 so so next the features features of lby 2.0 I want to introduce all of
00:04:27.520 features so you can read a n file to read uh uh new features of Ruby 2.0 so
00:04:35.240 there are many many features or changes so please check it check it
00:04:40.960 out so I finish about rup 2.0 so I
00:04:47.360 changed my title I want to change my
00:05:05.960 so let's just re about the how to make a
00:05:13.600 interpreter the lecture see number three so this is a revie SL so to uh to
00:05:24.639 attend the this course so you need to uh finish the the Language basic of course
00:05:31.400 you know of course you finished and this course uh I
00:05:37.479 use uh Ruby program language as you know Ruby is very
00:05:42.600 popular language of course you know and I use shui to to learn the how to make
00:05:49.520 an interpreter so shui has their own B Machine written by one
00:05:56.600 Japanese and the so this is this is also a review slide
00:06:02.600 so uh to make a b machine we need to execute the instruction of B machine so
00:06:08.960 the program counter pointed the instruction how which we need to uh
00:06:17.080 execute and uh usually we use the stock machine uh v
00:06:23.960 machine suck machine architecture to to implement the uh
00:06:30.240 uh virtu machine so all on the stock machine virtu machine all of bodies are
00:06:36.479 on the stock and stack top is pointed by the stock pointer uh shortly uh sp
00:06:45.240 sp and we already discussed the uh the comparison with the register machines
00:06:51.120 viral machines and St machine vir machines so there are several advantage
00:06:56.479 and disadvantage uh I eliminate this uh
00:07:02.120 discussion so this is a uh figure of how to execute uh St
00:07:11.280 machine B machine so program counter point is the the first of the program uh
00:07:17.400 uh instruction and stock pointer show uh pointed the the top of
00:07:24.440 Stack then uh uh doing the this
00:07:30.280 instruction gets the the value from uh local variable B and push
00:07:36.280 the uh value on the stock and the stock pointer
00:07:44.680 incremented and next uh same thing so push the the value of local varable
00:07:53.400 C and last uh execute the plus method
00:08:00.360 method so push the two vales from the stocktop and uh calculat the the B plus
00:08:07.759 C and then uh Lo uh at last pop the value of the St toop and push the uh set
00:08:16.400 the the local variable a this is a simple uh execution
00:08:23.960 sample so we uh I already uh introduced the uh
00:08:30.039 several uh optimizing techniques for the stock machine B machines so people optimization at the compile time and
00:08:36.680 make Maring section and D sting and St cing so there are many many uh
00:08:42.599 optimization techniques and also the the B Machine
00:08:49.200 generator is helps us the building a
00:08:57.440 flexible B Machine building so back to the today's topic we use the
00:09:04.279 uh this example the one line method one line sentence so rece do Sor ar ar
00:09:13.560 rece means reer and selector means the method IDE or method
00:09:19.200 name and argue one and AR two
00:09:26.200 arguments before mthod dispatch making meth method thisp we need to evaluate
00:09:31.640 the thece and evaluate the AR one and AR
00:09:36.920 two and push then on the stacks so uh after the the procedure
00:09:44.839 number two then the receiver and ar1 and ar2 are pushed on
00:09:53.760 stock so after that we uh we dispar the meth methods so we have uh this is the
00:10:00.959 overview of uh method dispat process so
00:10:07.959 uh there are several uh processes
00:10:14.880 so at first uh getting the class of receiver and searge the method body from
00:10:22.920 the class and this part method with checking something and the St the
00:10:29.959 resistance and building a local environment and set a program counter so
00:10:35.480 at first method so we need to meth sear from uh reer
00:10:41.480 cross so in objectoriented langage the the there are hierarchy of classes so in
00:10:50.760 rubby Ruby's case the basic uh basic object are inherited
00:10:56.040 and the inheritors here is z uh this and
00:11:01.880 we we need to search the uh methods method for
00:11:08.399 selector for C2 and C1 and object and can and B object when we can find a uh
00:11:16.079 Sor at C1 we we we can get the the method
00:11:21.760 body uh in the case if there are no uh method uh found then uh method missing
00:11:29.920 will be dispatched and checking uh means how
00:11:37.839 many uh how many argument the method have expects so
00:11:44.880 compare with the given argument number and expect argument number of course it is different than uh C the argument
00:11:53.519 error and checking visibility uh Ruby has three Vis visibility uh public
00:12:00.360 private and protected so I can't I can't say the which is what
00:12:09.360 I need to see the the source code of rby v machine to understand what is which is
00:12:15.079 one which is what so maybe you you you
00:12:20.440 know and we need to build a local environment so each method have a local
00:12:27.519 variables so we need to store the local variable and get the store the local
00:12:32.920 variable so uh we prepare the local VAR variable space from at the stack so
00:12:40.600 extend the stack and prepare the uh local variable space so we we show we say uh this space
00:12:50.399 as local environment or environment shortly in short so especially the parameters are
00:12:58.680 also in environment so this is a figure before the stock method dispatch stock
00:13:06.440 is uh receive and AR one AR two the this order and after the method dispatch we
00:13:13.360 prepare the local variable space to with using extending the St stock and the top
00:13:21.720 of environment are pointed by the environment pointer shortly
00:13:27.519 EP and E can uh point the every uh local
00:13:33.399 values for example using the uh 0 index
00:13:38.480 0 is local Val two and so
00:13:43.560 on uh this index uh index of local variables are uh can be calculated at
00:13:50.600 compile time so again uh this this slide show
00:13:57.800 again uh this is the overview so there are about seven steps so it's as you can
00:14:04.680 see it is a very simple but a bit slow you you maybe you you
00:14:10.839 think so so this is the lecture so I want to
00:14:16.440 ask the quiz so I show the the seven steps overview but Ruby is more complex
00:14:25.199 Ruby in Ruby case it is more complex so what do you think about how many uh
00:14:31.880 steps uh Ruby need to uh make a this me dispatch what do you think about eight
00:14:39.040 St eight steps uh 12 steps
00:14:45.399 oh 60 steps oh most of people select the 60 steps so
00:14:53.720 the answer is the 20s
00:15:03.959 so in luid case we need to uh make a uh such a uh uh complex method disp
00:15:14.279 processes so next Quiz is uh we have uh
00:15:19.759 uh we we we use several method method type for
00:15:26.600 example the the most of the case you the method is defined by a def keyword
00:15:32.639 in rubby script so this is one type and
00:15:38.279 also uh you can you can load a c extension Library written in C
00:15:44.279 programming language and uh the method type there is a method type uh by
00:15:52.000 defined by sh function the sh function method is only call the sh function
00:15:59.519 so there are at least two types at least two types but there are more uh
00:16:07.399 uh method types we implemented so the
00:16:12.639 this is next uh quiz so what do you think about the H so what's the number
00:16:18.519 number what do you think about so three types three
00:16:23.720 types six types oh uh nine
00:16:30.639 types oh thank you most of people understand this
00:16:43.000 types 11 types uh meth 11 method types so such a things at defined at the
00:16:51.720 uh the method do H in C Shu so there are many many types
00:16:59.600 different method dispatch procedure we implemented so we use the dispatch
00:17:04.640 switch case statement in C language at every uh method
00:17:13.880 dispatch so this is the last quiz so I introduced the B registers program
00:17:20.280 counter stock pointer and environment pointer
00:17:25.839 so so but uh Ruby has more uh virtual machine virtual registers so what do you
00:17:33.240 think about the H so so if you please raise your hand if you
00:17:39.360 think it is a four registers six
00:17:44.720 registers six registers oh ever nine registers n reg oh most of
00:17:53.360 people understand the system so all right so um what do you think
00:17:59.760 about there how many 11
00:18:12.440 registers so this means that uh we need to uh store the 11 regist 11 registers
00:18:21.039 for each method dispatch so it is very uh cost for for we need to reduce the
00:18:26.600 register number so at 1 193 we use 11 11 registers but uh
00:18:35.840 at the rubby 2.0 uh we use only uh 10 registers but it is uh big number so we
00:18:44.440 need to reduce and we uh introduced the control frame SE to St the
00:18:52.880 register uh to maintain the method frames so each method frames uh V uh
00:19:00.559 control uh suchal resistance so it mean that this is a figure of the structure
00:19:08.000 of uh value stack and control frame Stacks so this is a method uh we need to
00:19:17.120 build a method me control frame frame for each method so we we St the uh the
00:19:26.640 such 11 register and and point the the such
00:19:33.600 things okay and also Ruby has many many complex
00:19:40.600 features like a parameter so Ruby has a
00:19:45.720 uh mandatory mandatory parameters or optional parameters and post parameters
00:19:51.960 and rest parameter and block parameter and uh after Ruby 2.0 we introduce
00:19:59.000 keyword parameters so we need to uh uh process this for the this uh
00:20:08.520 complex
00:20:14.159 parameters so uh we need to uh every um
00:20:21.440 at every method dispatch we execute such a uh complex processes
00:20:32.799 so this figure shows how method dispatch consumes the uh uh the consume time so
00:20:41.520 at the pon Benchmark it's a micro Benchmark the method dispatch consumes
00:20:47.080 the quarter of time quter of one thir time of all of execution time and the
00:20:54.679 pom solving uh puzzle uh we need uh 10 about 10% method dis per time so we need
00:21:02.559 to reduce the uh method dispatch time so method reduce
00:21:09.799 the method dispatch time is important so this is a
00:21:15.559 homework so please report to uh method dispatch speed up techniques and analyze
00:21:21.600 the method dispatch overhead over your uh your favorite application and Survey method dis speed of techniques and
00:21:27.240 propose your own optimizing techniques to improve the method performance and to implement techniques and evalate Z
00:21:34.279 performance the deadline is z and this report is important for your grade all
00:21:42.919 right so lecture was
00:21:48.600 finished no no no no no presentation is not
00:21:57.320 finished so so this is a report of this uh homework so optimizing techniques so
00:22:04.720 Ruby meth dispatch so we implement the three kind
00:22:10.400 uh three new optimiz techniques to speed up the method dispatch so this is the main topic of
00:22:17.919 this today's presentation so uh summarize the method
00:22:23.159 dispatch overhead so there are several uh processes so we need to skip or
00:22:29.480 eliminate the such processes so at first the we use
00:22:34.799 specializ instruction for example The Fix N plus Fix N plus is uh is very
00:22:41.080 frequent maybe you use frequently so we need to eliminate reduce the uh the
00:22:48.600 execution execution time of fix and plus so we use the special B machine
00:22:53.679 instruction to to do uh uh fix number
00:23:01.520 and we use plus minus or times or uh some other meth special
00:23:09.640 methods so this code shows the the specializ instruction s so
00:23:16.960 code F code in Ruby so check checking the uh receiver as is either is it is it
00:23:24.840 a fixed number or object is a fixed number and the Fix Plus is not redefined
00:23:30.240 then we can use uh Fix Plus optimized method without building uh
00:23:38.600 building uh method frame so it is very fast but if this condition is false then
00:23:45.720 uh using using the uh orinal General method dispatch
00:23:52.039 code so there is a advantage and disadvantages so eliminate all dispatch
00:23:57.559 cost so this is a this technique has very effective for only a few methods so
00:24:06.840 if we make uh each instruction for the
00:24:12.080 the C method then uh it reduce the v machine performance so it is a very uh
00:24:19.840 limited method and if the uh we use a plus
00:24:26.640 method for other classes for for example the fix now then
00:24:33.240 we need we need uh additional overhead uh the checking
00:24:40.440 overhead so this is advantage and disadvantages and we use the method
00:24:47.360 caching I explained the the na uh simple method search
00:24:55.640 algorithm uh from this uh class hierarchy but
00:25:01.600 the uh we can cat the uh result of method search so we use the two types uh
00:25:09.120 methods uh two types method uh
00:25:14.480 caching so one is the inline method cach uh this
00:25:19.880 this has only one cach entry for each for each
00:25:26.360 uh uh call site and rby process has has
00:25:32.320 a uh one Global method cach table it has
00:25:37.840 uh hundreds of entries so we use two types
00:25:46.600 cashing so uh this technique are implemented at from the 1 n so here uh
00:25:54.440 this is a uh from Ruby 2.0 technique
00:26:07.120 so the next technique is the cing the checking result uh I explain that we
00:26:13.760 need to check the ality or
00:26:19.159 visibility but this check uh can be eliminated can be skipped uh after the
00:26:27.000 fast checking uh this means that the if I the if there is a loop and the same
00:26:34.679 call site uh at same call site and the previous the first method uh first
00:26:41.919 method called this part uh if it is skipped
00:26:47.000 then we can eliminate the checking of this uh
00:26:52.240 checking so the so after after the checking the
00:26:59.200 first time then cat the result of the such a checking checking result at into the
00:27:07.080 inline method cach entry so if inline method cach are are HED so we don't need
00:27:14.200 to search uh search methods then we can uh skip the this uh
00:27:24.760 checking so this is the main contribution of 2.0 for method part
00:27:30.200 speed up and we use we uh prepare the framess
00:27:37.840 C function uh framework so introduce a
00:27:42.919 frameless meth C fun methods so several methods such as a string length uh don't
00:27:49.559 need building a a new method frame so if the method is a frame C function then we
00:27:56.919 eliminate the meth eliminates the the building frame
00:28:03.159 so we can skip these processes for the some methods like a
00:28:09.679 string length so there is so it is a similar
00:28:15.799 technique to for with uh to uh similar
00:28:21.039 technique to uh specialized instruction but uh compare with specialized
00:28:26.640 instruction uh we can use uh ultimate number of uh frame method but it is a
00:28:34.039 bit slower than the specialized instruction and I show you the uh
00:28:40.000 evaluation result after that and but this technique
00:28:45.240 are not included and I prepar the special part
00:28:52.440 for the sent of method missing from 2.0 uh actually we prep uh
00:28:59.559 we you we make a Technique we implemented the the technique for the S
00:29:07.279 method but we refactoring and speed up are given
00:29:13.080 so before this technique this optimization we you we call the send
00:29:18.640 method send C function methods and then she send method uh call the the F method
00:29:26.080 but after this optimization if the send we call the send F then uh
00:29:35.360 search the F directory
00:29:40.600 not we call the method who directory and return this method
00:29:49.240 code so this is a micro Benchmark results this this graph shows the
00:29:55.880 uh how far faster than the this date uh
00:30:01.080 October 13th October 1
00:30:07.360 so um because I I implement the optimize
00:30:12.960 technique from this date so this is a daily date daily build to the from the
00:30:19.679 October 13th to the the yesterday October
00:30:26.559 31 so we can we can see the uh several uh performance
00:30:32.919 improvements uh this is a uh uh meth me
00:30:38.320 simple method sending so we we can get 40% of 50% and
00:30:45.519 there are a typical uh optimization result you can see uh uh we can see the
00:30:54.039 speed up results and for the uh
00:31:00.679 application sever application has uh 20% performance Improvement and sever
00:31:06.760 application doesn't uh get the speed up so we need
00:31:12.600 to we need to make more
00:31:17.799 effort so future also as you you can see
00:31:23.720 the uh we need to one more effort to speed up the method dispatch we uh we um
00:31:31.760 I want to restructure the method frame so 11 or 10 uh registers it is very huge
00:31:39.279 number of register so we need to reduce them and improve the E performance I
00:31:45.159 only uh improve the performance of uh Methodist but not e block impation so we
00:31:51.720 need to reduce the RO invocation performance
00:32:01.840 so the conclusion of the this report methodes by speed up we need to make uh
00:32:08.960 so method R method dispatch is very very very complex so but we can see the
00:32:16.600 15% uh speed up for simple meth dispatch
00:32:21.679 with uh new uh rby 2.0 uh from rby 2.
00:32:31.120 and we need more effort to achieve the performance to improve so this is a conclusion of this
00:32:38.639 method disp report and we introduce
00:32:43.880 the the some other methods uh some other optimization techniques into from Ruby
00:32:51.639 2. Z yesterday I
00:32:58.519 make practice this presentation I I consume the 40 minutes from uh to this
00:33:05.480 right so I want to say there are no time to say so I want to skip but there are
00:33:12.799 10 more minutes so I want to introduce the several optimiz techniques so we
00:33:20.159 introduce a Chron technique for the uh 64 CP operating systems to improve the
00:33:28.679 the floating number performance calculation performance and we prepare the right
00:33:35.399 weight back uh capturing so if you ra on
00:33:40.760 L then the back uh we need to build a back to see the back uh to print out the
00:33:49.320 back but most of case we don't need to get the
00:33:54.679 back if you if you don't see the bace
00:34:00.120 so uh we uh make a a lightweight
00:34:06.200 method and restructure a small uh small restructuring of v machine stocks and
00:34:13.599 instruction secet data and maybe most of people want to know
00:34:20.560 the bit marking garbage cure and require performance Improvement by other people
00:34:27.079 so by but um I don't have a presentation slide to explain that so please ask
00:34:34.520 please ask me after this
00:34:43.040 presentation so this is this needs a long time so I want to skip the the uh
00:34:51.960 details of the technique but we can see the some bemark uh
00:34:58.880 performance improvements so most of uh micro bench uh micro benchmarks don't
00:35:05.760 use uh floating number calculation so some some uh benchmarks us
00:35:12.839 uses some benchmarks use the floating number so we can see the performance
00:35:18.520 improvement with this optimization so I want to skipe the the
00:35:27.079 detail Tech technique of this optimization uh so if you want to know
00:35:34.040 the how to implement this so uh please run the uh itop
00:35:40.640 754 floting number uh
00:35:46.200 layout uh the simply explanation we
00:35:51.920 uh this technique uh Ed the the value 64 bits floating
00:36:01.920 number to 64 bits uh value
00:36:09.760 size and this is a explanation of right with back capturing so back as you know
00:36:16.480 back is string array of string objects so file name method and blah blah blah
00:36:22.839 but uh capture only instruction and
00:36:27.920 translate to string uh only when it is access access so it it uh can
00:36:36.720 uh so cap only capture the instruction sequence data array of instruction
00:36:41.920 sequence data and then when the back are accessed then translate this data
00:36:49.040 structure to the uh the orinal VOR array stting with array
00:36:57.800 so I have uh five time
00:37:03.040 minutes so we we will release uh Ruby
00:37:09.240 2.0.0 next February so we need we need
00:37:14.640 more effort I think so the this graph shows the performance
00:37:21.839 over uh performance overhead analysis the same same graph I explained but this graph is for
00:37:30.400 the aroc and L application uh I'm not sure what what
00:37:37.040 kind of RA application but this in this kind so method dispatch
00:37:42.440 overhead is 10% but most of uh overhead is most of time are consumed at the uh
00:37:52.160 garbage coru and Sh function so like string manipulation or something
00:37:58.160 and at ra application the garbage production time is
00:38:04.160 here here and there
00:38:10.960 uh uh sh so string manipulation or some other uh performance uh some other part
00:38:19.079 consumes many many time so we need more and more other
00:38:26.680 techniques so as you can see the on and uh actual practical application v
00:38:33.280 machine is not on bot so but I think we the mathematic of symol computation the
00:38:39.800 v machine is a matter so I will introduce some other
00:38:46.359 techniqu such as a compil framework or something after the littleit point0 and uh garbage corlection as you
00:38:54.680 know the generational garbage corlection are required but introducing the G generational
00:39:00.800 garage collection makes a uh huge huge number of bugs and we need to the in
00:39:07.680 health so we need more dependable way so I I have an idea so I want to try the
00:39:14.359 the to implement the r uh generational garbage corlection after 2.0
00:39:20.640 released and Par paration so we don't I don't have any
00:39:27.040 time or space to discuss about that so please ask me if you want to know about
00:39:32.640 that so conclusion so there are many many tasks left after even after Ruby
00:39:40.040 2.0 so conclusion is our challenge is has just become so thank you very much
Explore all talks recorded at RubyConf 2012
+46