Summarized using AI

Why JRuby Works

Charles Nutter and Tom Enebo • November 01, 2012 • Denver, Colorado • Talk

The talk titled "Why JRuby Works" discusses the performance and capabilities of JRuby, a Ruby implementation on the Java Virtual Machine (JVM). The presenters, Charles Nutter and Tom Enebo, emphasize the importance of JRuby as a drop-in replacement for MRI (Matz's Ruby Interpreter) while maintaining compatibility with Ruby as a whole. Throughout the presentation, the speakers highlight several technical advantages of JRuby stemming from its integration with the JVM, including effective garbage collection (GC), threading, and access to Java libraries.

Key points discussed in the talk include:
- Compatibility: Ensuring that JRuby is compatible with various Ruby versions to facilitate a seamless transition for developers. Several limitations were noted, such as lack of native C extension support prior to JRuby 1.7 and specific POSIX APIs.
- Performance Improvements with JVM: By running on the JVM, JRuby benefits from the efficient garbage collection mechanisms and optimization strategies employed by JVM, which are crucial for performance, especially in large-scale Ruby applications.
- Threading: JRuby utilizes native threads directly from the JVM, allowing for true parallel execution of Ruby threads across multiple CPU cores, which leads to improved performance compared to MRI that uses a Global Interpreter Lock (GIL).
- Integration with Other Languages: The JVM enables JRuby to interface with libraries and code from multiple languages, promoting a polyglot programming environment. This extended flexibility allows Ruby developers to leverage Java’s rich library ecosystem.
- Tooling and Monitoring: JRuby can utilize JVM profiling and monitoring tools, such as JMX and VisualVM, providing deep insights into application performance and allowing for better diagnostics.
- Garbage Collection Efficiency: Comparisons demonstrated that JRuby experiences significantly lower GC times versus MRI, which can struggle with garbage collection in object-heavy applications.
- Invoke Dynamic: The incorporation of JVM's Invoke Dynamic feature allows JRuby to optimize method calls at runtime, leading to performance boosts in applications.

In conclusion, the speakers advocate for JRuby as a powerful alternative for Ruby developers, particularly those focusing on larger applications where performance is critical. They encourage attendees to test their applications on JRuby and leverage community feedback to further improve this implementation, providing resources for learning and deployment related to JRuby. Overall, JRuby is positioned as a robust, performant choice for Ruby development in environments benefiting from JVM optimizations.

Why JRuby Works
Charles Nutter and Tom Enebo • Denver, Colorado • Talk

Date: November 01, 2012
Published: March 19, 2013
Announced: unknown

There's a lot to love about JRuby. It performs better than just about any other implementation. It has solid, reliable, parallel threading. Its extensions are managed code and won't segfault or interfere with threads or GC. And it gives Ruby access to anything that can run on a JVM, including code written in other languages.

But how does it work? More importantly, why does it work so well?

This talk will dig into the nitty-gritty details of how JRuby takes advantage of JVM optimization, threading, GC, and language integration. We'll show why Ruby works so well on top of the JVM and what it means for you. And we'll explore the future of the JVM and how it will make JRuby even better (and JRubyists even happier).

RubyConf 2012

00:00:15.760 six or seven years and a few years before that is a hobby so it's a project
00:00:22.000 close to our hearts before we start I'd like to underscore one point JRuby is
00:00:30.670 Ruby we're going to be talking about differences but we have to underscore
00:00:36.550 the fact that it's very important to us that we're as compatible as we can be with the various versions of MRI because
00:00:45.250 we want it to be a drop-in replacement if you go and write a ruby program and try running out in JRuby and you fall in
00:00:52.629 a bug you'll probably say all screw JRuby and not try for another year or maybe never again so it's very important
00:01:01.210 that we try to be as compatible with Ruby as we can well any of you saw Brian's talk in the last session in here
00:01:08.070 Ruby means all of the implementations you know and that's what we are aim for we want to be a Ruby language
00:01:14.549 implementation that matches as much as possible what the Ruby language does so
00:01:20.590 that may mean MRI if we don't have a good specification or it may mean the more abstract Ruby but JRuby is Ruby
00:01:27.400 just like MRI is will be honest we're compatibility project and we have some
00:01:35.740 holes there's some POSIX api's that we can quite cover we did start at jnr
00:01:42.460 POSIX library several years back we've been slowly adding missing pieces so
00:01:48.550 that will continue to get better we basically dropped native see extension
00:01:54.010 support for 17 in mostly just because no one's really stood up to maintain it so
00:02:00.190 if you really have a strong interest in seeing this continue forward you should come and talk to us later but it was
00:02:06.790 also kind of a confusing picture because people would go and install a native see extension for performance or speed and
00:02:14.190 we would load it and Brian was talking about this earlier we we do actually put
00:02:19.959 a lock around calls up to the native see extension because we don't want two
00:02:25.030 threads trying to call into a sea space that's not thread-safe at the same time and that kind of has lackluster
00:02:32.620 performance and we have to make all the same tricks to pretend that the objects that are floating around or real pointers when they're not actually real
00:02:39.010 pointers so you end up losing a lot of the performance gain you would have gotten from going to a C extension anyway and then we'd see a tweet saying
00:02:44.739 jruby is incredibly slow Wow wah wah and and the truth is there was a J Ruby
00:02:52.209 native equivalent that was out there that performed well so C extensions were good for loading certain things that we
00:02:59.319 would have been able to do without see extension support but it just kind of muddled the picture a bit there's a few
00:03:05.500 things will never be able to do the JVM is never going to allow us to fork we're
00:03:10.959 going to be able to support a very small portion of what call Cece can do we've
00:03:16.030 also made some decisions explicitly not to follow MRI we default object space to
00:03:22.810 mostly off so if you go and look object space still keeps track of modules and
00:03:28.569 classes but not every object in the system you can turn that on if you want to but for ninety-nine percent of the
00:03:35.260 use cases you have no need to we'll talk a little bit later about why we made some of those sort of decisions you'll
00:03:40.419 you'll see it you'll see the numbers so JRuby is Ruby and then some and the rest
00:03:46.870 of the talk is going to be about and then so we should make an announcement
00:03:52.689 last week we finally put out JB 170
00:04:01.020 it's over I think over a year and a half since 160 we for so we forked a year and
00:04:06.850 a half ago we only have two major bullets here 193 by default and
00:04:12.900 production ready and invoke dynamic supports baked in if you run the right
00:04:18.760 JVM but a year and a half's a lot of commits and we probably didn't backport
00:04:26.500 half of them yeah thousands of thousands of changes and improvements in 17 if
00:04:32.680 you've never used JRuby all you have to do is make sure you have Java installed and you can just install it with the rvm
00:04:39.480 simple try it out use it love it whatever so for an open source project
00:04:48.270 in the same mold as most open source projects we have a group of developers
00:04:54.010 that are committing to our code base and as an open source project we have
00:04:59.160 dependencies well we have a huge dependency and it's called the JVM but
00:05:05.140 another way to look at this is we might have a contributor like yoko making
00:05:10.710 committing to our code base to fix a embedding API but then you might have a
00:05:17.050 JVM engineer like John Rowe is adding a new optimization of the jVL so in a
00:05:23.380 sense we have a much broader team if we look in contrast to MRI there's a
00:05:33.900 there's a lot more flexibility in that they can make their own subsystems in
00:05:39.130 their own runtime but we kind of like the idea of letting other people do it
00:05:44.800 for us and what's great is by building
00:05:50.050 atop of the JVM we don't have to make our own garbage collectors we don't have to worry about getting to native code on
00:05:56.470 all the platforms we support tooling native threading was built from
00:06:01.650 the start and Java and that's not easy to add and so on and so forth I
00:06:10.880 mentioned John row so I thought I'd put his picture up there it's a pretty attractive guy he's been working on a
00:06:18.479 hot spot for 15 years a cool fact I just learned is he invented a programming
00:06:24.270 language I think it's called sea star I don't know what how the Stars pronounced but he made it with guys steal how cool
00:06:31.919 is that Andy's also in the VOC dynamic me finally I'll talk about dynamic so we
00:06:41.220 decided to have a thought experiment what if in 2007 or 2008 we decided just
00:06:47.449 we're not going to develop jruby anymore and we're not going to let anyone else do it so let's see what happens that
00:06:55.680 would be JRuby 10 basically yeah so at JRuby 103 is about 23 seconds to go and
00:07:03.479 run red black tree um with Java 14 now today it's about three times faster with
00:07:11.460 Java 7 this is pretty stunning mean just sit back and basically let the JVM get
00:07:17.340 faster and JRuby keeps getting faster yeah give me another margarita um but
00:07:23.940 then we thought Oh what if 186 stop developing today or back down at that
00:07:29.370 time and it amazingly we actually caught up to 186 by doing nothing so I think
00:07:35.909 this underscores the power of relying on the JVM let's go a little bit deeper
00:07:42.500 through some of the things mention the previous slide it's cross-platform it
00:07:49.110 literally runs everywhere standard platforms you come to expect but openvms
00:07:57.560 ASV as400 some guy was running ruby on rails on a s400 talking to Microsoft's
00:08:05.729 sequel server where was it access I might have might have been accent I've been a chair says
00:08:12.570 yep we are people people do this we enable yes we enable people to do wonderful things it's expanded to
00:08:21.130 Android androids done a great job of pushing Java back into the mobile space
00:08:26.280 here actually who here even knows what as400 is ok that's surprising only at
00:08:32.919 this conference when I get that many hands most of the time there's a bunch of blank stares here's our open bugs for openvms support if you want to dive in
00:08:40.810 and fix some of the openvms support go for it or you can just tell us how to get a copy of oh yeah we don't even know
00:08:48.370 how to put anyone out to test this I thought we get access to help VMs this I think you need a wheelbarrow full of
00:08:54.490 cash to get it above all the old systems that I have in my basement there's there's no faxes or anything else like
00:09:00.040 that I have no alphas I mean I don't know what I've got even run it on so we depend on our openvms users the second
00:09:07.960 great decision of the JVM is they compiled to a neutral java bytecode format so you compile things once and
00:09:14.110 then you can run them in many places so if you if you look at this list these
00:09:19.330 are the java native extensions that exist right now and you don't have to
00:09:25.360 compile them when you install the gym now why this is important does anyone
00:09:31.000 any windows users out there it's tough it's tough installing native see
00:09:37.360 extensions on Windows has anybody ever gone to production and had a see extension installation failed a few
00:09:45.130 people yeah or local or you know any time there's good there's all sorts of things that can go wrong in the process
00:09:51.070 of building a see extension and these all basically just distribute binary isn't just drop into your gem directory
00:09:57.630 there's 47,000 Java libraries now a lot of you might think oh yeah there's
00:10:03.430 45,000 rubygems but there's 45,000 rubygems with like pride as John to
00:10:10.060 released yeah that bet that sort of pattern doesn't actually happen in maven so it's a much more impressive number
00:10:17.320 than a lot a lot of good uses for dicks and dick streamer gems as well you haven't
00:10:23.010 installed gem install dicks you could have a lot of fun with that a lot of
00:10:28.230 useful gems out there too there's a lot of languages out there when I win it
00:10:33.450 when I put this together I kind of threw the cobalt thing as a joke but then I realized well crap there's a lot of
00:10:39.360 legacy apps out there this is a big deal and polyglot in general is a big deal I
00:10:47.690 love writing stuff in Ruby it's a great language but is it the best language for
00:10:53.610 absolutely everything of course not so use the right tool for the job the other
00:10:59.910 great thing about the JVM running multiple languages is they run within the same vm so you get this holistic
00:11:05.190 view because language and language be both have their both based on the same
00:11:14.250 basic object same garbage collector same tools to look at the performance and how much memory is being used and so forth
00:11:20.100 and another weird side benta polyglot is every single language likes to go and
00:11:26.700 reinvent their own wheel well some of those wheels are awesome maybe more awesome than the Ruby wheel so you just
00:11:34.110 have another dimensional choices alright so I'll take it to the next one GC here
00:11:40.590 uh so GC does matter and it makes a
00:11:45.900 tremendous difference for the performance on application anybody that runs a significant Ruby application on
00:11:51.680 MRI is eventually going to see some GC issues that they have to deal with or to knit I mean this is basically the
00:11:57.540 primary cell behind ruby enterprise edition was that the GC was tuned a little bit better you could make your
00:12:03.360 own tuning to it and it was designed to fit a process model forking model better GC is a huge huge issue for a lot of
00:12:10.770 applications and Ruby itself is very object heavy if you've ever done any object counts of an applications running
00:12:16.590 it's a tremendous amount like massive numbers of strings massive numbers of arrays all over the place a lot of
00:12:23.250 people say oh well you know it's okay we'll just keep our process is small and have a bunch of them well if you've got ten processes all doing ten percent of
00:12:30.480 their time on GC you're wasting 10 x 10 of all of the the process of the CPU time in the system
00:12:37.050 you're wasting those resources by having to split it up so that I believe that any application that gets to a circuit
00:12:44.280 that gets above a significant size is going to have issues with GC running on MRI there's just no way to avoid it at
00:12:50.400 this point what we have with a JVM is a wide variety of options ranging from
00:12:56.070 small applications up to very large applications with large heaps many many cores many threads running at the same
00:13:01.500 time it scales up extremely well there are JVM applications out there that are
00:13:06.990 running terabytes of memory tera byte size heaps and having no major issues
00:13:12.540 with it and these really are the best gc's in the world if you go back and look at GC research over the past ten
00:13:18.390 years ninety percent of them are talking about how this would be implemented on the JVM or implementing it for the JVM
00:13:25.020 so it's it's all there and available for free if you run on top of JRuby now just
00:13:30.840 on OpenJDK which is the one that most people are going to end up touching there's a parallel collector that uses
00:13:37.080 multiple cores stops the world but it uses multiple cores to clean up stuff as quickly as possible and keep the pause
00:13:42.300 time short as a concurrent collector that still stop the world parallel for a young generation but any old objects out
00:13:48.990 there it'll collect concurrently with the application execute so you don't see pauses g1 is intended to be concurrent
00:13:55.380 for young and old there's also a serial single threaded if you just want straight throughput as fast as possible
00:14:01.020 for a smaller application there's AB there's other GCS from other VMS like the c-4 continuously concurrent
00:14:09.360 collector crazy so I don't know what to see other seas are but I mean it's a real zero pause garbage collector that
00:14:16.680 you can just drop a J Ruby application into and never see a GC pause that's the
00:14:22.080 sort of thing that we've got and Tom mentioned it briefly but the fact that we have the homogeneity of all objects
00:14:27.810 in all libraries and all languages that run alongside jruby means that they all take advantage of the same GC we don't
00:14:34.740 have separate memory models we don't have separate extensions that have their own way of managing memory it all fits into the same world and so everybody
00:14:41.160 every piece of code that you're running with a JRuby application in that same excellent GC so I've got
00:14:47.970 some demos and numbers here to talk a little bit about GC on the JVM versus other options this first one is
00:14:55.080 basically doing a lot of heavy GC with a mix of old and young objects but the
00:15:00.390 heap is steadily growing over time so we can see what the effect of a larger heap has on GC times this is the meat of it
00:15:07.920 so we've got a couple loops inner loops outer loops inner loops are creating a young garbage that's going to be
00:15:14.220 collected fairly soon if not immediately the next loop out is creating an older day older older garbage older data in
00:15:22.140 memory as basically a chain of objects and this credit to Evan for Evan Phoenix
00:15:27.899 for coming up for this for testing young generation whole generation stuff and so I went to run this and get some
00:15:34.440 information on what the GC times and number of GC runs look like on regular MRI versus jruby and this is the first
00:15:41.790 result that I got and I actually I had to go logarithmic with the vertical
00:15:48.180 scale here to even have JRuby show up because Mr I was spending so much time and doing so many GC runs that it
00:15:55.470 completely buried it jruby numbers if we look at the amount of time being spent I
00:16:00.630 didn't have to quite go logarithmic here but still we're talking about orders of magnitude more time spent on GC in MRI
00:16:07.709 versus on the JVM that's Jamie and so then I thought okay well let's look at individual GC times Ruby 193 and higher
00:16:16.140 and JRuby with 17 have a reporting feature that will show all the collections what the heap sizes were and
00:16:22.980 how long they took and this is the court sort of graph that you get you can basically look from left from left to
00:16:29.279 right here and say this is the maturity of an application or this is the amount of data and applications going to have
00:16:35.310 the process and this is the exponential increase in how much time you're going to waste on garbage collection with that
00:16:41.220 same application well jruby the JVM it's all staying flat along the bottom however doing much data you have in
00:16:47.010 memory has much less effect on what GC is going to need to do and you know I this this is this makes people unhappy
00:16:54.660 when an application that gets larger and more popular and becomes Twitter or become somebody else starts to have all
00:17:01.199 these pauses and delays in it and so this is also a graph of the heap size of your application versus the unhappiness
00:17:07.740 of your users and that's not a good curve the highs are off-screen the eyes are off-screen but you can see the
00:17:13.620 frowns so uh and you know so all that people say okay well it's not so bad we're talking about in this previous
00:17:19.919 graph about two hundred and some milliseconds for this size heap well what if you're a cheap gets larger and
00:17:26.370 so I've got eight gig on my machine I carried it further and I even went to ruby 20 which has a bunch of garbage
00:17:32.370 collector improvements and it just it continues to get worse from there this is now in this counting in seconds per
00:17:38.070 GC pause if you have a gig heat you're at about a half second and then it continues to go up from there so you
00:17:44.550 split it up you say okay well I'm going to split it up into multiple processes well you've got each of those processes doing one quarter or one-fifth or one
00:17:51.540 tenth of the same amount of work essentially so I mean the findings when
00:17:57.540 I started running through this we're kind of what I expected with JRuby you get lower more uniform GC times because
00:18:03.990 the GC doesn't do a lot of unnecessary work reduced or eliminated pauses depending on which type of GC you want
00:18:09.809 to run and you can go to massive heaps with still consistent application performance consistent GC times any
00:18:18.660 application gets above a certain size is basically going to need a vm that has a garbage collector like this and JRuby is
00:18:24.480 not a bad option so the next want to be threading again a thing we mostly get
00:18:29.880 for free from the JVM there are things that we've had to do to improve Ruby threading and figure out what it means
00:18:34.980 to have concurrent execution of Ruby code but with JRuby a ruby thread is a JVM thread is a native thread on pretty
00:18:42.150 much every platform so you do thread dot new you can run that on another core completely in parallel with the main
00:18:48.990 thread or with other threads that you've got running in a system so one process one jruby process can actually use all
00:18:54.960 the cores in the system saturate all the cores in the system and you don't have to spin up and manage a dozen processes
00:19:00.240 to make that happen this also means that one server one server instance can handle all of
00:19:05.430 the requests for your site we've got people that run applications on JRuby pushing tens of thousands of requests on
00:19:11.430 eight core eight way systems with one process one jruby process that maybe is 200 meg and that's the entire
00:19:17.850 application much better way to run applications than having 30 instances of
00:19:22.920 a process and having to manage all of them and then wasting all the resources across them so another benchmark this is
00:19:29.850 the non parallel version of it it's basically just creating a big array and then walking over at a bunch of times if
00:19:36.000 we parallelize this just to see what we're actually getting as far as parallel performance using all the cores
00:19:41.190 in the system will just change that from 10 times going to loop to 10 threads that are going to loop individually and
00:19:47.160 then join them all and see where we've got so as it expects with unthreaded no
00:19:53.880 just the single version the direct 10 times loop we get one core that basically saturates and it's using all
00:20:00.570 the CPU on that one core if we go to threaded it's still basically using almost exactly the same amount of CPU
00:20:06.900 time because none of those threads are allowed to run in parallel we get the same one hundred percent of one core
00:20:12.720 just sprinkled across all the cords because they are native threads but they can't run concurrently now jruby pretty
00:20:19.770 much the same you'll see that we've got one core that's close to being saturated we've got other cores that are actually doing a little bit more work than in the
00:20:25.980 19 case that's very likely to parallel GC running off to the side while the application does its work and then I
00:20:32.520 mean this is even better than a fork or this is actually getting the hyper threaded cores on the machine as well
00:20:37.740 saturating an entire CPU just by spinning up a few Ruby threats this is
00:20:42.840 actually possible to do we look at another benchmark there's a threaded reverse benchmark that we use with JRuby basically walks a bunch of strings and
00:20:49.770 reverses them manually rather than using the C or Java code behind the scenes to do it and you get a pretty good
00:20:56.640 improvement of every time you add threads we've got various reasons why it's not linear but you can reduce the
00:21:03.560 you can reduce the runtime of a benchmark or an application or a big data processing system by using threads
00:21:11.550 in JRuby which can't do right now and MRI so you notice that was nonlinear it kind of tails off
00:21:17.870 well there's various reasons for it one of them is that we can only get memory from the system as fast as at a certain
00:21:24.570 speed we can only garbage collect objects as a certain speed because that's also going to be memory access
00:21:30.150 and it's it's a proven law that there's only so much parallelism we can get out of a given application depending on how
00:21:36.660 much of the app you can paralyze you're only going to be able to get up to a certain amount of increase no matter how
00:21:42.570 many cores you throw at the problem I would love to have a 65,000 core machine but obviously it's not going to help if
00:21:50.490 I can't parallel eyes enough of my application anyway okay tooling this
00:21:57.450 came up and Brian's talk this has been a constant pain point for folks that work on Ruby and need to be able to monitor
00:22:03.180 and manage their applications diagnose problems profile stuff so we get all the
00:22:08.250 stuff that the JVM has for free we get all the profiling tools that are available and since jruby is just
00:22:13.770 written in Java and jump in some Ruby your ruby code eventually compiles down
00:22:19.080 to JVM bytecode these profilers basically do work with Ruby too so there's lots of choices for profiling
00:22:24.360 jruby applications that are essentially just JVM tools we also have some built-in features like dash dash profile
00:22:31.140 which does a basic flat profile just out of the box profile graph which will do more of a graph form there's a HTML
00:22:38.610 output JSON output and so on if you pass this ray if I classes flag to Jay Ruby
00:22:43.980 every Ruby class that you define will be defined as a Java class or a JVM class
00:22:49.080 and so heat profiles memory profiles and all those tools will show Ruby objects right alongside the JVM objects Scala
00:22:56.490 objects whatever that's running in the background so you can get a full profile of memory and of course there's all
00:23:02.160 sorts of command-line profilers and all sorts of other little tools along with profiling we've got monitoring we've got
00:23:07.680 an application that we have in production we want to see what's going on I want to be able to monitor the actual application performance monitor
00:23:14.640 if there's any problems again we get jmx for free there are gems that run on Ruby
00:23:20.520 that allow you to define jmx entry points so that you can push monitoring information push
00:23:25.720 metrix up to a server we have tools like j console and j visualvm that can
00:23:31.360 connect to an existing running application v MJ Ruby VM and give you
00:23:37.240 all sorts of information about it I'll show that in a moment most servers have additional tools on top of that so
00:23:42.580 depending on which way you deploy Trinidad short torque box or whatever you use there are usually other metrics
00:23:48.760 that are exposed as well to monitor the health of the application and of course new relic and all the other application
00:23:54.610 monitoring services have JVM support along with Ruby support so you can get
00:23:59.890 the full stack of monitoring just by using jruby and hooking up to the existing tools you're used to so visual
00:24:06.940 vm is the cool one that i wanted to show a little bit of it does the basics like cpu memory and thread monitoring showing
00:24:12.940 you what's going on with the application I can also do CPU and memory profiling of a running application either by
00:24:19.210 actually instrumenting the code and getting accurate counts or by sampling and just kind of guessing where about in
00:24:24.370 the code you are what methods are being called a lot what objects are being allocated a lot it has another plug-in
00:24:30.010 called visual GC but actually lets you see live over time what the different
00:24:35.230 heaps in the JVM are doing and you can see that they'll go up and i'll increase in size they'll get collected that'll
00:24:41.560 increase in size objects will get promoted to another generation and heap analysis doing memory dumps and seeing
00:24:48.040 where the objects came from why i have so many of these objects and digging into the actual in memory structure of
00:24:53.710 your application all built in and this is a standard tool that comes with OpenJDK so here is what the first screen
00:25:00.040 that you see basically shows the basics of what JVM we're running on what the command line was what the command line
00:25:05.770 two JRuby was some settings additional JVM arguments and properties and whatnot
00:25:12.480 this is the basic monitoring paid which has cpu information shows you the
00:25:19.270 overall heap size and the garbage and you can see the memory occupation that's
00:25:24.310 going up and down there and you can perform a GC just by hitting the button yep you can form a juicy and see if that
00:25:29.410 solves some of your problems or reduce it down if you want to do a heap dump for example it's monitoring loaded JVM
00:25:35.440 classes monitoring the three that are available this particular run probably just had the 10 or so jvm
00:25:41.529 built-in threads in their JRuby one you can monitor what all the different threads are doing the only Ruby thread
00:25:47.350 here is main which is the second from the bottom you can see JRuby has one jit
00:25:52.450 thread that's running there the JVM itself has a few threads for doing remote connections for doing signal
00:25:59.139 Management finalization things like that but you can see whether they're active whether their Idol you can see what they're blocked on you can see if
00:26:05.230 they're doing I 0 or if they're waiting for a lock all of that information will show up in here this is the visual GC
00:26:11.470 pain so this is the standard parallel collector on the JDM for which there's
00:26:17.200 in eden space where new objects are created to survivor spaces where they're promoted as they get older and then the
00:26:23.169 old generation where you expect objects to sit for a long time and be ignored for the most part by the garbage collector but you can see all of the the
00:26:30.789 CPU information you can see how occupied it is and if there's a leak you'll see it steadily increase over time and maybe
00:26:38.080 it's old generation data maybe you've got a problem in the young generation maybe there's something classes that are being loaded all of that information is
00:26:44.259 available here too and then the final couple here I mentioned that you could do heap dumps and analyze them this is
00:26:51.070 with that ray of five flag so we can see normal Ruby objects here showing up alongside regular Java objects I've
00:26:57.399 filtered out the java stuff but we've got this is actually a rails application that was just booted and most of the
00:27:04.389 objects that are in memory are from Ruby gems at that point ral does a pretty good job of lazily standing up the rest
00:27:09.879 of the system but there are gems is about 180 gem specifications that have been created for this app looks like it
00:27:16.090 might be running two instances of JRuby in this particular JVM and you can just go right down the line see all of the
00:27:22.029 Ruby objects in memory figure out where they're coming from and maybe fix a problem if there's too many similarly
00:27:28.750 profiling this is a CPU profile it's a little bit cut off on the side but we've got an abstract abstract controller view
00:27:34.990 pads class method something or other setcookie headers in there now these kind of get sort of mangled names from
00:27:42.100 JRuby but they're still fairly readable what we're hoping to do is possibly do a plug-in that will unman go all the
00:27:47.910 as well but the tooling is available and you can profile JRuby application with
00:27:53.190 all the standard JVM stuff well there is fast hat yeah there is a there's another project that's working on adding that
00:27:59.340 sort of D mangling a little bit more really specific layer on top of these JVM tools so more of that coming all
00:28:08.580 right well um up to this point we've only actually talked about the JVM so
00:28:13.860 really wasn't it everyone loves talking about the JVM messaging is awesome wara
00:28:19.290 write once run anywhere yes of course that still gives me nightmares um so one
00:28:26.160 of the main features of jruby is that you can actually script Java classes as if there are Ruby classes and work with
00:28:32.760 them so we're just going to pour it a few snippets of this javafx clock
00:28:40.260 application javafx assorted like a pimped version of swing it has vector
00:28:47.940 graphics for this particular clock app the the dials go around if you dragged
00:28:53.250 it larger it would drag larger and wouldn't get jaggi so there's three
00:28:59.250 snippets here and I'm really hoping that oh poop well you can there's an MH dot
00:29:09.630 and then oh god you can't see the center the get okay well I'll up I'll describe
00:29:15.750 this because there's nothing we can really do about it especially since it's
00:29:21.270 like 10 slides at the top snippet where drew this is the code for actually
00:29:27.390 drawing the minute hand so we're creating a path we're setting its fill
00:29:33.000 two black and then we have some relative drawing commands if anyone's ever
00:29:38.670 written PostScript it's almost identical to postscript and so this is the code
00:29:44.880 that knows how to draw a minute hand in the middle we define a event handler this event handler is actually a Java
00:29:51.870 interface and we're implementing it as a ruby class the actual handle method is
00:29:58.560 just calling this method called refresh which updates all the hands on the clock because it's only part of the clock that
00:30:05.260 moves and at the bottom we set up the timeline on the first line we say we
00:30:11.560 want the timeline to go on forever this is kind of the temporal part of the application on the next part we say once
00:30:18.910 a second send an event to the handler I provided which we know will go and call
00:30:24.280 refresh then we play it so if we look at this code you know unfortunately I can
00:30:30.730 see this better than you that first line is set fill and those next lines are gets their camel cased it really does
00:30:40.110 mostly look like Java at this point it's kind of a mutant but we provide a whole
00:30:47.500 bunch of shortcuts to actually get rid of this artifact the first thing we do
00:30:52.690 is allow you to get rid of get on getters so now instead of get elements it's just elements and we also do the
00:31:01.060 same thing for setters so now we can do fail equals black and also camelcase is
00:31:09.520 really ugly in the Ruby space unless it's a class name for some reason um but
00:31:14.650 we can we can snake case it so cycle counts now cycle underscore count again it's looking more and more like Ruby we
00:31:24.400 have a special feature that if the last argument to a method expects an
00:31:30.250 interface you can just pass a proc in and that proc will say okay I'm that interface so now we got rid of that
00:31:36.820 entire class and it's a little well how many characters as that it's ten characters it's looking a lot more like
00:31:44.650 Ruby now another thing that we do is we add common adornments to basic java
00:31:50.890 classes mhm data elements is actually an ArrayList or it's well it's actually a
00:31:57.010 java.util.list we alias add to be less than less than to make it behave more
00:32:02.140 like a ray we defining each method on it we include innumerable into it you'd
00:32:07.300 have a hard time figuring out whether you were actually using a ruby ray or a java
00:32:12.509 a thing I really like about this example is that these are all every single thing on here is a java method call it's
00:32:20.219 calling into not Ruby code but everything looks like Ruby code it all feels right like Ruby code and it works
00:32:26.789 pretty much like you'd expect you actually can write Ruby code that calls into any of these libraries without any extra magic and it still feels like Ruby
00:32:33.899 so the truth of the matter is some Java API is there much more amenable to just using shortcuts and looking ok many of
00:32:40.979 them look like crap still because they're based off of Java idioms that just don't look nice and Ruby because we
00:32:47.519 have things like closures here's a project that I wrote called Prague in it
00:32:54.329 allows you to make modifications to the minecraft game it's based on a java api
00:32:59.849 called bucket and lay minecraft here Minecrafters yeah do it and write your
00:33:06.839 plot write plugins you can script it in Ruby if you want but the idiomatic code
00:33:11.999 in bucket is pretty it's pretty gross so it's really powerful and it's a really nice set of api's but doing it the Ruby
00:33:19.469 way and doing it the Java way for this particular library it's pretty stark
00:33:24.529 this spring I actually made a logo interpreter for Minecraft and I said
00:33:30.359 what the hell let's let's let's show that instead oh that was a good good
00:33:37.649 good choice yeah it's more centered but the tops gone ok what can you do the
00:33:44.249 important steps in the middle so this draws a a pyramid at the very top we see
00:33:49.949 this layer do that's defining an action after this is defined whenever you go
00:33:55.949 and specify layer it just execute what's in it if we actually look at that layer
00:34:01.799 we can see four times we go forward a certain amount and then we turn 90
00:34:07.949 degrees so if you visualize this it's going to go and draw a square
00:34:13.609 everybody's done turtle logo at some point right has anyone not seen logo or maybe they don't know what it is okay
00:34:20.429 sweet yeah this logo really is a popular children's programming
00:34:26.140 pivots crazy I replaced I added some more primitives for doing absolute
00:34:32.600 directions and I'm not going to walk through it but it kind of goes up and positions you up to the next tier and if
00:34:39.740 we look at the bottom we just kind of looped through several times we draw a layer we pivot up and we draw another
00:34:45.020 layer sweet oh it's video time there we
00:34:50.270 go now logo normally uses a turtle but there are no turtles in Minecraft yeah
00:34:55.850 there's a bug here you can kind of see that the chickens drawing it but it it's kind of stuck in it yeah used to be on
00:35:02.360 top of it and then I forgot I installed this bouncing plugin so but there you go
00:35:08.870 that bit that Ruby code basically did all that so if you want to write a ruby script or a logo script that that
00:35:16.610 creates your entire farm you certainly could do that and if we really want to tie this back to scripting the point is
00:35:22.280 there's some fairly Ruby gross java code underneath this and it took like two
00:35:28.310 hundred lines and make the logo interpreter and it's that's probably only using about fifty or a hundred lines of Ruby code on top of this
00:35:35.630 complicated Java library so it's very very easy to do alright and cut out it's
00:35:42.350 fun okay another goodie is java native
00:35:50.090 extensions I hate all native extensions I wish everyone only wrote in Ruby if
00:35:56.900 you replace shooting with right and penguins with see this one makes sense
00:36:06.040 you should always be asking yourself this question if you can write it in Ruby and it's not like a hundred or a
00:36:11.960 thousand times faster screw that just keep it in Ruby if if it's something you
00:36:17.120 just can't do in Ruby then maybe you can script it in Java if you're in JRuby but
00:36:22.340 then that's not compatible so maybe should you use a late binding thing like FF I open then that probably won't work
00:36:27.530 on windows so you you have to make some decisions and if none of that works then then make a native extension just
00:36:36.230 briefly native extensions again this this object an imaginary thing by by
00:36:42.110 having it in Java we can totally see it with our tools and understand what's going on we're not just calling this opaque entry point into C and saying
00:36:48.800 what the hell happened hope it work how much memory am I using okay so the last
00:36:55.820 major section here my favorite talk a lil bit about the performance of JRuby today and hopefully where we're going to
00:37:02.750 be able to take it and we had this we showed you this before we could have just SAT back and let the JVM get faster
00:37:07.910 and JRuby would have gotten faster right along with it and this is this is cool we're not going to do this but it's cool
00:37:13.580 that we get this all for free but you look at where we are with java 7 there and JRuby 1103 and then we look and
00:37:20.930 compare across Ruby implementations over time or jruby versions over time and so
00:37:25.940 there's our 103 which was faster than Ruby 18 on the far left and then Matt bet massive improvement in the 11 series
00:37:33.260 lots of incremental improvements along the way so we've got the severe performance bottleneck what is that all
00:37:39.230 about why why did we have that well if we turn on some GC logging to see what's
00:37:44.870 actually happening we'll see tons of GC running for the one version JRuby 103 a
00:37:50.360 lot more than we would expect in fact there's even a full GC under that actually has to clean the entire EEP out if we run with Jo v17 we get one
00:37:58.430 collection of about seven milliseconds there 10 milliseconds eight milliseconds and obviously it's much faster the JVM
00:38:05.420 isn't wasting as much time jruby isn't wasting as much time and now there's another outlier another anomaly here if
00:38:11.870 we pull the 103 number off of there so we have a better graph well this is actually the JVM feature that's giving
00:38:18.050 us this this is invoked dynamic that's helping us do better so stepping taking a step back performance wise so Jerry
00:38:25.670 compiles Ruby down to JVM bytecode JVM compiles that bytecode down to native code and it's some of the best jit
00:38:32.390 technology in the world it's been optimized for years to be able to optimize the code to the fastest
00:38:37.580 possible native execution next native instructions and invoke dynamic makes it even better so if we want to apply this
00:38:43.580 to Ruby let's look at how to optimize Ruby first of all we need to do less work this is mentioned in co e cheese
00:38:50.030 talk it's mentioned in other implementers talk do less work don't send them as much
00:38:55.089 time setting up call frames reduce the overhead of doing dispatch caching methods avoiding hash hits and things
00:39:02.470 like that reducing memory overhead internally try not to allocate as much data memory if you can optimize it at
00:39:08.349 the vm level so that Ruby code doesn't optimize or create as many objects and then once you've got that sort of thing
00:39:14.140 try to find more static patterns in the code an application that runs long enough is going to be mostly static and
00:39:19.660 how it behaves and we can take advantage of that to optimize if we look at what the JVM does it's very similar it's
00:39:26.500 going to profile code watch how it executes and then optimize based on that what the hot pads are look for the hot
00:39:33.099 pads through the code the branches that are followed versus the branches that aren't and make sure that the fat the
00:39:38.500 hot code is the fastest most optimized code it does that by inlining code together it does that by optimizing
00:39:46.089 larger sections of code and in lining calls into one another it does that through escape analysis where it can
00:39:51.369 eliminate a lot of transient objects and this is newer but getting better with every release so let's walk through a
00:39:57.609 whole contrived example here we've got a little script that's going to loop few times call the invoker method which
00:40:02.920 calls foo foo creates an object now what do we want to happen as far as optimizing this well first of all we
00:40:09.010 want food in line into invoker if that's the only food that's ever being called there it should essentially boil down to
00:40:14.770 this code right well we're calling invoker inside the loop so we want invoker to also in line and basically
00:40:21.160 just have that object construction inside the loop and now it turns out that we're constructing this object but it's not doing anything no one uses that
00:40:27.970 object that should be able to just go away right that's what we want the JVM to be allowed to Mize okay so now we've
00:40:34.750 got this loop that does no work other than incrementing I all the way up to 10,000 well this is the code that we
00:40:42.369 actually should have at this point the loop doesn't do anything and it should optimize down to this but then no one reads I either after this point because
00:40:49.390 the loop is gone so this is what we want that code to optimize to and this is
00:40:55.210 what we can get with the JVM with invoke dynamic now this is why it's such a big
00:41:00.819 deal for us so this is a java 7 feature it's getting better with each release it's better the Java 8 builds now too it basically
00:41:08.390 allows us to teach the JVM about Ruby and how it works dynamic calls get in line constants become really constant
00:41:14.780 and all of these optimizations that you get for Java apply to Ruby too so does it work we released Jerry 17 with invoke
00:41:22.099 dynamic support so we'll go back to our little script we've changed it a little bit we took the object that new out to
00:41:28.070 simplify it we've got a loop to make sure we're running this code and letting it get optimized some of the many flags
00:41:35.510 for the JVM our print inlining print compilation let's just see the JVM actually compiling the code that we've
00:41:41.810 got let's just see how it in lines the code and what decisions it makes about how to optimize it so we go back to our
00:41:47.570 little progression here this is the interesting part of the inlining output from the JVM and since this is a little
00:41:54.710 difficult to follow we'll walk through it step by step here so this block zero at the bottom is basically that end
00:42:01.040 times block running a certain number of times in the JVM and this is the base point from which the JVM is optimizing
00:42:06.980 our stuff we can see the less than operation Ruby fix numb up less than and
00:42:13.490 we see the is true for the loop that's being in lined as well as it's hot it's got to be in there it's got to be
00:42:18.890 optimized together we can see the invoker call actually gets in line like we expect it to and then under the
00:42:25.160 invoker call we can see that the foo call also is getting in lined and this is all going to be generating that the
00:42:30.470 one native piece of code here so there's food that's all pulling that code in and
00:42:35.540 then finally the last piece of the last operation inside this block is doing the increment of I and this is also getting
00:42:42.200 in line everything here in lines and becomes one piece of native code that sent to this processor and it can see a
00:42:48.230 lot of these optimizations more and more of them as they make invoke dynamic optimize better and better so not it is
00:42:54.500 actually help we're in lining stuff we're doing what we think we should be doing but do we get the performance we want what we're hearing for more and
00:43:01.010 more people that jruby 17 is definitely faster even without invoke dynamic is faster but invoke dynamic just makes it
00:43:06.890 far and away the fastest we've got right now so let's see here are a bunch of
00:43:13.160 benchmarks base64 is just a base64 encoding Richards is kind of a vm simulator it's things that are difficult
00:43:19.109 for VMS to do like polymorphic dispatch neural is a little cs101 neural network
00:43:24.210 mandal bro is just going to generate a mantle bro fractal in memory and then
00:43:30.269 red black is just a red black tree it creates the big tree walks it deletes elements and so on so this is jay Ruby
00:43:36.059 on Java 6 without invoke dynamic and we pretty good you know this is compared to Ruby 193 and faster in most cases for
00:43:44.460 these particular benchmarks and if we add in invoke dynamic just with what
00:43:49.710 we've done today just what we got in junior be 17 and we're multiple times faster on some scripts five to ten times
00:43:56.190 faster without doing anything other than using a newer jbm and turning invoke dynamic on smooth sort was one that came
00:44:04.470 up recently Chuck Remus implemented a port of a C C++ version of Dijkstra's
00:44:10.109 smooth sort and so we looked at it and we did some minor changes in JRuby tried to optimize to make sure it runs fast I
00:44:17.130 threw Ruby 20 and this is built today so this should have at least all of the
00:44:22.230 optimizations that are in Ruby Ruby trunk today and again it's significantly
00:44:27.539 faster in the JRuby case all right so what about rails that's the big one that
00:44:33.029 everyone cares about and we've run in close on time here so here's here's one
00:44:38.720 testimonial that it runs rails a lot faster than 193 here's some of the torque box guys they've done some
00:44:45.029 measurements where they show latency is lower throughput is better other metrics
00:44:50.279 like CPU usage you're using less CPU time you've got more free memory this stuff actually does work all right so
00:44:58.190 wrapping up it's your turn we want you to try your apps on JRuby try your
00:45:03.779 libraries in JRuby if you're not testing your library on Travis you should be and if you're not testing it on all the
00:45:10.049 implementations you should be so please turn on JRuby turn on Rubinius turn on whatever other place you can get access
00:45:15.660 to let us know what you think and help us improve jruby if there's edge cases that we don't support or if there's
00:45:21.239 problems tell us and we want to fix them you want to work with you and is anybody interested in a buff this evening
00:45:27.089 that'll just like questions and answers and office hours kind of stuff all right there's a few folks so we'll hang out
00:45:33.300 in room C I guess is equivalent to track 3 was the far blend of like 40 5406
00:45:40.260 rooms whatever will be in there around six-thirty and just hang out until seven-thirty or until we feel like we're
00:45:46.230 done all right in the last few plugs there is a using JRuby book we're going to be updating this but it's still an
00:45:51.720 excellent book right now for getting into JRuby and getting the details of it deploying with JRuby if you want to take
00:45:57.420 the next step and put an application in production excellent book covers a lot of different ways of deploying applications and that's all we have and
00:46:05.400 I don't think I have time for questions if you want to grab us in the hall or come to the buff we'll be here all week
00:46:11.310 so thanks very much
00:46:44.400 you
Explore all talks recorded at RubyConf 2012
+46