00:00:00.359
foreign
00:00:14.900
my name is Brian Lyles I'm from Baltimore and I'm here representing Thunderbolt Labs but I'm not talking
00:00:22.439
about Thunderbird Labs today we're going to talk about Ruby simulators and um
00:00:28.199
to get started with this this is a talk about writing simulators
00:00:34.320
in Ruby now you might say to yourself who would want to write a simulator in Ruby I mean it sounds pretty
00:00:40.860
Preposterous Ruby is not known as a great scientific or mathematical
00:00:46.379
language a lot of top Minds who are actually creating this kind of software don't use Ruby or really they don't even
00:00:52.559
use a lot of general purpose programming languages anyways they use things like Mathematica or crazy things like R but
00:00:59.280
you know what we're crazy people so we're going to write a simulator in Ruby or at least talk about writing
00:01:04.440
simulators in Ruby so why Ruby well the first thing this is rubyconf I'm sure everyone in here loves Ruby I love it
00:01:11.880
it's actually one of my favorite languages probably it's like one two with my favorite languages so I like to
00:01:18.840
code in Ruby because you know Ruby is very expressive I have not found anything ever that I've tried to code
00:01:26.700
that I could not actually just sit down and Hammer out in Ruby I'm actually I've done a lot of java and I've done a lot
00:01:32.759
of other languages and sometimes I scratch my head trying to figure out exactly how how would I actually codify
00:01:39.420
this idiom in code another thing I like about Ruby is there's no run compile loop I mean you
00:01:46.020
write the code and you run it and if it breaks you fix it and you run it again
00:01:51.079
there's not a lot of setup you don't there's no Linker there's nothing like that and you don't need I mean if you're
00:01:56.579
using MRI and you just have Ruby on your machine or using jruby you don't need much else to get Ruby up and running
00:02:03.060
so the next thing about Ruby is everything is an object and I really just enjoy this
00:02:10.380
fact that I'm going to model the world in oo and I'm just going to apply my oo Hammer everywhere I can because uh-oh is
00:02:17.520
the best way of doing things and I'm just kidding here so let's get into past the introductions
00:02:24.060
and talking about the building blocks of simulations so when you build a when you're building
00:02:29.520
a simulator um what we're actually doing is taking these Concepts called models and we're
00:02:35.819
giving them inputs and they're going to spit out outputs and we're going to use multiple models to actually
00:02:41.239
model or actually build um an effect and we're going to actually reason on the effect that was built so
00:02:48.120
before we talk about that there's some vocabulary words I hope you guys brought pencils and internet so you can actually
00:02:54.060
look up these words the first word is deterministic and I thought the best way to show what like a deterministic model
00:03:01.080
would be by actually writing some Ruby code that has that actually isn't a real model
00:03:06.840
um we've all written code like this what this is is a model of the world and
00:03:12.599
and it has a method answer to life and what makes this deterministic is that no matter how many times you run the answer
00:03:18.480
to life method on world model you're always going to get 42. very simple example here and continuing on models so
00:03:26.819
models can have inputs like I said so in this case we have a triangle class and we want to solve for the hypotenuse and
00:03:32.940
I hope everybody in here knows the Pythagorean theorem so you know hopefully you can check my math
00:03:38.580
so a squared plus b squared yes equals c squared and you notice I'm I'm giving
00:03:44.519
two inputs um the length of a the length of B and I'm using the square root of after you added them up and you will
00:03:51.540
always once again get the same answer so once again this is deterministic
00:03:57.659
so here's an another example um and to talk about so one of the
00:04:03.599
things that I do with with modeling is um we are actually building models of
00:04:09.420
infections and things like that but infections are boring and gross but you
00:04:14.519
know what um I bet when you were small cooties was fun and and it's funny and
00:04:20.160
I'll tell a little story here so I gave this I gave another version of this talk in Belgium where you know everybody speaks Flemish and not English and I put
00:04:28.199
a I put a slide up there it said cooties on it and they said who and actually people were Googling while
00:04:33.720
I was talking figure out what cooties were you know I'm glad I'm black in the United States where you guys actually know what cooties are so um
00:04:41.160
so once again we're talking about deterministic models models that there's no Randomness in these models you give
00:04:46.440
them input if you give them the same input you will always get the same output and and we're actually talking more about our simulation now so there's
00:04:53.400
a cost for cooties so you know what if this side of the room um actually gets cooties you know it's
00:04:59.639
going to cost like ten dollars to get rid of give everybody up your cootie shots so these are things that we um
00:05:05.520
want to model so um so our models won't be deterministic there's another word
00:05:11.400
another vocabulary word and I hope I spelled this right I'm sure I did it's stochastic and stochastic means that these are
00:05:17.280
models that have a little bit of Randomness in them and I only have one slide on this because
00:05:22.979
um I think I can illustrate this pretty succinctly here so what we have to do is
00:05:29.220
that so if this guy right here in the front row and this other guy here in the front row the guy on the left with the
00:05:34.860
gray sweatshirt on here has cooties what is the percentage chance this guy
00:05:41.400
here that's two seats away from him is going to get cooties I'm sure it's pretty high but according to my model
00:05:46.860
here the if um we actually have Iran so every time you run this there's actually a chance where you won't get it and
00:05:52.259
there's a chance that you will get it notice the chances one tenth of one percent so it's not very high but he's
00:05:57.360
been looking at him the whole entire time so I'm sure there'll be lots of chances for transmission
00:06:04.680
so um now we are experts in models and and I want to say this very very simply
00:06:10.440
um models we just modeled the world what we are doing is just coding what we see and what we know and mathematicians will
00:06:17.039
actually have large amounts of differential equations and they use Mathematica and it takes from what I hear it takes minutes and minutes and
00:06:23.160
minutes to run this but they don't have to be that they don't have to be that hard and like I said Ruby's expressive
00:06:29.039
everyone in this room even if you didn't really understand Ruby per se or you're a ruby like a like
00:06:35.460
a a neophyte you understand what's kind of what's going on here and like I said the expressiveness is the win
00:06:42.479
so let's talk about the Ruby that I like um MRI is it's great
00:06:48.000
um with 193 and the latest releases 193 you have fast Run cycles for your test you have a lot of gems out there but the
00:06:55.740
problem with MRI is I just don't understand it's garbage collector um
00:07:01.440
and I just can't get my head around threading and a couple other things so let's talk about jruby so what do I like
00:07:07.740
about jruby it's fast and you know I have to give kudos to the jruby team over there um one seven the release
00:07:14.759
it is way faster I mean startup is kind of slow still but you guys know this but when it gets up and humming it cooks and
00:07:22.199
here's some proof because you know we all like to use micro benchmarks to prove um all of our cases here
00:07:28.620
so um actually this slide is borrowed from um actually a slide later in the
00:07:34.080
talk where I actually showed the code of big arrays so what's really what's going on here is um you'll notice that I've
00:07:39.180
actually run it twice and the three so the arrow and the three is actually my prompt the three means I'm using MRI on
00:07:44.460
193 and the little Diamond means I'm using jruby I'm using RBM to switch between rubies and the first one and the
00:07:51.960
first one you'll see that the first line of the run is actually populating I think it's a million arrays and then
00:07:58.919
querying it randomly a million times and the second time is properly populating a million hashes and querying it randomly
00:08:05.819
a million times so you notice up top it's 42 and 141 but notice at the bottom
00:08:10.979
it's 41 and 73. um I you know benchmarks Michael benchmarks do lie but come on that's
00:08:17.819
twice as fast almost so I mean that's a big deal and that's why we are actually pursuing jruby for
00:08:24.300
this exercise and I know everybody likes pictures I kind of will explain what this picture means and and later on in the talk but
00:08:31.379
look at the slope of this line This slope of the line is um actually an old version of our simulator and you'll
00:08:37.020
notice and it's actually because our simulator has iterations so these are actually tracking the time of the
00:08:42.060
iterations and notice the slope goes kind of up and there's some outliers so I drew a little um trend line so you can
00:08:47.580
actually see the slope and lime and you notice towards the end I was actually getting into Ruby's garbage collection
00:08:52.740
so times were getting so that's why times are skewed off the line so same thing with jruby first thing you'll
00:09:00.660
notice is that the slope is much slower one thing you'll notice at the on the absolute left part of the graph what
00:09:07.440
that actually is does anyone know what that is take a guess of what that is on the why there's a lot of um dots on the
00:09:14.339
left side and they go up why they're not on the trend line does anyone know what that might be
00:09:19.920
what do you say that is legit warming up so notice after the jit warms up it Cooks I mean it
00:09:26.820
really does move quickly and you know having a nice Jet and I'll and I'll be
00:09:32.100
frank with you I have not tried this in ribinius um no no there's there's real there's
00:09:37.140
technical reasons why I haven't tried this in ravenous but I think even with ribinius with a real working jet I mean
00:09:43.440
we are getting some real performance gains and I'll tell you this code and this code was actually the same exact
00:09:49.620
code just one was running with jruby and one was run with MRI so another thing I like about
00:09:55.019
um jruby is the jvm the jvm is a lot of smart guys over a lot of years writing a
00:10:02.100
lot of neat code I don't understand it I don't understand all the ins and outs hotspot I don't understand all the ins and outs of garbage collection I do know
00:10:08.820
that um it uses all the memory you have um
00:10:14.160
so um right here I have one of these newfangled MacBook Pro retinas and I got it with the 16 gigs of memory I actually
00:10:20.640
can I've never in my life said I'm going to run a process on my Mac that will use all the memory on my box I just wrote
00:10:27.420
One so and another thing is um it uses all the cores and not to say that rubinius
00:10:33.839
and MRI I'm not going to talk about rabbinis anymore because I'm not picking on ribinius I'm not and I'm not picking an MRI but MRI can kind of use all the
00:10:41.040
cores but um good old um Global interpreter lot kind of limits you to one chord so let's
00:10:48.540
let's dig into that so whenever you have um things to execute on MRI it kind of
00:10:54.180
looks like this so you got so each one of those orange blocks is a new instruction so what
00:11:00.540
happens when you run thread new well not quite what you would hope so what happens is it does actually allow for
00:11:07.680
parallel execution it's just on one core and who here actually has a one core machine that you code on
00:11:14.339
right so it's just a waste of money so with jruby um same thing orange things are the
00:11:20.339
blocks to be or execution and you run thread new and hey look you potentially could be run on multiple cores I mean we
00:11:27.779
don't know this because the operating system is smarter than this but the potential is there but you know I don't want to be I don't
00:11:33.779
want to poop all over MRI so um I want to actually give a solution so if you want to run on multiple cores on who
00:11:40.560
knows how to do multiple cores who knows how to break the global interpreter lock and
00:11:45.899
MRI and a c extension well
00:11:51.540
actually no it's it's actually not that hard um so you just have to write a little bit of c
00:11:57.480
um and what and what this function does so this is C this RB thread blocking region the first argument the second
00:12:02.760
argument are the first argument is the name is the actual method and the second one are the what you're going to pass
00:12:08.880
into it so whatever you run and use all the cores that core that code will actually run outside of the global
00:12:14.640
interpreter lock so I mean we I mean there are ruby gems that use this what seeks and use this but you know it's not
00:12:21.060
readily accessible you know we get this for free for easy without having to write C extensions in jruby
00:12:27.060
so enough about jvm so who here statistics who like statistics who know
00:12:32.700
statistics so everybody who has their hand up probably knows more than I do um but you know what I can still share
00:12:39.180
so we have a bunch of numbers and why do we use statistics we use statistics because we want to actually reason about
00:12:44.240
output and data so we have this we have this list of 10 numbers and they actually they are random so and I
00:12:50.700
graphed them using numbers on the Mac and it looks kind of like this and if you look at this you have no idea
00:12:55.920
exactly how these how this data correlates to each other so um the simple things like the simple simple
00:13:01.740
tenets of Statistics are let's look at the mean the mean is the middle value it's not the median not the middle value
00:13:07.980
but the value that would be in the middle of all of them so the meanness Dot 42. and I shouldn't say dot 42 so
00:13:14.579
say 42 hundredths or 4300s and then we look at the max and then we also we look
00:13:19.980
at the men and the most important thing is we look at the standard deviation because what we're curious about is how
00:13:25.200
much so if you're running a so if you're running a stochastic simulation and your numbers are all over a place maybe your
00:13:31.860
model is not tuned correctly so we always look at the standard deviation to make sure that the data that's coming
00:13:36.959
out at least the numbers are similar so maybe there's an accepted amount of error and inside and I'm actually surprised
00:13:44.160
that Ruby doesn't include this but there's a nice gym called descriptive um statistics that you can install and
00:13:50.399
what it allowed you to do and I actually don't don't do it this there's there's actually two ways to do this you can
00:13:55.740
actually uninstall this gym and then you can actually require descriptive statistics
00:14:01.019
and what it does is it actually extends core extensions and I know we don't like that so um what you can do is actually
00:14:07.980
require descriptive statistics safe and you can actually say so I have an array
00:14:13.200
a I can actually go a extend descriptive statistics and I actually get those methods like standard deviation min max
00:14:20.100
averages and all the things that array does not already include
00:14:25.260
so another neat thing about statistics are distributions and so I was writing some Ruby
00:14:32.100
and first of all this is not Ruby so one thing you're going to learn when you're writing statistics or writing
00:14:38.160
simulations is that Ruby just does not give you everything you need actually this is does anybody know what language
00:14:44.160
this is yeah it is our and you know what you wouldn't normally see it like this you
00:14:49.260
would probably see it like this this would actually given it away really quickly this is our what this does is it
00:14:55.680
generates something that looks like this um what is this does anyone know what
00:15:01.740
this is and there's a normal distribution so we'll use normal distribution so
00:15:07.019
normal distribution um I guess the canonical example is the um your professor in college you know someone
00:15:13.199
had to get an A you know the class was hard and someone had to get an A so what he will do is he would actually he would
00:15:18.839
actually readership you everybody's grades on this bell curve so most people are getting C's and only the top few are
00:15:24.540
getting A's no matter how bad their grades were so what we do is we use distributions to actually model our
00:15:30.420
numbers so they are something that we can expect and another thing we do with my and we're talking about standard deviation
00:15:36.360
earlier so actually you can actually model um I can actually with r draw the standard deviation so I wanted it to be
00:15:43.320
um 0.5 and so it's actually one so I wanted to actually see so if I was actually
00:15:48.660
um examining this this graph for um see what output was I would actually just
00:15:54.360
only look in the gray block and the cool thing about this is that this code right here you can't do this with Ruby right
00:15:59.459
now um there's there's a project out there called protoviz which was actually I
00:16:04.560
don't know if it's still going on um I think so you think they encourage everyone no there's a project called
00:16:10.380
protoviz so Ruby people you know we get projects and we name them rubyves and it can actually generate graphs like this
00:16:16.620
but the problem is the people who are doing Proto um viz actually retired that project and created something great
00:16:22.019
called b3.js but we'll talk about that later so um here's another thing so here's
00:16:28.560
more here's actually another way to generate a distribution or generate a graph in um in R and right this this
00:16:36.180
right here is a beta distribution and beta distributions take um two values the two is actually the alpha so if you
00:16:43.440
look on Wikipedia and you look up beta distribution it's going to take input two inputs the two is Alpha the five is
00:16:48.839
Beta And depending on the on the um those two numbers it actually does draw a different graph or it actually does
00:16:56.220
draw different distribution so notice this one actually is more to the left and I don't know all the fancy technical
00:17:01.740
words for that so I don't want to confuse anybody so moving on there's also other types of
00:17:08.819
distributions actually there's a whole list of distributions and this one here is the wible and I just I only put this
00:17:13.980
in the slides because I like how Weibel sounds I just want to say why but all day long so Weibel with the shape one
00:17:20.880
actually generates the graph looks like this so how is this useful actually you know it um someone told me like a few
00:17:27.900
minutes ago what this is useful for and I already forgot so just know that you can do this
00:17:33.780
so um going back to Ruby because we are talking we are at Ruby comp we are talking about doing simulations in Ruby
00:17:41.220
um there's actually a gem out there called distribution and you can gem install distribution and this is how you
00:17:46.500
would use it so remember that graph that I had where it was the beta distribution it kind of went up on the left side and
00:17:51.780
then came and it slipped back down to the right um we can actually generate that distribution in Ruby and it's actually
00:17:57.419
really simple code I just put in I just put in 0.2 and notice I have a 2 and the five there and it actually generates
00:18:03.660
that number 2.73 so what I'm saying is that whenever X on the graph whenever X
00:18:09.539
is 0.2 the value is going to be 2.37 and as you notice I drew a little arrow
00:18:14.880
there to actually show you that so how would we generate a graph like this from Ruby
00:18:19.980
so let's see more code so you're going to notice a little um a little thing about this talk is that I put a lot of
00:18:26.460
code in it and if you I just I just like looking at pretty color code so this is a lot of code in this talk so actually
00:18:33.120
what I'm doing here in in this right here is I'm actually generating an array that that includes all the values of the
00:18:39.000
distribution and I'm actually sampling it 1000 times so because we're using Ruby we have to
00:18:44.940
actually write we have to have to generate our graphs in Ruby and using um state-of-the-art Ruby technology I get a
00:18:50.400
graph that looks like this so remember the pretty um canoe are pretty R graph that you know went up and
00:18:57.660
down um yeah I'm just not getting this actually what this is this is um spark written by I think by Zach Holman at
00:19:04.140
GitHub and in this right here actually was is actually on the console
00:19:09.360
so it's actually ASCII and I just colored it so so this is just the um this is state of the art right here so
00:19:15.240
don't tell anybody this stuff this is I mean this is new stuff right here so once again another distribution and
00:19:22.740
actually right here what we're so what we have right here is actually
00:19:28.020
there's there's a slight type of um so when we have distributions um there's there's the there's the um
00:19:34.740
the PDF which is the distribution that I showed you before but there's also called something called the CDF which is the um
00:19:41.580
the cumulative distribution and what it what a good example of that would be so
00:19:46.620
when a woman a woman goes 40 weeks for having a baby so actually somebody could create um create an actual probability
00:19:52.440
distribution for when a when a lady is going to have a baby what the percentage is but you notice that the graph so if
00:19:58.200
we use our graph from before um notice that right here notice that it goes up and down what a cumulative
00:20:04.860
distribution does it says that you can never really go down so actually as you
00:20:10.020
near your due date the graph will actually go up and once again um using Ruby state-of-the-art technology um I
00:20:17.160
created a graph to show you that any questions about that graph
00:20:23.820
so I mentioned spark every earlier and if you're on a Mac and you have Homebrew
00:20:29.160
you can Brew install spark um it's actually it's neat you can actually just pass it a list of numbers
00:20:35.280
and it'll just create a graph for you so and here's what I was talking about
00:20:40.620
pregnancies before and little sample code here
00:20:46.080
so another thing I want to talk about is sampling distribution so what in a lot
00:20:51.840
of cases what our stochastic models we're going to actually want to sample a distribution we're going to just want to say I want some kind of random variable
00:20:58.380
out of so my distribution describes some kind of Randomness and I want to actually just pull a variable out there
00:21:04.140
and what I did is when I wrote this gym called Vos and what it does is instead of so imagine you're rolling a die and
00:21:10.679
the die is not loaded so what's the percentage so you roll a die and there's a percentage so you have you have six
00:21:16.559
things that you have six outcomes um so what this does is similar but the um the die is loaded not everything is
00:21:23.039
the same so what my Bose Alias method will allow you to do is allow you to sample a distribution in in constant
00:21:29.400
time and coming in to find out so when you write web code for years and years and years you tend to not think about things like constant times like I got
00:21:36.299
caching who cares whenever you do things like this there's no caching so what this thing right here does is it
00:21:41.760
actually samples it actually just samples its distribution 100 times but it uses the Bose Alias method and notice
00:21:47.100
that it's just a little simple DSL on actually rolling a die that is based on that distribution
00:21:53.400
so let's talk about big data I mean because you know this is actually how I got my talk accepted here I think I put
00:21:58.440
big data in the talk thing so but you know what I really thought that I could
00:22:03.480
come bigger than that so let's talk about huge data and this is actually the new thing that's going to come and um so
00:22:09.840
let me talk about our simulation and you'll notice that I haven't really talked about our simulation because before when I was explaining this I
00:22:14.880
actually just jumped right in and threw a lot of code of people and I'm like I didn't tell people all the building blocks so they could actually understand
00:22:20.820
how awesome I am for writing these kind of simulations so um let's say our simulation has 100 people and the
00:22:27.600
simulation goes for 360 or 3650 iterations which will be for our for
00:22:34.020
example 100 or 10 years so if we do a little bit of multiplication we notice that we have 365 000 actions
00:22:41.220
so because we are big data and we're using active record this is what I thought we would do we would just create
00:22:46.919
an active record class called Observation and every time we every time we created one of those um what's that
00:22:53.880
number um 365 000 actions we'll just create a new observation so you know um that
00:23:00.240
didn't work that well so what the problem is is what if we actually have three billion actions and
00:23:06.600
this is what will happen if we run 100 000 people over a hundred years we can't actually run
00:23:12.840
um active record create three billion times I mean we just can't do it I mean it's slow
00:23:19.440
enough doing it one time so um one of the issues here is now either think so we have all this data
00:23:25.380
and actually um if I actually turn on all the logging out of the system it actually generates
00:23:31.140
over two gigabytes of data in two seconds and just think about that I mean this is something running on this box
00:23:36.480
this is not even it's not even big metal it's and it actually makes my SSD kind of wine
00:23:42.179
so it's actually goes from it actually does 400 megabits for like two seconds and you say this is kind of crazy so we
00:23:48.240
really can't use active record so you know um since this is Ruby cost um I figured we would just do the easy
00:23:53.520
thing maybe we'll dump it to and we'll dump it the couch and before I started I had one problem where I was
00:23:59.039
trying to create a simulation if I dumped it the or dumped at the couch now I had like five problems trying to figure out does my data
00:24:05.159
actually even there so um so I you know I'm I'm outside of
00:24:10.919
the box thinker so I said I'll think outside the box so anyone familiar with these two databases volt DB and Druid
00:24:17.640
um so these are like these new um newfangled olap all memory databases that have they're like really awesome
00:24:24.240
but the problem is is that um you should look at these and only put these in my slides because I actually want to show I
00:24:29.700
I don't want to hear my sequel postgres blah blah all the time people need to look at these new these new type of
00:24:35.280
databases that are actually can do real-time data and they can actually do real-time transactions on real-time data
00:24:41.220
I'm talking about if you're an ad serving provider and you're doing let's say you're doing 100 000 Impressions per second these databases can handle it
00:24:49.500
so um so I said you know what um I'm going to be a Luddite here
00:24:54.659
I'm just going to insert it into postgres like this so actually here's a little um there's just a little rails
00:24:59.940
thing um you can never never you can't create a thousand active record objects
00:25:05.059
easily so what you what I always do is I actually go right down to the um the connection and I execute this myself and
00:25:11.340
I just build these I just build this up myself so the next problem with um simulations
00:25:17.280
is you got to worry about memory management and um once again rails made me soft um once upon a time I was
00:25:23.280
actually a c in a similar program primer and we actually had to think about memory management but with Ruby you're like nah screw it I'm just gonna I'm
00:25:30.000
just gonna do things like this I'm going to create a billion people and then I'm going to put a billion people in an
00:25:35.460
array because you know what's the worst that can happen so um so I have this listener here and
00:25:42.120
this actually is just the example this listener is an observer so you actually see that um the the new person is
00:25:47.940
actually a callback so when the simulation runs this actually there's actually callbacks so every time a new person is created or born the p a person
00:25:54.779
gets attendance to the people array seems seems perfectly fine to me so this is what happens
00:26:00.720
um the simulator actually runs and then you pass the listener and then you do um
00:26:05.820
you do you actually run the simulator and then you can actually look in the listener and you can expect the people so what happens if we do this again so
00:26:13.440
let's say because I can't run the simulator one time I actually run a simulator maybe 10 to 15 times so I can actually get a good amount of sample
00:26:19.500
data so if I do this again guess what happens and actually these next few slides are going to show you something
00:26:25.440
of one of another reasons why you should use jruby over MRI and I'm not here just advocating it but these next few slides
00:26:31.860
are awesome so anybody know what this app is and I'm sorry it's really fuzzy because it was a small image what is it
00:26:37.740
it's visual VM and what this is actually showing and if you can just look at the look at the on the left side the y-axis
00:26:44.580
and this is actually running on this mat of how much memory this thing uses um that was actually this is actually
00:26:50.580
only two runs of a simulation I was actually trying to do 10 runs so notice it actually caps off at the first run it
00:26:56.159
hit seven gigs or seven seven gigabytes of memory on the Heap and then garbage collection came through
00:27:02.460
and it cleaned up a whole bunch of stuff but the second time it went up to almost 11 or actually went up to 11 gigabytes
00:27:08.220
and after that um so what happens when jruby runs out of memory is it does something really really cool so if you
00:27:13.980
have four cores in your box JB will be like you know what you're out of memory now I'm gonna you so um
00:27:20.760
what it does is um since the garbage collector I believe the garbage collector runs in another thread and it says well you know what that garbage
00:27:26.159
collector is too slow I'm gonna run something in another thread all of a sudden your machine is screaming and all the CPUs are pegged and you can't
00:27:31.620
control see the application anymore just have to wait so um let's let's look at this so This
00:27:36.960
is actually the other side this actually is the same image I actually just I couldn't fit it all because it's wide but you notice that um if you look on
00:27:44.039
the bottom there's a lot of GC on the bottom and what's happening there is um the people actually what's happening is
00:27:50.820
it's actually a memory leak and I'm surprised no one called that out so what's happening is I'm actually populating an array
00:27:57.419
and then creating another object but I'm never releasing anything in that array so jruby's like I'll keep it
00:28:04.740
so what I did is a simple is actually a very simple thing is that after We Run The Listener after we actually do a
00:28:10.080
simulator run we just call reset and we set we set people to an empty array and we do the same thing and we get
00:28:16.980
something more sane here so notice same code only change was that reset line and
00:28:22.500
notice that what happens is so whenever it runs it just uses less memory and it actually gets rid of all the people that
00:28:27.659
it never uses why persist things that we aren't going to be using it's only used for calculations so just a little just a
00:28:33.600
little reminder this is a reminder for Ruby code we can leak memory like crazy
00:28:38.640
in Ruby code rails proves it every single day so once again the other side of that
00:28:45.000
graph so um I've been talking for a while and I haven't talked about building a simulator yet
00:28:54.419
oh because I because I passed Dash yes because by default it's um 500 mags yeah
00:29:01.320
that doesn't work yeah it actually if I want to troll myself it yes
00:29:12.000
right you know what and it's not and it's not a fault of the language it's the fault
00:29:18.120
of me the programmer I'm leaking memory that's oh it's okay okay I'm holding
00:29:23.220
memory you know what I'm going to retract what I just said from what Ryan said I am not leaking memory I'm
00:29:28.380
actually holding on stuff that I don't need which actually makes a lot of sense
00:29:33.960
so um now on to building a simulator so um let's talk about this simulator so in my simulator I have a group of people
00:29:39.779
and I have eight people here and what the simulator does is it actually runs over a period of time so what happens is
00:29:46.320
this girl gets with this guy this girl and then this guy gets this girl this girl gets with this guy this guy gets
00:29:52.020
with this guy because that's how my simulator rolls and we actually um and we actually try to figure out
00:29:57.299
what happens and how cooties are being spread so just to show you a little example of why we're actually doing this
00:30:03.539
I prepared a short video
00:30:11.159
but I didn't want him to think I didn't trust him I didn't know I could catched on the playground we were in love so I didn't
00:30:18.600
think about it all I did is trade Lunchables every year two million kids are infected
00:30:25.980
with cooties
00:30:33.720
and two other kids Cody I just wanted to play Ted I never
00:30:39.899
thought I'd be it and the numbers are growing
00:30:46.080
blame myself
00:30:54.260
and even though a vaccine is available
00:31:03.360
children never you may have cooties
00:31:14.000
speak to your kids about cooties Cody speaks to them first
00:31:19.700
what do I do now oh
00:31:29.159
you're getting a little example of my passion for actually solving or actually being able to tackle this epidemic of
00:31:35.760
cooties so back to our simulation so our simulation is actually a big loop you
00:31:40.980
could actually just think about it everything every day we just increment the day and we just run it again
00:31:46.260
so what do we do in each day so there's a there's a group of people that are actually alive in our simulation what we
00:31:53.220
do is we look at each person and we determine hey people who is ready to actually transmit cooties every day
00:32:00.000
or actually who's ready to pair up and transmit cooties so what we do is we find people who are compatible and this is what the simulation does and then we
00:32:06.960
um we group people together and then what we do is we do some complex calculations to see if cooties are
00:32:12.000
actually shared and that's a technical diagram right there so and that's actually all the simulator
00:32:17.700
does and you know what I do have code I'm just wait until later on this afternoon
00:32:23.700
um we're gonna we're actually going to I have an unembargoed version of the simulator that I think I can share with
00:32:29.640
you guys I will put it up on GitHub so you can actually see what a simulator in Ruby looks like
00:32:34.860
um so moving on so before you can write this like I was talking about earlier um
00:32:40.140
Ruby just does not like you putting a thousand or no actually not a thousand a million items in an array it just says
00:32:46.740
you just shouldn't do that and actually you know what we really should not um be putting a million items in an array it's just there's
00:32:53.520
um our computer science classes our data structures classes told us that there's just much better ways of storing things
00:32:59.159
and the same thing with the hash so earlier I was talking about that um Benchmark that that I showed the output
00:33:05.340
for this is actually the code to make it simple once again I just populate
00:33:11.159
um an array put a million items in it and then sample and then actually um query it randomly a million times and
00:33:17.039
the same thing with the hash and to recap we notice that jruby is faster
00:33:22.679
than MRI so um but you know what that doesn't really mean something it doesn't really
00:33:27.960
it's like benchmarks micro benchmarks are bad doesn't really mean anything in the grand scheme of things so you notice
00:33:33.120
that um this is my actually notice I got three up there this is actually a run of and probably the version of the the
00:33:39.120
simulator that I'm going to share um this is actually with a thousand people over 100 years
00:33:44.240
36 500 days and you notice it ran in 129 seconds but if you notice
00:33:50.519
um jruby ran at about the same time so just because your micro benchmarks are faster
00:33:55.559
does not mean that it will actually double the speed of your app there's just other things going on
00:34:01.380
so um next thing I want to do is talk about algorithms in Ruby Ruby is just missing
00:34:07.860
a whole bunch of neat algorithms we don't have a real Heap in Ruby in the
00:34:12.899
standard lib we don't have priority queues and actually those are things that we could actually use and let me
00:34:18.240
show you a demonstration of that so every day we have an event or when we have we actually keep a track of events
00:34:24.659
and every and if we put events into my um into this
00:34:30.960
into this array and we pop it off you know that that's that's kind of cool but that's not describing what we want to do
00:34:36.540
actually what we're really doing is having a priority queue so really what I want to do is I want to say add this new
00:34:42.720
event at priority 10 and then the next thing I have to do I don't have to actually go through my array to figure out what I'm doing I actually have to do
00:34:49.080
is say hey event pop off the next item and it will pop off the thing in this case with the lowest priority which will
00:34:55.500
be 10. so that's cool and all so I like The Benchmark and I Benchmark all the time
00:35:00.839
so here's a cool thing um I actually have an array and then I have an array where I'm inserting at a position and
00:35:07.380
then I have the implementation of this priority queue that comes out of that algorithms gym you know it's nice for
00:35:14.400
the nice DSL and the nice ability to do this but you notice look how much slower it actually is because this
00:35:20.940
implementation of the Heat and the priority queue is actually coded in Ruby and this was actually a ruby summer of code project I don't remember who did it
00:35:27.540
but I mean it's a great effort but look we don't in Ruby we just don't have very
00:35:33.060
fast data structures um on the tangent um python which is another language that some of us love to
00:35:38.700
hate has numpy they have Panda they have so many nice things Ruby um python
00:35:44.160
because a lot of scientific communities use it actually puts an effort on making real fast data structures
00:35:50.760
so moving on so once you have a simulator another thing you're going to think
00:35:57.060
about is so you're going to build a simulator and even meet um Supreme coder up here on the stage right now when I
00:36:02.700
build simulators the first time I run it they always have there are they're actually wrong so I don't maybe I don't
00:36:08.700
have the right amount of Randomness you know so like I said a model is in and out in a little box here and um what we
00:36:15.960
need to do is we need to be able to train our simulator like train the values but inside of our our simulator to actually
00:36:22.500
turn return the right values because we already observed this or we just know that these to be the right empirical
00:36:28.200
values so um here's a graph here and what this is is
00:36:34.260
um here's what I because I've observed this this is what I expect my data to look like so infections per 1000 people
00:36:40.260
over time should look like the graph the slope of the graph like this so what we would do and and
00:36:46.339
unfortunately I really wish I could share this code with you guys but I'm just going to talk about it um but one
00:36:52.020
of the things and this is one of the value ads that Thunderbolt is working on um we're actually working on a machine
00:36:57.180
learning project in Ruby um I don't hear a lot of people doing machine learning like actual machine learning in Ruby so
00:37:02.700
what you're doing is actually building ways to train so we can say okay
00:37:07.859
I input this I expected this but I actually got this now what I'm going to do is actually create a large Bayesian
00:37:14.099
Network and do things something similar to like what spam assassin does or what Google does to your spam and we're actually going to Traverse through the
00:37:19.619
network to see is this the right value is this the right value up if this is this value we'll just return this number
00:37:25.500
and so what we're actually doing is the computer is actually learning that whenever it has error it actually looks
00:37:31.260
at a standard deviation of the error and actually uses that to rationalize what the next value possibly could be hey
00:37:36.900
machine learning and Ruby it's not fast but it is machine learning in Ruby and you know if we need to make it faster
00:37:43.320
because we're using jruby we'll just write it we'll just write it in Java and moving on so the last part about a
00:37:50.520
simulators you got to have them visualizations um these are graphs I showed earlier these graphs are this graph and this
00:37:56.700
graph where it was created with this kind of r um but we also um use canoe plot you
00:38:03.480
should learn canoe plot create a graph like this all you would need to do is create a Digraph that looks like this and you just run and you would just run
00:38:10.920
um canoe plot on that and like I was also talking about earlier we definitely
00:38:15.960
are using v3.js I'm not ready to show this code but this lets you know that if you're doing visualizations now and I
00:38:22.140
don't care what platform you are and you're using something where you can use web really have a nice look at d3.js
00:38:28.440
so um coming towards the end of this talk um we learned a lot of lessons
00:38:33.540
um Ruby lets you iterate very quickly but the problem is Ruby is not very
00:38:38.820
quick so what do you do you write the slow Parts in C plus plus and Java which is also the win of having jruby
00:38:46.280
take advantage of Jr being the jvm and only polyglots I mean if you're just
00:38:52.500
going to be Ruby Ruby Ruby Ruby Ruby only you're not going to be able to do this you're really going to have to
00:38:58.380
learn more than one language to do this correctly another thing that I want to say is that tdd is hard I don't know any
00:39:04.440
people here have tried a tdd an implementation of an algorithm that you found on Wikipedia whenever they're
00:39:10.140
using sigmas and alphas and all that stuff that's hard just write it and then
00:39:15.240
write the test second but just test is there don't don't beat yourself over trying to be a good
00:39:21.540
developer just because you are trying to follow some kind of standard whenever all you're really trying to do is
00:39:27.300
Implement a proof that's already working you just want to make just write the test second don't even I beat myself over over this constantly
00:39:35.940
um another thing is that uh rubylex stats science libraries for Ruby to be
00:39:41.220
taken seriously in this kind of space um we just need this so I mean we need the numpy we need matplotlib that python
00:39:47.520
has I mean I I I'm not moving this project to python I'm really dedicated to doing this on Ruby so we will
00:39:54.240
actually we we will try to build what we can and of course we will open source what we can but we have to acknowledge
00:40:00.599
that Ruby is just not great in this space even as a journal purpose language so um that's it I plan on going 40
00:40:07.320
minutes 45 minutes and it's been 44 minutes so thank you guys for not doing it all during the talk so that's it
00:40:21.599
so any questions if they're hard I will not answer them I promise you right here so you're saying that we
00:40:27.839
don't have a lot of these libraries and the problem is that we then have to implement them in either C plus plus or
00:40:33.420
job or whatever so as we have different runtime we're going to end up duplicating a lot of work like the
00:40:39.420
advantage of just using Ruby is that generally the VMS are pretty compatible so so how about this if there was a
00:40:45.300
great C extension for algorithms and some statistical stuff that I was doing I probably would not have looked at
00:40:50.760
jruby and the reason I'm using jruby is because actually the real simulators written in Java and because it's just
00:40:57.060
it's just that much faster now and I can at least take advantage of all the numerical stuff that's on the jvm so
00:41:03.180
that's so we can look at the libraries
00:41:09.440
that you have being so much slower is because the code is slow not because
00:41:15.000
no um what'd you say and what Ryan said is it's it's not that Ruby sucks inherently what he's saying
00:41:21.119
is that the reason that it's slows because it's probably the code that's in the algorithms gym it's just not very good and you know something it's hard
00:41:27.900
for whenever you code for money and like you're not coding for fun to actually sit and replace things whenever whenever
00:41:33.780
you're actually under a deadline so that that's kind of hard so Jesse so um there's been a lot of movement on side
00:41:39.960
Ruby lately um
00:41:48.359
Foundation um so they're working at basically doing the this kind of stuff like giving us
00:41:55.380
these tools so out of cyber B distribution gym comes out of cybery there's a statistics gym that comes out
00:42:01.680
of Sai Ruby and what I was talking about Ruby Fizz is part of the side Ruby what I'm saying is that
00:42:07.440
um I'm glad that and I do know that it's speeding up again it's just that it's not there yet so it's not a complaint
00:42:13.619
it's just saying this is what it is right now we will try something different so
00:42:25.700
so in a stochastic model there's there's a couple ways you can do this so he's asking me how do I test a model that has
00:42:31.800
Randomness in it um there's two easy ways the first easy way is whenever you create the model you
00:42:37.260
whenever you initialize whenever you initialize the object and at the second create an options hash and in there put
00:42:44.760
a key called Rand and passing your own random number generator that can actually only return five so you know
00:42:51.119
what the randomness is always going to be the second way you can do it is you can run it a whole bunch of times and
00:42:56.700
then sample it to see that it's actually within and like if you're using many tests like there's the within is that
00:43:02.220
what it is within you can actually use within and actually determine if it's close enough and actually that's how I use it in the
00:43:08.579
Bose Alias method I just I just actually run it 10 000 times and just sample and make sure that it's close enough
00:43:17.180
that sets the random scene to 42. so I always know that I've got the same
00:43:22.800
sequence of numbers coming out that's a that's actually another good way as well sometimes has problems
00:43:29.099
across versions and another thing I reason why I always straight from that because sometimes you'll do that and
00:43:35.819
then you'll call random somewhere else and you don't realize what the next value is and now you're off and that's
00:43:40.859
the only reason why I've been bitten by it before because I wasn't careful
00:43:46.020
right so any other questions all right well thank you guys you guys
00:43:51.240
been great
00:44:24.480
thank you