00:00:20.000
I love the 80s EAA all right if you're outside and you
00:00:25.080
haven't made your mind up just go and grab someone you know phys violence that
00:00:31.000
work for us all right I say we I say we kick this
00:00:36.360
bad boy off um welcome to uh to the conference I I hope you've enjoyed the
00:00:41.680
warm-up uh for the last day and a half it's been good if you could fill me in I missed it
00:00:47.879
writing slides um I uh just to the the lady doing the the hand things um if you
00:00:54.800
could do them in Australian I'm not sure if more kangaroo just or something
00:01:03.000
I um so uh I just want to introduce me introduce this topic and uh and and the
00:01:09.720
t-shirt that I'm wearing so uh over five years ago I I put a blog post up because there was a there was this thing it was
00:01:16.600
silly it was called Twitter and uh and they one of them complained a lot about ruby and rails and blah blah blah and uh
00:01:25.000
and so I wrote a blog post and I released a little bit of code which basically said look I don't I want to
00:01:30.119
make a big deal of it but I've solved all of your scaling problems um it was
00:01:36.240
silly you'd never use it but as you see at the bottom I said I wonder if they'll give me a free Twitter account I didn't
00:01:41.680
have I didn't know what it was for and uh or at least a t-shirt um and actually
00:01:46.920
so at the next rail ruof l in the year Alex Payne actually gave me this t-shirt um and I completely forgot about it
00:01:53.680
except there is a twitterbot that on your 5year anniversary apparently tells you congratulations you've had a Twitter
00:02:00.119
account for 5 years how's it g for you um so I that you know I thought I should get this do you know if you go to
00:02:05.680
Twitter now you cannot get this t-shirt I mean it's mine so off
00:02:11.080
um so uh uh more about me my background um it turns out that I have been
00:02:17.440
thinking about distributed system for a while whilst at the same time not caring at all about them very confusing State
00:02:24.400
of Mind to be in um but my actually remember my topic was something along
00:02:29.560
the line of how do you change distributed applications at runtime um in a sort of enterpris scenario where
00:02:35.080
there's lots of different participants and and it's awkward and and how do you let that application do it all itself and I highly recommend not reading uh
00:02:42.159
that 200 pages of quality text uh mostly because it was written when my one of my
00:02:47.560
PhD supervisors gave me this top tip on how do you write a PhD thesis and she
00:02:53.040
said The more boring the better you can imagine why they're not
00:02:59.159
selling uh with O'Reilly or pragmatic programmers um so but making it red was
00:03:05.720
about my only sort of daringness uh at the end so that was a long time ago um I
00:03:12.000
for a couple of years did rails Consulting but I did the app bit I made the thing I didn't make it pretty and I
00:03:17.879
didn't make it run I just made it someone else did the other bits of of doing Ops um but then I don't know I
00:03:25.159
became more interested in the space of of production apps and uh so I came to
00:03:30.280
this country whichever one it is and uh when in 4 days time it could be
00:03:36.120
different so who knows um remember to
00:03:42.080
vote whatever um doesn't matter what you do you're going to keep killing foreigners with drones so you know I'd
00:03:50.080
like to say it was safer to be here away from the drones but then everyone's got a gun
00:03:57.239
um actually if you noticed if you go to the front doors not now um right at the
00:04:03.079
bottom is the little sign little like no smoking sign that has no guns and no knives um I'm not sure how short people
00:04:10.840
are that have guns would notice that but so when know so engineer if you know know we uh one of the you know we do the
00:04:17.639
the hosting of the Ruby apps that you may be having one or two of um specifically mostly around businesses
00:04:23.440
people that have actual you know big apps they're doing big things and so um it's really fun and interesting to hang
00:04:29.440
out with those those people and I've learned more than I mean honestly I've learned stuff and everything I learned I
00:04:35.080
didn't want to learn I didn't come to engineer I to learn devops and system administration and Cloud stuff that's I
00:04:42.240
just like writing apps so in the last two years I've accidentally learned things and unfortunately due to your
00:04:48.320
choice of session you're going to now learn some of those things and uh so I I will
00:04:54.880
attempt to at least make it interesting so when I uh how I came to be thinking about this particular topic which we'll
00:05:01.039
get into was that last year I tried to convince a large number of people that you should use J Ruby and rubinius um
00:05:08.479
and I tried to focus specifically on one aspect which was threading because last year there was a lot of hoar about how
00:05:14.720
much fun it would be to do JavaScript and advented programming um fortunately
00:05:19.960
we've all got over that it was all a bit silly um and now we've gotten back to writing
00:05:26.440
apps that are both neither evented nor concurrent uh um so uh but I tried to make it really
00:05:33.919
simple I said you know you still do want advented programming and you still want threads and fortunately web apps really
00:05:40.520
easy put the the procedural stuff wrap it up an implicit thread around the request and then put a vented at the
00:05:47.000
front uh and use engine X and uh that was it you know what I thought it a
00:05:52.919
really compelling argument no one changed their behavior based on that um and so I started to
00:05:59.919
think about this idea that uh that that choosing J Ruby choosing rubinus was
00:06:05.440
perhaps more of an operations choice you were going to choose it for perhap business reasons like using less
00:06:11.240
resources um and uh uh perhaps getting better debugging out of a production system and things like that and perhaps
00:06:18.560
developers didn't make those choices or didn't want to or they knew they wanted to but just didn't for some mystical
00:06:25.240
reason and that became really interesting to me which is now kind of where this this talk starts off um so I
00:06:33.280
guess unfortunately there is another slide about me I've done lots of stuff there is a website that actually tells
00:06:38.479
you approximately how many projects you've uh well I don't know if you're dog urinated on um how many you've
00:06:46.240
participated sorry that's the word um and uh i' a lot and uh but what I've
00:06:52.120
become interested in is the idea of of resilient production systems but also ones that I could just ignore because
00:06:58.520
I've gotten really good ignor ing static systems I have a GitHub repo full of them easy to ignore you just turn off
00:07:05.680
the notifications easy as you like um but production systems don't seem to be as ignorable um and so here's my uh sort
00:07:13.879
of fun challenge for you to spot the difference between 300 code bases and
00:07:19.000
300 apps all right now I don't know if you spot it so this is my metaphor there um
00:07:25.319
sort of 300 books on a Shelf versus 300 people on the company the books are a lot more fun all right people suck and
00:07:34.000
uh me look around you're all ugly and uh
00:07:39.280
no that's not true you know all ugly and um but you know so static things are a
00:07:44.479
lot easier to to to live with and think about and uh um and so it's kind of in
00:07:50.560
the nature of our profession as Engineers we're a ruby conference Dev Ops conference we'd like to write code
00:07:57.159
and we like to write code we like to test code and it's just a really sad unfortunate sub part of our profession
00:08:04.879
that for most of us it unfortunately goes into production we don't talk about it that's
00:08:11.599
just just you know it's just annoying yes it's running and yes there's a do com and look at my code and how pretty
00:08:18.199
it is that's the important part and uh look how fast my tests are look at the
00:08:24.039
no we don't talk about the production stuff very much I mean it's a sad indictment on Ruby that sad yeah sad I
00:08:30.879
mean engine yard Heroku and those sorts of companies came out into existence created Concepts like p no one else
00:08:37.479
needed to solve these problems but we did that's how bad it was to try and run Ruby
00:08:43.080
apps so this is this this idea that that as a group we have so optimized around
00:08:48.760
our own developer happiness that I'd like you to start thinking that as your production code
00:08:54.760
bases get bigger start getting more traffic you are going to need to give up some of that you're going to need to
00:09:00.160
start to think more about production happiness because if you don't you won't
00:09:05.279
be a happy developer actually I can't prove that it just true that's the
00:09:10.880
quality uh argument you're going to get here in the next 30 minutes trust me all
00:09:16.360
right it sucks so you need to do both all right let's go through this uh just
00:09:22.680
in case you've never had a production app that's what one looks like
00:09:30.200
um traffic come from the left know which way you're looking um
00:09:38.760
and uh but it's not not just easy constant traffic sometimes it comes in impulses sometimes it comes um in sort
00:09:44.120
of large stressful batches and uh keeping your app running in those situations is uh not something you've
00:09:50.240
probably ever done before they be quite torturous to live with other things you're going to do with production systems is uh change them deploy things
00:09:57.519
which you may have seen in the title of the talk and I will come back to uh I'm going to spend a lot of time just talking about uh Tools in general around
00:10:04.399
thinking around this space and then specifically focus on the title of the talk which is what I think is the frontier of the next sort of thing we
00:10:10.880
need to be thinking about in deployment um because obviously there's a lot of arrows involved and we need to be
00:10:16.040
careful of them um and uh then we do that's called the magic move
00:10:21.560
feature of keynote it's very cool I'm going to do it
00:10:27.959
again nope Bo look at that um so uh inside your app it's a bunch of
00:10:36.120
bits I know see this is tricky because you all think your app your Ruby app is like God's gift to you know if only you
00:10:44.040
could just put that on the homepage people would buy more stuff sadly it's just another
00:10:51.040
box surrounded by other boxes which do stuff and your app talks to those boxes
00:10:56.120
and if all things go well perhaps your users do stuff um but all the boxes need to keep
00:11:02.160
working and need to talk to each other um and uh if you start to scale and if you watch some of these other talks
00:11:08.200
people saying you know one web app box is not enough we need lots more web app boxes then uh it'll look like
00:11:15.480
this and you can only imagine this is getting simpler
00:11:20.760
um but this is popular at the moment the service orientated architecture because Steve yigi told us too because he worked
00:11:27.480
at Amazon that's kind of his logic um anyway no no
00:11:34.959
I don't add SOA I just want to say that when you do this something's going to happen uh not only yet you the code you
00:11:40.920
wrote and the things you control talk to other things that you have less control of um I guess my best example is um if
00:11:49.320
you watch the Travis and no disrespect of Travis but I've noticed this they and and engine yard's own blog Twitter
00:11:56.079
account so Travis's Twitter account our Twitter account often has to make comment about how our customers can't do
00:12:01.360
something because uh for example GitHub might be down or Ruby jems might be down
00:12:07.600
so this is this idea that the external dependencies are part of our production system even if we don't control them and
00:12:13.240
uh um but nonetheless so even if you've never sort of had that picture in your head let me just give you the picture
00:12:18.519
that's actually in your head the dashboard of your brain the dashboard of
00:12:24.079
your brain pretty much says Ah something's wrong
00:12:29.399
or what it thinking is they're just the normal
00:12:34.880
errors move
00:12:41.480
on no one needs to build that dashboard all right locked in um but I just want
00:12:47.800
to sort of say do some basic math B basic math um I'm going to use the water
00:12:53.920
if anyone else would like some it's just up here not enough cups to go around but I
00:12:59.839
think on the whole most of you are probably not going to come
00:13:08.000
up even though you were invited all right hopefully you've gone
00:13:14.079
through the sums I'm not sure if you call them sums but that's what they are the arithmetic um that if uh if you've
00:13:20.600
got um a sequence of of Parts could be an app to a database an app to a caching to database whatever an app to another
00:13:27.959
app to a database um if there's some sort of success rate which there is if
00:13:33.240
you're lucky it's 100% but if you've got 100% success rate that's probably because you've had one
00:13:39.079
request and it was successful and you're feeling very confident about yourself um
00:13:44.720
but assuming there's some you know number then if they're dependent on each other then as you go through and I've
00:13:50.199
used the same number 95 95 95 uh the basic math is you multiply it out and you get a smaller
00:13:56.360
number that's the problem so when you go back and thata diagram uh as you add
00:14:01.480
bits and things talking to each other the math means that it's more likely something's going to be wrong at any
00:14:07.279
point in time uh both because of the traffic that's coming in normal traffic
00:14:12.440
impulse traffic stress traffic uh or when you change something so when you're doing deployment you're changing
00:14:19.000
something and there's a chance something will go wrong and the question is what else is going to be affected what bad
00:14:24.160
experience is your customer going to have and uh so yeah so this diagram is
00:14:29.240
not taken from any real production system just kind of looked
00:14:34.440
pretty couldn't figure out the math of that one um so this is kind of this is
00:14:39.839
kind of the high level idea I'd like you just to think about and take away is is
00:14:45.000
the things I'm talking about may not be relevant to you yet because you may not have a lot of traffic for your app
00:14:50.199
because if if say you've got a 95% or 85% uh success rate which is pretty low
00:14:55.240
so you know should fix that now um and you're only getting 100 request quests a day then you're only going to have you
00:15:02.160
got five errors 15 errors and that might be okay as it gets to th000 requests
00:15:08.440
10,000 requests whatever the next one is in the sequence um you're going to be less than
00:15:14.120
impressed with what happens um we often talk about technical debt but when all you're doing is doing
00:15:21.680
support because you've just got errors coming at you make something up um
00:15:29.880
then you know you that's me I start to think of this as sort of operational debt your system is causing you so much
00:15:36.199
headache and pain that regardless of how pretty your Ruby code is especially when it's got syntax highlighting um life is
00:15:42.399
going to be not very fun and so as you get more traffic the requirement for you
00:15:48.040
is that you have to constantly be improving the success rate and definitely not making it worse um which
00:15:53.880
may or may not be what we think about and that's kind of you this high level idea so uh the unfortunate part is0 two
00:16:01.040
which I tried to hide in the middle and then pointed it out let's go
00:16:06.160
back I'm gonna hide it more better this all right all right you can't see it now oh
00:16:12.079
no it doesn't work um so uh in order to get to this we are going to have to think as as your company becomes more
00:16:18.120
successful as your product your thing gets more probably and success is probably traffic um you're going to have
00:16:24.319
to think more and more about how to make that system happy which mathematically well let's call it you know reducing our
00:16:29.800
error rate or increasing our success rate um and my hypothesis is that's going to make you less
00:16:36.079
happy so let's just get over that and let's get on the rest of the talk but moving on to point three uh and really
00:16:42.600
I'm really hoping is that the people who are primarily interested in production happiness which as a group aren't
00:16:51.199
here it's their responsibility our responsibility to give you
00:16:56.839
tools that help you do the right thing by the production system and that's that's this from next Frontier of
00:17:04.000
deployment how fast you can ship an app into production it's not very
00:17:09.559
interesting if all you're doing is is keeping the error rates the same if the traffic keeps going up you have to be
00:17:14.959
thinking about how do I keep reducing that traffic and and uh we may have to slow things down a little bit so I just
00:17:21.360
want to break this out and just just in case you've never wondered why you're got a job um how it is that someone else
00:17:28.319
can afford to give you money to do your profession I just thought I'd go through it for a bit um so uh um
00:17:35.640
soest there's arrows let's go through the arrows first that's terrible animation actually I feel like fixing it
00:17:41.880
now um so the arrows so we make code just in case you haven't seen this before you make code you put it into
00:17:48.480
production and if you're lucky you get to be in business right there I don't want to
00:17:55.360
over summarize it but U um and unfortunately though the
00:18:00.480
priority of your users stakeholders and all that sort of stuff is that you in business doing a service or a product
00:18:07.559
that's useful and valuable to people they are able to find it they are able to determine this is the thing they want
00:18:12.880
to use at the time that they're having the problem that's that's important
00:18:17.919
otherwise your pretty syntax highlighted Ruby codes are relevant um so in order to do that this is the magical step
00:18:24.440
number two you need to have a production system where you wrote the code or not
00:18:30.440
isn't important that your production system can provide the value that to the people that want the value at the time
00:18:36.520
and place that they think they want the value that's important now for most of us we wrote the
00:18:42.360
code yet it's third on the list of important things so awkward uh
00:18:51.640
nonetheless so if we do number one and two well we get to have jobs to get to do number three um so uh you might think
00:18:59.400
well we should all care about all things well the fell Mr Smith
00:19:05.159
um he had this idea that you know amongst other very good ideas you know three four five was it 500 years AG four
00:19:12.080
this is an old book 300 years ago of division of labor that if we specialize we can be better at our thing and it
00:19:18.520
turns out if our profession wasn't hard enough um probably best we do specialize in being developers not doing
00:19:24.720
development and maintaining production systems and trying to make money so I'm not trying to I would like you all to
00:19:31.480
care about production whilst writing code this slide is me admitting that
00:19:36.840
it's tough all right so what I would rather solve the problem is by um and I
00:19:43.480
over summarizing what i' rather try and solve the problem is by pushing tools towards developers and educating you on
00:19:49.840
choices that will make the production lives better at perhaps a small cost to your developer life um so okay this is
00:19:56.840
how most businesses might split themselves out back in those three categories salespeople Ops people and Engineers um but and this is kind of the
00:20:04.120
I guess I already had this slide in my head so I should share it first um we all have different fundamental priorities we are paid and we get joy
00:20:11.360
out of adding features curating the app making it look nice doing all the sorts of things and certainly most of our
00:20:17.280
stakeholders think that's what we should be doing um but implicit and priority number two is that we have a system that
00:20:23.520
has is up certainly from our users perspective right that's that's kind of what they expect at the time they want
00:20:29.799
to use it it should be there um and the business priority is to make users happy which hopefully makes money um so hands
00:20:37.880
up if you think one of these columns is
00:20:44.799
correct all right actually let's start again read the
00:20:49.840
chart then we'll play the game that didn't go very well did
00:20:55.880
it when I tested this in my head everyone's a lot more
00:21:01.440
responsive all right um these these two lists are kind of what everyone thinks about you know it cares about and you
00:21:07.960
know summarized down I mean when I wrote it down it was a lot longer but I figured a few points in bigger font better than my handwritten scroll um and
00:21:15.600
unfortunately they don't really go well together and we we all implicitly know this so um this is this is what I'm
00:21:21.880
about to now talk about is what can we do what can we give developers um from a deployment perspective because that's
00:21:27.200
kind of what I care about um where it's easier for you to do the right thing than not
00:21:32.840
to um choosing J Ruby turns out to be a little bit hard some reason I don't know you should just use it grow up um
00:21:41.559
and uh but I understand and I'll talk about the distinction why M why people do one versus the other um but this is
00:21:48.360
what I want to talk about I'm going to do a demo of of a piece of software that I think is really interesting when you get to scale uh I'd certainly like to
00:21:54.559
see engineer out have a product that's similar to it that's what we're working on um and so there's the uh there's this
00:22:00.400
idea that the tools that you've chosen I use tools to represent everything libraries languages tools um um they
00:22:08.559
were all built to solve a problem but they come with consequences and if you get too much in
00:22:14.679
love with the good part you perhaps ignore or or dismiss sort of in a DHHS
00:22:20.760
fashion the consequences that wasn't a joke he does that all the time um
00:22:29.039
exist really good at it and you go yeah yeah that's that's right we should all own a car that drives 300 km an hour
00:22:34.600
traffic a lot faster he's never said that but I've thought it and
00:22:39.640
anyway that was just stupid so yeah so as I just said every tool has some purpose for why it was written and we're
00:22:45.960
going to go through with some examples and uh they come with a thought model that they're sort of imposing on you or
00:22:51.080
hoping you adopt in order to make the tool make sense um and uh but in in in
00:22:56.559
adopting the tool it makes something else harder um for example I thought no no
00:23:03.840
I'll come to that example it's in a second so here's an example I like people track I think it's fantastic looks very pretty um it's you know this
00:23:10.200
idea of well look you know you've got that that that stakeholder that that owner that says yes we should do X comes
00:23:16.279
back three days later let's do y you say the letters of the alphabet that's stupid
00:23:22.440
um and uh so you know and and so the solution to some degree is to well let's write them all down let's put them in an
00:23:29.080
ordered list and let's just see where X and Y turn up and whether they're going to fit into the budget this is I think
00:23:35.600
you know agile in general I think is fantastic um getting your system into
00:23:40.960
production and finding what errors turn up and fixing them is just the right thing um but what's interesting of this
00:23:47.159
tool is there's not really a way of of talking about constant things that should do constant things that shouldn't
00:23:53.600
do like crash and another thing of list that I've apparently not put on the
00:24:00.200
slide I mean who doing checking my slides you can't just have a list that
00:24:06.640
has one item in a comma um I got to add lib this
00:24:13.559
um now I've done that stupid thing now I can't remember the rest of the list let me get my notebook out no anyway uh you
00:24:19.600
know it shouldn't crash it shouldn't destroy small planets it shouldn't um it shouldn't make your you know customers
00:24:25.880
cry and hate you and talk about you negatively there's lot things but it's just as a tool it's hard to sort of mention that and and allocate time to
00:24:32.720
doing these things so why do we not do some of the things we're going to talk about and use the tools because perhaps
00:24:38.039
some of the tools don't infer that we should spend time that way um so you may need to take the tool subvert it
00:24:44.000
slightly and put in every week spend time as chores let's do some log looking
00:24:49.399
you know let's look at some logs see what we find let's go back through all our exceptions all four million of them
00:24:55.679
in you know air break let's pick out a good one fix it um chef chef was built for a purpose it
00:25:03.320
was built by a group of people who had got sick and tired of provisioning you know static servers and how quickly and redoing them
00:25:10.000
those servers were there um ultimately Cloud came along and it became useful then but it couldn't provision anything
00:25:15.960
um but it turns out for all the greatness of Chef that it still it's you know you still need to know how to
00:25:22.000
configure my Sequel and that's so what did Chef do it just hid that for a
00:25:28.039
bit so um and uh so what am I saying I'm
00:25:34.240
saying that Chef is great but it comes with you know there's it hides things there's still things you still need to do um what's another one Ruby
00:25:42.120
specifically M because I do want to make a comment J Ruby um may never quite get off that bandwagon so Ruby was a
00:25:49.039
fantastic language right that's we're all here and uh I'm not going to play that game where I put ask you put your
00:25:54.159
hand up because you're terrible at it um got you've lost hand putting up
00:26:01.320
privileges stop it right bloody Pals um and uh so what was my point my
00:26:09.919
point is I mean Ruby was written and with a with a a thought model of we should be happy as we go about a
00:26:16.480
profession as we write our code and then someone came up with syntax highlighting and world was complete um but it's not
00:26:24.760
really was never really designed or had the thought process around Long running production
00:26:29.919
systems I mean it can't and no disrespect at all I mean it was that's you know um and so uh but there's also
00:26:37.799
other things that are part of the the community that we have the ecosystem the thought processes we all bring to code we all agree on code should be pretty
00:26:44.640
and I'll tell you what's ugly handling errors so let's not do that let's let
00:26:50.360
those bad boys go straight up to the top they you're very excited about that
00:26:56.720
um and uh I mean you know I think Matt amini mentioned it I mean I really only
00:27:02.679
thought about this when I saw Blake misseri do a talk on go and so compared the sort of things from go that you could bring to Ruby and one of the ideas
00:27:09.120
in go as a language is dealing with errors immediately and uh when you start to
00:27:15.480
think about that is is exactly what you should do as you accumulate you start to
00:27:20.600
realize you know exactly why this error is here when you start throwing it up to the stack it doesn't know why the exception what the relevance is what it
00:27:26.360
should do about that how to handle it it's completely subverted the context and and said you know this P code now
00:27:32.520
you now need to know about what that was doing and that's not very nice in capsulation so dealing with errors locally helps manage error rates and
00:27:40.760
unfortunately it's going to make you code ugly so this I guess is my prime example
00:27:46.320
of the idea that as you care about production as you want to keep improving your success rates and reducing error
00:27:52.279
rates you're going to need to do some things that perhaps are not very Ruby like handle errors
00:27:59.679
so every day put another if statement
00:28:04.760
in um I mean I I'm becoming not a fan of exceptions in general unfortunately all
00:28:10.440
the libraries are use exceptions as a form of declaring that something went bad um go I I kind of like that idea of
00:28:16.840
another you know Objective C you you can all right I mean it's uh
00:28:23.200
so yes I should finish that thought and then go back to just the idea of exceptions don't like so yes so theide idea as just said is to put your rescue
00:28:30.919
blocks explicitly in that context um there just still something about
00:28:36.080
exceptions that uh I'd rather they came through the standard medium of a method which was uh either the read write
00:28:43.120
parameters or the uh response so the idea of returning a real value and perhaps an error value if necessary I I
00:28:50.159
just think that is um it makes how do I put it actually the way Blake mentioned
00:28:55.399
it was he said the word exceptional or the word ception makes it sound like exceptions are exceptional they never
00:29:01.720
happen they happen all the time right when you do distributed programming you know the boxes talking
00:29:07.640
to each other the moment you add that second box congratulations you've got distributed programming um like a
00:29:13.000
database and it's not always there but that's not exceptional you know it's not
00:29:19.279
going to be there sometimes you know it sometimes it will be out and you know
00:29:25.279
that because you put your app on Amazon Us East
00:29:33.440
so just not all have oh it's Amazon's fault no it's your
00:29:39.039
fault listen for that error which will happen again uh even if you move
00:29:44.960
somewhere else um and uh and make your out behave in an appropriate
00:29:50.720
fashion don't I mean yes you want to fix it and get back up and running but I mean you can't stop physical
00:29:56.679
Hardware having problems your code can handle it um all right so that's uh
00:30:02.559
really really rabbit on about that one for a long time um so from the Ruby on Rails website it says web development
00:30:08.760
that doesn't hurt no mention about production um so I think I've covered
00:30:14.960
this topic for a while but um no actually I haven't let's do it again so uh I heard a good tip for for dealing
00:30:21.840
with DBS as your DB as your SQL database scales you're going to think about sharting one of the things that makes sharting hard is when you do long joins
00:30:28.399
between tables so don't be you know be conscious of of of
00:30:33.600
the join as you start to get you know more production orientator start to scope down and think about your app in such a way that you don't have joins
00:30:39.600
across seven tables because you're not going to be able to sh that um but rails
00:30:45.000
with its beautiful uh active record um syntax makes it really easy and fun you
00:30:50.360
want to join everything it's like you want to get back in a circle hey I'm back to the user model again I be a
00:30:55.960
prize um not very good for production so um uh platform is a service uh you we
00:31:04.320
you know make deployment really easy let you describe all your Universe things that I like a lot uh but not really a
00:31:10.200
lot of assistance in the context of apps talking to each other this idea ofs platforms should make this easier to
00:31:16.720
deal with failure of in in you know systems that talk to each other um so
00:31:21.880
you know just because you should still use engine no one else fixes this problem either so just use engine yard
00:31:30.120
you just keep sign languaging to the one person I at least want that person being a customer by the
00:31:40.919
end um all right all right so let's get back to production so there that was sort of my summary of tools in general
00:31:47.600
uh and thought process just to sort of get you to think about this idea um I do want to talk about deployment specifically I just don't want to scale
00:31:53.679
it down to this one topic um so I don't mentioned before I don't know how to
00:31:58.840
prove or suggest or infer that if your production system is happy you'll be
00:32:03.960
happier um so we'll skip it and just go with it um so here's a couple of things
00:32:10.519
that you might like to choose which are going to affects your development life a little bit but at the benefit of having
00:32:17.880
a better production life uh J Ruby being on the jvm has a whole bunch of really cool tools inspecting uh getting
00:32:25.159
snapshots of what was going on at the time that's blocked up or whatever um obviously threading but I don't want
00:32:30.559
to talk about threading that's not going to make you buy anything I found that out last year don't do threading don't
00:32:37.919
don't I dare you I dare you not to don't there I win it's like telling a dog that's already lying down lie stay
00:32:45.600
ah I'm an excellent owner um so uh but the cost of using J or one of them is
00:32:52.720
that you know all that sort of just day-to-day usage of Ruby that your scripting usage it gets slow lower and
00:32:59.159
that's a little bit annoying and uh so this is sort of that main example that idea that something
00:33:05.159
that's good for production you as developers or let me put this another
00:33:10.440
way all the people around you because you've changed your mind already um you know choosing perhaps an
00:33:19.080
inferior production Choice perhaps um because because you'd
00:33:25.440
rather have the one that's good for you as a developer you want the Z fast one when you're run your eight task but it may not be the good one for in
00:33:31.360
production um so the other reason J I mean this is an example if you've never seen a a screenshot of visual VM a whole
00:33:37.840
bunch of interesting introspection of what's going on so as you put that chore
00:33:42.880
on your weekly list of things how we're going to make our system better throwing up visual VM and watching for for
00:33:48.480
interesting data is is one thing um bunch of other Cool Tools um so Q's
00:33:54.559
using Q's rather than so of blocking htb htb API are awesome I mean you can use
00:34:00.320
Curl and play with it I love it um but blocking not good especially with your
00:34:05.880
non-concurrent apps that you're all writing so so perhaps investigate an
00:34:11.520
asynchronous queuing um what else uh logging there's an excellent especially
00:34:17.320
if you can have multiple systems which you already have because you have an app talking to database pulling all those logs
00:34:23.200
together initially will just be ugly and you'll go well what was the point of that well then go back to the app and
00:34:29.200
start to curate interesting useful logs that tell you a story so perhaps you can search for a user and watch what that
00:34:36.119
user was doing throughout the system um people uh so out of all the
00:34:41.839
sort of app service companies that have come out recently sorry I start that again Splunk is worth a truckload of money Splunk is anyone heard of
00:34:50.440
Splunk um cool I did that judge I did actually ask that question you're terrible at this uh put your hand up the
00:34:57.320
wrong time put it up the right time um spiteful that's what you are and so I
00:35:03.320
was a Java 1 and did a quick talk on log stash and asked that same question and
00:35:08.480
uh no one had heard a Splunk so I I just told a joke to them and I'm going to
00:35:13.839
tell that same joke but I have to set it up and you answered
00:35:20.000
incorrectly but the joke was the reason that you couldn't uh you don't know about it is because you can't afford it
00:35:25.520
um spank is really really expensive and uh but log stash and Cabana are open
00:35:30.560
source and uh um and and really quite interesting for you to look at but there's also paper trail loggly and and
00:35:36.680
use them start putting logs in there start to get a feel for the flow of your system you know you don't really have
00:35:42.560
perhaps in your mind a good me mental picture of what how your app behaves in production really and logging is is one
00:35:49.000
and watching all the events is one way to do that the trick is obviously you have to spend time looking at them um
00:35:55.119
Bosch so Bosch is something I like I've liked and Bosch solves this interesting problem of of of deploying entire
00:36:00.960
systems and knowing what's happening as opposed to you know perhaps the chef mentality of well a node came up and it
00:36:07.920
became something hopefully good um Bosch has this more declarative you I like the
00:36:14.640
sort of totalitarian Master of the Universe I will tell all the VMS what they are and they shall be happy for it
00:36:21.480
um and I I I've started to really appreciate the value the mental the mental thought that's gone into Bosch um
00:36:29.000
and uh for the lack of there being a commercial tool that I my company has produced I will keep talking about this
00:36:35.040
uh until we have one and then I'll have to stop um so I'm actually going to do a
00:36:40.240
demo of B because I I think the thing I would like you to think about as you get bigger as a company this idea of being
00:36:47.400
absolutely declarative not having reducing the number of external dependencies of your production system
00:36:53.960
things that could go wrong because as we talked about we need continually reduce those so Bosch may not be for you today
00:37:01.520
be running a couple of dinos on Heroku if you've got a couple of VMS with engine yard you're good right but as you
00:37:07.800
get bigger and you want you know you've got more and more requests you're going to this is one of the things have to think about um so uh let's let's have a
00:37:14.839
look so the example app is gitlab HQ um which is uh I chose because it's
00:37:21.560
kind of a rails app it's got a few moving Parts in fact here's my uh my my architectural diagram it's a rails app
00:37:27.599
right we then also got g g Gite which is uh does the sort of the git stuff um and
00:37:34.119
so we're going to deploy this as a sort of a complex system and it's it's going to be awesome look I even found an icon
00:37:41.319
that makes you want to click it but stay seated everyone don't fall
00:37:46.760
for the Trap of the large touchscreen monitor um all right so uh this this I'm not
00:37:53.800
even going to pretend to type for 10 minutes look no hands so what we do is we're going to uh deploy this so I have
00:38:00.400
already written a description of how you deploy gitlab HQ and uh it's going to BOS release now we can look at it for
00:38:06.240
anyone that's interested if we have time we can look at it but um um it has the source code to gitlab HQ the source to
00:38:13.920
Gite uh subm moduled all the other dependencies you'll see are now being
00:38:19.000
downloaded now they're not being downloaded from some magical other place I've already downloaded them once from
00:38:24.880
the magical other place and they're are my S3 account because we're trying to reduce external
00:38:30.640
things that could go wrong um and uh and once I pull them all down and we create
00:38:35.680
a Bosch release and put it into my Bosch that's it that's the last time we go out to the rest of the world for the rest of
00:38:42.240
that app's life um unless we want more dependencies like you know a new version
00:38:48.680
of Ruby or something I haven't touched this for a while new version of postgress anyway
00:38:56.040
let's move on stop looking at version number I feel like I want to type something
00:39:03.440
but so fake I should pretend oh no
00:39:09.240
anyway um you can make screencasts that go wrong really fast and don't have the
00:39:16.040
inconvenient and awkward pauses but then you don't learn anything so we'll have the inconvenient awkward pauses um so
00:39:22.079
what it's doing now it's downloaded all the assets that I need because we're not going to use app G
00:39:27.960
or you know those packages because they're external you could build this system using that concept of of you just
00:39:34.079
have your own uh you know appet repo um and so B BOS does it all itself
00:39:40.880
so uh now what we're going to do his I can't remember we are GNA is that
00:39:51.200
me sweet I'll be back
00:39:57.839
God Jesus Christ um okay so what we've done the context here is we're actually on um not
00:40:04.599
my machine but just a VM um a lot faster if you do it in the cloud and uh so we've pulled all these things down from
00:40:10.440
S3 we've turned them into a sort of a big tball and now what we're doing is we're uploading it to Bosch Bosch is not
00:40:16.119
a like a command line just a pure command line it's a running service bit like a path might be and uh so what
00:40:23.160
we're doing is we're uploading it and now Bosch has the big tube of t-shirts
00:40:28.880
what stop it distracting me um it's unpacking those things now at this time
00:40:35.800
what we have is so it now knows about a release called gitlab HQ uh it and I've
00:40:41.920
it's there 71 five I mean seven sorry s 7.1 Dev um so now what we need to be
00:40:48.920
able to do is deploy it so there of this two main Concepts a release and a deployment manifest a deployment
00:40:55.440
manifest is that is awkward all right I keep thinking it's something that's awkward and you're hearing it from
00:41:02.640
me so a deployment manifest is a big gaml file which is great because it's text and you can read it awkward because
00:41:08.839
you know I really would rather have that uh much I love yaml sometimes I wish there was a schema I could validated
00:41:14.560
against um and you know probably should shouldn't say that out loud um but I
00:41:22.160
want XML back so uh we sorry I really talked through that so what we're doing now jobs uh the different moving parts
00:41:30.599
or sort of for the most part you can think of them as one vm's worth of work uh you can merge them into the same VM
00:41:36.880
but that's kind of the context and so you can sort of see them as those boxes I had git light redis gitlab and rescue
00:41:44.280
which isn't you know sorry Mike I I can't believe I haven't moved to sidekick I'm I'm insulted on your behalf
00:41:51.520
and um and then the last part which we skipped over is all the sort of the the parameters the arguments they're going
00:41:57.400
to go into all the templates so they're all in one place like a datab bag if you're doing chef and I'm not really a
00:42:03.960
puppet person so um if you want to sort of tell me about puppet relative to this uh that'd be great so now we've just
00:42:10.200
told the command line tool this is the deployment manifest of what I care about and the way Bosch does deployment is it
00:42:15.920
sort of says well what have you already got what do you want and I'll go get that for you the first time you've got
00:42:23.599
nothing so it looks like it does everything well it does right so first thing it's going to do is compile stuff
00:42:29.960
this I think this is fantastic uh it comes built in with its own packaging binary packaging thing against whatever
00:42:38.079
the base operating system that you have got so in this case it's some version of a buntu but it's your version of a buntu
00:42:44.480
so there's no chance of packages having being built on a slightly different uh environment being applied to yours
00:42:50.599
they're going to be built in exactly the same environment um so as they're compiling here this is my uh Amazon
00:42:56.040
account you can see 4 VMS available for compilation um it finished and we moved
00:43:03.559
on that was terrible um oh so what we're doing now is while it's running because one of the packages was a pearl and
00:43:10.960
whoever built Pearl had a lot of spare time it takes like 30 something minutes
00:43:16.160
to build on an M1 medium um so this is bit of Bosch's tooling you can see what
00:43:22.359
you know processes are running uh you can go on from another machine you can go and watch that process just as as if
00:43:29.200
you were doing it when you did it yourself so teams can watch a deployment um that's kind of interesting uh nearly
00:43:35.720
every action is is sort of uh a rescue job so to speak and you can watch it um and also all all the log information for
00:43:42.920
that is kept and you can go and get it so there's lots of different ways to look at
00:43:48.200
tasks let's just take a moment to appreciate that I didn't make you watch Pearl being compiled for 30 minutes are
00:43:54.440
you getting why I did a screencast yeah because it's stuff takes forever
00:43:59.480
um one of the things we have to give up so this is sort of just another way of looking at that data so this is raw data
00:44:04.800
that you could perhaps build a a nicer interface on top of should you wish to certainly one of the things I've been
00:44:10.480
working on for my own Amusement all right so now we've finished compiling packages now we're going to boot up some
00:44:16.119
VMS why are we booting VMS because that's where stuff runs and that's what we put in the the deployment manifest we
00:44:22.920
go back to it you'd get to see that that we had five VMS one for each of the different uh jobs Bosch jobs and so now
00:44:29.680
we're booting them up God this takes forever even when you cut it all out and make it faster I'm so
00:44:37.839
impatient and there's really no funny jokes to tell about FM's
00:44:51.520
booing your patience is I can get the cannon get the cannon
00:44:57.880
Squad of three and a decoy Squad go um so uh it takes a little while but eventually you know git laab is running
00:45:03.880
and so I think this is really interesting I was able to deploy yeah I mean it's the Manifest is a bit icky but
00:45:10.000
at least you know it's an interface you could build a tool that made it easy to work with the yaml Manifest this is the
00:45:15.480
VMS running um you can see on the right four of them have elastic IPS assigned to them uh that does Bosch now does have
00:45:22.280
uh an internal DNS so I didn't really need all those elastic IPS anymore but I haven't figured out how to make that
00:45:28.000
work yet um and ah so now what we're going to do is we're going to change it
00:45:34.200
the deployment in this case in uh there's a deploy in BOS in language and certainly way I think about it is it
00:45:40.960
could be changing scale attributes bigger VMS more VMS it could be changing
00:45:46.280
uh configuration um or it could be changing the release so we may have cut out a new
00:45:52.680
release of some software which might be a new version of postgress might be a new version of web app new version of
00:45:58.640
one of the things that's part of your system BOS doesn't think of your Ruby code as all that
00:46:04.559
special uh just just to get the gist of it uh your Ruby code is just a thing that runs so uh here we're going to add
00:46:12.559
an extra rescue thing we got we've you know somehow the rescue workers weren't very uh productive and so the solution
00:46:20.720
for rescue is to just add more resources um or use sidekick and I'm sorry Mike
00:46:26.760
for for not putting a sidekick slide in there please in your mind put in a sidekick slide move to sidekick it's
00:46:33.680
good um so you can see it's sort of prompted this is the Delta do you want me to do this yeah good and off it goes
00:46:40.760
adding a new VM turning into rescue and uh Bob's your uncle it does take a lot longer than
00:46:48.319
this now we get to see the extra VM look perhaps this doesn't impress you but if you've ever have to manage your own VMS
00:46:55.079
this is awesome um another tool it has is built-in SSH so it will go off to
00:47:01.520
that VM create a random username account for the purpose of that one session uh
00:47:06.960
you give it a a password for this session for sudu so you can change it each time if you want it doesn't do
00:47:14.079
auditing which is a nice sort of you know what did that person do whilst on the VM that would be a nice feature doesn't have that yet uh so you can say
00:47:20.920
that random uh username and what we're going to do is just look at the process list just to
00:47:26.520
see what why you know we still haven't quite got enough rescue workers oh look there's only one of them so whichever
00:47:32.559
genius wrote this on me um didn't put in enough rescue workers so now let's look
00:47:38.680
at some some some of a little bit of how this works so all on my GitHub reper if you want to go and play it and have a
00:47:44.559
look around um so this instead of Chef it's sort of it's shell script Shell's
00:47:51.319
actually kind of good when you you know get over it so um here I'm running rake
00:47:56.800
and you can see I've only got one of them oh and look there's a fix me to add more workers so anyway um I'm a
00:48:03.119
genius and that's that's that right I I find this the ability to describe
00:48:08.480
everything exactly is just removes all the possible things that go wrong it it
00:48:14.119
makes it makes understanding the system simpler and and the less things I have to think about the less the chance I'm
00:48:21.359
going to make a mistake somewhere um there may be other things that are similar and again it's hard to play with
00:48:27.480
everything uh and if you have a tool or a set of tool chain that fits into a similar model I'd love to talk to you
00:48:32.839
about it because I think this stuff is fascinating uh I mentioned this before but your Ruby app really is is is just
00:48:38.880
another process it's not special when it comes to Ops um it's
00:48:44.280
special in that it bks all the time or bloats or does all sorts of other aberant Behavior but beyond that um so
00:48:51.839
the way this works is you know it's just processes and we're running on VMS uh aot release is fundamentally two things
00:48:57.640
set of job descriptions which as you saw is sort of it's shell scripts because they're kind of easy to write and
00:49:03.760
templates so Chef as an example uh you programmatically say whether you want a
00:49:09.280
template or not you programmatically say whether you want a package or not you programmatically say what you know what to do next chef with BOS no no no none
00:49:15.040
of that you just in a gamble file say what packages you want they'll be there you render all the
00:49:20.520
templates and why not and uh then use Shi scripts to generate everything else through start stop on monit packages as
00:49:27.680
we sort of talked about you go and get all the the source code for yourself you do your own uh installation scripts
00:49:34.680
they're all easy to do and once you've done them you never do them again so that compilation step never happens
00:49:40.680
again so we deserve these nice things then you can read the slide later
00:49:47.440
I do want for as much as I like BOS there it's not you know Perfect by any means uh and uh right what's the
00:49:53.920
Highlight there you know getting started how you got to go to run your own Bosch so if you're you know don't have a big
00:50:00.079
budget for running Amazon VMS or don't have a you know vsphere account or whatever uh you know sounds expensive so
00:50:06.640
it's not for everyone can't run on bare metal at the moment because it wants to manage VMS and attach discs and all that
00:50:11.920
sort of cool stuff um and it's a sort of single region so if you want multi- region you have to have separate washes
00:50:18.280
but here's an idea if you've described everything absolutely and have no external dependencies theoretically you
00:50:25.559
might be able to answer this question with a different answer so when that fancy Enterprise person says well this
00:50:31.359
looks fantastic can I run into my data center with this kind of tool the IDE here perhaps you could say
00:50:38.119
yes even if it's completely locked off from you and I think that's that's an interesting idea of the future of
00:50:43.599
deployment people have talked to people they say no I'd never want to sell to Enterprise you know it's hard enough running this one thing but what if it
00:50:49.960
wasn't hard to run you one thing what if deployment and management production systems made it as easy to run a
00:50:55.400
thousand of copies as it was to run one um so the summary of my talk really is this
00:51:01.280
math is is as you add more traffic you need to constantly be improving your error rates or your success rates um and
00:51:09.720
that you are going to need to you know change some of your behavior as you go along um and
00:51:16.599
you I don't want to say you're going to have to grow up except can't think of another way of
00:51:21.680
putting it um so what I think the job of Ops people in general or the or the S of
00:51:28.440
the people who care about production systems is to be constantly giving tools to to to us developers the engineers
00:51:34.760
that help us do the right thing that Bosch deploy anyone could do that once the packaging's done I mean I can figure
00:51:40.799
it out and I'm not very bright um so uh thank you very much now
00:51:46.079
the last thing is uh we are hiring because um we still have money left
00:51:52.839
and no no no um I mean we this is whil we're not doing BOS we are doing something very similar and it's uh it's
00:51:58.960
really exciting stuff uh if this sort of stuff tickles you fancy and you'd like to create the New Frontier of tooling
00:52:04.880
for for production system please uh make sure you come and work for us we'd love to have you uh we got the party on
00:52:10.599
tonight here are some snapshots of people you should look for and talk to us about how much alcohol is awesome and