Summarized using AI

Change your tools, change your outcome

Dr. Nic Williams • November 01, 2012 • Denver, Colorado • Talk

In his talk titled "Change your tools, change your outcome," Dr. Nic Williams discusses the complexities of deploying applications and managing production systems, especially within the realm of Ruby programming. He raises fundamental questions around the common practice of app-centric deployments and proposes exploring alternative perspectives such as environment-centric or system-centric deployments.

Key points from the talk include:

- Understanding Intent: Initially, developers often focus narrowly on deploying code updates without considering the broader implications on the production environment.

- Choice of Frameworks and Tools: The consideration of frameworks such as Chef or Puppet versus a more holistic system-centric approach is emphasized, questioning whether they adequately address the needs of deployment and system management.

- Importance of Deployment Processes: Dr. Williams argues for a shift in how we perceive deployment, stressing that the focus should include not just app updates but also the entire ecosystem of dependencies and services, which can introduce errors and outages.

- Error Management: As applications grow and users multiply, the error rates can increase if the underlying infrastructure is not robust enough. He presents a mathematical perspective on failure rates as systems become more complex and interconnected.

- Continuous Improvement: Developers must constantly address operational debt and enhance systems to maintain user satisfaction and application reliability, suggesting that production happiness should be prioritized alongside developer happiness.

- Holistic Tool Selection: The discussion highlights the need for tools that foster easier and more effective management of production systems, with the overarching goal of enabling smoother operations as application traffic increases.

Dr. Williams concludes by emphasizing the significance of adapting both tools and mentality to ensure that production systems function optimally, thereby reducing errors and improving developer experience. Overall, the talk reflects a critical examination of traditional practices in application deployment and encourages a more informed approach to managing complex production environments.

Change your tools, change your outcome
Dr. Nic Williams • Denver, Colorado • Talk

Date: November 01, 2012
Published: March 19, 2013
Announced: unknown

When you type "cap deploy", "ey deploy", or "git push heroku master" your intent is to deploy your local application source to your running system on the Internet. That seems to be the point - you changed your code, and you want to Just Ship It. But what is your actual objective? Is it really to just "deploy app code changes"? Is this "app-centric" view and user experience satisfactory?

Code deployment is your intent on some occasions. On others you want to change your production environments for applications, or change scale attributes of your system, or change how applications and services within a system communicate with each other, or with remote services (such as facebook).

Is "app-centric deployment" the best mental model and toolchain for shipping changes to productions systems? Or is "environment-centric" or "node-centric", enabled with frameworks like Chef or Puppet, the most powerful & effective model of the system to allow you to deploy and manage change?

Or perhaps we should describe the entire system - all the apps, all the system dependencies, all the interconnections, all the scale attributes - and command it to come into existence? To command the system to go from nothing to v1 to v2 to v3, where each version includes changes in attributes of the system.

Where should configuration/manifests/attributes go? Source code files in the config folder? PaaS configuration or environment variables? Or should components of a system dynamically discover information about itself and configure itself?

Perhaps we need the benefits of a "system-centric" build toolchain, with an "app-centric" user/developer experience to trigger deploys, with a "node-centric" experience for sysadmins.

In this talk, we will reflect on the current state of deploying production systems, including build/deploy toolchains, and continuous deployment. We'll look at the attributes of a complete system, how we explicitly or implicitly describe them and their relationships, and how to orchestrate changes in the system - from app-centric, node-centric and system-centric views.

Let's discuss the difference between deployment in month 1 and living with your system for the next 59 months.

RubyConf 2012

00:00:20.000 I love the 80s EAA all right if you're outside and you
00:00:25.080 haven't made your mind up just go and grab someone you know phys violence that
00:00:31.000 work for us all right I say we I say we kick this
00:00:36.360 bad boy off um welcome to uh to the conference I I hope you've enjoyed the
00:00:41.680 warm-up uh for the last day and a half it's been good if you could fill me in I missed it
00:00:47.879 writing slides um I uh just to the the lady doing the the hand things um if you
00:00:54.800 could do them in Australian I'm not sure if more kangaroo just or something
00:01:03.000 I um so uh I just want to introduce me introduce this topic and uh and and the
00:01:09.720 t-shirt that I'm wearing so uh over five years ago I I put a blog post up because there was a there was this thing it was
00:01:16.600 silly it was called Twitter and uh and they one of them complained a lot about ruby and rails and blah blah blah and uh
00:01:25.000 and so I wrote a blog post and I released a little bit of code which basically said look I don't I want to
00:01:30.119 make a big deal of it but I've solved all of your scaling problems um it was
00:01:36.240 silly you'd never use it but as you see at the bottom I said I wonder if they'll give me a free Twitter account I didn't
00:01:41.680 have I didn't know what it was for and uh or at least a t-shirt um and actually
00:01:46.920 so at the next rail ruof l in the year Alex Payne actually gave me this t-shirt um and I completely forgot about it
00:01:53.680 except there is a twitterbot that on your 5year anniversary apparently tells you congratulations you've had a Twitter
00:02:00.119 account for 5 years how's it g for you um so I that you know I thought I should get this do you know if you go to
00:02:05.680 Twitter now you cannot get this t-shirt I mean it's mine so off
00:02:11.080 um so uh uh more about me my background um it turns out that I have been
00:02:17.440 thinking about distributed system for a while whilst at the same time not caring at all about them very confusing State
00:02:24.400 of Mind to be in um but my actually remember my topic was something along
00:02:29.560 the line of how do you change distributed applications at runtime um in a sort of enterpris scenario where
00:02:35.080 there's lots of different participants and and it's awkward and and how do you let that application do it all itself and I highly recommend not reading uh
00:02:42.159 that 200 pages of quality text uh mostly because it was written when my one of my
00:02:47.560 PhD supervisors gave me this top tip on how do you write a PhD thesis and she
00:02:53.040 said The more boring the better you can imagine why they're not
00:02:59.159 selling uh with O'Reilly or pragmatic programmers um so but making it red was
00:03:05.720 about my only sort of daringness uh at the end so that was a long time ago um I
00:03:12.000 for a couple of years did rails Consulting but I did the app bit I made the thing I didn't make it pretty and I
00:03:17.879 didn't make it run I just made it someone else did the other bits of of doing Ops um but then I don't know I
00:03:25.159 became more interested in the space of of production apps and uh so I came to
00:03:30.280 this country whichever one it is and uh when in 4 days time it could be
00:03:36.120 different so who knows um remember to
00:03:42.080 vote whatever um doesn't matter what you do you're going to keep killing foreigners with drones so you know I'd
00:03:50.080 like to say it was safer to be here away from the drones but then everyone's got a gun
00:03:57.239 um actually if you noticed if you go to the front doors not now um right at the
00:04:03.079 bottom is the little sign little like no smoking sign that has no guns and no knives um I'm not sure how short people
00:04:10.840 are that have guns would notice that but so when know so engineer if you know know we uh one of the you know we do the
00:04:17.639 the hosting of the Ruby apps that you may be having one or two of um specifically mostly around businesses
00:04:23.440 people that have actual you know big apps they're doing big things and so um it's really fun and interesting to hang
00:04:29.440 out with those those people and I've learned more than I mean honestly I've learned stuff and everything I learned I
00:04:35.080 didn't want to learn I didn't come to engineer I to learn devops and system administration and Cloud stuff that's I
00:04:42.240 just like writing apps so in the last two years I've accidentally learned things and unfortunately due to your
00:04:48.320 choice of session you're going to now learn some of those things and uh so I I will
00:04:54.880 attempt to at least make it interesting so when I uh how I came to be thinking about this particular topic which we'll
00:05:01.039 get into was that last year I tried to convince a large number of people that you should use J Ruby and rubinius um
00:05:08.479 and I tried to focus specifically on one aspect which was threading because last year there was a lot of hoar about how
00:05:14.720 much fun it would be to do JavaScript and advented programming um fortunately
00:05:19.960 we've all got over that it was all a bit silly um and now we've gotten back to writing
00:05:26.440 apps that are both neither evented nor concurrent uh um so uh but I tried to make it really
00:05:33.919 simple I said you know you still do want advented programming and you still want threads and fortunately web apps really
00:05:40.520 easy put the the procedural stuff wrap it up an implicit thread around the request and then put a vented at the
00:05:47.000 front uh and use engine X and uh that was it you know what I thought it a
00:05:52.919 really compelling argument no one changed their behavior based on that um and so I started to
00:05:59.919 think about this idea that uh that that choosing J Ruby choosing rubinus was
00:06:05.440 perhaps more of an operations choice you were going to choose it for perhap business reasons like using less
00:06:11.240 resources um and uh uh perhaps getting better debugging out of a production system and things like that and perhaps
00:06:18.560 developers didn't make those choices or didn't want to or they knew they wanted to but just didn't for some mystical
00:06:25.240 reason and that became really interesting to me which is now kind of where this this talk starts off um so I
00:06:33.280 guess unfortunately there is another slide about me I've done lots of stuff there is a website that actually tells
00:06:38.479 you approximately how many projects you've uh well I don't know if you're dog urinated on um how many you've
00:06:46.240 participated sorry that's the word um and uh i' a lot and uh but what I've
00:06:52.120 become interested in is the idea of of resilient production systems but also ones that I could just ignore because
00:06:58.520 I've gotten really good ignor ing static systems I have a GitHub repo full of them easy to ignore you just turn off
00:07:05.680 the notifications easy as you like um but production systems don't seem to be as ignorable um and so here's my uh sort
00:07:13.879 of fun challenge for you to spot the difference between 300 code bases and
00:07:19.000 300 apps all right now I don't know if you spot it so this is my metaphor there um
00:07:25.319 sort of 300 books on a Shelf versus 300 people on the company the books are a lot more fun all right people suck and
00:07:34.000 uh me look around you're all ugly and uh
00:07:39.280 no that's not true you know all ugly and um but you know so static things are a
00:07:44.479 lot easier to to to live with and think about and uh um and so it's kind of in
00:07:50.560 the nature of our profession as Engineers we're a ruby conference Dev Ops conference we'd like to write code
00:07:57.159 and we like to write code we like to test code and it's just a really sad unfortunate sub part of our profession
00:08:04.879 that for most of us it unfortunately goes into production we don't talk about it that's
00:08:11.599 just just you know it's just annoying yes it's running and yes there's a do com and look at my code and how pretty
00:08:18.199 it is that's the important part and uh look how fast my tests are look at the
00:08:24.039 no we don't talk about the production stuff very much I mean it's a sad indictment on Ruby that sad yeah sad I
00:08:30.879 mean engine yard Heroku and those sorts of companies came out into existence created Concepts like p no one else
00:08:37.479 needed to solve these problems but we did that's how bad it was to try and run Ruby
00:08:43.080 apps so this is this this idea that that as a group we have so optimized around
00:08:48.760 our own developer happiness that I'd like you to start thinking that as your production code
00:08:54.760 bases get bigger start getting more traffic you are going to need to give up some of that you're going to need to
00:09:00.160 start to think more about production happiness because if you don't you won't
00:09:05.279 be a happy developer actually I can't prove that it just true that's the
00:09:10.880 quality uh argument you're going to get here in the next 30 minutes trust me all
00:09:16.360 right it sucks so you need to do both all right let's go through this uh just
00:09:22.680 in case you've never had a production app that's what one looks like
00:09:30.200 um traffic come from the left know which way you're looking um
00:09:38.760 and uh but it's not not just easy constant traffic sometimes it comes in impulses sometimes it comes um in sort
00:09:44.120 of large stressful batches and uh keeping your app running in those situations is uh not something you've
00:09:50.240 probably ever done before they be quite torturous to live with other things you're going to do with production systems is uh change them deploy things
00:09:57.519 which you may have seen in the title of the talk and I will come back to uh I'm going to spend a lot of time just talking about uh Tools in general around
00:10:04.399 thinking around this space and then specifically focus on the title of the talk which is what I think is the frontier of the next sort of thing we
00:10:10.880 need to be thinking about in deployment um because obviously there's a lot of arrows involved and we need to be
00:10:16.040 careful of them um and uh then we do that's called the magic move
00:10:21.560 feature of keynote it's very cool I'm going to do it
00:10:27.959 again nope Bo look at that um so uh inside your app it's a bunch of
00:10:36.120 bits I know see this is tricky because you all think your app your Ruby app is like God's gift to you know if only you
00:10:44.040 could just put that on the homepage people would buy more stuff sadly it's just another
00:10:51.040 box surrounded by other boxes which do stuff and your app talks to those boxes
00:10:56.120 and if all things go well perhaps your users do stuff um but all the boxes need to keep
00:11:02.160 working and need to talk to each other um and uh if you start to scale and if you watch some of these other talks
00:11:08.200 people saying you know one web app box is not enough we need lots more web app boxes then uh it'll look like
00:11:15.480 this and you can only imagine this is getting simpler
00:11:20.760 um but this is popular at the moment the service orientated architecture because Steve yigi told us too because he worked
00:11:27.480 at Amazon that's kind of his logic um anyway no no
00:11:34.959 I don't add SOA I just want to say that when you do this something's going to happen uh not only yet you the code you
00:11:40.920 wrote and the things you control talk to other things that you have less control of um I guess my best example is um if
00:11:49.320 you watch the Travis and no disrespect of Travis but I've noticed this they and and engine yard's own blog Twitter
00:11:56.079 account so Travis's Twitter account our Twitter account often has to make comment about how our customers can't do
00:12:01.360 something because uh for example GitHub might be down or Ruby jems might be down
00:12:07.600 so this is this idea that the external dependencies are part of our production system even if we don't control them and
00:12:13.240 uh um but nonetheless so even if you've never sort of had that picture in your head let me just give you the picture
00:12:18.519 that's actually in your head the dashboard of your brain the dashboard of
00:12:24.079 your brain pretty much says Ah something's wrong
00:12:29.399 or what it thinking is they're just the normal
00:12:34.880 errors move
00:12:41.480 on no one needs to build that dashboard all right locked in um but I just want
00:12:47.800 to sort of say do some basic math B basic math um I'm going to use the water
00:12:53.920 if anyone else would like some it's just up here not enough cups to go around but I
00:12:59.839 think on the whole most of you are probably not going to come
00:13:08.000 up even though you were invited all right hopefully you've gone
00:13:14.079 through the sums I'm not sure if you call them sums but that's what they are the arithmetic um that if uh if you've
00:13:20.600 got um a sequence of of Parts could be an app to a database an app to a caching to database whatever an app to another
00:13:27.959 app to a database um if there's some sort of success rate which there is if
00:13:33.240 you're lucky it's 100% but if you've got 100% success rate that's probably because you've had one
00:13:39.079 request and it was successful and you're feeling very confident about yourself um
00:13:44.720 but assuming there's some you know number then if they're dependent on each other then as you go through and I've
00:13:50.199 used the same number 95 95 95 uh the basic math is you multiply it out and you get a smaller
00:13:56.360 number that's the problem so when you go back and thata diagram uh as you add
00:14:01.480 bits and things talking to each other the math means that it's more likely something's going to be wrong at any
00:14:07.279 point in time uh both because of the traffic that's coming in normal traffic
00:14:12.440 impulse traffic stress traffic uh or when you change something so when you're doing deployment you're changing
00:14:19.000 something and there's a chance something will go wrong and the question is what else is going to be affected what bad
00:14:24.160 experience is your customer going to have and uh so yeah so this diagram is
00:14:29.240 not taken from any real production system just kind of looked
00:14:34.440 pretty couldn't figure out the math of that one um so this is kind of this is
00:14:39.839 kind of the high level idea I'd like you just to think about and take away is is
00:14:45.000 the things I'm talking about may not be relevant to you yet because you may not have a lot of traffic for your app
00:14:50.199 because if if say you've got a 95% or 85% uh success rate which is pretty low
00:14:55.240 so you know should fix that now um and you're only getting 100 request quests a day then you're only going to have you
00:15:02.160 got five errors 15 errors and that might be okay as it gets to th000 requests
00:15:08.440 10,000 requests whatever the next one is in the sequence um you're going to be less than
00:15:14.120 impressed with what happens um we often talk about technical debt but when all you're doing is doing
00:15:21.680 support because you've just got errors coming at you make something up um
00:15:29.880 then you know you that's me I start to think of this as sort of operational debt your system is causing you so much
00:15:36.199 headache and pain that regardless of how pretty your Ruby code is especially when it's got syntax highlighting um life is
00:15:42.399 going to be not very fun and so as you get more traffic the requirement for you
00:15:48.040 is that you have to constantly be improving the success rate and definitely not making it worse um which
00:15:53.880 may or may not be what we think about and that's kind of you this high level idea so uh the unfortunate part is0 two
00:16:01.040 which I tried to hide in the middle and then pointed it out let's go
00:16:06.160 back I'm gonna hide it more better this all right all right you can't see it now oh
00:16:12.079 no it doesn't work um so uh in order to get to this we are going to have to think as as your company becomes more
00:16:18.120 successful as your product your thing gets more probably and success is probably traffic um you're going to have
00:16:24.319 to think more and more about how to make that system happy which mathematically well let's call it you know reducing our
00:16:29.800 error rate or increasing our success rate um and my hypothesis is that's going to make you less
00:16:36.079 happy so let's just get over that and let's get on the rest of the talk but moving on to point three uh and really
00:16:42.600 I'm really hoping is that the people who are primarily interested in production happiness which as a group aren't
00:16:51.199 here it's their responsibility our responsibility to give you
00:16:56.839 tools that help you do the right thing by the production system and that's that's this from next Frontier of
00:17:04.000 deployment how fast you can ship an app into production it's not very
00:17:09.559 interesting if all you're doing is is keeping the error rates the same if the traffic keeps going up you have to be
00:17:14.959 thinking about how do I keep reducing that traffic and and uh we may have to slow things down a little bit so I just
00:17:21.360 want to break this out and just just in case you've never wondered why you're got a job um how it is that someone else
00:17:28.319 can afford to give you money to do your profession I just thought I'd go through it for a bit um so uh um
00:17:35.640 soest there's arrows let's go through the arrows first that's terrible animation actually I feel like fixing it
00:17:41.880 now um so the arrows so we make code just in case you haven't seen this before you make code you put it into
00:17:48.480 production and if you're lucky you get to be in business right there I don't want to
00:17:55.360 over summarize it but U um and unfortunately though the
00:18:00.480 priority of your users stakeholders and all that sort of stuff is that you in business doing a service or a product
00:18:07.559 that's useful and valuable to people they are able to find it they are able to determine this is the thing they want
00:18:12.880 to use at the time that they're having the problem that's that's important
00:18:17.919 otherwise your pretty syntax highlighted Ruby codes are relevant um so in order to do that this is the magical step
00:18:24.440 number two you need to have a production system where you wrote the code or not
00:18:30.440 isn't important that your production system can provide the value that to the people that want the value at the time
00:18:36.520 and place that they think they want the value that's important now for most of us we wrote the
00:18:42.360 code yet it's third on the list of important things so awkward uh
00:18:51.640 nonetheless so if we do number one and two well we get to have jobs to get to do number three um so uh you might think
00:18:59.400 well we should all care about all things well the fell Mr Smith
00:19:05.159 um he had this idea that you know amongst other very good ideas you know three four five was it 500 years AG four
00:19:12.080 this is an old book 300 years ago of division of labor that if we specialize we can be better at our thing and it
00:19:18.520 turns out if our profession wasn't hard enough um probably best we do specialize in being developers not doing
00:19:24.720 development and maintaining production systems and trying to make money so I'm not trying to I would like you all to
00:19:31.480 care about production whilst writing code this slide is me admitting that
00:19:36.840 it's tough all right so what I would rather solve the problem is by um and I
00:19:43.480 over summarizing what i' rather try and solve the problem is by pushing tools towards developers and educating you on
00:19:49.840 choices that will make the production lives better at perhaps a small cost to your developer life um so okay this is
00:19:56.840 how most businesses might split themselves out back in those three categories salespeople Ops people and Engineers um but and this is kind of the
00:20:04.120 I guess I already had this slide in my head so I should share it first um we all have different fundamental priorities we are paid and we get joy
00:20:11.360 out of adding features curating the app making it look nice doing all the sorts of things and certainly most of our
00:20:17.280 stakeholders think that's what we should be doing um but implicit and priority number two is that we have a system that
00:20:23.520 has is up certainly from our users perspective right that's that's kind of what they expect at the time they want
00:20:29.799 to use it it should be there um and the business priority is to make users happy which hopefully makes money um so hands
00:20:37.880 up if you think one of these columns is
00:20:44.799 correct all right actually let's start again read the
00:20:49.840 chart then we'll play the game that didn't go very well did
00:20:55.880 it when I tested this in my head everyone's a lot more
00:21:01.440 responsive all right um these these two lists are kind of what everyone thinks about you know it cares about and you
00:21:07.960 know summarized down I mean when I wrote it down it was a lot longer but I figured a few points in bigger font better than my handwritten scroll um and
00:21:15.600 unfortunately they don't really go well together and we we all implicitly know this so um this is this is what I'm
00:21:21.880 about to now talk about is what can we do what can we give developers um from a deployment perspective because that's
00:21:27.200 kind of what I care about um where it's easier for you to do the right thing than not
00:21:32.840 to um choosing J Ruby turns out to be a little bit hard some reason I don't know you should just use it grow up um
00:21:41.559 and uh but I understand and I'll talk about the distinction why M why people do one versus the other um but this is
00:21:48.360 what I want to talk about I'm going to do a demo of of a piece of software that I think is really interesting when you get to scale uh I'd certainly like to
00:21:54.559 see engineer out have a product that's similar to it that's what we're working on um and so there's the uh there's this
00:22:00.400 idea that the tools that you've chosen I use tools to represent everything libraries languages tools um um they
00:22:08.559 were all built to solve a problem but they come with consequences and if you get too much in
00:22:14.679 love with the good part you perhaps ignore or or dismiss sort of in a DHHS
00:22:20.760 fashion the consequences that wasn't a joke he does that all the time um
00:22:29.039 exist really good at it and you go yeah yeah that's that's right we should all own a car that drives 300 km an hour
00:22:34.600 traffic a lot faster he's never said that but I've thought it and
00:22:39.640 anyway that was just stupid so yeah so as I just said every tool has some purpose for why it was written and we're
00:22:45.960 going to go through with some examples and uh they come with a thought model that they're sort of imposing on you or
00:22:51.080 hoping you adopt in order to make the tool make sense um and uh but in in in
00:22:56.559 adopting the tool it makes something else harder um for example I thought no no
00:23:03.840 I'll come to that example it's in a second so here's an example I like people track I think it's fantastic looks very pretty um it's you know this
00:23:10.200 idea of well look you know you've got that that that stakeholder that that owner that says yes we should do X comes
00:23:16.279 back three days later let's do y you say the letters of the alphabet that's stupid
00:23:22.440 um and uh so you know and and so the solution to some degree is to well let's write them all down let's put them in an
00:23:29.080 ordered list and let's just see where X and Y turn up and whether they're going to fit into the budget this is I think
00:23:35.600 you know agile in general I think is fantastic um getting your system into
00:23:40.960 production and finding what errors turn up and fixing them is just the right thing um but what's interesting of this
00:23:47.159 tool is there's not really a way of of talking about constant things that should do constant things that shouldn't
00:23:53.600 do like crash and another thing of list that I've apparently not put on the
00:24:00.200 slide I mean who doing checking my slides you can't just have a list that
00:24:06.640 has one item in a comma um I got to add lib this
00:24:13.559 um now I've done that stupid thing now I can't remember the rest of the list let me get my notebook out no anyway uh you
00:24:19.600 know it shouldn't crash it shouldn't destroy small planets it shouldn't um it shouldn't make your you know customers
00:24:25.880 cry and hate you and talk about you negatively there's lot things but it's just as a tool it's hard to sort of mention that and and allocate time to
00:24:32.720 doing these things so why do we not do some of the things we're going to talk about and use the tools because perhaps
00:24:38.039 some of the tools don't infer that we should spend time that way um so you may need to take the tool subvert it
00:24:44.000 slightly and put in every week spend time as chores let's do some log looking
00:24:49.399 you know let's look at some logs see what we find let's go back through all our exceptions all four million of them
00:24:55.679 in you know air break let's pick out a good one fix it um chef chef was built for a purpose it
00:25:03.320 was built by a group of people who had got sick and tired of provisioning you know static servers and how quickly and redoing them
00:25:10.000 those servers were there um ultimately Cloud came along and it became useful then but it couldn't provision anything
00:25:15.960 um but it turns out for all the greatness of Chef that it still it's you know you still need to know how to
00:25:22.000 configure my Sequel and that's so what did Chef do it just hid that for a
00:25:28.039 bit so um and uh so what am I saying I'm
00:25:34.240 saying that Chef is great but it comes with you know there's it hides things there's still things you still need to do um what's another one Ruby
00:25:42.120 specifically M because I do want to make a comment J Ruby um may never quite get off that bandwagon so Ruby was a
00:25:49.039 fantastic language right that's we're all here and uh I'm not going to play that game where I put ask you put your
00:25:54.159 hand up because you're terrible at it um got you've lost hand putting up
00:26:01.320 privileges stop it right bloody Pals um and uh so what was my point my
00:26:09.919 point is I mean Ruby was written and with a with a a thought model of we should be happy as we go about a
00:26:16.480 profession as we write our code and then someone came up with syntax highlighting and world was complete um but it's not
00:26:24.760 really was never really designed or had the thought process around Long running production
00:26:29.919 systems I mean it can't and no disrespect at all I mean it was that's you know um and so uh but there's also
00:26:37.799 other things that are part of the the community that we have the ecosystem the thought processes we all bring to code we all agree on code should be pretty
00:26:44.640 and I'll tell you what's ugly handling errors so let's not do that let's let
00:26:50.360 those bad boys go straight up to the top they you're very excited about that
00:26:56.720 um and uh I mean you know I think Matt amini mentioned it I mean I really only
00:27:02.679 thought about this when I saw Blake misseri do a talk on go and so compared the sort of things from go that you could bring to Ruby and one of the ideas
00:27:09.120 in go as a language is dealing with errors immediately and uh when you start to
00:27:15.480 think about that is is exactly what you should do as you accumulate you start to
00:27:20.600 realize you know exactly why this error is here when you start throwing it up to the stack it doesn't know why the exception what the relevance is what it
00:27:26.360 should do about that how to handle it it's completely subverted the context and and said you know this P code now
00:27:32.520 you now need to know about what that was doing and that's not very nice in capsulation so dealing with errors locally helps manage error rates and
00:27:40.760 unfortunately it's going to make you code ugly so this I guess is my prime example
00:27:46.320 of the idea that as you care about production as you want to keep improving your success rates and reducing error
00:27:52.279 rates you're going to need to do some things that perhaps are not very Ruby like handle errors
00:27:59.679 so every day put another if statement
00:28:04.760 in um I mean I I'm becoming not a fan of exceptions in general unfortunately all
00:28:10.440 the libraries are use exceptions as a form of declaring that something went bad um go I I kind of like that idea of
00:28:16.840 another you know Objective C you you can all right I mean it's uh
00:28:23.200 so yes I should finish that thought and then go back to just the idea of exceptions don't like so yes so theide idea as just said is to put your rescue
00:28:30.919 blocks explicitly in that context um there just still something about
00:28:36.080 exceptions that uh I'd rather they came through the standard medium of a method which was uh either the read write
00:28:43.120 parameters or the uh response so the idea of returning a real value and perhaps an error value if necessary I I
00:28:50.159 just think that is um it makes how do I put it actually the way Blake mentioned
00:28:55.399 it was he said the word exceptional or the word ception makes it sound like exceptions are exceptional they never
00:29:01.720 happen they happen all the time right when you do distributed programming you know the boxes talking
00:29:07.640 to each other the moment you add that second box congratulations you've got distributed programming um like a
00:29:13.000 database and it's not always there but that's not exceptional you know it's not
00:29:19.279 going to be there sometimes you know it sometimes it will be out and you know
00:29:25.279 that because you put your app on Amazon Us East
00:29:33.440 so just not all have oh it's Amazon's fault no it's your
00:29:39.039 fault listen for that error which will happen again uh even if you move
00:29:44.960 somewhere else um and uh and make your out behave in an appropriate
00:29:50.720 fashion don't I mean yes you want to fix it and get back up and running but I mean you can't stop physical
00:29:56.679 Hardware having problems your code can handle it um all right so that's uh
00:30:02.559 really really rabbit on about that one for a long time um so from the Ruby on Rails website it says web development
00:30:08.760 that doesn't hurt no mention about production um so I think I've covered
00:30:14.960 this topic for a while but um no actually I haven't let's do it again so uh I heard a good tip for for dealing
00:30:21.840 with DBS as your DB as your SQL database scales you're going to think about sharting one of the things that makes sharting hard is when you do long joins
00:30:28.399 between tables so don't be you know be conscious of of of
00:30:33.600 the join as you start to get you know more production orientator start to scope down and think about your app in such a way that you don't have joins
00:30:39.600 across seven tables because you're not going to be able to sh that um but rails
00:30:45.000 with its beautiful uh active record um syntax makes it really easy and fun you
00:30:50.360 want to join everything it's like you want to get back in a circle hey I'm back to the user model again I be a
00:30:55.960 prize um not very good for production so um uh platform is a service uh you we
00:31:04.320 you know make deployment really easy let you describe all your Universe things that I like a lot uh but not really a
00:31:10.200 lot of assistance in the context of apps talking to each other this idea ofs platforms should make this easier to
00:31:16.720 deal with failure of in in you know systems that talk to each other um so
00:31:21.880 you know just because you should still use engine no one else fixes this problem either so just use engine yard
00:31:30.120 you just keep sign languaging to the one person I at least want that person being a customer by the
00:31:40.919 end um all right all right so let's get back to production so there that was sort of my summary of tools in general
00:31:47.600 uh and thought process just to sort of get you to think about this idea um I do want to talk about deployment specifically I just don't want to scale
00:31:53.679 it down to this one topic um so I don't mentioned before I don't know how to
00:31:58.840 prove or suggest or infer that if your production system is happy you'll be
00:32:03.960 happier um so we'll skip it and just go with it um so here's a couple of things
00:32:10.519 that you might like to choose which are going to affects your development life a little bit but at the benefit of having
00:32:17.880 a better production life uh J Ruby being on the jvm has a whole bunch of really cool tools inspecting uh getting
00:32:25.159 snapshots of what was going on at the time that's blocked up or whatever um obviously threading but I don't want
00:32:30.559 to talk about threading that's not going to make you buy anything I found that out last year don't do threading don't
00:32:37.919 don't I dare you I dare you not to don't there I win it's like telling a dog that's already lying down lie stay
00:32:45.600 ah I'm an excellent owner um so uh but the cost of using J or one of them is
00:32:52.720 that you know all that sort of just day-to-day usage of Ruby that your scripting usage it gets slow lower and
00:32:59.159 that's a little bit annoying and uh so this is sort of that main example that idea that something
00:33:05.159 that's good for production you as developers or let me put this another
00:33:10.440 way all the people around you because you've changed your mind already um you know choosing perhaps an
00:33:19.080 inferior production Choice perhaps um because because you'd
00:33:25.440 rather have the one that's good for you as a developer you want the Z fast one when you're run your eight task but it may not be the good one for in
00:33:31.360 production um so the other reason J I mean this is an example if you've never seen a a screenshot of visual VM a whole
00:33:37.840 bunch of interesting introspection of what's going on so as you put that chore
00:33:42.880 on your weekly list of things how we're going to make our system better throwing up visual VM and watching for for
00:33:48.480 interesting data is is one thing um bunch of other Cool Tools um so Q's
00:33:54.559 using Q's rather than so of blocking htb htb API are awesome I mean you can use
00:34:00.320 Curl and play with it I love it um but blocking not good especially with your
00:34:05.880 non-concurrent apps that you're all writing so so perhaps investigate an
00:34:11.520 asynchronous queuing um what else uh logging there's an excellent especially
00:34:17.320 if you can have multiple systems which you already have because you have an app talking to database pulling all those logs
00:34:23.200 together initially will just be ugly and you'll go well what was the point of that well then go back to the app and
00:34:29.200 start to curate interesting useful logs that tell you a story so perhaps you can search for a user and watch what that
00:34:36.119 user was doing throughout the system um people uh so out of all the
00:34:41.839 sort of app service companies that have come out recently sorry I start that again Splunk is worth a truckload of money Splunk is anyone heard of
00:34:50.440 Splunk um cool I did that judge I did actually ask that question you're terrible at this uh put your hand up the
00:34:57.320 wrong time put it up the right time um spiteful that's what you are and so I
00:35:03.320 was a Java 1 and did a quick talk on log stash and asked that same question and
00:35:08.480 uh no one had heard a Splunk so I I just told a joke to them and I'm going to
00:35:13.839 tell that same joke but I have to set it up and you answered
00:35:20.000 incorrectly but the joke was the reason that you couldn't uh you don't know about it is because you can't afford it
00:35:25.520 um spank is really really expensive and uh but log stash and Cabana are open
00:35:30.560 source and uh um and and really quite interesting for you to look at but there's also paper trail loggly and and
00:35:36.680 use them start putting logs in there start to get a feel for the flow of your system you know you don't really have
00:35:42.560 perhaps in your mind a good me mental picture of what how your app behaves in production really and logging is is one
00:35:49.000 and watching all the events is one way to do that the trick is obviously you have to spend time looking at them um
00:35:55.119 Bosch so Bosch is something I like I've liked and Bosch solves this interesting problem of of of deploying entire
00:36:00.960 systems and knowing what's happening as opposed to you know perhaps the chef mentality of well a node came up and it
00:36:07.920 became something hopefully good um Bosch has this more declarative you I like the
00:36:14.640 sort of totalitarian Master of the Universe I will tell all the VMS what they are and they shall be happy for it
00:36:21.480 um and I I I've started to really appreciate the value the mental the mental thought that's gone into Bosch um
00:36:29.000 and uh for the lack of there being a commercial tool that I my company has produced I will keep talking about this
00:36:35.040 uh until we have one and then I'll have to stop um so I'm actually going to do a
00:36:40.240 demo of B because I I think the thing I would like you to think about as you get bigger as a company this idea of being
00:36:47.400 absolutely declarative not having reducing the number of external dependencies of your production system
00:36:53.960 things that could go wrong because as we talked about we need continually reduce those so Bosch may not be for you today
00:37:01.520 be running a couple of dinos on Heroku if you've got a couple of VMS with engine yard you're good right but as you
00:37:07.800 get bigger and you want you know you've got more and more requests you're going to this is one of the things have to think about um so uh let's let's have a
00:37:14.839 look so the example app is gitlab HQ um which is uh I chose because it's
00:37:21.560 kind of a rails app it's got a few moving Parts in fact here's my uh my my architectural diagram it's a rails app
00:37:27.599 right we then also got g g Gite which is uh does the sort of the git stuff um and
00:37:34.119 so we're going to deploy this as a sort of a complex system and it's it's going to be awesome look I even found an icon
00:37:41.319 that makes you want to click it but stay seated everyone don't fall
00:37:46.760 for the Trap of the large touchscreen monitor um all right so uh this this I'm not
00:37:53.800 even going to pretend to type for 10 minutes look no hands so what we do is we're going to uh deploy this so I have
00:38:00.400 already written a description of how you deploy gitlab HQ and uh it's going to BOS release now we can look at it for
00:38:06.240 anyone that's interested if we have time we can look at it but um um it has the source code to gitlab HQ the source to
00:38:13.920 Gite uh subm moduled all the other dependencies you'll see are now being
00:38:19.000 downloaded now they're not being downloaded from some magical other place I've already downloaded them once from
00:38:24.880 the magical other place and they're are my S3 account because we're trying to reduce external
00:38:30.640 things that could go wrong um and uh and once I pull them all down and we create
00:38:35.680 a Bosch release and put it into my Bosch that's it that's the last time we go out to the rest of the world for the rest of
00:38:42.240 that app's life um unless we want more dependencies like you know a new version
00:38:48.680 of Ruby or something I haven't touched this for a while new version of postgress anyway
00:38:56.040 let's move on stop looking at version number I feel like I want to type something
00:39:03.440 but so fake I should pretend oh no
00:39:09.240 anyway um you can make screencasts that go wrong really fast and don't have the
00:39:16.040 inconvenient and awkward pauses but then you don't learn anything so we'll have the inconvenient awkward pauses um so
00:39:22.079 what it's doing now it's downloaded all the assets that I need because we're not going to use app G
00:39:27.960 or you know those packages because they're external you could build this system using that concept of of you just
00:39:34.079 have your own uh you know appet repo um and so B BOS does it all itself
00:39:40.880 so uh now what we're going to do his I can't remember we are GNA is that
00:39:51.200 me sweet I'll be back
00:39:57.839 God Jesus Christ um okay so what we've done the context here is we're actually on um not
00:40:04.599 my machine but just a VM um a lot faster if you do it in the cloud and uh so we've pulled all these things down from
00:40:10.440 S3 we've turned them into a sort of a big tball and now what we're doing is we're uploading it to Bosch Bosch is not
00:40:16.119 a like a command line just a pure command line it's a running service bit like a path might be and uh so what
00:40:23.160 we're doing is we're uploading it and now Bosch has the big tube of t-shirts
00:40:28.880 what stop it distracting me um it's unpacking those things now at this time
00:40:35.800 what we have is so it now knows about a release called gitlab HQ uh it and I've
00:40:41.920 it's there 71 five I mean seven sorry s 7.1 Dev um so now what we need to be
00:40:48.920 able to do is deploy it so there of this two main Concepts a release and a deployment manifest a deployment
00:40:55.440 manifest is that is awkward all right I keep thinking it's something that's awkward and you're hearing it from
00:41:02.640 me so a deployment manifest is a big gaml file which is great because it's text and you can read it awkward because
00:41:08.839 you know I really would rather have that uh much I love yaml sometimes I wish there was a schema I could validated
00:41:14.560 against um and you know probably should shouldn't say that out loud um but I
00:41:22.160 want XML back so uh we sorry I really talked through that so what we're doing now jobs uh the different moving parts
00:41:30.599 or sort of for the most part you can think of them as one vm's worth of work uh you can merge them into the same VM
00:41:36.880 but that's kind of the context and so you can sort of see them as those boxes I had git light redis gitlab and rescue
00:41:44.280 which isn't you know sorry Mike I I can't believe I haven't moved to sidekick I'm I'm insulted on your behalf
00:41:51.520 and um and then the last part which we skipped over is all the sort of the the parameters the arguments they're going
00:41:57.400 to go into all the templates so they're all in one place like a datab bag if you're doing chef and I'm not really a
00:42:03.960 puppet person so um if you want to sort of tell me about puppet relative to this uh that'd be great so now we've just
00:42:10.200 told the command line tool this is the deployment manifest of what I care about and the way Bosch does deployment is it
00:42:15.920 sort of says well what have you already got what do you want and I'll go get that for you the first time you've got
00:42:23.599 nothing so it looks like it does everything well it does right so first thing it's going to do is compile stuff
00:42:29.960 this I think this is fantastic uh it comes built in with its own packaging binary packaging thing against whatever
00:42:38.079 the base operating system that you have got so in this case it's some version of a buntu but it's your version of a buntu
00:42:44.480 so there's no chance of packages having being built on a slightly different uh environment being applied to yours
00:42:50.599 they're going to be built in exactly the same environment um so as they're compiling here this is my uh Amazon
00:42:56.040 account you can see 4 VMS available for compilation um it finished and we moved
00:43:03.559 on that was terrible um oh so what we're doing now is while it's running because one of the packages was a pearl and
00:43:10.960 whoever built Pearl had a lot of spare time it takes like 30 something minutes
00:43:16.160 to build on an M1 medium um so this is bit of Bosch's tooling you can see what
00:43:22.359 you know processes are running uh you can go on from another machine you can go and watch that process just as as if
00:43:29.200 you were doing it when you did it yourself so teams can watch a deployment um that's kind of interesting uh nearly
00:43:35.720 every action is is sort of uh a rescue job so to speak and you can watch it um and also all all the log information for
00:43:42.920 that is kept and you can go and get it so there's lots of different ways to look at
00:43:48.200 tasks let's just take a moment to appreciate that I didn't make you watch Pearl being compiled for 30 minutes are
00:43:54.440 you getting why I did a screencast yeah because it's stuff takes forever
00:43:59.480 um one of the things we have to give up so this is sort of just another way of looking at that data so this is raw data
00:44:04.800 that you could perhaps build a a nicer interface on top of should you wish to certainly one of the things I've been
00:44:10.480 working on for my own Amusement all right so now we've finished compiling packages now we're going to boot up some
00:44:16.119 VMS why are we booting VMS because that's where stuff runs and that's what we put in the the deployment manifest we
00:44:22.920 go back to it you'd get to see that that we had five VMS one for each of the different uh jobs Bosch jobs and so now
00:44:29.680 we're booting them up God this takes forever even when you cut it all out and make it faster I'm so
00:44:37.839 impatient and there's really no funny jokes to tell about FM's
00:44:51.520 booing your patience is I can get the cannon get the cannon
00:44:57.880 Squad of three and a decoy Squad go um so uh it takes a little while but eventually you know git laab is running
00:45:03.880 and so I think this is really interesting I was able to deploy yeah I mean it's the Manifest is a bit icky but
00:45:10.000 at least you know it's an interface you could build a tool that made it easy to work with the yaml Manifest this is the
00:45:15.480 VMS running um you can see on the right four of them have elastic IPS assigned to them uh that does Bosch now does have
00:45:22.280 uh an internal DNS so I didn't really need all those elastic IPS anymore but I haven't figured out how to make that
00:45:28.000 work yet um and ah so now what we're going to do is we're going to change it
00:45:34.200 the deployment in this case in uh there's a deploy in BOS in language and certainly way I think about it is it
00:45:40.960 could be changing scale attributes bigger VMS more VMS it could be changing
00:45:46.280 uh configuration um or it could be changing the release so we may have cut out a new
00:45:52.680 release of some software which might be a new version of postgress might be a new version of web app new version of
00:45:58.640 one of the things that's part of your system BOS doesn't think of your Ruby code as all that
00:46:04.559 special uh just just to get the gist of it uh your Ruby code is just a thing that runs so uh here we're going to add
00:46:12.559 an extra rescue thing we got we've you know somehow the rescue workers weren't very uh productive and so the solution
00:46:20.720 for rescue is to just add more resources um or use sidekick and I'm sorry Mike
00:46:26.760 for for not putting a sidekick slide in there please in your mind put in a sidekick slide move to sidekick it's
00:46:33.680 good um so you can see it's sort of prompted this is the Delta do you want me to do this yeah good and off it goes
00:46:40.760 adding a new VM turning into rescue and uh Bob's your uncle it does take a lot longer than
00:46:48.319 this now we get to see the extra VM look perhaps this doesn't impress you but if you've ever have to manage your own VMS
00:46:55.079 this is awesome um another tool it has is built-in SSH so it will go off to
00:47:01.520 that VM create a random username account for the purpose of that one session uh
00:47:06.960 you give it a a password for this session for sudu so you can change it each time if you want it doesn't do
00:47:14.079 auditing which is a nice sort of you know what did that person do whilst on the VM that would be a nice feature doesn't have that yet uh so you can say
00:47:20.920 that random uh username and what we're going to do is just look at the process list just to
00:47:26.520 see what why you know we still haven't quite got enough rescue workers oh look there's only one of them so whichever
00:47:32.559 genius wrote this on me um didn't put in enough rescue workers so now let's look
00:47:38.680 at some some some of a little bit of how this works so all on my GitHub reper if you want to go and play it and have a
00:47:44.559 look around um so this instead of Chef it's sort of it's shell script Shell's
00:47:51.319 actually kind of good when you you know get over it so um here I'm running rake
00:47:56.800 and you can see I've only got one of them oh and look there's a fix me to add more workers so anyway um I'm a
00:48:03.119 genius and that's that's that right I I find this the ability to describe
00:48:08.480 everything exactly is just removes all the possible things that go wrong it it
00:48:14.119 makes it makes understanding the system simpler and and the less things I have to think about the less the chance I'm
00:48:21.359 going to make a mistake somewhere um there may be other things that are similar and again it's hard to play with
00:48:27.480 everything uh and if you have a tool or a set of tool chain that fits into a similar model I'd love to talk to you
00:48:32.839 about it because I think this stuff is fascinating uh I mentioned this before but your Ruby app really is is is just
00:48:38.880 another process it's not special when it comes to Ops um it's
00:48:44.280 special in that it bks all the time or bloats or does all sorts of other aberant Behavior but beyond that um so
00:48:51.839 the way this works is you know it's just processes and we're running on VMS uh aot release is fundamentally two things
00:48:57.640 set of job descriptions which as you saw is sort of it's shell scripts because they're kind of easy to write and
00:49:03.760 templates so Chef as an example uh you programmatically say whether you want a
00:49:09.280 template or not you programmatically say whether you want a package or not you programmatically say what you know what to do next chef with BOS no no no none
00:49:15.040 of that you just in a gamble file say what packages you want they'll be there you render all the
00:49:20.520 templates and why not and uh then use Shi scripts to generate everything else through start stop on monit packages as
00:49:27.680 we sort of talked about you go and get all the the source code for yourself you do your own uh installation scripts
00:49:34.680 they're all easy to do and once you've done them you never do them again so that compilation step never happens
00:49:40.680 again so we deserve these nice things then you can read the slide later
00:49:47.440 I do want for as much as I like BOS there it's not you know Perfect by any means uh and uh right what's the
00:49:53.920 Highlight there you know getting started how you got to go to run your own Bosch so if you're you know don't have a big
00:50:00.079 budget for running Amazon VMS or don't have a you know vsphere account or whatever uh you know sounds expensive so
00:50:06.640 it's not for everyone can't run on bare metal at the moment because it wants to manage VMS and attach discs and all that
00:50:11.920 sort of cool stuff um and it's a sort of single region so if you want multi- region you have to have separate washes
00:50:18.280 but here's an idea if you've described everything absolutely and have no external dependencies theoretically you
00:50:25.559 might be able to answer this question with a different answer so when that fancy Enterprise person says well this
00:50:31.359 looks fantastic can I run into my data center with this kind of tool the IDE here perhaps you could say
00:50:38.119 yes even if it's completely locked off from you and I think that's that's an interesting idea of the future of
00:50:43.599 deployment people have talked to people they say no I'd never want to sell to Enterprise you know it's hard enough running this one thing but what if it
00:50:49.960 wasn't hard to run you one thing what if deployment and management production systems made it as easy to run a
00:50:55.400 thousand of copies as it was to run one um so the summary of my talk really is this
00:51:01.280 math is is as you add more traffic you need to constantly be improving your error rates or your success rates um and
00:51:09.720 that you are going to need to you know change some of your behavior as you go along um and
00:51:16.599 you I don't want to say you're going to have to grow up except can't think of another way of
00:51:21.680 putting it um so what I think the job of Ops people in general or the or the S of
00:51:28.440 the people who care about production systems is to be constantly giving tools to to to us developers the engineers
00:51:34.760 that help us do the right thing that Bosch deploy anyone could do that once the packaging's done I mean I can figure
00:51:40.799 it out and I'm not very bright um so uh thank you very much now
00:51:46.079 the last thing is uh we are hiring because um we still have money left
00:51:52.839 and no no no um I mean we this is whil we're not doing BOS we are doing something very similar and it's uh it's
00:51:58.960 really exciting stuff uh if this sort of stuff tickles you fancy and you'd like to create the New Frontier of tooling
00:52:04.880 for for production system please uh make sure you come and work for us we'd love to have you uh we got the party on
00:52:10.599 tonight here are some snapshots of people you should look for and talk to us about how much alcohol is awesome and
Explore all talks recorded at RubyConf 2012
+46