Ruby Conf 2011 Release Early and Release Often: Reducing deployment friction


Summarized using AI

Ruby Conf 2011 Release Early and Release Often: Reducing deployment friction

Andy Delcambre • September 29, 2011 • New Orleans, Louisiana • Talk

In this talk at RubyConf 2011, Andy Delcambre from Engine Yard discusses the importance of reducing "deployment friction" in software development and how to achieve a streamlined deployment process through automation and efficient practices. Delcambre emphasizes that the deployment process encompasses not just deploying code but the overall development workflow from coding to testing and merging. He shares insights into Engine Yard's practices around continuous integration, ticketing, and collaboration that help minimize friction throughout the development process. Key points include:

  • Efficient Ticketing System: At Engine Yard, work begins by filing tickets using a flexible ticketing system called Utra, which allows for a customizable workflow. Team members assign tickets to themselves to indicate readiness to work on them.
  • Use of Git: Each developer checks out branches that correlate with ticket IDs to facilitate tracking of work and commits. Tags linked to commits help maintain context.
  • Pair Programming: Engine Yard primarily employs pair programming, aiming for about 80% of their development time, enhancing collaboration and knowledge sharing.
  • Continuous Integration (CI): The team developed an in-house CI system called Ensemble that integrates closely with their workflow, allowing them to continuously test and merge code efficiently. They rely on CI for comprehensive testing instead of running full test suites locally.
  • Automation of Deployment: The deployment process is designed to be frictionless, requiring only a single command to deploy changes once code is merged. The process includes checks to ensure code is "green" (error-free) before deployment.
  • Zero Downtime Deploys: They utilize strategies to allow for zero downtime deployments, which enables pushing updates without affecting users, thus they implement techniques to manage database migrations seamlessly.
  • Feedback Mechanisms: Notifications through tools like Campfire provide feedback on the deployment status and integrate alerts and issue tracking into the workflow.
  • Fast Deploys: The team has managed to achieve up to 44 deploys in a month, with daily updates to ensure continuous improvement and deployment processes.
  • Conclusion: Delcambre emphasizes the need for each team to define their processes, focusing on how to connect and automate them to effectively reduce friction. This approach not only improves productivity but also enhances the overall quality of code deployments.

Overall, Delcambre’s insights provide a valuable framework for developers looking to improve their deployment practices and integrate CI into their workflows for better efficiency and success.

Ruby Conf 2011 Release Early and Release Often: Reducing deployment friction
Andy Delcambre • New Orleans, Louisiana • Talk

Date: September 29, 2011
Published: December 12, 2011
Announced: unknown

At Engine Yard, we release the main AppCloud code base at least once a day, many times more often than that. Yet we still have a fairly rigorous testing and release process. We have simply automated and connected as much of the process as possible. In this talk I will discuss how we do our deployments and how it ties in with our continuous integration service, and how we automated and tied it all together.

RubyConf 2011

00:00:18.279 started here we're still filtering in a little bit so uh I'm talking a little bit about
00:00:25.720 sort of reducing deployment friction how do how the release process works to make that happen uh my name is Andy delcom
00:00:31.960 it's not fonetic it does rhyme with welcome this might be one of the few cities that that's not necessary to say but probably still is um I'm a delom on
00:00:41.280 Twitter on GitHub on Flickr Etc most places on the internet you can find me there and I work for engine yard so we
00:00:48.879 uh I work on the engine yard Cloud product um we host mostly Ruby and rails applications on ec2 on the cloud and we
00:00:56.800 also have a managed hosting uh offering as well um I work on the cloud team but
00:01:01.879 I also work largely on internal projects so things for uh sales Finance support Etc but also the engineering team and
00:01:08.520 this is a lot about what I've been working on with regards to the engineering team tooling I live and work
00:01:14.040 in San Francisco um it's a beautiful city if you haven't visited before you should this is from near my apartment so
00:01:21.320 um the subtitle my talk that I submitted is reducing deployment friction and as I was sort of developing this talk I was
00:01:27.439 realizing that that the deployment bid is not is only part of the story it's not really just about deployment
00:01:32.799 friction it's about devel development friction as a whole it's it's not just
00:01:38.399 about the final step of deploying the code but it's about how you get there from all the way from writing the code
00:01:43.600 to merging it to getting it out to to um testing it Etc all these things so the
00:01:50.680 whole process to reduce friction in that to get to ease the transition or to ease getting code into production basically
00:01:57.159 so again like I said I work at engineered and this is basically how we do it at engine yard and the point of this talk isn't so much that you should
00:02:03.799 do it like this it's more about defining the process discovering the process that you have and then automating it so that
00:02:09.959 way however you do it can be done faster and better and well uh with less
00:02:15.760 friction so um the first thing that you might do when you're sort of ready to
00:02:21.560 start working on a task is uh file a ticket at least an engine yard everything has to have a ticket to be
00:02:27.560 able to be worked on there's a few exceptions but in general you have to have a ticket before you start working
00:02:32.959 on things um we use a ticketing system called utra how many of you guys have even heard of UT
00:02:39.959 track I really like it I had never heard of it until I started investigating uh ticking systems it's pretty good it's
00:02:46.319 very flexible um it's uh kind of Enterprise it's a big Java thing but uh
00:02:52.040 it's really Snappy it has a good search interface um if you are looking for Tak system I would recommend looking at it
00:02:57.800 so this is a a sample ticket that you can kind of track through this process
00:03:03.720 the important things to note here are on that left side there all of those all of those columns are all those fields every
00:03:09.680 single one of those is customizable completely customizable you can add more custom ones you can remove any ones you don't like you can customize all of the
00:03:16.560 uh States in those fields the options in those fields um it's very flexible and
00:03:22.560 uh the other thing to note is that there's that links section there that says this one is required for another
00:03:28.480 ticket um we use this really heavily to associate tickets together so there's both bidirectional links uh like relates
00:03:35.040 to where just things are related things like this where it's uh one directional
00:03:40.080 so this is required for another one if you look at the other ticket it would say depends on this ticket and we use this a lot for what we call road map
00:03:45.720 tickets so we have a highle road map ticket that we schedule as far as bigger features and then all of the other
00:03:51.560 tickets are related to that one so you can look at that road map ticket and see what all the tickets that are associated with it are and how uh how how many of
00:03:58.519 them are completed or whatever so when you're starting a ticket the first thing you do is just assign it to
00:04:03.879 yourself nothing nothing else needs to happen you don't need to change the state or anything like that just assign it to yourself so then now that we're
00:04:10.599 having the ticket assigned to ourselves and we're ready to start working on the ticket we use git of course and we
00:04:16.400 always check out the branch name and the branch name is the same as the ticket ID so this allows us to uh see what's being
00:04:23.000 worked on very easily so when commits come in on a branch or when you're looking in CI and you see a list of the
00:04:28.320 branches that are in there you you can always see which tickets are being being worked on similarly when you commit code we
00:04:35.039 always tag the commit with the with the ticket ID so this allows you to when you're looking through get history or if you do get blame something like that you
00:04:41.759 can see both the uh commit that was related to that and see the sort of what
00:04:47.400 you fixed and also go back and look at the ticket and see why it was a problem in the first place a lot of times it's
00:04:52.479 not very clear from the commit message so another thing that we do at engineer almost exclusively is we pair
00:05:00.000 um we pair we try and aim for 100% of the time we usually uh get about 80%
00:05:06.520 coverage of pairing um maybe a little bit more uh we use this so-called tet
00:05:12.320 AET pairing setup that Josh suser sort of came up with and blogged about it pivotal this is Josh in the front there
00:05:18.400 this is back to you um it's really good we've been using it pretty frequently uh pretty much all of our parent setups are
00:05:23.720 like this now so the two displays are mirrored and so both people are looking at the same thing but you're you're sort
00:05:28.759 of square in front of computer it's easier to uh avoid distractions and you
00:05:34.039 can also look the person that you're pairing with in the face you can have more conversations about it and still be focused on the computer everybody sort
00:05:39.840 of has their own setup in front of them and it's also you're not as a modal where you don't have to like look at the computer and then turn and have a
00:05:45.280 discussion it's more more fluid it works really well we we liked it a lot so um we don't do very much uh we
00:05:51.759 don't really do any uh formal code review because we do almost all pairing there's a handful of things that we kind
00:05:58.000 of do code reviews on if that somebody's not pairing for a little while but uh we
00:06:03.440 kind of do that ad hoc we don't actually have a system set up for
00:06:08.759 that so the other thing we do obviously everyone here should probably know this we always test drive our code uh
00:06:16.759 probably don't need to sell anyone on this but the point here I'm trying to make is that uh we run just the focus
00:06:22.560 specs locally so when we're developing we'll write new specs run those specs make sure they're red and then write the
00:06:27.960 code watch them go green test driving stuff but we don't ever run the full
00:06:33.199 Suite locally we always depend on CI for that the full Suite takes too long to run locally for it to be efficient to do
00:06:39.000 so typically we're just running Cycles around the one file or one set of specs that we're running on locally push the
00:06:44.520 whole thing to CI so we push to CI for the full run we push to the branch this is our CI system
00:06:51.919 uh something we wrote in house it's called Ensemble and this is sort of an example of how how this might work kind
00:06:58.319 of ignore that first one where something went wrong but that second one I pushed code it was uh I fixed a few specs
00:07:04.639 before I pushed but I knew that there would be more I pushed up to CI let CI run I found all the the the failures
00:07:10.199 that had happened in CI and I fixed them and then I pushed again and then it was green and I was ready to merge so um all
00:07:16.840 of the all branches are automatically built happens via GitHub post receive as a sort of typical for a CI system they
00:07:22.720 all get brought in there automatically we have a bunch of different views where you can look at just a single Branch or a single application or all the all the
00:07:28.120 builds total um the other interesting thing here is that we shred our test test um setup in CI so there's all those
00:07:36.919 little dots each one of those is a Shard we Shard by file each each set of files goes to one Shard and uh there's
00:07:43.319 currently I think 28 units and this makes our our test Suite take between about 12 and 20 minutes depending on how
00:07:49.680 many things are currently running so it's not it's not too bad right now if we didn't have it charted I don't
00:07:55.000 actually know how long it would take but it would take a long time so uh we have a pretty good test Suite though so it
00:08:00.280 covers a lot of things and I would much rather have a slightly slower test Suite that was actually testing stuff than than fast test that didn't do
00:08:07.039 anything um we also have a rebalancer so we try and keep each unit to be the same length so it runs all the all the spec
00:08:12.759 files individually calculates the length of them and then then spreads them out across the files or across the shards uh
00:08:19.080 the last thing to note here is that you'll notice that the red lines or the red builds have strike through and the
00:08:25.360 little dots are X's when they're red and circles when they're green uh this is because my boss Tamar is color blind and
00:08:31.680 you can't tell the difference between the red and the green which is not something you might think
00:08:36.760 about and so when you're writing the CI system if you're ever writing a UI for it you should probably keep in mind that
00:08:42.320 red and green are really poor choices to use for color blind people so now he can see when we when we
00:08:48.680 break the code so this UI is is almost entirely read only there's a few sort of settings
00:08:54.920 that you can change on the left there but all of the uh CI system and all all the all the actions you might take none
00:09:01.000 of that happens here and we have this whole other user interface that we use for our CI system that's all via
00:09:06.399 campfire so this is our uh Bots channel it's not you're not supposed to read it all but this is basically we have a
00:09:11.920 robots channel that all the notifications come into and then we can talk to the the bot in to have it do things for us so all all real
00:09:19.800 interaction of uh Ensemble is through campfire we get notifications in there um and then we can take some actions
00:09:26.120 like I said so here the black text it's kind of at the top it's uh Larry one of my co-workers rebuilding one of our uh
00:09:33.800 apps Master Branch because it had gone red for some serous reason or another so he was able to rebuild that Branch
00:09:39.200 rebuilding is interesting because it only rebuilds the red units so if something goes wrong just in one of the units or something you don't have to rebuild all 28 which is really handy
00:09:45.640 sometimes doesn't happen very often but it does happen can also deploy from this which I'll talk about more later so we
00:09:52.360 have this bot um and it's a pretty good framework it's called iiot not very creative um so the the core piece of the
00:10:00.800 bot itself is actually really simple it's just this uh event driven Chat thing that basically talks to campfire
00:10:06.760 and IRC and and proxies the messages back and forth and all of the logic that happens that for the um bot to do all
00:10:14.600 happens via web hooks so when a message comes into ebot it uh sends it out to all the web hooks on the back end and
00:10:20.920 they can take actions based on that so things like rebuild and then uh if you want to talk back to it there's a API
00:10:27.760 that you can hit on the bot that will send the message to Ensemble as well or to campfire as well so Ensemble it
00:10:34.959 doesn't really talk to the bot it has this bot component in it that that talks back and forth with the the campfire and
00:10:40.959 IRC stuff so it works pretty well um we have a bunch of these endpoints this is
00:10:46.120 kind of an example one we have this Library so this is this is an entire bot
00:10:51.839 endpoint basically if you wrote this code ran it somewhere and then pointed the config to point to it as one of the
00:10:58.200 back ends you could you would basically be able to say UI V ping and it would respond with col world so it's just it's
00:11:03.600 just a rack app you run run it as a rack app every time it gets a new message it it uh creates this message object which
00:11:09.079 has multiple methods on it body and uh from and what channel it was in things
00:11:14.760 like that and then you can also have respond with the message. say command so we do this fairly frequently this is
00:11:20.800 sort of our normal setup where we case on the message body for different actions each app usually handles multiple multiple commands sometimes
00:11:27.639 it's only one but most most of the time it's multiple and then we take actions in each of the apps to do that it works
00:11:32.720 really well for us so some of the we have a bunch of endpoints like I said and many of those
00:11:38.720 endpoints handle multiple things so we here's sort of a handful of the endpoints that we use this is uh the
00:11:45.399 notifications that come from Ensemble this is a a code push and then the build notification that it started building
00:11:51.279 and then the build notification that was green these are all come directly from Ensemble this is an alert that happened
00:11:57.040 for one of our customers something something went strange cross some threshold for one of our customers um
00:12:03.200 this happens in the support Channel film Prague is Tyler he's one of our support guys and I was telling him to go look at
00:12:08.800 uh the admin interface for the alerts to see what was actually wrong this is one that we have for utra
00:12:15.000 so every time a new triage ticket comes in we have a bucket called triage that all tickets from the company come into
00:12:20.880 so that way we can kind of look at them as engineers and and put them in the right place so every time anyone comes in we have this go into our fireman
00:12:26.720 channel so people can look at it and know where to put it we also have some more fun things you
00:12:32.560 might have seen some of these in if you've seen a GitHub presentation before about thebot so we have image me of course they have image me uh you
00:12:40.399 basically tell it to look for something it'll do a Google image search and return you a random image and we also
00:12:45.959 have Instagram integration so if anybody sort of in the company posts on Instagram it'll go directly into campfire which is actually really fun
00:12:52.199 for sort of community building team building stuff so after that aside about OT we were
00:12:59.120 talking about continuous integration so I kind of talked a little bit about RCI system uh abstractly but I didn't really
00:13:04.639 talk about how we actually build the build the code so a CI system is kind of it's really
00:13:10.680 simple there's not very much to it to me personally there's a lot of projects out there and we wrote Our Own so a c system
00:13:16.480 is really just a job Runner all it's doing is basically executing code storing the output and returning the result of that command that's all it
00:13:22.560 does right all the other interesting stuff is kind of the integration around that and what you do with that that information once you have it so it's not
00:13:29.120 really very hard it's not even even if you write it from scratch we wrote from scratch the the actual piece that builds
00:13:34.839 the code and and Returns the results is the really simple part of that it's not very interesting you can't really do
00:13:40.959 anything with it unless unless you write the code around it to do all of the Integrations with it but that those
00:13:46.680 Integrations are interesting so things like the stuff that we have with Ensemble where we get where we get those build units and and keep track of it all
00:13:53.759 show the red and green status all that stuff is interesting most C systems provide that but there's also other interesting things so this is one
00:13:59.759 example so this is trying to deploy to production our main code base and it
00:14:05.040 says we can't because the the branch is not green so this could be common you could do this in your in your deploy
00:14:10.959 system to prevent this but this is all integrated right so it's all one system and it just won't let you that there's
00:14:16.920 there's only one system it's it basically looks at the latest Branch when it tries to tag it and says oh this is not green because it knows it's
00:14:22.120 already in there um our CI system is called Mason or the CI system the job run
00:14:27.920 really it's not the C system this CI system is sort of the whole thing but uh it's just built on top of rescue it's
00:14:33.120 basically one endpoint one rack endpoint you post to it uh Ensemble gives it a call back URL Mason builds the job uh
00:14:41.360 runs the command gets the output Returns the status hits that callback URL and Ensemble then knows about it it it's
00:14:47.759 really really simple um Ensemble keeps track of everything Mason doesn't know anything about whether what kind of job
00:14:53.399 this is whether it's a building unit or not uh yeah Ensemble keep track of all that stuff Mason just runs the
00:15:02.160 jobs so now that our branch is green we push it to CI now what do we do the
00:15:09.800 obvious answer is we merge it to master right so merge to master um if if this m
00:15:16.120 merge ends up being a fast forward merge like if you've rebased um then the uh if
00:15:24.680 the shaw is the same basically Ensemble knows to not rebuild that it knows that it's the same the same code and so it
00:15:30.519 won't rebuild it and let Mark it as green already second you mark the ticket has
00:15:35.720 merged this is the important piece for the integration steps uh automation will
00:15:41.199 use this later in the process so it's important to Market as merged for the release n and things like
00:15:46.240 that uh finally when we push to master we have uh what we call Edge it's our uh
00:15:52.279 it's always running whatever is green on top of Master so we basically do continuous deployment to our staging server or Edge server um it's really
00:15:59.040 handy for testing so if you're pushing something out that maybe you want to give one last pair of eyes on on in
00:16:04.920 before pushing to actually to production just something that yeah you you know it works but you want to make sure that in
00:16:10.880 production will look right you can let it deploy to Edge check it and then before you deploy the
00:16:16.040 production so finally now that we have it merged the obvious answer is we ship it right we always get code out fast
00:16:23.319 nothing sits on Master we always deploy code as soon as we get as we merge it we don't ever wait for someone else to deploy there's no like deployed day or
00:16:29.560 anything like that we want we want to this is sort of the whole point of this talk is that we want to make it as frictionless as possible so that way at
00:16:35.800 all times as soon as you merge it whenever you merge it you deploy it
00:16:41.560 immediately so it's not quite continuous deployment we still do have uh that
00:16:48.680 manual step involved it doesn't automatically deploy when it's green because it allows you to do the things I was saying with Edge you can you can
00:16:54.399 give it one final check before it goes to production which is actually useful to know to be able to to test the code
00:16:59.480 the exact branch that you're going to deploy before you deploy it so like I said Master is always
00:17:05.880 Deployable we always deploy immediately after merge so what did this lead to we had 44 deploys so far in September as of
00:17:12.520 yesterday when I took these stats thought it was a funny number but uh this is across all of our apps it's
00:17:18.559 pretty good I'd like it to be better it ends up being about 20 per weekday so far um and that's across all
00:17:25.919 environments so we have between 2 and production deploys per day we tend more
00:17:31.039 to be on the low end of that scale and I'd really like to be more on the high end of that scale in general but uh and
00:17:36.360 that's kind of what we're what we're working towards with these lower friction deploys definitely want it to
00:17:41.400 be better so how do we actually do this deploy well there's only one command you
00:17:48.320 just run again through iBot talks to the bot um it's just one command it does the
00:17:53.679 entire process everything all the automation so the steps that that it takes are this first shes Masters green
00:18:01.960 we talked about this earlier you won't you can't deploy code that's red some environments are forcible where you can
00:18:07.200 Force Red code to go out this is useful for things like a staging Cloud where you want to uh check to see in on in on
00:18:15.720 a real uh deploy what is actually wrong with the code maybe something like that might happen we don't do it very often
00:18:21.039 but it does happen um but production is definitely not forcible you can't force a red build out to
00:18:27.039 production and we tag the release he's going to get tag and uh bump the version number things like that we always
00:18:32.760 version our our deploys uh we create a version field in
00:18:37.960 utra and then assign all of the tickets that were merged into that um field this is why it's important to Mark tickets
00:18:43.280 merged earli in the process if you don't Market merge it won't end up in the bucket then we uh push that tag that we
00:18:49.880 created a second ago into the deploy Branch so uh this always allows us to deploy from the deploy Branch every time
00:18:55.640 we don't have to deploy from a different shot every time so we're always just deployment from the deploy Branch uh you use a branch or than the tags changing
00:19:02.039 tags is bad you shouldn't do that and get um you won't your peers won't get the new tag when they pull
00:19:09.159 automatically and then we basically do continuous deployment from that deploy Branch so whenever the whenever we push
00:19:16.640 the deploy Branch it'll automatically kick off a production deploy so we have this continuous deployment sort of
00:19:22.480 infrastructure built into Ensemble and so that's what we just use to uh to to
00:19:27.720 get this to going we used to do this manually and we we discovered that it
00:19:33.360 didn't ever really come up like you could test on edge before you before you ship the code so there wasn't really a
00:19:38.400 need to to to have that pause and wait and then ship it there so we just do it
00:19:43.919 automatically again removing friction so then all those tickets that you put into
00:19:50.480 the version bucket in the beginning of this process you mark them as resolved now so you mark them as as deployed at
00:19:56.720 the end so that way basically we used to have a race condition here where we would Mark tickets uh deployed at the
00:20:03.720 beginning of the process and it was people would ping us and say hey this ticket got marked deployed is it out yet
00:20:09.520 it doesn't seem fixed and it's like well that deploy hasn't actually finished yet so now we don't mark them as deployed until they're actually deployed and on
00:20:15.679 the other hand if we did all it did it all at the end then if somebody pushed code and then merged or uh Market tiet
00:20:21.400 it merged between the beginning of the deploy and the end of the deploy things would get merched as deployed even
00:20:26.440 though they hadn't actually been in that release so this kind of solved that that uh that race condition
00:20:32.720 there so finally we send uh notifications and there's a bunch of
00:20:38.400 notifications that we do for this uh for every deploy first we send to air Brak
00:20:44.760 so they we can see all the exceptions since the most recent deploy we send to new relics so we can
00:20:50.840 get those nice deployment lines in the our uh stats we can see if they change anything and finally we sent out an
00:20:57.600 email with with release notes this is kind of a big deal for us this was is really useful inside the company so every time we deploy we send an email
00:21:04.039 out like this uh we link to the ticket view for all of the track tickets that got merged or got deployed we link to
00:21:10.760 the GitHub compare view between so we can see what code changed and then we put all the tickets directly in the email as well so this is the one from uh
00:21:18.080 yesterday I don't if you heard it uh we took JB out of beta now fully available
00:21:23.240 so this was the this was the deploy that did it this is the email that went out to the whole company and uh lot of
00:21:28.279 people in the company really like seeing these emails they they people that are non-technical in sales and and support
00:21:33.559 and things like that look at these emails to uh sort of see what's changing in the product so this whole process
00:21:40.200 that I just described it happens via Mason because it's just a job Runner right so there's no difference in in
00:21:46.039 this process as opposed to a CI process we still run the job capture the output and return the status that's it so we we
00:21:53.799 have this generic job Runner tool that we can we can use for these different different pieces which works really well
00:21:59.840 for us I guess you probably could use a CI system for that and kind of pretend
00:22:05.000 it's a build or something like that I'm not really sure how that would work but it doesn't it seems it seems same to
00:22:10.279 me so the step I kind of skipped in this in that previous section there uh I
00:22:16.960 didn't actually talk about the actual deploy process I kind of skipped that step so it's kind of because it's not
00:22:22.919 very interesting we have a fairly standard deploy process we use Capistrano for now we might change that
00:22:29.480 soon hopefully but uh we use a current releas is and shared directory we don't do a GitHub style moving around the get
00:22:37.279 tree uh we just use the regular current Sim link uh we use unicorn which allows
00:22:43.720 us to do zero downtime deploys this is another big deal for removing friction in the deploy process you don't want you
00:22:49.279 don't want an engineer to have to say oh this is going to cause a few seconds of downtime do I want to do it now when it's a high traffic timer going I want
00:22:55.520 wait so now we always just it's it's always zero down deploys and we always can deploy basically uh whenever that's
00:23:03.880 the whole point removing the friction so I don't know if you know how the downtown deoy in unicorn work but
00:23:10.679 basically you put the new code in place you move the current Sim link you trigger the deploy in unicorn all
00:23:16.720 without ever putting up a maintenance page uh and then the uh deploy in unicorn spins up new workers that start
00:23:22.960 serving request out of the new directory it lets the current workers finish handling the request of the old directory and those die so then
00:23:29.799 everything moves over all requests are served nothing yeah all it all just works works really well
00:23:36.559 actually we use bundler for uh our deploys for our gem management General I
00:23:42.600 think this is probably an obvious thing for most people here but you should definitely use bundler it has basically solved the gem management headaches the
00:23:49.080 gem gem hell that we used to get into with versions on production uh we deploy
00:23:54.760 or we defer to bundler for sort of the deployment best practices there's this D Das deployment flag in bundler that does
00:24:00.919 a couple of switches that are kind of the best practices for deployment how it handles the lock file and things like
00:24:06.480 that so yeah that's really good uh and then finally even if we have migrations
00:24:12.840 we still don't take down time this isn't really a uh Magic technique or it's not
00:24:19.880 really magic we don't do anything special it's just the technique that we use um again because we don't want to
00:24:25.799 prevent people from Shipping codee that uh we don't want to prevent people from shipping code just because there's
00:24:31.520 a migration in it we don't want them to say oh this feature needs a migration so therefore I'll have to wait until the weekend and to to do this because we
00:24:37.000 need to take 10 minutes of down time or whatever so this technique for example
00:24:42.399 if you want to add a column it's pretty simple really you add a migration to add the column and then you ship that code
00:24:48.760 that's it you don't add any code that that deals with that column yet so that this code this deploy can go out with a
00:24:54.960 downtime right because there's no code that depends on that migration being run run you deploy the code migration runs
00:25:00.960 it gets the column added no downtime nothing needed right just goes then
00:25:06.799 later you finish writing all the code depends on the new column you deploy again this time there's no migration
00:25:12.960 column's already there and you just deploy the code again no no need for downtime the the column's already there
00:25:18.679 so this means that we actually run migrations last so we deplay all the code get all bundle our stuff set up
00:25:24.440 restart the app servers then run migrations so this allows us to uh speed
00:25:30.159 up the time until the new code is live on the server earlier in the process so
00:25:35.520 the migrations run at the very end after everything's already done it's a little bit counterintuitive but it works really
00:25:41.640 well for us um we like it a lot so a few more
00:25:48.399 examples this kind of gets a little bit hairy at times things like removing a column are
00:25:54.120 pretty simple it's kind of the same thing but opposite you deploy the code change that removes the the need for
00:26:00.039 that column and then you deploy the uh the migration second right so by the
00:26:06.679 time the mation runs all the code that that needs that column is gone already something like renaming a column
00:26:13.640 gets much more complicated so this is something where maybe you would add the new column in One release then ship code
00:26:20.600 that uses the new column but also syncs everything so it maybe reads from the old column and writes the new column and
00:26:26.799 then you can migrate all the over while that code is out there so now all the all the data is in the new column and
00:26:31.960 then you ship code that uh reads and writes from the new column so you're not using the old column anymore and then
00:26:37.520 you can drop the old column and every single one of those steps is doesn't need any take any downtime and this is
00:26:42.760 where uh taking down time might not be the worst thing it's kind of a trade-off
00:26:47.919 at this point do you want do you want to take on that additional complexity to prevent yourself from having to worry
00:26:53.360 about when you deploy or do you want to just maybe write the code normally this might be not too bad but we've
00:26:58.600 definitely had situations where it was pretty complex to do it with zero
00:27:03.679 downtime and we opted to just take you know a few minutes of down time on the weekend have somebody do a deploy on a Saturday or something like that and we
00:27:09.880 also have done deploys where if we want to add an index to a really big table or something like that that clearly needs a
00:27:15.000 downtime that's not it's not really possible to have a table Lo for five
00:27:20.279 minutes and have it keep working there's no there's no way to do that without done time so uh this is a uh been a
00:27:27.360 really good technique for us uh we use it pretty much all the time anytime we
00:27:32.399 run a migration I I think the last time we had to take downtime was one of those adding index things and that was a few
00:27:38.760 months ago I don't think I can't think of any recently that we've had to take downtime for migration so it works
00:27:45.399 really well so again this is all about just reducing friction to ship code you don't want to by using this technique
00:27:52.440 you don't have to worry about when you deploy anymore and that's kind of the whole point so all of this is a work in
00:27:58.840 progress sort of everything I've just talked about um it started it all started as a side project we didn't ever
00:28:05.519 decide that we were going to build this thing and uh there's still a lot more that we want to do uh we started with
00:28:12.720 Integrity as a CI system and it wasn't working well for us and we basically decided we wanted we wanted more of the
00:28:19.679 style of integration and we couldn't get it with integrity and we had a discussion about whether we should patch Integrity to make it work or write
00:28:26.200 something ourselves and it was uh it was a huge win to write it ourselves because
00:28:31.440 now the code and the project is basically exactly tied to our particular
00:28:37.399 development practices and it all works exactly how we want it to without having to hack into the system I'm I'm sure that it's easier that it has been easier
00:28:43.960 to work on The Ensemble system than it would have been to try and get Integrity to to do what we wanted it
00:28:51.039 to um we want to be able to get to faster deploys or deploys are still relatively slow partially we do all this
00:28:57.960 migration stuff at the end um we want to get some more integration we want yeah more pieces to basically happen on
00:29:04.000 deploys we want more uh interactions to happen via automation so right now we
00:29:10.000 like I showed we had we have to manually merge code and Mark all the tickets has merged there's no reason why we couldn't
00:29:15.240 have a merge command in UI bot or in Ensemble that basically took that Branch merg it to master and then marked all
00:29:20.960 the tickets merge that are associated with that and again like I said at the beginning of the talk this is all this
00:29:26.720 is all the workflow that we use at engine yard and my suggestion isn't that you use this workflow although you you
00:29:32.200 are welcome to it's a I like it a lot but I'm not I'm not saying that this is a one siiz fitall workflow it's more about defining the processes that you
00:29:38.960 use and how you tie them together and automate them so that way you can reduce
00:29:44.039 friction in your in your deployment uh infrastructure so that's pretty much all
00:29:49.360 I have thanks um again I'm ad delom on Twitter um I work for engine yard that's
00:29:55.760 our website if you haven't looked at us before and we are hiring so if you want to work on stuff like this you should come and talk to me or anybody else here
00:30:03.159 and if are there any questions feel free qu not different
Explore all talks recorded at RubyConf 2011
+55