Last winter John Allspaw joined Etsy to run tech ops. John has written several important books on web ops and is one of the experts in operating large scale web services. One of the first things John did when he arrived at Etsy is work with the dev and ops teams to put in place a continuous deployment system.
Continuous deployment is the idea that you push out changes to your code base all the time instead of doing large builds and pushing out big chunks of code. Here's an Eric Ries post on continuous deployment if you want to get a longer description of what it is and how it works.
At Etsy, they push out code about 25 times per day. It has worked out very well for Etsy and has led to faster cycles, improved morale, and a more stable and reliable web service. We were talking about all of this during the board meeting yesterday and Chad Dickerson, Etsy's CTO, invited me to push out some code after the board meeting.
I sat at Chad's computer, we pulled up the queue of code that had been reviewed and approved and was ready to push, I selected the code I was going to push out, I hit the "deploy" button, and then we watched as the code was pushed out into the production system. Then we watched all the key metrics to make sure I hadn't broken the service with my push. I pushed some small changes to the checkout system. We watched to see that checkouts continued to get processed. We looked at all the charts and graphs. Everything seemed fine. And we were done.
After we finished the deploy, I asked John (who took this picture) what would have happened if I did break the site. I asked how to roll back the changes. He said "we don't roll back, we fix the code." I asked what was the longest time it took to "fix the code" after a push had broken something. Kellan (who is on the right in the picture) told me that the longest fix for something that had taken a critical service down was about 4 minutes although they had seen longer periods of diminution of service.
I really like this model. Big changes create big problems. Little changes create little problems. I realize that this model doesn't work in desktop software and mobile apps. But cloud based web services can be operated this way and it really jives with whole culture of building and managing web apps.
The next time I am at Etsy, they are going to let me make a small change to the code base and then push that out. As Pete Miron said on Twitter yesterday, "so simple, a VC can do it."
Etsy is clearly one of the most efficient companies I’ve had to work with!
Continuous deployment is a very cool idea and something that we are very excited about. But we have to do it a little differently because we are operating a more critical service. For most web services, a small mistake doesn’t have a significant effect. but if you are operating a service that is critical, like storage, or banking, or something like that where people are depending on you never making a mistake, it is more risky.So we are doing continuous deployment to our test environment. From there we still have to aggressively test changes before they go into deployment. But it allows us to quickly integrate multiple developers work very quickly and get many of the benefits of pushing live immediately. Eventually I think we will move to continuous deployment directly to the live site for UI related changes, but leaving backend deployment changes to the test server where they are rigorously tested before pushing to the live environment.
You should check out what the guys are Wealthfront (previously kaChing) are doing with their continuous deployment:http://www.vimeo.com/12849143These guys do operate a service like banking: they handle real money, and are financially liable if they get a trade wrong. But they’re practicing continuous deployment. They don’t try to use a manual testing step to force quality; rigorous automated testing and other quality-focused practices happen earlier.
We have zillions of automated tests. Thats part of the definition of continuous deployment. The problem with automated tests is that you have to think of them upfront. So if something goes wrong that you did not consider then your automated tests wont catch them. What that means is that automated tests are great at catching a regression, but not great at catching something you never considered. The point is that you cant take experiential testing/usage out of the equation. Part of the idea of continuous deployment (including what fred described) is that you put it out there and then see what happens and fix if needed. My point is that how much internal testing you need to do before you push things out to the public is dependent on how critical what you are doing is. Also, as I said, some parts are safer to continuously deploy than others. For example I would not be hot patching my datastore on the fly, while doing that with UI is much safer. For this reason you dont see many people automatically updating their MySQL databases in mission critical applications without rigorous testing first.
Automated tests are part of what it takes to do continuous deployment, but only part. The more important part is continuous improvement. Any issue found in manual testing or in production isn’t just a sign of a bug; it’s a sign that an upstream process created a bug. That’s what needs fixing, and it’s why IMVU treats the Five Whys as such an ijmportant part of their effort.I agree that SQL servers are a problem with continuous deployment; they’re also a big barrier to effective automated tests. That’s unsurprising; they’re millions of lines of code implementing a product from the 1970s. I have hope that over the next 5 years or so we’ll see people reinvent storage, either via the NoSQL movement or elsewhere, to fit better with modern needs.
I didnt claim that automated tests are all of what it takes to do continuous deployment. I just said “thats part of the definition”Regarding databases, I dont think the issue is that they are old. I think the problem is that they are mission critical. There are some things, like databases, that you cant afford to be wrong, whether its nosql or sql or whatever. A big part of our current work is a new database platform, so I have spent a fair amount of time thinking about this issue. Your database cant give you a wrong answer, regardless of the technology style or age.Regarding IMVU, they are great and smart. But they are a game company. I dont think you could apply their technique at MySQL, or even with Linux, again because the code is so critical. You need to ease the code into deployment.The point here is that CI is *great* but every great concept doesn’t apply to every situation, and you need to understand the benefits and disadvantages and risks of every methodology and operate accordingly.
Alas, it seems like we’re talking past one another, which was not what I was hoping for. Good luck with your efforts!
I work at Wealthfront and can provide some extra context. Yes, we do a ton of up-front automated tests before code gets pushed, but Hank is right that it doesn’t obviously catch what you don’t test for. To handle these situations you have a very rigorous data migration processes. For example, we have a very good methodology for pushing simultaneously backwards- and forwards-compatible changes to our databases and code. At the same time we have hundreds, if not thousands, of automated checks running against the production databases and systems to ensure that the systems, machine, and business metrics have not deviated over time.We also practice pushing canary services, where if we have a cluster of 3 instances of a service we’ll push the new code to just one instance, let it burn in for a period of time (minutes, hours, depending), and then if all is good according to our “immune system” then we’ll roll the remaining instances. This practice is *highly* recommended as you catch errors early and gain confidence in your newly released code while having the safety of a known “good state”.
Awesome, love it. We’re trying to do the same thing on both our websites, mobiles sites and apps. One thing I love about the Android Market is the auto-update…we push new features all the time on our app and just keep tweaking and fixing – especially as it is not realistic to try and test for *every* Android device out there -Great post
i check auto update on every android app i have. i sort of wish it wasstandard (from a developer perspective)but i understand that many users don’t want that
It’s great to see the fun you’re having in that picture, Fred. :)Pushing out small chunks of code is so much easier on the end-user as well as easier for the team to pinpoint problems. Doing 20 or 30 deployments a day is becoming more the norm now and it’s really the smart way to work in my opinion. Big commits bring bigger headaches and the possible need for rolling back to a previous version to allow time to fix it.I like to see the obvious rapport you have with the guys. We know it’s there, but it’s fun to see the picture showing the sincerity. Good post.
I was like a kid in a candy store
It shows, Fred. And that’s what I really like about the post.
this is an awesome post, and an awesome place for the etsy engineering team to be. but the part of about not rolling back goes against every fiber of my being. counting on being able to fix the code in real time is a luxury, and I think any one click deployment system is incomplete without a one click rollback system, which I’m sure the etsy team probably has, but it’s just not highlighted in your post.for any kind of critical service, having an unbound time it takes to rollback the code is a bad bad bad thing, which is why I think the rollback thing is probably more convention than it is a technical limitation for etsy.
I am certain they can roll back if they have to
Rollback is a fantasy- it *almost never* occurs. Roll-forwards (i.e. fix it directly in prod, then move the change back into source control) are what happen when a deployment breaks production.This is almost always due to the fact that database schema changes, once made to prod, are impossible* to undue.*anything is possible, given sufficient time + resources. Production fixes have precious little avail time, tho
john, this really depends. we do a practice rollback in QA as part of every single release. you’re right that it’s not always possible, especially if schema changes aren’t additive, but for many cases it’s the best strategy for getting a time critical system back online quickly.in my experience you are right that rolling back rarely happens, but that doesn’t mean you should discount it. not that anyone here is doing that, I don’t mean to be making a straw man 🙂
From Kellan, our VP of Engineering:http://twitter.com/#!/kella…”We don’t roll back, we roll forward.”We have a number of methods that allow us to get back to a known good state, and we do a lot of scenario planning and rehearsing for major changes. We also launch features “dark” and push releases to small percentages of users so that the code is in production for some time before it becomes visible. This all limits the risk.It’s important to note that we do schema changes very carefully and in batch once per week, so we are not continuously deploying schema changes. If anyone knows how to do that successfully and at scale, I’d love to hear about [email protected] below: Using PHP makes this easier.
No doubt, each env has its own special case of roll-back that may occur under some set of circumstances.I’ve been inside lots of of shops (world’s largest e-commerce sites etc.) where downtime is measured in million(s) of dollars lost revenue per hour, so time really is the most important element in the equation.It’s assumed that the team *can* fix things – the high-wire act comes into play when dollars are pouring out of the conversion bucket & falling onto the floor. This almost always means a live fix in prod.
Chad,Would love to hear what you are using under the covers as both infra and dev language, both of which impact how agile and continuous you can be. E.g. I have a much easier time seeing you do this using, say, a JS on Node or Rails on Thin/Mongrel in heavy parallel deployment environment than a Java-heavy monolithic ear/war environment. It is doable there, and has advantages, but harder.I actually helped a client interview a director of tech ops a few weeks back, I asked him his thoughts on the differences between the two models of multi-application deployment (ear/war on a single Java server vs. multiple node/thin processes, virtualizing at the package level vs the process/POSIX level), get some insight into his thought processes. Entertaining. :-)So, please share.Avi
In one shop I visited, in a year, one outage meant that the CIO didn’t get his bonus. Two outages and his daughter didn’t get tuition for college next year. Three outages and the CIO got a new job. So, they were very careful about making changes.
of course, usually this is just a matter of flipping symlinks. that’s why I was pretty sure it’s more convention than any limitation of their system. really enjoyed this post, and the picture is great. did you get a “golf clap” after you hit the button?
Would be great if NOK and MSFT did this to unify desktop to phone.
Hey Fred,Etsy is clearly an example of a ‘tightly running team’; I’ve seen even CI fail at large organizations that have a substantial number of legacy-practices engineers. Retooling an organization as it gets past this point of no return is tricky (i.e. super expensive but not impossible).I discuss my main reccos for companies in the post below – this applies to the smallest startups to the very largest. The single most important piece of advice I give teams is this: forget how you did it before. As a team, you’re at a distinct advantage if you’re truly greenfield & have nothing to ‘forget’.http://commavee.com/2010/10…
Great post, Fred. Tangent==I believe this post informs why HTML5 mobile apps will win. What platforms allow dev teams to iterate and try new ideas in real time? App stores? Or already deployed cloud based services that can run locally and be tweaked and tracked in real time. Innovative ideas are sprung from the CONSTANT recombination/reworking of pieces, HTML5 (or derivatives that accomplish same idea) allows this to happen.
It was awesome having you deploy, Fred. We’re hiring. ;-)The one-button deploy tool you used was Deployinator, which was built by Erik Kastner (http://twitter.com/kastner) and described here on our engineering blog: http://codeascraft.etsy.com….Mike Brittain (http://twitter.com/mikebrit…, who runs our core team that focuses on the guts of Etsy, also wrote about how we monitor and track releases: http://codeascraft.etsy.com….We’ve invested considerably in the tooling and culture to make this possible.
i wish i had known all of that when i wrote the post. i would have loved to called out Erik and Mike in the post.thanks for stopping by and adding to the discussion Chad
Hey Chad,Can you talk a little bit more about the culture there and how you invested in it (as a leader) to get your team feeling like they can rise above the typical entropic “just make it work”.Any blog psots you might have already posted would be appreciated. I’m going through a period of change in the current organization I work with and there’s a great opportunity to make some changes in the culture.
seems like a cool office !
That’s what I was thinking. This would be my idea of an office, one big, open space.
Captains of industry used to christen ships with champagne, dig foundations with gold shovels, and cut ribbons at factories. Today they click “Deploy”.
But there isn’t Champagne crashing against the side of a computer, alas
next time make some craft work!
Great that you are shedding the light on the modus operandi of an agile startup. What you have described is an agile development environment, probably an extreme form of it. That’s exactly how we operate as well. We push weekly actually. Agile development goes hand in hand with agile companies.The speed of finding bugs is directly proportional to the quality and quantity of inherent self-testing that you have. In an agile environment, you write the test before the code. And that helps you to pinpoint the bug in minutes, if not seconds. I have watched our CTO find a bug in less than a minute once he saw the error test.Agile environments accelerate the rate of innovation. Every time you push a new release or partial release, you’re in essence innovating and pushing your company forward at the same time. Only thing is that it creates a marketing challenge. You have to keep communicating the innovation and new stuff constantly. But that’s what product and company blogs are for!
Does Eqentia have any internal company blogs?
Not internal, but external yes.
My question is how extensive is their testing. If their biggest bug only took 4 minutes to fix, that’s pretty amazing. They must have a great QA team.
I’m sure their QA is excellent but the speed of fixing bugs like that comes more from the limited batch size which in turn limits the potential causes of the bug.
That was their biggest *showstopper* bug. As noted, they have had other bugs that caused degraded performance and ongoing issues for much longer than 4 minutes. Anyone does, of course. Etsy is complex and they’re dealing with migrating from a legacy platform.
Kaizen (Japanese) or Continuous improvement is the culture of accruing small aligned improvements yielding large gains over a period of time. It is a core component of the lean manufacturing philosophy introduced by the japanese auto manufacturers and being more and more successfully adopted by industries all over. It is a proven success model.Continuous deployment seems to be a software variant of Kaizan. If so, this model will definitely help identify small mistakes before they snowball into major issues.
That’s a very interesting way to look at it. Especially given that companies like IMVU use kaizen on the continuous deployment processes.
Continuous deployment does work for mobile *web* apps :)Yet another reason they’ll eventually dominate.
Ha… Still hacking after all these years…
Hey FredYou and your readers might be interested to know I wrote a book on the techniques behind continuous deployment last year: Continuous Delivery, published by Addison Wesley http://www.amazon.com/dp/03…I’m currently working on a set of videos, which will include Eric Ries and John Allspaw, which will further explain this paradigm. More on my site: http://continuousdelivery.com/
PS John Allspaw got an early version of the book and his comment is on the Amazon page: “Humble and Farley illustrates what makes fast-growing web applications successful. Continuous deployment and delivery has gone from controversial to commonplace and this book covers it excellently. It’s truly the intersection of development and operations on many levels, and these guys nailed it.”
Nice, Jez. I just downloaded my Kindle copy. Continuous Delivery of Continuous Delivery! Nice.I spent a decade managing this type of stuff on Wall Street IT, always more to learn.
thanks. i will take a look.
it “Lean Manufacturing” of code – if you release 25 times a day, you will win…period
Fascinating post. What I find interesting is that even with small changes to the system bugs are still being introduced. Sure they may get fix quickly but it suggests that the developers still don’t completely understand their own system.
They have:”If you currently have a bunch of shell scripts that move everything in place, wrap those up with a single shell script. “I’m big on scripts, and one reason is that can drive those with “a single shell script”. And there are other good reasons to be big on scripts.Generally would have to believe that for a lot of software such rapid, easy changes would be from not possible down to too much work instead of a single, well tested, big change. Indeed, long a ‘glass house’ standard was to take a new release of anything and run on test systems for over a month before considering going on-line. But for a Web server farm, typically there has to be a lot of very loose coupling with just message passing, remote procedure call, etc. I concede: With my server farm architecture, there is so much loose coupling that rapid changes would often be doable. Still, generally, I’d prefer ‘larger granularity’ in changes.And if moving to a new release of, say, Windows Server or SQL Server, I wouldn’t even hope that rapid changes would work and will test offlne.That they are doing real time monitoring is good. It would also be good to have that real time data fed to some automated anomaly detection with some solid, known, “powerful’ statistical properties. Hint: Monitoring is essentially forced to be statistical hypothesis testing. Just how I’m going to do such monitoring I haven’t worked out yet, but Microsoft has some tools for software ‘instrumentation’.Yes, I’m using Microsoft as my ‘platform’, and there are pros and cons there.I’m more concerned about the whole site being able to have individual servers, network switches, routers, etc. go on/off line with essentially nothing noticed by the users. My architecture has lots of ‘parallelism’ so that whenever a server A needs something, there should always be several other servers they can get it from so that the main question is just how will server A discover which other servers are now on-line or took the request and went off-line before completing the request. There is likely a fast, easy solution. However, there are some curious dangers here: Once among some parallel servers, one was sick, was throwing all its work into the ‘bit bucket’, thus, was not very busy, and, thus, from the ‘least busy’ load leveling was receiving nearly all the work! Another danger is, once a site gets quite busy, some of the ‘normal’ latency times might grow leading to a lot of ‘time-outs’ and, thus, lots of repeated work and a site that then is even more busy. So, near its maximum capacity, the site can go unstable! So, have to be a bit careful about various load leveling and parallelism schemes!An advantage in my architecture, and maybe common in Web server farms, is that nearly all of my servers are essentially ‘memoryless’ or ‘stateless’ — that is, they just do ‘units of work’, are unchanged by doing such work, and have several ‘siblings’ that can also do such work.
Speaking of Etsy, I’m not a regular reader of its tribute site Regretsy, but my girlfriend is, and this post she brought to my attention a few days ago is pretty funny. It includes one of the better Downfall parodies I’ve seen.
Hadn’t heard of Regretsy. Hilarious.
the fact that regretsy exists is a testament to what etsy has achieved
A site that highlights poor quality content and mocks Etsy sellers is a ‘testament’? It’s not really a positive testament to anything, to me. The fact that I regularly encounter people who have equal name recognition for Regretsy and Etsy is a testament to the fact that Etsy needs to work on their marketing, like how about starting three years ago or so…
My girlfriend is both an Etsy customer as well as an avid Regretsy reader.
Perhaps this is true, but if it is, it’s only a testament to how thoroughly Etsy has failed artists and artisans.What I mostly see in your post, Mr Wilson, is the complete disconnect at Etsy between what it says it is and how it acts. It claims to be a community that connects those who make handmade with those who buy it, but very basic seller tools are still missing from the site. Search is still broken. Alchemy, upon which many Etsy sellers depended for their livings, has been vanished.Etsy was better for sellers before the VC money changed Etsy’s focus.And Etsy has now totally betrayed its users by eviscerating the true community, the forums, and unrolling mass mutings of anyone who objects. I encourage anyone curious about what the people who truly helped build Etsy, the sellers, are saying, to check out the Unofficial Etsy Forums.
VC money did not change Etsy’s focuswe’ve been an investor in Etsy since 2006
“I sat at Chad’s computer, we pulled up the queue of code that had been reviewed and approved and was ready to push, I selected the code I was going to push out, I hit the “deploy” button, and then we watched as the code was pushed out into the production system …. We looked at all the charts and graphs. Everything seemed fine. And we were done”So exciting. I feel geeky. I would a much more inappropriate action remark if this wasn’t AVC…Myself being at the same stage feels so far away currently. As a friend recently said to me, keep shooting for the stars – – – and I shall
Just fantastic knowledge coming out in the comments. I would love to see a guest post from someone in your portfolio talking about scaling.
Continuous Deployment does work in Desktop Software, and IMVU and Google Chrome are proving it, among others. I wrote about why it could work and who was doing it back in early 2009: http://timothyfitz.wordpres…
“We don’t roll back, we fix the code” sounds confident but might not be the right attitude as the company scales
Just getting to this now, but there’s a slight misquote in Fred’s post. :)I (and I think Kellan said it at the same time) said “…we don’t roll *back*, we roll *forward*…”We do that because it’s simply faster and easier to roll forward than to roll back the entire deploy. Rolling forward means taking advantage of the percentage rampups of new code paths, feature and config flags to turn things off/on, and even reverting the 5 line change is simpler than rolling it all back. 🙂
GOOD CODE BREAK BECAUSE IT HAVE MISTAKE. BAD CODE BREAK BECAUSE IT WRONG.MISTAKE EASY FIX. BUT BAD CODE HARD FIX, BECAUSE IT NEVER GOING TO WORK IN FIRST PLACE.BEST WAY FOR FIX BAD CODE AM FIRE BAD DEVS, HIRE GOOD ONES. THEN YOU NOT NEED ROLLBACK, JUST LIKE ETSY.
Thank you for the clarification
thanks for the clarification John.letting me push code or talk about code is always dangerous 😉
And in reading between the lines you can assume that all is well in the world of Etsy (ie. the board meeting went so well that people stuck around after to play with the systems).This in itself is no small feat considering the leadership changes Etsy has flipped through over the years.So awesome on many fronts!
the leadership has come full circle
We’re working on this for Garious as well. I think it is definitely a great way to increase the reliability of your code-base and increase the speed of your iterations. I can’t wait to get there!
Very cool, Fred. Saw your tweet about this post before I read the post. For a moment there I did a double-take – has Fred switched his field of work or what? :)BTW, Paul Graham wrote an article about using similar techniques (particularly related to fast and incremental deployment) while he was running Viaweb, and credited that as part of the reason for its success.Two relevant links to that:”Beating the Averages”:http://paulgraham.com/avg.htmlThe “Beating the Averages” article has a link at the end of it, “More Technical Details” :http://lib.store.yahoo.net/…The best part I like from that is (emphasis mine):[ When one of the customer support people came to me with a reportof a bug in the editor, I would load the code into the Lispinterpreter and log into the user’s account. If I was able toreproduce the bug I’d get an actual break loop, telling me exactlywhat was going wrong. Often I could fix the code and release afix right away. And ***when I say right away, I mean while the userwas still on the phone***.Such fast turnaround on bug fixes put us into an impossibly temptingposition. If we could catch and fix a bug while the user was stillon the phone, it was very tempting for us to give the user theimpression that they were imagining it. And so we sometimes (totheir delight) had the customer support people tell the user tojust try logging in again and see if they still had the problem.And of course when the user logged back in they’d get the newlyreleased version of the software with the bug fixed, and everythingwould work fine. I realize this was a bit sneaky of us, but itwas also a lot of fun. ]
that is fantastic. imagine an entire tech support team that could fix bugs in real time
Who is the jerk sending a text, all the while ignoring your big moment?
he’s not a jerk. he’s the COO of the company. someone has to keep the trains running and that is his job
Was joking, should have put a smiley. Although I’m glad to see he is on top of his job.
A few have wondered why wouldn’t anyone use Continuous Deployment, and I took a crack at answering that here: http://claylo.com/post/3253…Fred, thanks for setting the bar higher. I will now expect anyone I take on as an investor in my new company to be able to push code!
RE: RSS feed request to Before It’s NewsHi, I’m the Business editor at Before It’s News. Our site is a People Powered news platform with over 2.5 million visits a month and growing fast.We would be honored if we could republish your blog RSS feed in our Business category. Our readers need to read what AVC has to say.Syndicating to Before It’s News is a terrific way spread the word and grow your audience. Many other organizations are using Before It’s News to do just that. We can have your feed up and running in 24 hours. I just need you to reply with your permission to do so. Please include the full name and email of the person who will be attached to the account, and let me know the name you want on the account (most people have their name or their blog name). Please also send a JPG of a picture or logo you would like to have on your profile.You can also have any text and/or links you wish appended to the end or prepended to the beginning of each of your posts on Before It’s News. Just email me the text and links that you want at the beginning and/or ending of each post. If you know html you can send me that. If not, just send me the text and a link to your site. It should be around 200 characters or less (not including links).You can, if you like, create a custom feed for Before It’s News that includes multiple links back to your blog or web site. We only require that RSS feeds include full stories, not partial stories. We don’t censor or edit work.Thank you,Chris HolehouseEditor, Before It’s Newswww.beforeitsnews.com
On the same subject, this is how we do it at outbrain: http://prettyprint.me/2011/…
Boy, developers are really getting expensive if it’s cheaper to have your investor do it.
investors work for their companies for free!
This is a cool picture, nice job getting the Etsy luminaries all in one place! Who’s the guy next to Chad?
danny rimer, founding partner of Index Ventures
The last line is the best, again. 🙂
I saw something similar at Tumblr: four non-technical people hunched around an unnecessarily-large (and not color accurate) screen where the founder hemmed and hawed over how many pixels of border were appropriate for a button. I’m not quite sure how long they kept me waiting, but a tiny blonde woman did walk past me to grab the bathroom key — twice.Then, over a period of months, the features necessary to surface this bit of window dressing were gradually removed from the site.Funny, people don’t write books about this technique but it does seem far more common than this magic world where people push finance code live to the web, look at a couple graphs, then split for lunch at Shake Shack.Etsy is awesome, though.
Great reference to a company that is practicing continuous deployment. Always a few new tricks to learn. I’d love to see how the code is architected and operated in a pro shop.
Nice post, Fred. Great to peek behind the scenes to read about what your portfolio companies are doing right. I’d love to see posts like this on a more regular basis. Probably pretty cool for the portfolio companies, too.
It is the coolest thing I’ve seen in years in the space – a game changer in terms of technology and operations management. I got a tour of the office a couple months back and I saw these big screens showing KPIs. On the graphs there were a series of vertical blacks lines every couple of hours – I asked what they were and I was told “they are code pushes – we mark when they take place and then watch the metrics to make sure all is ok”. So impressive – the way every shop should run.
Just curious, how many developers do you have that you can push 25 updates per day? Unless, of course, some of them are bug fixes and some changes are so minimal that they don’t require many developers.
Acceration is the way to go.
It’s really cool to see well-established companies stay nimble. Small startups like mine, Talentopoly.com – shameless plug, don’t have the resources for large deployments (copious amounts of unit tests, QA dept, lengthy code review process) and it’s nice to know that doesn’t have to be the goal.
I’ve been teaching myself to code in Ruby on Rails for the past few months, and I develop the code locally and then “push” it live to the production server pretty much whenever I have a feature built.Once or twice I’ve broken things by pushing new code but then I just fix it quickly. I know a few startups locally that work on Rails and tend to push slightly larger batches of features, and take them from development, to stage, to production — anyways, my method has been fun, and it’s good to see it in use “professionally.”
In complex and dynamic systems, all the pieices work together. That means that the smallest system can “break” everything, leading to ‘wobbly” overall behavior. This is chaos theory. I wonder if there exists a parallel Ety system acting as the initial test site?
I love/hate the continuous deployment model for one other reason – speed of deployment often leads to breaking your SEO. The challenge is that SEO breakages don’t always become apparent until well after the release – sometimes weeks. In a rapid deployment situation, it’s not so easy to tie the SEO issue back to a specific release so you can diagnose the problem. So the SEO consultant gets a call.If you are going to go the continuous deployment route, make sure you have a solid SEO quality process for every release to minimize the risk.
This is a fascinating way of deploying new code which I haven’t run across before. They seem to be operating on the edge of chaos and reaping the benefits, without much, if any downside. Not being able to roll back makes me a bit nervous – although not being able to go back has distinct benefits – sort of like Cortez burning his ships so his men could only go forward. Definitely something to learn here.
My company is interested in getting to a similar model, but we have a few impediments. The biggest challenge for us is that we have a SaaS product (DrupalGardens.com) which spins up Drupal websites for our customers. Drupal is a fairly complex CMS, and with all the bells and whistles we add, it is capable of pretty massive customization.Continous deployment is a bit scary because we’re not just updating one web product when we push, we’re updating thousands of live databases which could have been customized in countless ways. So staging that process is time consuming, and the potential for “fix it if it breaks” is lower. If you cause a site to be messed up or even worse, cause data loss, the odds that you will be able to detect that are quite low. If you’re lucky, a very happy customer will report it quickly and be patient, if you’re not, several leave.We have automated tests, continuous integration and we smoke test several copies of production sites before each push, but it stills seems necessary to do a bunch of manual QA on each release. Plus, the process of running an update can take several days.So we do scrum and release every three weeks, with a 2 day dedicated freeze on the new code at the end for heavy QA. It isn’t ideal, but hopefully we can find a path to quicker releases and continuous releases in the future.
This is a great way to do cloud based development!
There’s a very good chance that Deployinator will be open sourced soon (maybe before OSCon?).We also just released a piece of our monitoring infrastructure: StatsD (http://github.com/etsy/statsd). A blog post about it will be going up in a few days.
hi Eriksorry i didn’t call you out in the blog postexcellent work on the deployinatorit was fun driving it on friday