Feb 19, 2008

My Head Is In The Cloud This Morning

Last week I posted the following to twitter:

We’ve assembled 8 of the top developers we know and are talking about the future of the lamp stack

That was not entirely accurate. We put together an impromptu lunch last week and invited some of the top developers from our portfolio companies located in NYC to attend. We know a lot of great developers who didn’t attend the lunch. I wish they could have. Because it was fantastic.

We discussed the lamp stack (and derivatives) that almost every single one of them run their web app on. We discussed the scaling issues that they have all faced. And we discussed what we should be looking for in the future to make those scaling issues easier.

The first post (that I know of) that has resulted from that lunch is Alex Iskold’s cloud computing post on Read Write Web. For those of you who don’t know Alex, he is the founder and CEO of our portfolio company Adaptive Blue, he’s a great developer, and he also writes awesome blog posts at Read Write Web.

I dont’ want to summarize Alex’ post here because I think it’s an excellent discussion of scaling issues that the lamp stack creates and how cloud computing can overcome some of them for some companies.

Alex is a huge fan of relying on others to deal with the scaling issues so his company can focus on the application itself. Last year at our portfolio company offsite, when we were having a similar discussion, he said:

think about it, if you can’t trust Amazon to be up, who can you trust?

Of course, the day after our lunch last week, S3 went down for what seemed like the entire morning east coast time. Alex acknowledges that risk in his post and quotes my partner Albert Wenger who once told Alex:

We live in a stochastic world, but people fail to grasp it because all they experience is right now.

Alex argues that it’s time for startups to give up the scaling issues to the big guys and let them do what they do best. He says:

Every time we have an outage, like the one that happened on Friday,
people sit back and think: How can I possibly
rely on these guys? I bet I can just code this up myself and it will be
fine! For decades the software industry has been
suffering from the ‘I can do this better’ disease. We keep re-inventing
programming languages, we keep on re-writing the APIs,
and we keep thinking that we’re smarter than the guys who came before
us. 99.9% of the time we are wrong. The truth is that we cannot do it
better than Amazon. They spent a massive amount of money,
talent and most importantly time, trying to solve this problem. To
think that this can be replicated by a startup in a matter
of months, assembled, be cost effective, and work properly is just
absurd. Large-scale computing is an enormously complex problem, that
takes
even the best and brightest engineers years to get right.

I think Alex is directionally correct. We are going to see more and more companies build and host their web apps on someone else’s infrastructure. It’s not going to happen overnight because I’ve never met a more control oriented group than software engineers. But it will happen and in the long run we’ll all be better off because of it.

#VC & Technology

Comments (Archived):

John Feb 19, 2008

While I’d agree that Alex is right. The counter argument is that when it does break no one else is as motivated to fix it as you are, so the question “How can I possibly rely on these guys?” should really be “How can I make sure these guys are reliable” and you should then ask “What if these guys let me down”.
1. WayneMulligan Feb 19, 2008
  
  Hey John,I think the way to look at this is: “What’s the difference between Amazon going down and RackSapce going down?”. They were both down for a few hours over the last 4 months, nobody seemed to go, “OMG, how can I rely on RackSpace, I’m ditching this solution for Cloud Computing!”, but yet, when Amazon went down last week I can’t begin to tell you how many people msg’d me going, “See, Amazon’s solution sucks”.Alex makes a very good point – what makes you, me or any other entrepreneur think they can write a better solution than Amazon or Joyent?I’m a big believer in specialization of labor – perform the tasks that you have a comparative advantage at and outsource the rest (assuming it’s a commoditized process – and I truly feel that LAMP stack hosting is).Just my two pennies.-Wayne
  1. John Feb 19, 2008
    
    Wayne,Yes he makes a good point – which is why I agreed.However when “outsourcing the rest” it’s not just a question of if their technology is good (it probably is otherwise you wouldn’t be outsourcing to them), you also need to factor in the additional risks you are now exposed to; for example, if your supplier has a problem are you important enough to them that they’ll care about it, what if your supplier goes out of business, what if your supplier decides to stop offering the service, what if your supplier is taken over by a competitor of yours. What if your supplier suddenly increases their price.None may be likely, but you’d be a fool not to consider the risks.Regards, John
Dan Cornish Feb 19, 2008

I agree except that many new services depend on technologies that Amazon or others do not provide. How do you get around that problem.
WayneMulligan Feb 19, 2008

FYI: Anyone interested in EC2/S3 but wrestling with scaling, backup and redundancy issues, take a look at RightScale. I haven’t used them yet but I’m currently evaluating them along with several other cloud computing solutions.
nickdavis Feb 19, 2008

So true. Our time is much better spent on dealing with Black Swan events rather than preventing Black Swan events. By their very nature, they are going to happen. And they will happen in some area we weren’t thinking about.So write your apps so that they fail gracefully, because they will fail. And take a deep breath. And maybe be thankful for the amount of time that S3 stays up!
scott Feb 19, 2008

I agree. Large scale computing is evolving in complexity where most application developers will need to rely on systems like Amazon’s compute or storage services in order to be competitive. The up-front costs related to developing these systems is staggering for any application developer, not to mention the what it takes to maintain them.These systems are part of an evolution in infrastructure — I believe we will come to think of them as fundamental utilities in the support of webapps, much like bandwidth or electricity is now.The best and brightest engineers will want a services that is open. The best engineers will always be driven for the best and are willing to re engineer when they expose faults and weakness. Working in an early ISP our biggest challenges were related to how CLOSED the telcos were when it came to improving their infrastructure to support the demand. We presented solutions as to how they could fix fundamental flaws in their infrastructure. It took years for us to convince local utilities to upgrade lines to support adsl and dsl services. I think the recent outage shows how open Amazon will be about their system and its faults, which may attract or detract usage depending on how you look at it. I look at it as a blip in an otherwise successful run. I think the majority of early adopters expect blips.I biggest open question I hear related to using Amazon S3 or like systems are related to security. There are a whole host of industries that would benefit from the integration and use of these systems.
Tal Keinan Feb 19, 2008

I think that Alex said it correctly. Startups face enough challenges to try and solve this massive problem as well. John: your argument is flawed, because I do think that Amazon is not less motived to solve such problems. I can only assume what type of pressure the guys at S3 at to deal with, and the negative PR it generated.
1. brooksjordan Feb 19, 2008
  
  Couldn’t agree more. Amazon is super motivated to solve problems that come up with their services. So, when it’s so early in the game of outsourcing infrastructure, they should get that chance.They know that they’ll only get a few opportunities to prove that they’re worthy of the trust people place in them to keep their businesses running.Salesforce.com faces the same challenge, has gone down several times, and each time has done something to improve the service like http://trust.salesforce.com/.
Jeffrey McManus Feb 19, 2008

Well, to be fair, most developers are control-oriented because their business managers, many of whom have never written a line of code in their lives, kick their butts whenever anything goes wrong.
1. Cam MacRae Feb 20, 2008
  
  Really? I reckon most developers regard their business managers with utter contempt.Perhaps closer to the bone is that developers are control oriented for the same delusional reason that lends itself to “not invented here” development: no one on the entire planet could possibly have solved this as well as I will.
awilensky Feb 19, 2008

You can’t scale thousands of small, temporary, transient joins even in a cluster of MySql boxes, or, for that matter, Oracle, or SQL Server farms. They all write saturate.You need distributed file by user ID cluster, and RDBMS references to the clusters, but not the data.The latest Object databases from http://www.db4o.com/default…Are an area that application developers better get familiar with. Danga.com also has some good thoughts.We have inherited a generation of LAMP stack journeymen that are very bright, but have been lulled into a state of complacency regarding the utility of the Relational Database, to the point where scaling has become a cause celebre (Twitter).
BillSeitz Feb 19, 2008

a related-but-different issue is whether this leads to a near mono-culture in infrastructure, with worrisome implications for macro net resilience
StartupAddict Feb 20, 2008

I couldn’t agree more. I did a post defending Amazon from the barrage of people stating “I’m dumping AWS”. Amazon continues to be far superior and lend tremendous economies of scale over conventional hosting for the little startup that could. I think it is naive to believe AWS is a failure due to a 2 hour outage. Amazon has proven it will continue to throw resources in this area and it will certainly have cheaper, faster and simpler services 12 months from now superior to the next infrastructure startup with the “I can do better” attitude. Enjoy and employ open source and run it big pipes like AWS for a fraction of the cost of conventional. I think an SLA offered to startups and developers is in Amazon’s future and will solidify AWS prowess. I’m not saying AWS is the end all be all for web services, but a brand of that magnitude needs to be considered in any startup business plan.
chrisherbert Feb 20, 2008

@scott is right on with this sentence: “These systems are part of an evolution in infrastructure — I believe we will come to think of them as fundamental utilities in the support of webapps, much like bandwidth or electricity is now.”I’m trying to get through Nick Carr’s new book, The Big Switch, and his main thesis is around exactly what Alex and Fred are suggesting: in the future, IT infrastructure will become a commodified service that you purchase just like we do now with power or water. We can argue about SLs and how responsive they are to customers (nobody ever calls their local water or power authority unless they have a problem and you know how painful it is to reach a live person) but we’re going in this commodified direction without a doubt.Why is this shift a foregone conclusion? It’s simply more cost efficient.Like other market forces or economic trends like globalization, you can’t fight the economics of IT infrastructure becoming cheaper because of economies of scale. Uptime and security issues are valid concerns so I’m not disregarding them outright (and it won’t be simple to solve all of them) but these concerns will be ironed out. As Fred notes, in the long-run, it will make everyone better because we specialize in our competitive advantage (our actual products or services) and leave the non-essential parts of the value chain to those who have economies of scale in providing IT infrastructure (like an Amazon).
adam Feb 20, 2008

While I personally railed against this sort of thing in the past (outsourcing vital services), I have come to understand I was incorrect. The beauty of treating vital services as a utility is that it frees up your business to work on the things that distinguish it. Everyone knows how to set up a webserver, everyone can set up DNS, etc, but not everyone (ideally) can set up your business.As a tech guy myself, it seemed like a lesser engineer would outsource vital services. However, it’s actually a lesser engineer who insists on managing the mundane themself.That said, hosting providers go down, as does the entire internet, every so often. You can have the worlds coolest, most advanced setup, and yet when the power in your facility dies (or your upstream carrier dies) you are offline with the rest of us.
Wille Feb 21, 2008

Getting around the scaling issues of a LAMP stack is fundamentally hard, unless throwing more hardware is the solution for everything (a very expensive one I should think).Without going into too much technical detail, Apache, and most other web servers have a very bad design flaw, in that they pretty much spawn off one thread per incoming request, and threads are a precious commodity on any hardware. An alternate to this is to have fewer threads, but partition up the work required to fulfill a request-response among those threads, thus increasing concurrency (by a factor of 40-50, the concept is called “Staged Event Driven Architecture”, hardly a new thing, but surprisingly absent from the mainstream of computing, considering you could do with one server what tens of servers do today, on the same hardware).It’s not too common yet, but there are companies that are addressing the issue, such as “Zeus”.Just thought I’d throw this into the mix when it comes to scalability discussions – before this area is properly addressed and commoditized most other types of optimizations for scalability will be like shooting mosquitoes and swallowing camels.(I am in no way affiliated with Zeus, nor have I ever used their products, I have only heard of them when researching implementations for a software project I was part of previously).
jeff Feb 21, 2008

uh oh, my jargon radar just perked up. watch out for stochastic, folks. although a long shot, it could join orthogonol, long-pole, and social graph as the buzzwords of 2008!
1. fredwilson Feb 21, 2008
  
  If it does it started in the usv offices!
David Stone Feb 21, 2008

Absolutely, where you can decentralize control, and allow one of the big players to worry about it, do.I’m still surprised when startups waste engineers time fixing their email servers. Move it to Google Applications! Same with servers and EC2. Same with storage and S3. I’d rather have 50 odd top ops folk at Google/Amazon worry about my 99.9% uptime then the few on-site engineers…. I’m sure there is a few specific cases when you’d need to DIY, 80/20 rule.
Daniel Weinreb Feb 21, 2008

Re: “If we can’t trust Amazon to stay up, who can we trust?” The entire US phone system. Just about every airline reservation system. The systems that monitor electrical power distribution. And, if you are limiting the field to large popular web sites, there’s always Google…
Fritz Schenk Dec 19, 2008

This is a very well thought out article on “cloud” computing, and the comments below (above) are awesome. How Microsoft may play in this arena with the Azure offering is yet unknown; they do have the money. Same case for Google and or AOL