The Bad Script Trip
I got some bad script the other day and it took me down for almost 36 hours.
But it was an interesting experience that opened my eyes to the the risks of taking code from various places and running it in the same place (my blog).
But there is a downside to this massive experimentation with script of various kinds.
As Scott said in his comment about the page load problems:
It’s certainly to do with all of the "bling".
First, I did an HTTP trace on your main page. It requires well over 300 HTTP requests to load the whole thing. That, in itself, does not "cause" slowness, but it’s just the first point I wanted to make. But the fact is that those 300 HTTP request took over 1 minute to complete on a 45 Mbps connection. That’s pretty slow for a single web page to load.
The second point is that most of those HTTP calls are distributed over numerous service providers and, by extension, "hosts". This means that each hostname must be looked up by the DNS client on the requesting computer (the user’s PC). DNS can be flaky and has been for a few sites recently. But even when operating optimally, each request takes a little bit of time. So each little bling item you add requires numerous extra queries by the client computer.
When even one of these name lookups fails or has a delay, it generally causes the whole page to load only partially or, in some extreme cases, not at all.
Each additional widget or bling introduces more chance of this happening — just like adding more disks to a system means one is more likely to have a failed disk.
By my count, my computer had to look up 37 different hosts just to load the first page. Once they have all been successfully looked up, the DNS client (and local DNS server — long story) on the user’s computer (or network) generally caches the results and subsequent requests can go to the web servers without having to do the lookup first. That explains why you don’t see any noticable issues — you’ve already looked up all the hosts.
I’ve noticed some inefficiencies in the way publishers, ad services, partners and the like have set up their services, but there very well may be good infrastructure or service reasons they have done it, so I’ll reserve comment on that and just assume they know what they are doing.
So while the web has made it easy for consumers to mashup web services directly on their own pages, there is a looming problem with the architecture of all of this. One bad piece of code can take down the whole shebang, like what happened to me last week.
And what if some bad code on the page impacts someone else’s code? My page stopped loading at the part of the left sidebar where the coComment widget started. And so many people guessed that it was coComment that was causing the problem. And taking down the coComment widget fixed the problem, but they could have just gotten hit with a conflict from some other script on the page.
The coComment guys said in an email to me:
At one point last week, the finger pointing got pretty funny. It was like watching the phone company and the phone equipment company cast blame at each other. Here are some of the comments I got from various places on thursday and friday:
one of our developers traced the error back to a call to
To me, the slow part of the page seems to be:
Email this • Add to del.icio.us • View CC license • Subscribe to this
feed • Digg this • Take my survey • Advertise In This Feed • Reddit It
As my friend Brad Feld pointed out in an email, nearly all of my portfolio companies got blamed at some point in the debugging process!
The bottom line is I have a ton of other people’s script running on my page generating something like 153 errors on the page. That’s a bad script trip for sure. But I am not going to stop experimenting. It’s too much fun.