Murder by the Masses

So after all that load testing, what happened on the day of the launch? How could the site crash so badly and so fast? Our first thought was that marketing was just way off on their demand estimates. Perhaps the customers had built up anticipation for the new site. That theory died quickly when we found out that customers had never been told the launch date. Maybe there was some misconfiguration or mismatch between production and the test environment?

The session counts led us almost straight to the problem. It was the number of sessions that killed the site. Sessions are the Achilles’ heel of every application server. Each session consumes resources, mainly RAM. With session replication enabled (it was), each session gets serialized and transmitted to a session backup server after each page request. That meant the sessions were consuming RAM, CPU, and network bandwidth. Where could all the sessions have come from?

Eventually, we realized that noise was our biggest problem. All of our load testing was done with scripts that mimicked real users with real browsers. They went from one page to another linked page. The scripts all used cookies to track sessions. They were polite to the system. In fact, the real world can be rude, crude, and vile.

Things happen in production—bad things that you can’t always predict. One of the difficulties we faced came from search engines. Search engines drove something like 40 percent of visits to the site. Unfortunately, on the day of the switch, they drove customers to old-style URLs. The web servers were configured to send all requests for html to the application servers (because of the application servers’ ability to track and report on sessions). That meant that each customer coming from a search engine was guaranteed to create a session on the app servers, just to serve up a 404 page.

The search engines noticed a change on the site, so they started refetching all the cached pages they had. That made a lot of sessions just for 404 traffic. (That’s just one reason not to abandon your old URL structure, of course. Another good reason is that people put links in reviews, blogs, and social media. It really sucks when those all break at once.) We lost a lot of SEO juice that day.

Another huge issue we found was with search engines spidering the new pages. We found one search engine that was creating up to ten sessions per second. That arose from an application-security team mandate to avoid session cookies and exclusively use query parameters for session IDs. (Refer back to Broken Authentication and Session Management, for a reminder about why that was a bad decision.)

Then there were the scrapers and shopbots. We found nearly a dozen high-volume page scrapers. Many of these misbehaving bots were industry-specific search engines for competitive analysis. Some of them were very clever about hiding their origins. One in particular sent page requests from a variety of small subnets to disguise the fact that they were all originating at the same source. In fact, even consecutive requests from the same IP address would use different user-agent strings to mask their true origin. You can forget about robots.txt. First of all, we didn’t have one. Second, the shopbots’ cloaking efforts meant they would never respect it even if we did.

The American Registry for Internet Numbers (ARIN) can still identify the source IP addresses as belonging to the same entity, though.[82] These commercial scrapers actually sell a subscription service. A retailer wanting to keep track of a competitor’s prices can subscribe to a report from one of these outfits. It delivers a weekly or daily report of the competitor’s items and prices. That’s one reason why some sites won’t show you a sale price until you put the item in your cart. Of course, none of these scrapers properly handled cookies, so each of them was creating additional sessions.

Finally, there were the sources that we just called “random weird stuff.” (We didn’t really use the word “stuff.”) For example, one computer on a Navy base would show up as a regular browsing session, and then about fifteen minutes after the last legitimate page request, we’d see the last URL get requested again and again. More sessions. We never did figure out why that happened. We just blocked it. Better to lose that one customer than all the others.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.149.27.202