The war on Heisenbugs

In modern physics, Heisenberg's uncertainty principle loosely says that there is an absolute theoretical limit to how well a particle's position and velocity can be known. Regardless of how good a laboratory's measuring equipment gets, funny things will always happen when you try to pin things down too far.

Heisenbugs, loosely speaking, are subtle, slippery bugs that can be very hard to pin down. They only manifest under very specific conditions and may even fail to manifest when one attempts to investigate them (note that this definition is slightly different from the jargon file's narrower and more specific definition at http://www.catb.org/jargon/html/H/heisenbug.html, which specifies that attempting to measure a heisenbug may suppress its manifestation). This motive—of declaring war on heisenbugs—stems from Facebook's own woes and experiences in working at scale and seeing heisenbugs keep popping up. One thing that Pete Hunt mentioned, in not a flattering light at all, was a point where Facebook's advertisement system was only understood by two engineers well enough who were comfortable with modifying it. This is an example of something to avoid.

By contrast, looking at Pete Hunt's remark that he would "rather be predictable than right" is a statement that if a defectively designed lamp can catch fire and burn, his much, much rather have it catch fire and burn immediately, the same way, every single time, than at just the wrong point of the moon phase have something burn. In the first case, the lamp will fail testing while the manufacturer is testing, the problem will be noticed and addressed, and lamps will not be shipped out to the public until the defect has been property addressed. The opposite Heisenbug case is one where the lamp will spark and catch fire under just the wrong conditions, which means that a defect will not be caught until the laps have shipped and started burning customers' homes down. "Predictable" means "fail the same way, every time, if it's going to fail at all." "Right means "passes testing successfully, but we don't know whether they're safe to use [probably they aren't]." Now, he ultimately does, in fact, care about being right, but the choices that Facebook has made surrounding React stem from a realization that being predictable is a means to being right. It's not acceptable for a manufacturer to ship something that will always spark and catch fire when a consumer plugs it in. However, being predictable moves the problems to the front and the center, rather than being the occasional result of subtle, hard-to-pin-down interactions that will have unacceptable consequences in some rare circumstances. The choices in Flux and ReactJS are designed to make failures obvious and bring them to the surface, rather than them being manifested only in the nooks and crannies of a software labyrinth.

Facebook's war on the shared mutable state is illustrated in the experience that they had regarding a chat bug. The chat bug became an overarching concern for its users. One crucial moment of enlightenment for Facebook came when they announced a completely unrelated feature, and the first comment on this feature was a request to fix the chat; it got 898 likes. Also, they commented that this was one of the more polite requests. The problem was that the indicator for unread messages could have a phantom positive message count when there were no messages available. Things came to a point where people seemed not to care about what improvements or new features Facebook was adding, but just wanted them to fix the phantom message count. And they kept investigating and kept addressing edge cases, but the phantom message count kept on recurring.

The solution, besides ReactJS, was found in the flux pattern, or architecture, which is discussed in the next section. After a situation where not too many people felt comfortable making changes, all of a sudden, many more people felt comfortable making changes. These things simplified matters enough that new developers tended not to really need the ramp-up time and treatment that had previously been given. Furthermore, when there was a bug, the more experienced developers could guess with reasonable accuracy what part of the system was the culprit, and the newer developers, after working on a bug, tended to feel confident and have a general sense of how the system worked.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.222.19.111