By Ben Grinnell
When Netflix moved to the cloud , it released a “ chaos monkey ” to wreak havoc on its system . The chaos monkey is a script meant to mimic the behavior of “ a wild monkey with a weapon in your data center ( or cloud region ) to randomly shoot down instances and chew through cables .” 1 The idea was to cause intentional failures in a controlled environment where Netflix ’ s experts could comfortably monitor and repair its infrastructure .
During lulls in operations , the Netflix team either increased the activity of the chaos monkey or decreased the tolerances on the many metrics it recorded on its apps and infrastructure . This proactive approach to identifying problems was not new . Toyota ’ s plant managers have been known to tighten tolerances when employee reports of assembly line problems drop in order to detect ever smaller weaknesses . At Netflix , these self-inflicted disasters enabled the company to identify weak links and build up resiliency so it could continue serving customers when real failures occurred .
Modern IT solutions must be able to continue functioning when their own wild animals break into the system . By working closely with other developers and service owners , developers build a deep understanding of how their applications and infrastructure perform through a culture of measurement , continuous improvement and learning . This is the only way to survive the inevitable disasters .
Innovation and constant improvement are essential to the long-term success of enterprise IT . Over the last couple of decades , however , many leaders decided that IT was not a core capability they wanted to worry about . They outsourced it , optimizing for cost and reliability . That decision built up an entire industry and led many cost-focused organizations to delay modernizing their requirements . IT fell behind , so when those initial contracts with business process outsourcing ( BPO ) providers were relet , the IT Refresh program was born .
In this program , the new tenders invited the industry to bid to refresh and then run their IT . Most prioritized cost . They saw the price of IT hardware fall , and they assumed that IT expenses overall would drop with it . Tenders frequently expected their suppliers to maintain rates at existing levels and assumed that the supplier would recoup IT Refresh costs over the life of the new contract .
In the short term , this model appealed to many CIOs because it was an easy sell throughout the organization . Over the first 18 months of the contract , everyone got new technology and modernized services , and the cost of IT held steady . Unfortunately , this model didn ’ t account for the increasing complexity and advancement of IT , users ’ growing expectations of basic services , and the rising IT Refresh costs that accompanied it . This created an awkward stop-and-start cycle . Contracts got longer and longer , until their duration outlasted the average tenure of CIOs .
Keeping the lights on
Perception of change
Companies with multiyear contracts — and the BPO providers who served them — could do little more than keep the lights on . Many BPO providers fell behind everywhere else . Keeping up with the basics was out of reach because even the volume demands of a basic corporate email system change significantly every year . Any efforts to meet new business needs stalled because organizations underestimated the rate at which software changes . Fixed price contracts with tight service legal agreements ( SLAs ) drove proper repairs and overhauls past the contract end date . The BPO industry became an expert in keeping systems crawling , touching nothing until it failed and designing the cheapest fixes possible .
The result was astoundingly diminished IT capabilities for some of the world ’ s largest organizations . “ For companies who are now coming off five-year IT outsourcing contracts , it ’ s likely they ’ ve been frozen in time , during one of the most disruptive times in technology ,” remarked cloud architecture strategist Adrian Cockcroft . Any years-old IT contract is inadequate today . Because many contracts were too amorphous , commercial teams trying to reduce expenditures were occupied in a game of corporate whack-a-mole , reducing costs in one area just to see them pop up elsewhere .
Change to keep the lights on
Change to keep pace with the basics
Change to meet the business need
20 % 30 % 30 % 20 %
80 %
20 %
