This is the third in a series of four postings on some new frameworks and methodologies around Service Management. How do they work together, and what really matters?
There is an exciting new IT role coming out of Google. The need for service and site reliability in such a global player is obvious. This has led to a specific role focused on expanding the operations space. The Site Reliability Engineer (SRE) is a skilled software developer driven to improving existing services. Building support systems and tools well beyond the batch job realm of traditional operations.
Staff at Google have released a tome (500+ pages) on Site Reliability Engineering. A new role in IT fast growing in importance. The concept originated in Google, but is applicable to enterprises of all sizes. How does this new role fit in the service management world? How does it relate to ITIL practices?
The book itself is a heavy read. There are chapters that are quite technical and specific to the Google environment. However, there is a lot of material which makes use of service management processes to ensure greater reliability.
12
A new player in the game
Service Management
and Site Reliability Engineering
by Gary Percival