Read Aloud the Text Content
This audio was created by Woord's Text to Speech service by content creators from all around the world.
Text Content or SSML code:
Reliability trends for DevOps We witness economies recovering from the pandemic as we enter into the promising digital era with emerging technology integration into multiple new areas. We are in a state of uncertainty, with a shortening lifespan of products and services and a volatile & Chaotic world ahead. Amid huge global onboarding to the digital ecosystem, we were forced to quickly deploy & deliver digital solutions with the hope of making technology solutions accessible to all. This on one hand helped us immensely to experiment with new approaches, steering new pathways for economic development, taking the technology beyond pandemic & pocket other benefits but on the other hand also created an imbalance in some cases. Applying patchworks, quick fixes, and reactive approaches, significantly reduced incentives for faster deployment of software features. With recent CSP outages and security vulnerabilities surfacing, shifting resources to collective action to tackle resiliency issues is slowly becoming a priority and an attractive business outcome for software companies We at the Canada DevOps Community of Practice initiated a crowd-chat sponsored through Blameless, where top software reliability & resiliency experts came together for creating a broader dialog with the software user community, investors, and influencers. We touched upon various key trends around software practices & tools, also outlined steps for the next steps into a more reliable & resilient digital ecosystem. With both DevOps & SRE practices, tools, and platform investments on the rise, the whitepaper created through the crowd-chat comes in at the right time Demystifying the silos between DevOps & SRE We Crowdsourced some of the key questions at hand like the emerging dynamics between DevOps & SRE practices. The dialogue indicated long-term benefits from balancing these initiatives in both dimensions. While DevOps advocates strong engineering practices to be embedded into software delivery & deployment, SREs substantiate the benefits of embedding good engineering practices through focusing on the end-to-end customer journey. SRE practices have the potential to attract business stakeholders to take a deeper look into business models more suitable for the modern software ecosystem. Re-negotiating SLA-driven contractual engagements, and moving towards continuous relationships based on a baseline of reliability, establishing SLIs, error budgets, and policies is one way forward. It will not only help re-establish the rules of engagement but also support scaling DevOps to enterprise-level adoption. According to the experts, SRE practices provide us a better opportunity to integrate the business community into the modern software-based economic landscape. Making the best of both worlds would mean we work closely to establish a universal language, creating reliability issues as new features or enhancements. Coming together to handshake the core principles not only in terms of error budgets & policies but also tie it back into the OKRs in some way. It is obvious that the journey from SLAs to SLIs & SLO and eventually leading to OKRs needs more work, learning & dialogue. A way forward to explore further, give it time to mature until we advocate the next steps towards orchestration through processes & tools. SRE platforms are not a thing of the future, these platforms are in making and we through crowd chat have explored some new features of the platform. Coming back to the point of blurring the boundaries of DevOps & SRE, it is evident that it goes beyond what initially experts advocated by simplifying it in a way that the Dev community could understand “Class SRE implement DevOps”. There are more steps in this convergence and SRE practices have more to offer. We will continue to explore its answers to other key questions, so what if an organization has invested in DevOps, how hard it would be to implement SRE practices, and is there a difference? Implementing SRE practices for those already invested in DevOps As organizations of varying sizes and scale have already invested in DevOps. Many have been able to successfully take the first steps towards DevOps onboarding, but some organizations are caught in the middle with a confused state, some on the verge of a failed DevOps implementation. A new wave of practices has started to steer in as a result, with SRE at the focal point. Coordinated efforts are required for implementing these SRE practices alongside DevOps. Experts provided their guidance to adapt to the convergence. Start with taking the first steps assessing the operations aspects, the Ops side of DevOps. It will not only help in identifying the strategic gaps but also set realistic shared goals for the organization. One of the critical next steps in the process is identifying resources, tools, and training for the people. Another important point is to baseline SLO’s from a user experience perspective. Lastly, reflect on the investment and time required to improve reliability. It is important to identify the small improvements, build up the momentum and compound it into a significant shift. Blameless had some detailed reflection in an ebook on this very subject! Check it out: https://info.blameless.com/bridging-the-gap-devops... In the next section, we will highlight the key components of a reliability engineering platform, that can serve as the next steps in the reliability engineering journey Key components of a Reliability Engineering platform One of the critical threads which ties everything together from a platform perspective is that the reliability engineering platform is for everyone. It should be able to ease out the silos, bringing coordinated approaches to set error budgets and policies. A platform-centric approach to SRE not only fosters new ways of collaboration but can be critical for communication, connecting business users to technology through a common platform. Some of the key features outlined are as follow 1. SRE platform Integrate tools used across the pertinent teams - reduce the onboarding effort 2. SRE tools are orchestrated in a secure & reliable way 3. Systemic reliability by effectively collecting & Cataloging information across all systems 4. Monitoring, Optimizing SLIs, and performing system generated reviews on indicators based on a data-driven approach 5. Introducing event-driven orchestration of SLIs for cloud-native applications, monitoring SLIs, and then automating actions based on them. Reference https://keptn.sh/ Scope of Reliability Engineer Current role boundaries are defined based on historical organization design. The scope of a reliability engineer would need careful extrapolation based on modern software practices. The convergence of DevOps practices with SRE practices will add on a more pragmatically crafted scope for reliability engineering. Some key trends influencing the scope is as follows 1. Integration of disciplines – As products and services become more multi-disciplinary, better system thinking skills will be required to embrace the scope of a reliability engineer 2. Data capturing, curation, and data analytics 3. Ability to define, manage and monitor inner-connected and shared products & services 4. Collaboration with your new co-worker, machines & AI systems 5. Ability to customize SRE platform to enable unique features while keeping shared SRE platform features standardized 6. Managing the minimum viable products, ability to make trade-off decisions (error budgets & policies), and flexible processes Investment of time and embracing new skills through life-long learning will define the scope of reliability engineers. A long - shot towards reliability engineering would require fundamental skills in programming, data science & basic maths as we see expanding the scope of data. Future of SRE practices & key trends As we progress into 2022, practitioners will come together to uplift SRE practices- building proactive resources for incidents (runbooks, etc.), using SLOs to not overreact to incidents, and further integrating with data science. The field of cognitive systems engineering has a lot of insight on how to reduce cognitive load and make the tools into good co-workers. Practitioners will assess the effects of heavy cognitive load, study the overload hypothesis for our systems and create new pathways by advancing SRE practices in this direction. Limiting the secondary tasks and profiling and reducing toil will remain one of the primary features for SRE practices and will be integrated in one way in tools & platforms. It is evident that SRE space is getting interesting and we will witness more collaboration in this space in the coming future. While we watch out for the key developments, we conclude the whitepaper and encourage everyone to contribute and be part of the collective journey of SRE, a cultural shift towards a reliable digital future. We thank our experts, for this powerful crowdsourcing event & for setting the perspective for the journey towards SRE