Situation
Discover’s engineering teams were experiencing significant inefficiencies in AWS ECS (Elastic Container Service) provisioning processes. The existing workflows were causing unnecessary cloud resource consumption, leading to substantial operational costs across the enterprise infrastructure. Multiple engineering teams were struggling with complex provisioning procedures that resulted in over-allocation of resources and extended deployment times. The business identified this as a critical cost optimization opportunity requiring immediate attention.
Task
ROLE: Designer, Researcher, Facilitator
RESPONSIBILITIES: User research, Workflow analysis
COLLABORATORS: Director of Financial Operations, Engineering teams representing Cloud, Application, & Infrastructure
Led comprehensive research initiative to identify the root causes of provisioning inefficiencies and design solutions that would optimize cloud resource utilization. My responsibilities included conducting user research with engineering teams, analyzing current provisioning workflows, understanding technical constraints, and developing actionable recommendations that could deliver measurable cost savings within a defined timeline.
Action
System Analysis & Design: Analyzed technical architecture and user journeys to pinpoint optimization opportunities. Collaborated with cloud infrastructure teams to understand the overarching challenges, technical constraints, and possibilities for workflow improvements.
Facilitating the abstraction Laddering exercise to frame the challenge
Qualitative Research Phase: Conducted 45-minute one-on-one interviews and observational research with 10 participants across multiple engineering roles including Senior Managers of Software/Cloud Engineering, Expert Infrastructure and Application Engineers, and Infrastructure Engineers. Interviewed participants to understand their step-by-step provisioning approaches from thought process to execution in understanding their AWS provisioning workflows, identifying pain points, and workarounds. Mapped current state processes and identified key inefficiency patterns.
“People aren’t building their systems and their components for performance stress tests up front, they’re waiting until pretty far down the road…And so at that point, they’re usually up against the deadline that’s coming…and they haven’t done performance tests and then they are rushing to get the testing done and then if it works, they’re just like, ok, great.”
Jobs to be Done Persona design
Cross-Functional Implementation: Worked closely with DevOps, platform engineering, and business stakeholders to ensure solution viability and smooth rollout across the organization.
GreenFinOps Team Collaboration
Worked with Director of FinOps to frame research objectives and synthesize findings that would inform organizational provisioning processes.
Process Analysis & Insight Development
Discovered that majority of participants lacked structured utilization review processes, relied on vendor best practices or arbitrary sizing (with only Digital Payments using customized approaches), and had no approval/justification processes except within Infrastructure teams.
Strategic Recommendations Development
Created the following comprehensive recommendations specifically:
Develop training around sizing & provisioning
Sizing guidelines customized based on performance testing
Importance of optimization and ensuring it is part of process including resizing cadence requirements
Consolidation of viewing data from disparate tooling (Maximizing use of existing tools for needs (Turbonomics, Datadog, Ansible)
Customize to be specific to application, environment
Partner with teams with structured process
Digital Payments for sizing
Infrastructure (CDPL) for education and approval process
Develop structured approval process
Check off during initial provisioning vs re-sizing
Implement auto-scaling where possible
Results
Post Implementation results show:
Annual cost savings of $28M through optimized cloud resource utilization with an identification of 30% over-provisioning rate across applications
Structured provisioning process development focusing on EC2 as initial implementation and establishment of training program addressing education gaps identified in 50% of participants.
Key insights revealed that “optimization is not a priority in the provisioning process” and “people aren’t building their systems and their components for performance stress tests up front,” leading to rushed decisions and arbitrary resource allocation. The research led to partnership with Digital Payments team for customized sizing approaches and Infrastructure team for structured approval processes.
The success of this project inspired the creation of a Cloud Center of Excellence, where I coached the team on development strategy and best practices using a framework I had adapted from a previous design center of excellence I created.
Facilitating engineering teams in developing a Cloud Center of Excellence