If you are a site reliability engineering leader ready to take the reins and drive impact, we’ve got an opportunity just for you.
As a Senior Director of Site Reliability Engineering at JPMorgan Chase within the Enterprise Technology, AI/ML & Data Platforms division, you will play a crucial role at both business and firmwide levels. Your responsibilities will include inspiring your team and others to deliver robust and resilient products and services to our clients. You will be tasked with developing company-wide reliability strategies and leading your team in the implementation and execution of these strategies. Our team is committed to providing comprehensive data management solutions that make J.P Morgan Chase's data discoverable, understandable, observable, reliable, accessible, and interoperable for authorized users, thereby accelerating BI and AI/ML initiatives with agility and speed.
Job responsibilities
Manages team members’ development by ensuring they have access to resources needed for learningCollaborates across the firm to align team members for mobility opportunities in line with their career aspirationsApplies a wide range of tactics and strategies to guide internal executive decisions to achieve substantial goalsManages multiple stakeholders and complex projects consisting of large teamsImplements innovative methods, techniques, and evaluation criteria for projects and people working on highly complex business issues
Required qualifications, capabilities, and skills
Formal training or certification on site reliability/software engineering concepts and 10+ years applied experience. In addition, 5+ years of experience leading technologists to manage, anticipate and solve complex technical items within your domain of expertiseInfluences the teams' culture by championing innovation and change for firmwide successExpertise in monitoring tools (e.g., Prometheus, Grafana, Nagios) and logging systems (e.g., ELK stack, Splunk).Ability to implement and manage observability practices to ensure system reliability.Proficiency in cloud platforms (e.g., AWS, Azure, Google Cloud) and their services.Experience in implementing SRE principles and practices to improve system reliability and availability.Proficiency in SQL, NoSQL databases, and data warehousing solutionsExperience hiring, developing, and recognizing talentDemonstrated prior experience influencing across highly matrixed, complex organizations and delivering value at scale Experience leading complex projects supporting site reliability engineering design, scaling, resilience, and system performance assessments
Preferred qualifications, capabilities, and skills
Knowledge of data governance frameworks and best practices.Familiarity with data privacy regulations (e.g., GDPR, CCPA)Skills in identifying and resolving performance bottlenecks.Experience with load testing and capacity planning.
#LI-RB3