Equifax is where you can power your possible. If you want to achieve your true potential, chart new paths, develop new skills, collaborate with bright minds, and make a meaningful impact, we want to hear from you.
Site Reliability Engineering (SRE) at Equifax is a discipline that combines software and systems engineering for building and running large-scale, distributed, fault-tolerant systems. SRE ensures that internal and external services meet or exceed reliability and performance expectations while adhering to Equifax engineering principles.
SRE is also an engineering approach to building and running production systems – we engineer solutions to operational problems. Our SREs are responsible for overall system operation and we use a breadth of tools and approaches to solve a broad set of problems. Practices such as limiting time spent on operational work, blameless postmortems, proactive identification, and prevention of potential outages.
Our SRE culture of diversity, intellectual curiosity, problem solving and openness is key to its success. Equifax brings together people with a wide variety of backgrounds, experiences and perspectives. We encourage them to collaborate, think big, and take risks in a blame-free environment. We promote self-direction to work on meaningful projects, while we also strive to build an environment that provides the support and mentorship needed to learn, grow and take pride in our work
What you’ll do
Troubleshoot and support the dev teams with their continuous integration and continuous deployment processes (CI/CD).
Assist in resolving complex issues arising from product upgrades, installations and configurations
Design and improve automation tools that integrate with: Docker, Kubernetes, Helm, Terraform, GitHub Actions, GCP
Develop and execute best practices, system hardening and security controls, contribute to providing solution architectures and strategy
You will automate system scalability and continually work to improve system resiliency, performance and efficiency
Configuration of monitoring and APM tools such as: Datadog, AppDynamics, Grafana and Prometheus
Partner with respective departments to develop practical automation solutions and participate in cross functional team meetings to collaborate and ensure successful execution
Diagnose and deploy complex systems that may involve coordination with external teams.
Maintain internal documentation that fully reflects all activity related to an application and environment to be used by applicable teams
Respond and work incident tickets in ServiceNow regarding items such as service outages, infrastructure issues, zero day vulnerability patching, etc.
Design and implement delivery pipelines, including test automation, security, and performance
Assist in resolving complex issues arising from product upgrades, installations and configurations
Comply with all corporate and departmental privacy and data security policies and practice
You will influence and design infrastructure, architecture, standards and methods for large-scale systems
You will support services prior to production via infrastructure design, software platform development, load testing, capacity planning and launch reviews
You will maintain services during deployment and in production by measuring and monitoring key performance and service level indicators including availability latency, and overall system health
What experience you need
Bachelor's degree in Computer Science or related technical field involving coding (e.g., physics or mathematics), or equivalent job experience required
5+ years of experience developing and/or administering software in public cloud
5+ years of experience in languages such as Python, Bash, Java, Go, JavaScript and/or node.js or similar skills
5+ years of experience in system administration skills, including automation and orchestration of Linux/Windows using Chef, Puppet, and/or containers (Docker, Kubernetes, etc.) or similar skills
Experience with build systems such as GitHub Actions, Jenkins
Experience with configuration management tools such as Chef, Ansible, Powershell DSC
Experience with infrastructure-as-code technologies (Terraform, GCP Deployment Manager)
Experience with Kubernetes (GKE preferred)
What could set you apart
Technical knowledge about monitoring tools, Splunk, security controls, networking (firewalls, ingress/egress routing)
Experience with GCP, such as autoscaling, Google Cloud Functions, Google Cloud Dataflow, Google Cloud Pub/Sub, IAM
Experience with web servers such as Apache or Nginx
You have expertise designing, analyzing and troubleshooting large-scale distributed systems.
You take a system problem-solving approach, coupled with strong communication skills and a sense of ownership and drive
You are passionate for automation with a desire to eliminate toil whenever possible
You’ve built software or maintained systems in a highly secure, regulated or compliant industry
You thrive in and have experience and passion for working within a DevOps culture and as part of a team
We offer a hybrid work setting, comprehensive compensation and healthcare packages, attractive paid time off, and organizational growth potential through our online learning platform with guided career tracks.
Are you ready to power your possible? Apply today, and get started on a path toward an exciting new career at Equifax, where you can make a difference!
Primary Location:
IND-Trivandrum-Equifax Analytics-PECFunction:
Function - Tech Engineering and Service OpsSchedule:
Full time