Site Reliability Engineer
JOB CODE: 7391
Our site reliability engineers (SREs) focus on a rich feature set, high availability, and excellent performance to enable our users to complete their missions.
At Datapixels, we're seeking a DevOps Engineer to join our team. You'll be responsible for providing product updates, troubleshooting production issues, and developing integrations to fulfill our clients' needs. You will play a critical role in bridging the gap between development, quality assurance, and IT operations.
You'll seek to incorporate the routine tasks of software development, quality assurance, deployment, and integration into a single, continuous set of processes.
Objectives of this Role
- Automation of IT Operations, developing and integrating software solutions to increase the stability, automation, and scalability of organizational systems.
- Monitoring critical applications and related services to ensure availability during critical business hours.
- Specifying Service Level Indicators and Objectives.
- Incident Management and Disaster Recovery.
- On-Call Support and Issue Resolution.
- Facilitate Post Incident Analysis.
Primary Responsibilities
- Ensuring that services are available, the underlying infrastructure is properly functioning, and other internal tools, processes, and systems are working as expected.
- Analyzing historical data and setting realistic objectives to meet Service Level Agreements (SLAs).
- Collaborate for high-priority Incident Tickets and ensure system recovery within an SLA.
- Ensure high-priority tickets are handled for a speedy resolution to meet Service Level Agreement. SRE will investigate, diagnose the problem, and subsequently resolve it.
- Incident analysis to identify the root-cause and how to prevent the future occurrence of similar incidents.
Required Skills and Qualifications
- Bachelor’s degree in computer science or other highly technical, scientific discipline or 5+ years of comparable experience.
- Software Development experience in one or more languages such as Python, Java, Typescript and Javascript.
- Experience with Docker, Kubernetes, and/or Terraform.
- Experience with Github Actions, CircleCI, Jenkins or other Continuous Integration tooling.
- Experience with AWS,Google Cloud Platform, or Azure.
- High proficiency with source control including Git.
- Proficiency with command line navigation.
- Experience with site performance profiling and tuning.
- Experience working within a service-oriented architecture.
Preferred Qualifications
- Experience in implementing observability for Graph QLAPIs
- Experience managing workloads for applications written in Javascript and Typescript with a Node runtime.
- Experience facilitating blameless incident retrospectives.