Careers: Site Reliability Engineer

Site Reliability Engineer

JOB CODE: 7391

Our site reliability engineers (SREs) focus on a rich feature set, high availability, and excellent performance to enable our users to complete their missions.

At Datapixels, we're seeking a DevOps Engineer to join our team. You'll be responsible for providing product updates, troubleshooting production issues, and developing integrations to fulfill our clients' needs. You will play a critical role in bridging the gap between development, quality assurance, and IT operations.

You'll seek to incorporate the routine tasks of software development, quality assurance, deployment, and integration into a single, continuous set of processes.

Objectives of this Role

Automation of IT Operations, developing and integrating software solutions to increase the stability, automation, and scalability of organizational systems.
Monitoring critical applications and related services to ensure availability during critical business hours.
Specifying Service Level Indicators and Objectives.
Incident Management and Disaster Recovery.
On-Call Support and Issue Resolution.
Facilitate Post Incident Analysis.

Primary Responsibilities

Ensuring that services are available, the underlying infrastructure is properly functioning, and other internal tools, processes, and systems are working as expected.
Analyzing historical data and setting realistic objectives to meet Service Level Agreements (SLAs).
Collaborate for high-priority Incident Tickets and ensure system recovery within an SLA.
Ensure high-priority tickets are handled for a speedy resolution to meet Service Level Agreement. SRE will investigate, diagnose the problem, and subsequently resolve it.
Incident analysis to identify the root-cause and how to prevent the future occurrence of similar incidents.

Required Skills and Qualifications

Bachelor’s degree in computer science or other highly technical, scientific discipline or 5+ years of comparable experience.
Software Development experience in one or more languages such as Python, Java, Typescript and Javascript.
Experience with Docker, Kubernetes, and/or Terraform.
Experience with Github Actions, CircleCI, Jenkins or other Continuous Integration tooling.
Experience with AWS,Google Cloud Platform, or Azure.
High proficiency with source control including Git.
Proficiency with command line navigation.
Experience with site performance profiling and tuning.
Experience working within a service-oriented architecture.

Preferred Qualifications

Experience in implementing observability for Graph QLAPIs
Experience managing workloads for applications written in Javascript and Typescript with a Node runtime.
Experience facilitating blameless incident retrospectives.

Careers: Site Reliability Engineer

Site Reliability Engineer

JOB CODE: 7391

Objectives of this Role

Primary Responsibilities

Required Skills and Qualifications

Preferred Qualifications

Apply Now

JOIN THE TEAM

We're Looking For You

Senior React Native Developer

Marketing Technology Solution Engineer

Site Reliability Engineer