Title: Site Reliability Engineer

Location: Riyadh, Saudi Arabia

Duration: Fulltime Onsite

About the job

Enterprise wide-area networking is primed for a new paradigm with the introduction of software defined networking architecture to deliver agility, performance, services and software innovations. client is changing the networking industry and you will be part of the charge to drive evolution. You will collaborate with industry leading engineers to build a development and deployment infrastructure for the best product portfolio in the industry.

The Team

As part of client engineering team, you will be responsible for the development of client’s offerings and execute on the Product and Portfolio strategy.

The Work

Your primary responsibilities in this role will be to monitor and support the client service. This will include monitoring of the client Portal as well as the client Backbone network. You will:

Take ownership of the underlying infrastructure that supports the production backbone, including compute, storage, networking, and platform layers

Ensure that servers, networks, and distributed systems are architected and managed to deliver maximum uptime, resiliency, and operational efficiency

Monitor and manage production workloads, ensuring that software and services run smoothly with minimal disruption

Quickly diagnose, triage, and resolve incidents, while performing root cause analyses to prevent recurrence

Design and implement observability solutions (logging, monitoring, alerting) to proactively detect and respond to performance issues or failures

Collaborate closely with other SREs, DevOps, and engineering teams to define and enforce SLAs, SLOs, and SLIs, and drive continuous improvement in system reliability

Contribute to automation efforts, including infrastructure-as-code (IaC), CI/CD pipelines, and self-healing capabilities for production environments

Competencies

Expertise in Programming: Essential for automating tasks and designing resilient systems.

Understanding of IT Operations: Vital to manage infrastructure, diagnose issues, and keep services running.

Leadership Skills: Necessary for guiding teams and influencing tech strategy.

Strategic Vision: Enables the SRE to anticipate challenges and steer the company towards reliability and scalability.

Experience

Bachelor’s Degree, or higher, in Computer Science or related technical field, or equivalent experience

1-3+ years of software development experience

Kubernetes

Proficient in C, C++, Java, Go, Nodejs, php, Scala, Python or similar language

1-3+ years of experience with different deploying and running services in a public or private cloud; AWS, Azure, GCP, etc

1-3+ years of experience with service discovery tools; Kubernetes, Zookeeper, HashiCorp consul, or similar software

1-3+ years of experience with RPC technologies and messaging systems; Google protobuf, apache thrift, ZeroMQ, RabbitMQ, Kafka or similar

1-3+ years of experience with different SQL and No-SQL datastores; MySQL, MongoDB, ElasticSearch, InfluxDB, Redis, DynamoDB, Cassandra or similar

Network management technologies experience a plus; gNMI, Netconf, SNMP, NetFlow, IPFIX

Show more Show less

Site Reliability Engineer (Need Saudi Citizens Only)

وصف الوظيفة