Site Reliability Engineer (Linux Platform Operations) (m/f/d)
As a Site Reliability Engineer, you operate and evolve Linux-based production platforms that power critical business services at scale. You focus on automation, reliability, and reducing operational overhead while enabling teams to work more independently.
Responsibilities
- Ensure reliable, secure, and high-performing Linux-based production systems with full ownership
- Automate operational tasks (e.g. patching, provisioning, deployments) to eliminate manual effort and improve efficiency
- Standardize and optimize deployment and configuration processes for scalability and consistency
- Lead incident response and drive root cause analysis and long-term fixes
- Manage and automate access and identity processes with a strong focus on security and auditability
- Maintain and improve core Linux infrastructure services essential for platform operations
- Collaborate with engineering teams to enhance observability and shared operational practices
- Analyze complex systems end-to-end and simplify them to improve reliability and performance
- Drive the modernization of operations towards automation, scalability, and self-service models
- Adapt quickly to changing environments and deliver pragmatic, effective solutions
Requirements
- 5+ years of experience in Linux-based production environments
- Strong expertise in Linux systems engineering, performance tuning, and lifecycle management
- Strong understanding of reliability concepts (SLOs, SLAs, performance, capacity)
- Solid scripting and automation skills (e.g., Bash, Python) with a continuous improvement mindset
- Hands-on experience with configuration management (e.g., Salt, Ansible) and Infrastructure as Code (e.g., Terraform)
- Experience with CI/CD tools (e.g., GitLab, Jenkins) and automated deployments
- Good knowledge of monitoring and observability tools (e.g., Zabbix, Grafana, ELK)
- Proven experience in incident management, root cause analysis, and postmortems
- Experience with security practices, including patching and access control
- Knowledge of core traffic services (DNS, load balancing, CDN)
- Basic experience with container and cloud technologies (Docker, Kubernetes, AWS)
Benefits
We value diversity and treat all applications equally – regardless of gender, background, age, religion, disability, or sexual orientation. Different perspectives enrich our team and make EVENTIM stronger.