System Software Engineer, Instances Site Reliability Engineering - H/F - Free - Rue

Job description

Description de l'entreprise Nous avons commencé sous le nom Online.net en tant que fournisseurs de services d'hébergement pour plusieurs centaines de milliers de sites Internet sur nos trois datacenters. Nous sommes à ce jour le 2éme Cloud Service Provider en Europe et l'un des 8 plus grands Cloud Service Provider au monde. Créé en 2000 en même temps que le service d'accès à Internet Free, nous sommes une filiale à 100% du groupe Iliad, focalisée sur les services Cloud et fournissons des services Cloud pour toutes les tailles d'acteurs Internet dans le monde entier.

Le poste

Description

**Description of the company**

Scaleway, the Cloud branch of the Iliad Group, offers a range of innovative Cloud infrastructures covering a full spectrum of services for professionals: Public Cloud services with Scaleway Elements, Private infrastructures and Colocation with Scaleway Datacenter and Bare Metal infrastructures with Online by Scaleway.

Making sense of your work and fulfilling yourself are your priorities?

It's a good thing, because Scaleway places people at the heart of its ecosystem and has set up an organization that encourages responsibility, autonomy, influence and commitment from its employees. Our premises are open spaces, conducive to exchange and interaction between individuals.

Join a community of 250 passionate people and become part of a company rooted in the world of tomorrow that has transformation and disruption in its DNA.

**Context of the position**

We are looking for a System Software Engineer to join our Instances Site Reliability Engineering team. Your main mission will be to ensure we can reliably serve virtual machines for users around the world. We expect you to have a strong background in system administration, mixed with reliable software engineering skills. Our systems evolve all the time, issues can pop repeatedly and differ very much from one another. You will need to be a resilient problem solver that is willing to collaborate, and that knows how to leverage knowledge of system interactions in his favour. Are you ready to look after our virtualisation system and strive to improve our users daily life ? This is a unique opportunity to join Scaleway and ensure developers of any companies get the high-quality virtual instance service they need.

**What you’ll be doing**

* Take on-call responsibilities, mitigate issues encountered in production and secure the best real-time answer to our customers

* Troubleshoot high-impact issues working with multiple engineering teams (Storage, Network, Hardware)

* Optimise on-call processes, tools & documentation that will help identify, diagnose and remediate production incidents

* Ensure a high quality of service for our customers by leveraging observability and monitoring technologies

* Manage lifecycle of hypervisors in production and take part to fleet-wide migration plan

* Empower your team mates to swiftly integrate and deploy software components of our virtualisation system

* Bootstrap new regions and availability zones collaborating with Platform & Network engineering team

* Help implementing best practices in stability, resiliency, scalability, security and performance across our virtualisation system

**Technical stack & tools we use**

* Python

* RabbitMQ + Celery

* PostgreSQL + SQLAlchemy

* HA Proxy, Nginx, REST APIs / Flask

* S3 API

* Sentry, Icinga, Prometheus, Grafana, Fluentd, ElasticSearch

* Ubuntu, Debian, CentOS

* Saltstack, Ansible, AWX, Foreman

* GitLab, Nexus, Jenkins

* Jira, Confluence, Slack

Profil recherché

**What we expect from you**

* 5+ years of system administration experience including significant usage

* A great attitude and desire to work with a team

* Ability to make independent decisions, taking ownership for them

* Demonstrated ability to troubleshoot production systems failures

* Passion for incremental improvements on tooling, love all things of automation

* Experience scripting with bash and Python

* Experience with Linux systems: Ubuntu server, qemu/kvm

* Experience with infrastructure as code and continuous deployment

* Understanding of computer networks: TCP/IP, DNS, load-balancing, IPv6, BGP, load balancer and network virtualisation

**Nice to have**

* Experience dealing with physical hardware automation

* Experience with monitoring & logging systems

* Experience administrating relational databases

* Experience with Python programming

* Knowledge of one cloud platform and related use-cases

* Experience as an OSS contributor or maintainer

You recognize yourself by reading these lines and you want to join a young, innovative, growing company where it is good to work ?

Then don't wait any longer and join us :)