Acquia empowers the world’s most ambitious brands to create digital customer experiences that matter. With open source Drupal at its core, the Acquia Digital Experience Platform (DXP) enables marketers, developers, and IT operations teams at thousands of global organizations to rapidly compose and deploy digital products and services that engage customers, enhance conversions, and help businesses stand out.
Headquartered in the U.S., Acquia is a Great Place to Work-CertifiedTM company in the U.K., is listed as one of the world’s top software companies by The Software Report, and is positioned as a market leader by the analyst community.. We are Acquia. We are building for the future and we want you to be a part of it!
The Opportunity
The SRE engineer is responsible for designing and delivering secure and highly available solutions. You will be a critical part of a team focused on ensuring our services are ready and stress tested. You should be comfortable taking on new challenges, defining potential solutions and implementing designs in a team environment. You will be working on a tech stack composed of Linux, Kubernetes, Ruby, Go-lang, Python, pgSQL, MySQL, Redis, Jenkins, Github and GCP.
You'll Spend Time:
- SRE partners closely with Engineering and Support. We are responsible for the deployment, and continuous operation of the Monsido platform.
- Making sure we automate as many tasks as possible to make diagnostics, scaling, healing and deployments a breeze.
- Working on a team responsible for a blend of architecture, automation, development, and application administration.
- Developing and deploy solutions from the infrastructure, to the network, and application layers, on public cloud platforms.
- Ensuring our SaaS platform is available and performing, and that we can notice problems before our customers.
- Collaborating with Support and Engineering on customer issues, as needed.
- Working with distributed data infrastructure, including containerization and virtualization tools, to enable unified engineering and production environments;
- Developing dashboards, monitors, and alerts to increase situational awareness of the state of our production issues/sla/security incidents.
- Independently conceiving and implementing ways to improve development efficiency, code reliability, and test fidelity.
- Participating in on-call rotation
You'll be Successful if You:
- Are anExpert in Unix/Linux OS administration (5-8 years)
- Proficient with computer network setup and debugging
- Proficientwith at least one scripting language (Shell, Python, …)
- Competentwith deploying, tuning, and maintaining Linux-based, highly available, fault-tolerant platforms in public cloud providers such as GCP, AWS or Azure
- Competent with Kubernetes
- Competent with application containerization
- Competent with SQL and relational database administration (PostgreSQL, MySQL)
- Competent with configuration management
Suggested Years of Experience:
- DevOps and/or build & release experience including delivery: +3 years
- Software Configuration Management tools: +2 years
- DB/Data Platforms: CitusDB/PgSQL: +2 years
- Application monitoring tools: +2 years
- Experience with Kubernetes and containerization +1 year
Extra credit:
- Best practices in infosec.
- The ability to dig deep into infrastructure and code to solve problems.
- The drive to solve traditional operations problems through automation.
- High attention to detail.
Acquia is an equal opportunity employer. We hire without regard to age, colour, disability, gender (including gender identity), marital status, national origin, race, religion, sex, sexual orientation, or any other status protected by applicable law.