Senior Software Engineer, Observability Team
Department: Engineering
Acquia is an open source digital experience company. We provide the world's most ambitious brands with technology that allows them to embrace innovation and create customer moments that matter. At Acquia we believe in the power of community and collaboration - giving our customers the freedom to build tomorrow on their terms.
Headquartered in Boston, we have been named as one of North America’s fastest growing software companies as reported by Deloitte and Inc. Magazine, and have been rated a leader by the analyst community and named one of the Best Places to Work by the Boston Business Journal. We are Acquia. We are building for the future of the web, and we want you to be a part of it.
Acquia’s products run 100% on Amazon Web Services using EKS, EC2, CloudFormation, Terraform and various other technologies and best practices. Since each product is built and maintained by its own engineering team, the ideal candidate for this position would need to be proactive in familiarizing themselves with those services and have the ability to coordinate and collaborate with multiple teams.
About the Team: The Observability team plays a pivotal role in ensuring the smooth functioning and performance optimization of all our systems. We are a dynamic team of engineers dedicated to providing centralized Observability solutions to empower all teams within the company. We are seeking a highly skilled and experienced Senior Software Engineer to join our team. As a Senior Engineer, you will play a key role in designing, implementing, and maintaining systems and tools to ensure the reliability, performance, and scalability of our infrastructure and applications.
As a Senior Software Engineer, you will…
- Lead the design and implementation of observability solutions, including monitoring, logging, and tracing systems along with a wide range of core internal systems. Work with your team to develop far reaching modules that have scalability and availability at their core
- Collaborate with cross-functional teams in deciding, developing integrations with other subsystems and best practices for both current and future infrastructure needs at a scale.
- Develop and maintain monitoring dashboards and alerts to provide actionable insights into the system health and performance.
- Automate the observability process to improve efficiency and scalability.
- Conduct in-depth performance analysis and troubleshooting to identify and resolve issues proactively, ensuring minimal impact on operations.
- Maintain an understanding of system functionality and architecture, with a strong focus on the operational aspects of the service (availability, performance, change management, emergency response, capacity planning, etc)
- Stay abreast of industry trends and emerging technologies in observability, and make recommendations for adoptions to enhance our systems.
- Provide product support to internal and external stakeholders
- Work in a team environment where your team owns and operates the services you build
You’ll enjoy this role if you…
- Like solving complex challenges for scalable, low latency systems
- Enjoy solutioning for a Cloud native environment
- Enjoy collaborating with multiple stakeholders
- Have a passion for DevOps & SRE practices
What you’ll need to be successful…
- Have 5+ years of software development experience with time spent working on Cloud technologies (AWS, Google Compute, Azure) at large scale. AWS with Kubernetes is greatly preferred.
- Proficiency in programming languages such as Golang, PHP, Ruby or similar.
- Comfortable navigating & troubleshooting unix/linux based operating systems.
- Strong understanding of monitoring and logging technologies, such as Prometheus, OpenTelemetry, Fluentd, Collectd, Grafana, ELK Stack, or similar.
- Familiarity with Sumo Logic, New Relic, Dynatrace, Cloudwatch, Splunk, Nagios.
- Strong interest in building and operating distributed systems and/or service oriented architectures.
- Passion for Devops processes and tools (Jenkins), distributed configuration management systems (Ansible, Puppet) and maintaining infrastructure as code (Terraform, Cloudformation)
- Excellent problem-solving skills, attention to detail, and ability to work independently as well as part of a team.
- Strong communication and collaboration skills, with ability to effectively interact with stakeholders across different teams and levels.
Extra credit if you…
- Certifications in relevant technologies (AWS, CKAD, CKA, etc)
- Have hands on experience with Docker, K8s or equivalent
- Have a mindset to automate repetitive tasks
Acquiais an equal opportunity (EEO) employer. We hire without regard to age, color, disability, gender (including gender identity), marital status, national origin, race, religion, sex, sexual orientation, veteran status, or any other status protected by applicable law.