Skip to content

Platform#

Keeping systems online and running 24/7!

Responsibilities#

  • Manage, maintain and monitor all clusters 24/7
  • Manage, maintain and monitor global Lagoon infrastructure 24/7
  • Monitor all production sites 24/7
  • React to infrastructure alerts
  • React to outages reported from clients via Client Support Team
  • Provide emergency phone support outside office hours
  • Continuously improve amazee.io platform
  • Coordinate with external partners such as AWS, GCP, Azure and Fastly to ensure stable operations
  • Guarantee platform and website uptime SLAs
  • Coordinate with Lagoon Team for Lagoon features, releases, issues
  • Coordinate with and support amazee.io security team
  • Monitor, analyze, and optimize infrastructure costs with the help of knowledge from the Business Operations Team and tooling from the IT Team
  • Create and update statuspage entries during outages and maintenance
  • Write post-mortems for more significant outages on time

Non-Responsibilities#

  • 1st & 2nd level support
  • No direct communication with clients during outages (communication happens via statuspage)
  • Supporting client application issues or requests
  • Maintain Lagoon codebase
  • Create reports on site uptime

Workstream#

Roles#

Current Staffing#