Platform#
Keeping systems online and running 24/7!
Responsibilities#
- Manage, maintain and monitor all clusters 24/7
- Manage, maintain and monitor global Lagoon infrastructure 24/7
- Monitor all production sites 24/7
- React to infrastructure alerts
- React to outages reported from clients via Client Support Team
- Provide emergency phone support outside office hours
- Continuously improve amazee.io platform
- Coordinate with external partners such as AWS, GCP, Azure and Fastly to ensure stable operations
- Guarantee platform and website uptime SLAs
- Coordinate with Lagoon Team for Lagoon features, releases, issues
- Coordinate with and support amazee.io security team
- Monitor, analyze, and optimize infrastructure costs with the help of knowledge from the Business Operations Team and tooling from the IT Team
- Create and update statuspage entries during outages and maintenance
- Write post-mortems for more significant outages on time
Non-Responsibilities#
- 1st & 2nd level support
- No direct communication with clients during outages (communication happens via statuspage)
- Supporting client application issues or requests
- Maintain Lagoon codebase
- Create reports on site uptime
Workstream#
Roles#
- Platform Lead
- Platform Engineer
Current Staffing#
- Brittany Mitchell
- Glyn Davies
- Michael Schmid (Management Sponsor)
- Salvatore Pappalardo
- Tobi Nehrlich (Platform Lead)