What are the key concepts that define what we mean by a sustainable service on AWS?
- All required data is persisted.
- Containers can be restarted at any time without losing any essential data and without any disruption to the service
- Instances can be restarted at any time without losing any essential data and without any disruption to the service
- The infrastructure can be recreated at any time with automatic data recovery
- All needed logs can be seen without accessing the server
- All logging and metrics that indicate a user-affecting service problem alert
Reliable by Design
By designing for failure, and persisting data, we can provide a sustainable service that looks after itself
- All containers must set themselves up on startup, loading any persistent data required.
- All instances must set themselves up on startup, installing and configuring applications, containers, and persistent data as required.
- Data held on containers or instances is considered ephemeral, any persisted data can only be relied upon if stored on a reliable multi-zone database such as RDS, or on S3.
- Multiply redundant access can be maintained by load balancing instances through an ELB. The minimum requirements should be,
- 2 or more online instances in different availability zones
- Proper health checks on the ELB that assure the instances are truly available
- The auto-scaling group uses the ELB health check
- Logging and monitoring data is held completely separately from the service
- all logs and metrics required are shipped to this location
- no ssh access is required for additional information
- all logs and metrics that indicate a service problem are monitored and alerted on
- non-essential logs or metrics are performance data and do do not generate critical alerts