Compute as a Service

An example of CaaS

This week, we were assigned to read the chapter Compute As A Service from Software Engineering at Google. When the code is production-ready, it is important to find a compute service that will meet the organization's needs and follow best practices. Some best practices of a compute service include automation, workload types, data handling, and implementing distributed system concepts.

The first best practice for a service is to leverage automation. In this case, we are referring to handling failures such as container crashing or dealing with an unhealthy job or machine. Containers and machines should be used as "cattle" for services to be healthy, which means that when one crashes, another can pick up the job and continue forward.

The next best practice is handling different types of workloads. A service should be designed to handle batch jobs and serving jobs. Batch jobs are short-lived jobs that are split into small chunks by the system and assigned dynamically. These jobs are focused on the throughput of processing, which maximizes the total work completed over time. Serving jobs are long-lived jobs where each request is independent and can run concurrently with other jobs. These jobs focus on the latency of serving a single request as the software is running, ensuring that spikes in traffic are handled with care.

Next, a service should be able to handle data appropriately. Services should be able to pull data from external storage or caches and write it in batches. Writing it in batches helps with the overall processing and reduces the overhead by writing multiple records in an operation.

Finally, a service should follow the concepts of a distributed system. One concept is that of service discovery, which helps apps find each other and connect automatically. Another is idempotency, which refers to the way services handle retries safely in the event of failure.

By following these best practices, services can ensure long-term maintainability and earn an organization's trust in production systems.

Search This Blog

Edward's CS Journey

CST 438 Week 6 Learning Journal

Compute as a Service

Comments

Post a Comment

Popular posts from this blog

CST 489/499 Week 11 Learning Journal

CST 489/499 Week 16 Learning Journal