CUDs on A2 with A100

Hello there

Wanted to understand how CUDs for GPUs should be handled in GKE multi AZ environment, and specifically for A100 on A2

What are the best practices for purchasing GPUs commitments, and handling dynamically shifting of workloads between AZs reservations wise ?

Is there any special configuration that needs to be made to GKE Node Pools to have them utilize the reservations ?

When purchasing A2 commitments, should the GPU be purchased together with the machine, or separately ?

When purchasing A2 commitments, are reservations required or not ?

Thanks,
Elad.

Solved Solved
0 3 520
1 ACCEPTED SOLUTION

You can refer to the reservation when creating the GKE Node Pool ( by using the reservation-affinity, none means any reservation [reservation name] can be used to specify a specific reservation). If you create the same amount of nodes as you have reservation ( and disable node autoscaling) you will never run a GPU machine without a CUD+reservation. The default is none btw -> so it will consume any matching reservation, to use a named one you would need to use gcloud, the API or Terraform. 

If you only run pods that require containers on this node pool you should be good to go - to evenly distribute the pods over the nodes use something like topology spread

If you also want to run other pods on these hosts you could use something like pod priority to make sure the GPU pods will always run the GPU workloads first.

View solution in original post

3 REPLIES 3

First of all there are two different concepts - the CUD which will give you a discount because you committed to using a resource. Secondly a reservation which will give you the guarantee that a resource will be available for you at all times, within a zone.

Now when purchasing a CUD for a VM equiped with a GPU you always need to buy both and if you have a node pool in GKE then you would typically purchase a CUD+Reservation for all the nodes ( spread over the AZs) and use K8s and make sure that your application balances the load ( or use k8s native methods to do so) to use the resources you are paying for as efficiently as possible

For how GKE consumes reservations please look here: https://cloud.google.com/kubernetes-engine/docs/how-to/consuming-reservations

You can purchase a commit on a GPU without including vCPU and memory (https://cloud.google.com/compute/docs/instances/signing-up-committed-use-discounts#purchasecommitmen...).

I hope this answers your questions.

 

Thanks @jmbrinkman for you answer

So this part is the one that eludes me "and use K8s and make sure that your application balances the load ( or use k8s native methods to do so) to use the resources you are paying for as efficiently as possible"

Question is how exactly, assuming workloads are dynamically shifting between AZs ?

How can I ensure I don't have GPUs running without CUDs and don't have CUDs that aren't utilized ?

Also regarding how GKE consumes reservations

So I know the docs, but the question still remains -> Is there any special configuration that needs to be made to GKE Node Pools to have them utilize the reservations ? That is, assuming I am not using gcloud CLI to spin up my cluster, and I don't have the ability to explicitly define the policy for using reservations. What is the default i.e. what will I get ?

 

Thanks,

Elad.

You can refer to the reservation when creating the GKE Node Pool ( by using the reservation-affinity, none means any reservation [reservation name] can be used to specify a specific reservation). If you create the same amount of nodes as you have reservation ( and disable node autoscaling) you will never run a GPU machine without a CUD+reservation. The default is none btw -> so it will consume any matching reservation, to use a named one you would need to use gcloud, the API or Terraform. 

If you only run pods that require containers on this node pool you should be good to go - to evenly distribute the pods over the nodes use something like topology spread

If you also want to run other pods on these hosts you could use something like pod priority to make sure the GPU pods will always run the GPU workloads first.