GCP Idle Model Charging

Hello!
I would like to deploy the ML Model into GCP.

Most of the time the model will be sleeping. Sometimes I should use it through Endpoint for some seconds.
I don't want to pay for full-time GPU instance and I need fast responses at the same time, without deployment from scratch everytime I need it.

Is this possible in GCP ?

2 REPLIES 2

According to the Pricing for AutoML models documentation:

Pricing for AutoML models

For Vertex AI AutoML models, you pay for three main activities:

  • Training the model
  • Deploying the model to an endpoint
  • Using the model to make predictions

Vertex AI uses predefined machine configurations for Vertex AutoML models, and the hourly rate for these activities reflects the resource usage. ... You pay for each model deployed to an endpoint, even if no prediction is made. You must undeploy your model to stop incurring further charges. Models that are not deployed or have failed to deploy are not charged.

You pay only for compute hours used; if training fails for any reason other than a user-initiated cancellation, you are not billed for the time. You are charged for training time if you cancel the operation.

Custom-trained models

Training

The tables Machine types and Accelerators provide the approximate price per hour of various training configurations. You can choose a custom configuration of selected machine types. To calculate pricing, sum the costs of the virtual machines you use.

If you use Compute Engine machine types and attach accelerators, the cost of the accelerators is separate. To calculate this cost, multiply the prices in the table of accelerators below by how many machine hours of each type of accelerator you use.

For further information about pricing, please refer to the Vertex AI pricing documentation, or you can connect with our sales team to get a custom quote.

I think it needs to develop CloudRun instead of VertexAI AutoML deployment.