GCP vertex AI online prediction

I am using GCP vertex AI online prediction for model deployment with a custom container. The deployed model is working fine with less number(<50) of minimum nodes with n1-highmem-2.

But when I tried giving the higher number of minimum nodes(>50), I am getting following error:
Error Messages: model server container out of memory, please use a larger machine type for model deployment:
https://cloud.google.com/vertex-ai/docs/predictions/configure-compute#machine-types

I am not getting why increasing the minimum number of nodes is giving me out of memory. My understanding was, it would deploy the container(in the form of a model artifact registry) on each node and download the model from GCS. So if the same model is working on a smaller number of minimum nodes, why increasing the minimum nodes is giving out of memory?

Thanks in advance
Regards,
Anil

7 REPLIES 7

It seems like there might be some confusion regarding the relationship between the number of nodes and the memory allocation for your model in Google Cloud Vertex AI.

When you increase the number of nodes in Vertex AI, you are essentially scaling out your deployment to handle more concurrent requests. Each node runs a separate instance of your model container, which means if you have 50 nodes, you have 50 instances of your model running in parallel.

Now, the error message "model server container out of memory" suggests that one or more instances of your model container are running out of memory. This is independent of the number of nodes you have in your deployment.

Here are some possibilities to consider:

  • Each instance of your model container running on a node has its own memory allocation. If the memory allocation per container is insufficient for the workload it's handling, you'll encounter out-of-memory errors.
  • When you increase the number of nodes, the overall load on your deployment increases. This could lead to more memory usage overall, especially if the workload is memory-intensive.
  • It's possible that the memory footprint of your model increases with the number of concurrent requests it's handling. If the model is not memory-efficient, running more instances of it could exacerbate memory issues.

To troubleshoot this issue:

  • Review the memory allocation configuration for your model containers.
  • Monitor the memory usage of individual containers and nodes to identify which instances are running out of memory.
  • Consider optimizing your model for memory efficiency.
  • If increasing the number of nodes exacerbates the issue, consider scaling up (using a larger machine type) instead of scaling out (increasing the number of nodes).

By understanding how your model consumes memory and adjusting your deployment configuration accordingly, you should be able to resolve the out-of-memory errors.

Hi,
I'm seeking additional clarification regarding the number of nodes and containers in our deployment process. Specifically:
  1. While deploying, we can only set the number of nodes, thus I want to know what exactly is a node?
  2. Also, when we are setting machine type, are we configuring the nodes?
  3. Also, a follow up from the original question, when we are setting 50 minimum nodes, are these nodes getting created inside one machine (n1-highmem-2)?
 

Hey, thanks for the reply!

@Poala_Tenorio, first of all, thanks for providing the detailed explanation. But I am also having the queries as posted in the above reply.
Can you please provide some clarification?
Thanks!

When you increase the minimum nodes, each node runs its own instance of the model server container. If your model consumes a significant amount of memory per instance, scaling up the nodes could exceed available memory resources. Try using a larger machine type or optimizing your model to reduce memory usage.

Thanks for the reply.
But if the same model container is working on the same machine type(n1 -highmem-2) with a lesser number of nodes, why it is going out of memory with a larger number of nodes?

 

It seems like increasing the minimum number of nodes is causing memory issues on larger machine types. This could be due to resource constraints or the model's memory requirements exceeding the capacity of the larger machines. I recommend checking the model's memory footprint, ensuring it's compatible with the selected machine type. Additionally, review the resource utilization on the larger nodes during deployment to identify any potential bottlenecks. If the issue persists, consider reaching out to Google Cloud Support for detailed assistance in diagnosing and resolving the memory error.

Thanks for the reply.
But if the same model container is working on the same machine type(n1 -highmem-2) with a lesser number of nodes, why it is going out of memory with a larger number of nodes?