Does Vertex AI support multi model endpoints

We have 100's of models and deploying each one to its independent endpoint is very expensive.We are looking for a way to deploy multiple models to a single endpoint.Our docker image will have all the models and we will be having custom logic to invoke the models based on the request from the endpoint.

Similar functionality is available in AWS SageMaker.

 
2 8 1,988
8 REPLIES 8

I read the following Vertex AI documentation page:

https://cloud.google.com/vertex-ai/docs/general/deployment#models-endpoint

This page seems to say that we can deploy multiple models to the same endpoint.  If I understand that correctly, you can then serve multiple models from the same endpoint nodes.

I think this means deploying multiple versions of the same model and not completely independent model. 

Hi there,

You may deploy totally different models to the same endpoint on Vertex AI and split the traffic as you wish. There is no technical restriction. From a business point of view, you may prefer to have the same (or similar) targeting goals for the models in order to support your decisions.  

Hi, how would that work though, as in, if the endpoint is the same, how do we make sure that we request a specific model prediction. For example, if we deploy 2 different models, say model1 and model2, to the same endpoint, with a traffic split of 50%, then what this means is that all requests to this endpoint are split to the two models with a probability of 0.5, i.e., if a we make a request, sometimes we will be served by model1 and sometimes by model2. How do we make sure we are served by a specific model in this scenario?

you can actually deploy a multi model endpoint and to call a specific model, just add the argument "TargetModel": 'yourmodelname.tar.gz'.

for more information refer to this link : https://towardsdatascience.com/deploy-multiple-tensorflow-models-to-one-endpoint-65bea81c3f2f

Yes that option is available in amazon sagemaker as the article suggests.Is it also possible with vertex ai

Hi,

Could you please suggest how to do this using the Python API? (https://googleapis.dev/python/aiplatform/latest/aiplatform.html)

I have been trying but when specifying a `traffic_split` dict, the keys of this dict have to be Deployed Model IDs, which makes no sense because the models are not deployed yet when calling `model.deploy()`

Thank you!

Do these google guys ever help with realistic solutions ? I have the exact same problem and theres absolutely no documentation around how to deploy multiple versions of the same model to the same endpoint !! About time to learn from AWS maybe??