VERTEX Ai platform: Error Messages: Model server exited unexpectedly.

Hi, 

I have multiple models that I want to deploy.

I created two resource pools, one for CPU and one for GPU.

Now I deployed two models to the resource pool CPU, and it is working well.

 

But when I try to create an endpoint and attach it to the resource pool GPU, it fails.

I tried two different models and it is still not working.

The models work if I set dedicated resources with GPU.

 

Here's the message I got by mail:

Error Messages: Model server exited unexpectedly.

 

So basically, when I hit create endpoint it keeps loading for some minutes then the error shows.

3 REPLIES 3

I found this error in logging explorer:
(1) NOT_FOUND: Error executing an HTTP request: HTTP response code 404 with body '<?xml version='1.0' encoding='UTF-8'?><Error><Code>NoSuchKey</Code><Message>The specified key does not exist.</Message><Details>No such object: caip-tenant-fc9d9d0b-17f4-4284-9823-401faaf96ac0/5044324052747943936-processed/tfeieOptimizedModel/20230812093241/1/variables/variables.data-00000-of-00001</Details></Error>

when reading gs://caip-tenant-fc9d9d0b-17f4-4284-9823-401faaf96ac0/5044324052747943936-processed/tfeieOptimizedModel/20230812093241/1/variables/variables.data-00000-of-00001

Tried importing model without "Tensorflow optimize runtime" option and I got this error:

P_REQUIRES failed at xla_ops.cc:296 : UNIMPLEMENTED: Could not find compiler for platform CUDA: NOT_FOUND: could not find registered compiler for platform CUDA -- was support for that platform linked in?"