About Yash2384

Yash2384 · 04-10-2024

I was looking into the code# Set docker and quantization for AWQ quantized models VLLM_DOCKER_URI = "us-docker.pkg.dev/vertex-ai/vertex-vision-model-garden-dockers/pytorch-vllm-serve:20231127_0916_RC00" quantized_model_id = "TheBloke/Llama-2-70B-chat...

Yash2384 · 04-01-2024

I am using this library to make a prediction request to the model deployed on Vertex AI. I am getting a timeout exception, Not sure if I need to increase the timeout and up to what value . Also what is the default value , I can find nothing in the do...

Yash2384 · 03-22-2024

I've integrated an LLM model into the Model Registry using a custom Docker container. The model is hosted correctly, and I can consistently execute prediction requests. However, occasionally I encounter a '503 Service Unavailable' error.This issue be...

Yash2384 · 03-17-2024

I've deployed a container hosting a customized model in Vertex AI. I encounter connection timeout exceptions, particularly when there are 5 or more concurrent requests.I'm exploring an alternative approach that is cost-effective and capable of autosc...

Yash2384 · 04-01-2024

What do you mean by "specific deployment of yours". Shall I import the model directly into vertex AI instead of wrapping it into a container. Will then this issue will get resolved

Google Cloud Community

My Stats

Yash2384's Bio

Badges Yash2384 Earned

Recent Activity

Is the model part of the container in the script

What is the default timeout when a predict request is made to an LLM model deployed on vertex AI

"503 Server Error: Service Unavailable exception" on service hosted on vertex AI

Is it a good practice to deploy an container hosting an ML model in cloud run

Re: "503 Server Error: Service Unavailable exception" on service hosted on vertex AI