Custom Tune of LLM in Generative AI Studio - training time/parameters - Costs

Hi,

I want to tune a LLM in Vertex AI Generative Studio (text-bison001 ) and I know it has 137B parameters. I investigated the costs and they are:

Tuning jobs in us-central1 use eight A100 80GB GPUs. Tuning jobs in europe-west4 use 64 cores of the TPU v3 pod custom model training resource, only available upon request. Using a fast calculation, eight A100 80GB will cost 40.22 USD/hour and the TPU V3 64 cores, supposing is the double of 32 cores, will cost 64 USD/hour.

I have a proper JSONL dataset with 1,000 to 52,000 examples and I want to train for 300 epochs.

The issue here is that I need to know how much time usually the tune of text-bison001 takes (1 hour, 10 hours), or at least how many parameters will be tuned, to have an idea about costs involved.

This information is not provided in Vertex AI Pricing, it is not provided in Generative AI Studio Language documentation. Should I consider the regular Vertex AI pricing in calculator ? Maybe this is not the case, as 8 A100 GPUs will be used.

Thanks in advance

1 4 1,862
4 REPLIES 4

I solved the problem. Tuning Foundation Model in Generative AI studio means using PEFT (Parameter Efficient Fine Tuning), where not all weights are updated through training, but adapters (additional layers) are used to tune for a specific task. "These techniques enable tuning of the model to specific tasks without having to rebuild the entire foundation model. Shared deployment of a foundation model can be quickly augmented with adapter weights that are specific to a particular task or domain at runtime. "

Once the adapter layers are trained, the weights are uploaded as a model to a bucket in the Vertex Prediction per-customer tenant project and is deployed to an Endpoint in the customer’s project. Note that when serving a request on an endpoint, the adapter weights are loaded from the bucket. So you don't load all weights from the bucket, just the adapter's.

So, according to my experience, tuning a FM for 300 epochs would take approximately 40 minutes using 8 A100 GPUs.

References:

 

https://services.google.com/fh/files/misc/adaptation_of_foundation_models_whitepaper_google_cloud.pd...

https://arxiv.org/abs/1902.00751

Hi @rubenszmm! In case you still remember, how much data did you use in your 40 min run? I am also trying to estimate my cost before running, but Googles information are a bit limited.

Has someone got updates in how long it would take to? is there a way to estimate the time, in what does it depends on? # of epochs? # of lines in the jsonl?

I ran it with 30 epochs(I dont even know whats is a epoch) and the 10 lines in the jsonl(the minimun), text-bison@002 base model,  and it lasted 47mins the part that uses the nvidias a100 80gb, a2-ultragpu-8g, total was about a little more than an hour, I haven't been billed for it yet, will update when billed.

Screenshot 2023-12-18 at 4.48.07 PM.png