Solved: Re: BigQuery Quota Limit understanding

ammar_hanif · 08-18-2023 07:59 AM

Hi,

I'm using bigquery, interactive queries (not scheduled or nor API) to perform task that requires too many loops, hence too many temp table creation, querying table and making INSERT statements into the already created table. The sample_uc table has around 250,000 rows and 6 columns. However, the original uc table would have 600,000 entries and 6 columns.

Although things are running fine for now, since I'm limiting the loop for max 20 iterations. But what would happen if don't limit iterations? When I use the original (600,000 entries) table?

Will this prompt any error due to the quota limitations? for INSERT (around 5600 INSERT instances required), querying table and creation and drop of temporary tables?

I'm using below query for reference

DECLARE unique_material ARRAY<STRING>;

DECLARE index INT64 DEFAULT 0;

DECLARE current_value STRING;

DECLARE part_names_query STRING;

DECLARE all_part_names STRING;

CREATE TEMP TABLE parts AS (

SELECT * FROM `nebbiu-data-hrp.testing.sample_uc`

WHERE Option__Variant_ NOT LIKE 'UNIT%' OR Option__Variant_ LIKE 'UNIT_B0000'

);

-- CREATE TEMP TABLE base_models AS (

-- SELECT * FROM `nebbiu-data-hrp.testing.sample_uc`

-- WHERE Option__Variant_ LIKE 'UNIT_B0000'

-- );

CREATE TEMP TABLE variant_models AS (

SELECT * FROM `nebbiu-data-hrp.testing.sample_uc`

WHERE Option__Variant_ LIKE 'UNIT%' AND Option__Variant_ NOT LIKE 'UNIT_B0000'

);

SET unique_material = (

SELECT ARRAY_AGG(DISTINCT Material) FROM parts

);

EXECUTE IMMEDIATE """

SELECT STRING_AGG(DISTINCT CONCAT("'",Option__Variant_,"'")) FROM parts """

INTO all_part_names;

-- WHILE index <= ARRAY_LENGTH(unique_material) DO

WHILE index <= 9 DO

SET current_value = unique_material[SAFE_OFFSET(index)];

CREATE TEMP TABLE filtered_table AS (

SELECT Option__Variant_, Future_Price, CONCAT(current_value,'B0000') AS base_model, 'B0000' AS option_code, NULL AS list_price_ FROM parts

WHERE Material = current_value);

-- SET part_names = REGEXP_REPLACE(part_names, r'[./-]', '_');

EXECUTE IMMEDIATE FORMAT("""

INSERT INTO `nebbiu-data-hrp.testing.processed`

SELECT * FROM filtered_table

PIVOT (SUM(Future_Price) FOR Option__Variant_ IN (%s)) """

,all_part_names);

DROP TABLE filtered_table;

SET index = index + 1;

END WHILE;

EXECUTE IMMEDIATE """

SELECT STRING_AGG(DISTINCT CONCAT("NULL AS ",Option__Variant_)) FROM parts """

INTO part_names_query;

EXECUTE IMMEDIATE FORMAT ("""

INSERT INTO `nebbiu-data-hrp.testing.processed`

SELECT REGEXP_REPLACE(VarKey,r'.UNIT.','') AS base_model, REGEXP_REPLACE(Option__Variant_,r'UNIT_','') AS option_code, Future_Price AS list_price_, %s FROM variant_models LIMIT 100 """

,part_names_query);

CREATE OR REPLACE TABLE `nebbiu-data-hrp.testing.processed` AS

SELECT *,

COALESCE(list_price_,UNIT_B0000) AS list_price,

FROM `nebbiu-data-hrp.testing.processed`;

ALTER TABLE `nebbiu-data-hrp.testing.processed`

DROP COLUMN list_price_,

DROP COLUMN UNIT_B0000

ms4446

50 Concurrent Interactive Queries

This means that you can have a maximum of 50 queries running at the same time in BigQuery. If you try to run a 51st query while 50 are already running, it will be placed in a queue and will not start until one of the other queries finishes.

If your loop is doing 1 query per iteration and you have 50 iterations, then you will be able to run your loop 50 times before you hit the quota. However, it's worth noting that if a user is running other queries outside of the loop, they could hit the limit even if the loop has fewer than 50 iterations.

Solutions

There are a few ways to get around the 50 concurrent interactive queries quota:

Use a batch job: Batch jobs are queued and executed based on resource availability. They can run for longer periods of time and can handle more queries than interactive queries. To use a batch job, you would need to write a script that contains the queries you want to run. You can then schedule the script to run using the BigQuery scheduler.
Optimize the query: You can try to optimize your query to run faster. This could involve partitioning or clustering tables, using more efficient join algorithms, or using BigQuery's built-in functions.
Use a smaller sample: If you're looking for a rough estimate or preliminary results, you could use a smaller sample of your data. However, for accurate results, you'll need to use the entire dataset.

View solution in original post

ms4446

Yes, you are likely to run into quota limitations if you do not limit the iterations of your loop.

The following are the quotas that may be affected:

INSERT: There's a daily destination table update limit of 1,500 updates per table per day, which includes APPEND and OVERWRITE operations. Each INSERT operation counts as one update.
Querying table: BigQuery allows for a maximum of 50 concurrent interactive queries per user. Ensure you're not running too many queries simultaneously.
Creation and drop of temporary tables: BigQuery automatically manages the life cycle of temporary tables. They are automatically deleted after the query completes. However, there's an overall limit on the amount of temporary space available for query processing.

To avoid hitting these quotas, you can:

Limit the number of iterations of your loop.
Use a smaller sample of the sample_uc table.
You can also check the BigQuery quotas documentation: https://cloud.google.com/bigquery/quotas for more information.

Additional considerations:

The specific quotas that you are affected by may vary depending on your project's settings.
The quotas are enforced per project, so you may need to coordinate with other users of your project if you are close to reaching the quota.
You can monitor your usage of the quotas in the BigQuery console.

ammar_hanif

Hi ms4446,

Many thanks for your quick response.

Refer to your above response, I do have following queries if you could help me in this.

- What is meant by '50 concurrent interactive queries'? Does it means the total number of queries run in a single query window/ session? If yes, so if my loop is doing 1 query per iteration this means I cannot even run my loop more than 50 times?

- under the suggestions for you can:

Unfortunately, I cannot limit the iterations since I'm restricted to run them like the number of time I found unique material (may be I need to change entire query logic for this - PIVOT thing is really causing issue to me).
Furthermore, sample_uc might run fine but I'm required to run it on actually (larger) dataset, thus causing concerns.

ms4446

50 Concurrent Interactive Queries

This means that you can have a maximum of 50 queries running at the same time in BigQuery. If you try to run a 51st query while 50 are already running, it will be placed in a queue and will not start until one of the other queries finishes.

If your loop is doing 1 query per iteration and you have 50 iterations, then you will be able to run your loop 50 times before you hit the quota. However, it's worth noting that if a user is running other queries outside of the loop, they could hit the limit even if the loop has fewer than 50 iterations.

Solutions

There are a few ways to get around the 50 concurrent interactive queries quota:

Use a batch job: Batch jobs are queued and executed based on resource availability. They can run for longer periods of time and can handle more queries than interactive queries. To use a batch job, you would need to write a script that contains the queries you want to run. You can then schedule the script to run using the BigQuery scheduler.
Optimize the query: You can try to optimize your query to run faster. This could involve partitioning or clustering tables, using more efficient join algorithms, or using BigQuery's built-in functions.
Use a smaller sample: If you're looking for a rough estimate or preliminary results, you could use a smaller sample of your data. However, for accurate results, you'll need to use the entire dataset.

zdenek_hanzal-1

hello, the 50 concurrent queries per user, is it documented somewhere in the official BigQuery quotas/limits as I cant find it in the list? Where is this officially stated please?

Thank you

ms4446

Hi @zdenek_hanzal-1

I apologize for the confusion. The information given the previous post is outdated

BigQuery uses a queuing system for both batch and interactive queries, which can be found in the official documentation: https://cloud.google.com/bigquery/quotas:

Key Points:

BigQuery doesn't have the strict concurrency limits by user that I stated. The specific number of queries that can actively run at a time depends on slot (resource) availability,not a hard limit per user.
The correct quotas/limits as documented are:
- Your project can queue up to 1,000 interactive queries.
- Your project can queue up to 20,000 batch queries.

zdenek_hanzal-1

Hello, thank you for taking time for response

I found more info in this thread which seems up to date
https://blog.fourninecloud.com/bigquery-query-queues-query-concurrency-in-your-control-c60411868997

in general with pay as you go mode (on demand) it looks we wont know what is the potential of concurrent queries as it always depends on the available slots at a given moment,
it could be 10, it could be 300 ...the rest goes to queue up to 1000 items in the queue

with reservation i noticed there is parameter where we can try to indicate what we want as concurrency but still need enough slots to make it happen

Regards

Zdenek

ms4446

Your understanding and summary of the information from the blog post about BigQuery's query concurrency seem accurate based on general knowledge of how cloud services like BigQuery operate:

On-demand (Pay as you go):

The amount of concurrency your project can handle depends entirely on the number of slots available at a given time in your on-demand slot pool. This can fluctuate.
You can monitor the actual concurrent query usage and available slots in your project to get a sense of the variations you might experience.
The queue can hold up to 1000 interactive queries, so even if there are no slots immediately available, your queries will eventually run.

Reservations:

With reservations, when you set the "Target Job Concurrency" it gives BigQuery the goalor desired amount of concurrency for your project. BigQuery will allocate slots as needed to try to meet that goal.
However, the key point is that even if you set a target concurrency, you still need to have a large enough slot commitment within your reservation to reach the target. If you don't have enough slots allocated, BigQuery might not be able to meet the target concurrency and queuing will occur.

Key Takeaway: BigQuery's dynamic concurrency and queuing mechanism is designed to allow many queries to run without strict per-user concurrency limits. The queuing system allows BigQuery to manage workloads and avoid overloading the system, even if there are sudden spikes in query demand.

lastreaction122

@ammar_hanif wrote:
Hi,

I'm using bigquery, interactive queries (not scheduled or nor API) to perform task that requires too many loops, hence too many temp table creation, querying table and making INSERT statements into the already created table. The sample_uc table has around 250,000 rows and 6 columns. However, the original uc table would have 600,000 entries and 6 columns.

Although things are running fine for now, since I'm limiting the loop for max 20 iterations. But what would happen if don't limit iterations? When I use the original (600,000 entries) table?

Will this prompt any error due to the quota limitations? for INSERT (around 5600 INSERT instances required), querying table and creation and drop of temporary tables?

I'm using below query for reference

DECLARE unique_material ARRAY<STRING>;
DECLARE index INT64 DEFAULT 0;
DECLARE current_value STRING;
DECLARE part_names_query STRING;
DECLARE all_part_names STRING;

CREATE TEMP TABLE parts AS (
SELECT * FROM `nebbiu-data-hrp.testing.sample_uc`
WHERE Option__Variant_ NOT LIKE 'UNIT%' OR Option__Variant_ LIKE 'UNIT_B0000'
);

-- CREATE TEMP TABLE base_models AS (
-- SELECT * FROM `nebbiu-data-hrp.testing.sample_uc`
-- WHERE Option__Variant_ LIKE 'UNIT_B0000'
-- );

CREATE TEMP TABLE variant_models AS (
SELECT * FROM `nebbiu-data-hrp.testing.sample_uc`
WHERE Option__Variant_ LIKE 'UNIT%' AND Option__Variant_ NOT LIKE 'UNIT_B0000'
);

SET unique_material = (
SELECT ARRAY_AGG(DISTINCT Material) FROM parts
);

EXECUTE IMMEDIATE """
SELECT STRING_AGG(DISTINCT CONCAT("'",Option__Variant_,"'")) FROM parts """
INTO all_part_names;

-- WHILE index <= ARRAY_LENGTH(unique_material) DO
WHILE index <= 9 DO
SET current_value = unique_material[SAFE_OFFSET(index)];

CREATE TEMP TABLE filtered_table AS (
SELECT Option__Variant_, Future_Price, CONCAT(current_value,'B0000') AS base_model, 'B0000' AS option_code, NULL AS list_price_ FROM parts
WHERE Material = current_value);

-- SET part_names = REGEXP_REPLACE(part_names, r'[./-]', '_');

EXECUTE IMMEDIATE FORMAT("""
INSERT INTO `nebbiu-data-hrp.testing.processed`
SELECT * FROM filtered_table
PIVOT (SUM(Future_Price) FOR Option__Variant_ IN (%s)) """
,all_part_names);

DROP TABLE filtered_table;

SET index = index + 1;
END WHILE;

EXECUTE IMMEDIATE """
SELECT STRING_AGG(DISTINCT CONCAT("NULL AS ",Option__Variant_)) FROM parts """
INTO part_names_query;

EXECUTE IMMEDIATE FORMAT ("""
INSERT INTO `nebbiu-data-hrp.testing.processed`
SELECT REGEXP_REPLACE(VarKey,r'.UNIT.','') AS base_model, REGEXP_REPLACE(Option__Variant_,r'UNIT_','') AS option_code, Future_Price AS list_price_, %s FROM variant_models LIMIT 100 """
,part_names_query);

CREATE OR REPLACE TABLE `nebbiu-data-hrp.testing.processed` AS
SELECT *,
COALESCE(list_price_,UNIT_B0000) AS list_price,
FROM `nebbiu-data-hrp.testing.processed`;

ALTER TABLE `nebbiu-data-hrp.testing.processed`
DROP COLUMN list_price_,
DROP COLUMN UNIT_B0000

Running a large number of iterations with temporary tables, insert statements, and queries in BigQuery may lead to various challenges. Some factors to consider are:

Quota Limitations: BigQuery has various quotas and limitations, such as the number of concurrent queries, query processing time, and maximum number of temporary tables. Running too many iterations may hit these limitations and result in errors.
Costs: Executing numerous queries and creating temporary tables can contribute to increased costs, especially if you are dealing with large datasets and performing complex operations.
Performance: As the number of iterations increases, the overall performance of your queries may degrade. You may experience longer execution times and higher resource consumption.
Error Handling: Without proper error handling mechanisms, your script might fail if any individual query or iteration encounters an issue.
Code Complexity: The more iterations you have, the more complex your code becomes. This can make it harder to maintain, debug, and optimize.

To address these concerns:

Consider optimizing your queries to minimize the number of iterations or to batch operations.
Implement error handling to gracefully manage issues during execution.
Monitor your resource usage and costs regularly.

Additionally, when dealing with large datasets, it's always advisable to test your scripts on a smaller scale before scaling up to avoid unexpected issues.

Remember to review BigQuery documentation and monitor your project's usage to ensure you stay within the limits and quotas imposed by the service.