Datastream Not Working with BigQuery Table Partiti...

zephschafer · 04-01-2024 10:38 AM

Hello!

I am attempting to stream data to a partitioned BigQuery table using Datastream. Based on the information found in this documentation page and this forum post I created a partitioned table in BigQuery that matched the Table ID of the corresponding destination table in Datastream. I was able to observe that data continues to be successfully written to the table by Datastream and the table details indicate that the table is indeed partitioned. However, when I execute queries against the table I find that it does not consistently utilize the partition key. For some queries--which do include the partition key in the WHERE clause--I observe that (a) the bytes billed for the query correspond to a full table scan and (b) the execution graph indicates that some other "CDC_TABLE"s which are otherwise invisible to me are the sources for the query (e.g. "CDC_TABLE_f53e853384a119a0e2cd9c72840fc_my-table-name") . This behavior is intermittent. However, it does significantly increase the cost of our queries. Does anyone know how to resolve this issue?

Datastream Not Working with BigQuery Table Partitioning and Clustering