Simba biquery driver not supporting different project other than billing project

Hi Folks,

We are using Simba BigQuery driver for one of our integration. Based on that we have two projects one which is the billing project and one where actually the dataset resides. I tried various options as per the doc https://storage.googleapis.com/simba-bq-release/jdbc/Simba%20Google%20BigQuery%20JDBC%20Connector%20... and tried to use QueryProperties and within that dataset_project_id=<dataset_proejct> however it didn't work.

Requesting veterans and experts to share suggestions how it can be defined for the driver.
Reference doc : https://developer.lumapps.com/datalake (Section : Connecting to the Data lake)

0 5 343
5 REPLIES 5

The Simba BigQuery JDBC driver has some known limitations when attempting to query datasets in a project other than the billing project. While there is no official solution to this issue, several workarounds:

Modify the Connection String: Some users have reported success by including the project ID of the dataset within the connection string. An example modification might look like this:

 

However, this approach may not work for everyone and should be tested thoroughly.

Driver Version Adjustment: There have been instances where users resolved issues by either updating to a newer version of the Simba driver or reverting to an older one. It's crucial to review the release notes for each version of the driver to understand the impact of any changes.

Utilize Alternative Client Libraries: For those who can work outside of JDBC, client libraries such as the Python client library for BigQuery offer more flexibility and do not share the same limitations. Here's a brief example of how to use it:

Driver Version Adjustment: There have been instances where users resolved issues by either updating to a newer version of the Simba driver or reverting to an older one. It's crucial to review the release notes for each version of the driver to understand the impact of any changes.

Utilize Alternative Client Libraries: For those who can work outside of JDBC, client libraries such as the Python client library for BigQuery offer more flexibility and do not share the same limitations. Here's a brief example of how to use it:

 

from google.cloud import bigquery

client = bigquery.Client(project='desired_project_id')

query = "SELECT * FROM dataset.table"

results = client.query(query).result()

for row in results:
    print(row)

Explore Third-Party Connectors: Connectors provided by third-party tools may offer alternative methods to connect to BigQuery datasets in different projects. Be aware that these connectors also come with their own set of limitations and should be evaluated for compatibility with your specific requirements.

It's important to note that these workarounds are not officially supported solutions and may not be suitable for all scenarios. They should be tested in a non-production environment to avoid any unintended consequences. 

Lastly, if you decide to implement any of these workarounds, ensure that you have a comprehensive backup and recovery strategy in place, and that you understand the potential implications for data integrity and billing.

The billing project can be set in Simba JDBC using the ProjectId property. You can specify additional data projects using AdditionalProjects. 

Thanks @dmedora , however this doesn't seem to be working for some reason. I have below parameters here. Can you please provide any further suggestions which i can try

  • The billing Project :  It is the project on which the compute power is billed by google.
  • The project hosting the data : It corresponds to your production environment where your data is located. 
  • The dataset hosting the data : external.

As per the suggestion tried to define the parameters in below options. Is there anything else that can be tried?
DefaultDataset : External
ProjectId : <Billing Project>
AdditionalProjects : <Dataset Project>

Also, i tried to define AdditionalProjects under QueryProperties : dataset_project_id=<Dataset Project>

 

Hi @akashburnwal , could you share more on what you mean by it's not working? Are there errors? What behaviour do you see?

@dmedora worked perfectly for me, thanks for sharing the AdditionalProjects parameter.
I set the ProjectId to the billing project id, and AdditionalProjects helped me get to the project holding the data sets I needed.