Best Practice: GCP Project Design for Looker Workloads

  • 17 October 2022
  • 0 replies
  • 98 views

These best practices reflect recommendations shared by a cross-functional team of seasoned Googlers. The practices are written to work for many GCP deployments, but as always use best judgment when implementing.

 

Problem

We are often asked for best practices for how to set-up projects in GCP to support scaling out different workloads running on BigQuery.  A common practice for new Data Analytics customers is to store enterprise data as well as data ingestion & analytic workloads in the same project. This practice has a few unintended consequences including: creating a bottleneck in terms of resources (as BigQuery Slots and Quotas are allocated at the project level) as well as making it more difficult to allocate costs to specific departments or applications. So, we’re here to help with some high-level recommendations for project set-up!

 

Create Separate Projects for Data and Consumption

We recommend creating a project that will be primarily used for storage. Often this project will hold the end result of a complex ETL/ELT process that ingests data from numerous sources that need to be consumed by multiple business units and applications. Having all of your “Enterprise Data Warehouse” (EDW) data in one project helps simplify applying permissions to your enterprise data assets. From there, we recommend creating projects for specific business units or workloads that will query data from the data project. Creating separate projects for data consumers is beneficial as each BigQuery on-demand project has access to their own separate pool of 2000 BigQuery slots, fresh project based quotas and project based billing. In the case of Looker, it can look like the image below:

 

Y3-5yUXd1q5eeNO3TSfUkWnxBGLFCso3EHdmR_BlnnDBQCWOui1htaBADcXcNvcwMVJCqJ0PaP_uueELmWFsKFFDPMjvCDz5tXBPTsPzZjL0v_UM7M4dd8BP4WmvCjZfc3iDG48nBsgly20u8ssqIb3tOQM7iwgCAj4lS55rFmTC1pG8moB2skdzAQM


The Set-up

In this case, our “Data” projects store the data and the “Consumption” projects will be used to issue queries. Again, benefits include slots / resources being assigned at the project level so your Looker instances are not competing against your ML pipelines, for example. So, how do we implement this?

  1. Create a project for data storage. Likely, this is already completed. See documentation for creating and managing projects here.

  2. Create project(s) for Looker workloads. We might need to segment out a different project for dev, test, prod or even for different business units and applications. This allows separation of billing and allocation of slots for Looker workloads, in an on-demand environment.

  3. Create a separate Looker Service Account in each of your “Looker Projects”. These projects will require the following roles:

    1. BigQuery JobUser to issue queries.

    2. BigQuery Data Editor for the use of persistent derived tables (PDTs).

    3. Note: For more information on BigQuery roles, see documentation here

uHDGnzwm0pPQIsvtOIqnSD6c53X77Ybr5qTvcAEiUw1b_qUpg6IKx9FDhlHeKbGRTofshggI6Gl-KLKNzbLAcLeByLJtTxjdXb88CVCiP4WG16VvEZypi3-emeJNTmCCn6J3qHckP8_PNAkgmVU-koaRPR256p8qDN7dhAZFZUBBkLmtf6mz4HLZeIw

  1. Grant each Looker Service Account DataViewer in “Data Project”. In the “Data Project”, grant the Looker Service Account BigQuery DataViewer to the required datasets or tables. 

    Note: BigQuery DataViewer can be applied at the project, dataset or table level. We always recommend the principle of least privilege when granting data access to resources.

TYQZjHwyctAQouXp3J4O-mKujNGcv6rOzQz3WJULvIBNB6V3xjVAXVTrmphC2pbcE2FbuWc0b1mnuo3i90-_F-WUVfGhb0L6zND7CxlLGI-nGVEvzUijSVFNm4blcO1W_KNYjjnr6juArXbJforpOR7NNjVkXibLVBmmGkIxC5gjJFZPHutKJ-aMWpo

  1. Optional: Create looker_scratch dataset in each “Looker Project”. If using Persistent Derived Tables (PDTs), we recommend placing the scratch schemas in the “Looker projects” as it contains all the compute resources as well application-specific workloads to that project. This has the advantage of simplifying permissions by only allowing Looker write access to its own project.  However, if PDTs are being used by other applications or workloads, a case could be made to right back to the “Data Project”. However, if this is the case, we would generally consider creating views in BigQuery instead of using PDTs. 

  2. Create Connection in Looker. 

    1. Project ID: “Looker Project”

    2. Dataset: “Data Project”

    3. Service Account: Service Account created in “Looker Project”

    4. Temp Dataset: Dataset created in “Looker Project”

0IYK9eTNc569ZlIdeVQl-OQw7ZBkdaJ6OknlB35cPN5bu8sKLSOj4Wt-b7dkk8pJhIwvOrK8wK3io9f1R2zIHusTtnsZ4ollvUdbYxkYKVd3WdeqWLaf3SBaoxgYfIeqIuNrMn8dQRX9OL3Bh87A9mbW52R0Or6MLuuDXEcWBhzu6RwliaNLiFoQL7o

  1. Test and connect! 

0 replies

Be the first to reply!

Reply