Looker Architecture

davidtamaki · ‎05-10-2022

Looker will not be updating this content, nor guarantees that everything is up-to-date.

Starting in Looker 7.12, you can deploy any Git commit SHA, tag, or branch to production with Advanced Deploy Mode. This helps consolidate repositories in multi-environment developer workflows, where each environment points to a different version of a codebase. It also gives greater control to one or a few developers or administrators over the changes that are deployed to production.

In another article we've discussed having a multistage development framework in Looker, and we focused on the pros and cons of having multiple Looker instances. In this article we'll go through the Looker architecture required for customer-hosted Looker deployments and the steps to create new instances.

Looker Architecture

Looker can be customer-hosted or hosted by Looker in an AWS VPC. Using a Looker-hosted instance greatly reduces the effort required to install, configure, and maintain the Looker application as all necessary IT functions related to the Looker application are handled for you. If, however, you have the necessary resources to host Looker on your own on-premises or cloud-based equipment, then the installation steps can be found here.

The architecture design of Looker consists of a Linux server that has the following connections:

Note that hosting the Looker application is independent of where your data resides; data always remains in-database, and is not copied to the Looker instance (1). Further details on data security can be found here. Take note of the minimum requirements of the server specifications on the installation steps, as well as the recommended Java memory to allocate to Looker. In addition, the Looker application requires outbound network access for authorization, backups, email relay, Git, and license checks (5).

By default, Looker uses a HyperSQL in-memory database as the application's internal database. On busy instances, this database can grow to be gigabytes in size, which can lead to performance issues. For these large deployments, customers should replace the HyperSQL database with a MySQL database backend (6).

Furthermore, customers can achieve high-availability and efficient traffic control on Looker by deploying a cluster of Looker nodes behind a load balancer. For a production instance with a goal of 100% uptime, this would be the recommended approach:

Clustered Looker configurations require a MySQL internal database. Please refer to our tutorial for the recommended method of creating a clustered Looker configuration.

Creating New Instances

For customer-hosted Looker deployments, the easiest option to set up additional instances would be to compress and archive the entire Looker directory of their existing production instance (e.g. tar -czvf looker.tar.gz looker/). You'll need to do the following:

Ensure that Looker is the top directory.
Remove any extra JAR files (only the most current JAR file is required).
Include all hidden files/folders in the archive.
Check that .db is included, unless using a MySQL backend. If using a MySQL backend, clone your existing database and follow our instructions for setting up a MySQL database.

If the production instance has a lot of unnecessary metadata not required in other environments, it may be preferable to create a new instance from scratch. In this case, follow our customer-hosted installation steps to create a new instance.

If you choose to create new instances from scratch, we recommend using the Looker API to store JSON configuration files which can be rapidly loaded in each environment. This is particularly useful for managing database connections, user groups, roles, user attributes, folders (called "Spaces" before Looker 6.20), schedules, and other metadata. Please note that each Looker instance will require a distinct license key.

As mentioned in the associated article on multistage development frameworks, managing multiple instances will require a strong development team with knowledge of Git and the Looker API. If you are self-hosting the Looker instances, then further DevOps knowledge will be a requirement. We strongly recommend that you have a code deployment strategy in place before embarking on this path.

Looker Multistage Development Framework (Dev->QA->Prod) for Customer-Hosted Deployments

Looker Architecture

Creating New Instances