Best Practice: Optimize Database Setup for Reading Data

  • 10 May 2022
  • 0 replies
  • 324 views

Userlevel 1

Looker will not be updating this content, nor guarantees that everything is up-to-date. 

 

These best practices reflect recommendations shared by a cross-functional team of seasoned Lookers. These insights come from years of experience working with Looker customers from implementation to long-term success. The practices are written to work for most users and situations, but as always use best judgment when implementing.

Optimizing Column Store Databases


Some recommendations for optimizing column store database setup for running SELECT statements, (reads) include:

  • Utilize column encoding and compression. This saves on space and improves processing speed. If column encoding/compression is not utilized, database resources will be more scarce than they need to be, limiting cluster performance and increasing storage costs.
  • Gather statistics on all tables regularly. Analyzing tables will update column-level metadata.
  • Vacuum tables on a regular basis. This reclaims space and resorts rows in either a specified table or all tables in the current database.
  • Limit the number of columns returned in queries in Looker. Queries will execute faster when operating on a relatively small subset of columns.
  • Maximize the use of filters and aggregation in queries. Issue queries that reduce many rows to relatively few in the result set (i.e., lots of filtering and aggregating). To ensure that users apply filters by default within Explores, include Explore-level filters such as the always_filter, conditionally_filter, or sql_always_where parameters. The sql_always_where parameter applies a filter condition to the underlying SQL on the back end, and is not exposed to users in a front-end filter.
     

Optimizing Row Store Databases

 

Some recommendations for optimizing row store database setup for running SELECT statements (reads) include:

  • Normalize database schemas as much as possible. Transactional databases perform best when redundancy is minimized. Organize data in normalized Snowflake schemas, in which all tables are as small as possible.
  • Restructure data to reduce the use of sparsely populated tables. A sparse data matrix can impact overall performance. Rather than storing data in a single, sparse table, separate out columns into separate, more populated tables wherever possible. For extremely sparse data tables, consider setting up an entity-attribute-value schema.
  • Consider handling some complex calculations within the database in advance, and storing aggregate tables to perform these calculations within the database. However, continue to utilize the raw data whenever possible, as this supports greater drill down capability and Explore flexibility.
  • Ensure that indexes are present on all tables within the database. Indexes are vital, because they enable the database to order columns so it can find different rows on the disk quickly. However, ensure that superfluous indexes aren't added, as indexes do take up space on disk.
  • Place indexes on foreign keys for commonly joined tables and on timestamps for time-ordered tables.
  • Lower the isolation level within the database. Lowering the isolation level decreases replication lag and moderates the database's use of locking conditions.

0 replies

Be the first to reply!

Reply