Question

Deprecation Notice: Redshift Distribution Style (3.36+)

  • 22 March 2016
  • 3 replies
  • 89 views

Userlevel 3

As of Looker 3.36, when users of Amazon Redshift create a persistent derived table, the Redshift distribution style will default to ALL instead of EVEN.


Description


Amazon Redshift allows database rows to be distributed in one of 3 ways:




  • All Distribution: All rows are fully copied to each node.


  • Even Distribution: Rows are distributed to different nodes in a round-robin fashion.


  • Key Distribution: Rows are distributed to different nodes based on unique values within a particular column.


Before 3.42, Looker defaulted to EVEN distribution, but now defaults to ALL if you do not specify a style using the distribution_style parameter, or the distribution parameter.


Going Back to an EVEN Distribution


If you would like to go back to using an EVEN distribution on a derived table, you can do so by using the distribution_style parameter as follows:


- view: customer_order_facts
derived_table:
sql: |
SELECT
customer_id,
COUNT(*) AS lifetime_orders
FROM
order
persist_for: 24 hours
distribution_style: EVEN

3 replies

sdhoover will there be a way to default to an EVEN distribution for our instance of Looker?


Many of our developers do not follow the Looker releases and don’t update their code according to new best practices - this release will cause the storage requirements on our Redshift instance to increase by a non-trivial factor. A lot of them rely heavily on derived tables, which now will use N times more storage, where N is the number of nodes in the cluster…

Is the assumption that most PDTs are “kinda big” but not very big, so the redshift reshuffle for a other than distribution key join is a more common and painful problem all your pdts change and use too much space?

Why is all the choice? I’m trying to understand why join optimization is the better choice vs space optimization?

Reply