Typically, the primary timestamp or date field of any given table will be the best candidate for your sortkey. This helps Redshift speed up queries that are sorted or limited by that column.
For tables that join in other tables with a great deal of frequency (such as fact tables in the star schema world), it’s best to set your distribution key to be the foreign key with the greatest cardinality. For tables that join into other tables (such as dimension tables in the star schema world), it is best to set the primary key to be the distribution key.
More about Sort & Distribution Keys
Redshift has a page on how to best choose sort and distribution setups depending on data configuration.
Analyzing Table Design
Redshift has a nice page with a script that you can run to analyze your table design. The script checks if you’ve got sort keys, distribution keys, and column compression dialed in.
As you update tables, it’s good practice to vacuum. This is because newly added rows will reside, at least temporarily, in a separate region on the disk. (You may be able to specify a
SORT ONLY VACUUM in order to save time)
To learn more about optimizing performance in Redshift, check out this blog post by one of our analysts.
Here is a view of that script for analyzing table design in case anyone wants it.
yes, I created a view on this also,
Thanks Scott, I’ll check those out. I have just started building some auditing views, so I don’t have much yet.