Question

Indexes on multiple columns in derived table


Userlevel 2

Hi,


The documentation is a bit unclear about indexes on multiple columns in a derived table. The Derived Tables page says:



add indexes on customer_id and first_order_date, like this:…
indexes: [customer_id, first_order_date]



On the other hand, the indexes page says:



with an index on customer_id and date:…
indexes: [customer_id, date]



Note the difference! According to the Derived Tables page, there are indexes on the two columns, one on each. According to the indexes page, there’s a single index on the two columns. This is a significant difference (in MySQL at least); specifically, a single multicolumn index serves as an index not only on the combination of columns but also on any initial subset of that combination, but not on any other subset. Therefore, if this is a multicolumn index, then the order columns are specified in matters. (Again, this is true for MySQL. I don’t know what other dialects it might be true in.)


So my questions are:



  1. Which documentation page is correct: does the indexes: [foo, bar] syntax create a single multicolumn index or two indexes?

  2. Can the incorrect documentation page be emended?

  3. If indexes: [foo, bar] creates a multicolumn index, then is there a way to create separate indexes on various columns?



  • If there isn’t a way, then can there be, please? (This is an important feature request, because a multicolumn index, as I mentioned, is useful only when using an initial subset of its columns. Again, this is true of MySQL.)



  1. If indexes: [foo, bar] creates separate indexes, then is there a way to create a multicolumn index?



  • If there isn’t a way, then can there be, please?


I propose a fairly mnemonic syntax for specifying a multicolumn index on foo and bar in that order, a multicolumn index on baz and quux in that order, and separate indexes on xyzzy and mumble, as follows: indexes: [foo, bar], [baz, quux], xyzzy, mumble


11 replies

Userlevel 6
Badge
indexes: [foo,bar]

creates two separate indexes on two separate columns.


Your suggestion


indexes: [[foo,baz],bar]

Is a great one. We should look into supporting it @mtoy .

Userlevel 3

I’ve clarified the behavior of indexes in both the Derived Table doc and indexes doc.

I know this is an old thread, but is there a preferred workaround for this? I can add indexes manually, but wont those get killed for derived tables that get rebuilt automatically?

Userlevel 4
Badge

@ifightcrime you are correct that manually created indexes will be overwritten when the derived table is rebuilt. There isn’t a preferred workaround at this time. You could concat the desired columns together and use that as an index, but then the index won’t be used if you filter or join on just one of the fields instead of the index.


We appreciate you posting here though - it gives us an idea of who wants this feature. As Lloyd said, this is a great idea and we’re looking into supporting it.

Thanks for the reply @sam. I suppose one option is to have three indexes, two for individual fields (if even needed), and one for the combined. I’ll mess around with that and see what I can come up with. If that doesn’t work I think we’ll just pull this out into a cron or something on one of our servers.

Userlevel 4
Badge

Sounds reasonable - let us know if you come up with something that works really well for your use case!

+1 for us on including the option to add a composite index for PDTs

Userlevel 1

Thanks, Noah! I’ve let our product team know that this is something you’d like to see!

We are also stuck with a MySQL database for one of our products. Adding a composite index increases the speed of a join by a factor of 1000 for one of our derived tables. So, yes, +1! for the addition of composite indices.

+1 for composite indices on PDT.

I realize this is an old thread but I just saw it today and thought I’d provide a work around to be able to create multi column indexes in looker pdts… 

 

Looker gives us the ability to define pdts using the create_process syntax. (https://docs.looker.com/reference/view-params/create_process)

 

We do this quite often.  The pdt engine has it’s issues at times but they seem to be constantly working to improve things.

 

Initially we created a pdt and then altered the table and created indexes on it but lately I’ve found it much more efficient to create the table and then pump the data for the pdt into it in subsequent steps.  here’s a trivial example: (using MySQL 5.7.x)

 

  derived_table: {
    create_process: {
 

      sql_step:
      -- create base fact transaction data table
        CREATE TABLE ${SQL_TABLE_NAME} (
          `description` varchar(255) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL DEFAULT '',
          `user_id` int(11) NOT NULL DEFAULT '0',
          `company_id` int(11) NOT NULL DEFAULT '0',
          `transaction_id` varchar(50) CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT NULL,
           `price` decimal(14,4) DEFAULT 0,
          `posted_at` datetime DEFAULT NULL,

          PRIMARY KEY (description, transaction_id),

          KEY idx_user_posted (user_id, posted_at)
        ) ENGINE=InnoDB DEFAULT CHARSET=latin1;;

      sql_step:
        INSERT INTO ${SQL_TABLE_NAME}
          SELECT
            tbl1.description, tbl2.user_id, tbl2.company_id,
            tbl2.transaction_id,
            tbl2.price / 100 as price,
            tbl2.posted_at

          FROM my_basic_fact_tbl tbl1
            inner join fact_data_details as tbl2 ON tbl2.company_id = tbl1.company_id and tbl2.transaction_id = tbl1.transaction_id

            ;;   

}
    datagroup_trigger: a_defined_datagroup
  }
 

 

This is an attempt to dumb down some existing code to show the technique.  no attempt at db design etc..  but you should be able to get the idea from this.  Your mileage may vary but we use this a lot these days.  I hope it’s helpful

 

Reply