Aggregate based on a column value (SQL subselect)

  • 18 October 2016
  • 3 replies

We have a table which pulls in rows of data each time our continuous integration server runs tests. It imports ~500 rows for each CI run, one for each source code file, and for each one shows the test coverage metrics ‘# lines tested’, and ‘# lines in file’, and file category (things like ‘model’, ‘controller’)

Schema is like:

  • timestamp

  • file_name

  • num_tested_lines

  • num_relevant_lines

  • file_category

What I’m trying to do is show a snapshot of ‘% coverage by category’

However, I can’t figure out how to structure things so I’m just looking at the “most recent version” of each file.

In plain SQL, to figure out coverage for controllers, I would write:

SELECT 100.0*SUM(num_tested_lines) / SUM(num_relevant_lines)
FROM code_coverage_reports
WHERE file_category = 'controller'
AND timestamp = (SELECT max(timestamp) from code_coverage_reports)

It’s the timestamp filtering that I can’t figure out in Looker. What I’m asking semantically is “when doing the aggregation, only consider the last set of matching rows”. I feel like there’s a WINDOW trick for this, but I haven’t been able to figure it out.

Any pointers?

3 replies

Userlevel 5

Hi @wkoffel, one thing you might try is putting that subselect inside a yesno dimension. Something like this example:

Then we can query it in an explore like so:

With the yesno dimension made, you could apply it as a filter to the measures the same way you were applying the file_category = 'controller' logic.

Thanks @sam, I ended up going with your suggestion, I hadn’t considered actually just putting the sub-select in there, but makes sense.

Userlevel 5

Nice, glad to hear it worked for you @wkoffel!