BigQuery Table Sampling: LookML Syntax Explanation

  • 4 August 2021
  • 0 replies
  • 35 views

Userlevel 2
  • Looker Staff
  • 12 replies

Table sampling is a feature that enables users to query random subsets of data from a large BigQuery tables, which may be useful in a data science / machine learning workflow (you can read about BigQuery Machine Learning (BQML) in Looker here). 

 

As of August 2021, it seems that TABLESAMPLE SYSTEM does not support aliasing table. The following query:
 

SELECT * FROM dataset.my_table TABLESAMPLE SYSTEM (10 PERCENT) AS mytable

will result in this error: 

​​​​​​​notFound: Not found: Table dataset.my_table TABLESAMPLE SYSTEM (10 PERCENT) was not found in location US


With that being said, we would need to make a derived table to get around with the alias issue. Inside the view for the derived table, I have added a parameter to help with adjusting the value for selection condition: For example, if we put in 10 in the percent filter in the explore front end, the generated SQL would be `TABLESAMPLE SYSTEM (10 PERCENT)`, and BigQuery will return 10% of the data.

 

explore: tablesamplet_system {}

view: tablesamplet_system {

derived_table: {
sql: SELECT * FROM bigquery-public-data.google_analytics_sample.ga_sessions_20170801 TABLESAMPLE SYSTEM ({%parameter percent %} PERCENT) ;;
}

parameter: percent {
type: number
}

dimension: visitId {
type: number
primary_key: yes
}

}

 

Final result: 

 


Please share if you have cool use cases on table sampling!​​​​​​​


0 replies

Be the first to reply!

Reply