Do you want to enable your Looker users to leverage large language models on your text data?
This analytic block walks you through how to leverage the ML.GENERATE_TEXT function in LookML to perform generative natural language tasks on text data stored in BigQuery. Example language tasks include the following:
The full block can be found here, and the following example is built on the CFPB Consumer Complaint Database public dataset.
constant: big_query_model_name {
value: "YOUR_PROJECT_NAME.DATASET_NAME.MODEL_NAME"
}
constant: generate_text_table_name {
value: "bigquery-public-data.cfpb_complaints.complaint_database"
}
constant: generate_text_primary_key {
value: "complaint_id"
}
constant: generate_text_text_field {
value: "consumer_complaint_narrative"
}
The manifest file contains the constants used in the LookML project. Replace the big_query_model_name with the model you have created in step 1.
view: generate_text {
derived_table: {
sql:
SELECT
@{generate_text_primary_key}
, prompt
, ml_generate_text_result
FROM ML.GENERATE_TEXT(
MODEL `@{big_query_model_name}`, (
SELECT
@{generate_text_primary_key}
, CONCAT({% parameter prompt_input %},': """', @{generate_text_text_field}, '"""') AS prompt
FROM @{generate_text_table_name} AS model_query
WHERE TRUE
{% if complaint_database.company_name._is_filtered %} AND {% condition complaint_database.company_name %} model_query.company_name {% endcondition %} {% endif %}
...
),
STRUCT(
{% if max_output_tokens._parameter_value > 1024 or max_output_tokens._parameter_value < 1 %} 50 {% else %} {% parameter max_output_tokens %} {% endif %} AS max_output_tokens
, {% if temperature._parameter_value > 1 or temperature._parameter_value < 0 %} 1.0 {% else %} {% parameter temperature %} {% endif %} AS temperature
, {% if top_k._parameter_value > 40 or top_k._parameter_value < 1 %} 40 {% else %} {% parameter top_k %} {% endif %} AS top_k
, {% if top_p._parameter_value > 1 or top_p._parameter_value < 0 %} 1.0 {% else %} {% parameter top_p %} {% endif %} AS top_p
)
)
;;
}
This derived table shows you the SQL pattern to leverage the ML.GENERATE_TEXT function in LookML and take user input as parameters for the text prompt and function arguments.
The liquid _is_filtered conditions are used to mirror any filters on the base table to ensure that the Vertex AI APIs are only run on the final result set for performance reasons.
There are five parameters defined on this view:
prompt_input - This is where users can ask their question. The text will be prepended to the text field that is being analyzed by the model. Prompt design can strongly affect the responses returned by the model. For more information, see Overview of text prompt design.
max_output_tokens - This sets the maximum number of tokens that the model outputs. Specify a lower value for shorter responses and a higher value for longer responses.
temperature - This is used for sampling during the response generation, which occurs when top_k and top_p are applied. It controls the degree of randomness in token selection. Lower temperature values are good for prompts that require a more deterministic and less open-ended or creative response, while higher temperature values can lead to more diverse or creative results.
top_k - This changes how the model selects tokens for output. Specify a lower value for less random responses and a higher value for more random responses.
top_p - This changes how the model selects tokens for output. Specify a lower value for less random responses and a higher value for more random responses.
Review the arguments section for a full description of each parameter.
For the model results, the ML.GENERATE_TEXT function outputs a column labeled ml_generate_text_result in the following JSON format:
{
"predictions": [
{
"citationMetadata": { "citations": [] },
"content": "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt...",
"safetyAttributes": { "blocked": false, "categories": [], "scores": [] }
}
]
}
The content key contains the text result of the function, and we can extract that value in LookML using the JSON_VALUE function:
dimension: text_result {
type: string
sql: JSON_VALUE(${TABLE}.ml_generate_text_result, '$.predictions[0].content') ;;
html:<div style="white-space:pre">{{value}}</div>;;
}
connection: "YOUR_BQ_CONNECTION_NAME"
include: "./*.view.lkml"
datagroup: consumer_complaint_database_default_datagroup {
max_cache_age: "1 hour"
}
persist_with: consumer_complaint_database_default_datagroup
explore: complaint_database {
always_filter: {
filters: [
generate_text.prompt_input: "",
generate_text.max_output_tokens: "",
generate_text.temperature: "",
generate_text.top_k: "",
generate_text.top_p: ""
]
}
join: generate_text {
type: left_outer
relationship: one_to_one
sql_on: complaint_database.@{generate_text_primary_key} = generate_text.@{generate_text_primary_key} ;;
}
}
This is where we join our derived table to the base table. Replace the connection with the name for your BigQuery connection set up in Looker.
Now that we have the LookML files ready, we can query the dataset directly from the Looker Explore 🎉
Note that there is a quota of 60 requests per minute for the ML.GENERATE_TEXT function (i.e. ~1 minute per 60 rows), so keep result sets small in order to have performant queries.
You can easily adapt this block to your own data by: