I have log sink where i'm routing jobservice.jobcompleted to Pub/Sub. Idea of this is to check that all the jobs have labels defined. As you can't define labels for jobs that are run from the BQ UI i'm trying to figure out best way to ignore those runs. Only thing i have noticed from the data that could be used is the userAgent information.
Edit:
You can actually define the labels also on the UI side using the system-variable @@query_label eg.:
SET @@query_label = "environment:dev, application:test"; <query>;
But this then creates two jobservice.jobcompleted events where the other one contains the labels but other doesn't so this would trigger missing labels even when they are there and didn't find any sensible info to tie these events together. So i would rather skip everything coming from the UI.
Hi @joonvena1 ,
Here's a possible approach that combines both filtering and processing:
1. Pub/Sub Filtering:
Start by setting up a filter on your Pub/Sub subscription to exclude messages that likely originate from the UI. Here's an example you can adapt:
attributes.type="jobservice.jobcompleted" AND NOT attributes.userAgent CONTAINS "google-cloud-sdk" AND NOT attributes.userAgent CONTAINS "BigQuery Java"
2. Cloud Function:
Use a Cloud Function triggered by the filtered messages to perform more detailed checks:
Here's a Python example for the Cloud Function:
import base64
import json
import re
def process_job_completion(event, context):
message_data = base64.b64decode(event['data']).decode('utf-8')
message = json.loads(message_data)
user_agent = message.get('attributes', {}).get('userAgent', '')
job_id = message.get('attributes', {}).get('jobId', 'Unknown')
ui_patterns = [r'^Mozilla/', r'^Chrome/']
is_ui_job = any(re.match(pattern, user_agent) for pattern in ui_patterns)
labels = message.get('attributes', {}).get('labels', {})
has_labels = bool(labels)
if not has_labels and not is_ui_job:
print(f"Alert: Job {job_id} completed without labels and is not UI-originated.")
elif is_ui_job:
print(f"Info: Job {job_id} is from UI; label check is skipped.")
else:
print(f"OK: Job {job_id} has labels.")
return {
'jobId': job_id,
'userAgent': user_agent,
'isUiJob': is_ui_job,
'hasLabels': has_labels
}
Additional Considerations:
userAgent
strings, thus necessitating updates to your filter criteria.Hi @ms4446 thanks for this. This is something that i already have in place. Was just wondering if there would be some other attribute / way than using the userAgent information to distinct if the query is coming from UI. I currently have the processing done inside Cloud Run but your suggestion 1 makes more sense to just filter it on the Pub/Sub subscription level. Much easier to update this compared to the processing application.
Cant actually use the Pub/Sub filtering because it can only access the message attributes and not the actual data that is in the message. As the event is coming through the logging sink i can't affect the message attributes. Don't want to make another processing step that would insert the userAgent informaton to the message attributes so i'm going to go with the solution 2 and still utilize the Cloud Run.
Hi @joonvena1 ,
Given that Pub/Sub filtering isn’t viable because it only accesses message attributes and not the payload where your userAgent information is located, focusing on refining your existing Cloud Run solution is a sound strategy.
Since you’re interested in exploring alternatives to using the userAgent to distinguish UI-originated queries, here are a couple of additional ideas you might consider:
Log Analysis Enhancements:
Integration with IAM Policies:
Enhanced Cloud Logging Filters:
Using Audit Logs:
Each of these strategies has its own set of challenges and would require some experimentation to refine. However, they could potentially provide you with additional filters or mechanisms to accurately segment the jobs as you intend.
If these alternatives still don’t meet your needs, maintaining and enhancing the Cloud Run process to handle complex conditions and transformations, as you mentioned, is likely your best path forward.
User | Count |
---|---|
1 | |
1 | |
1 | |
1 | |
1 |