Looker UI response time testing with Python and Selenium

  • 24 December 2021
  • 2 replies
  • 64 views

Userlevel 2

We self-host Looker (“on-prem”) and have encountered several issues where Looker’s performance degraded into 50s response times per page load. We eventually identified the problem as being the bandwidth between the databases and our Looker instance. It was 2gbps and we needed around 15gbps.

Given we white-label Looker, our our company has decided to use the load time of our primary Explore as a proxy for our customer’s UX. In such, I’ve written some Python to load the Explore on repeat, collecting load times and saving them to a database.

This code does the following:

  1. Opens your Looker instance’s login page.
  2. Logs in.
  3. Enters OTP (2FA) code
  4. Starts a timer, loads your main Explore, waits until the “Run” button is ready, stops the timer.
  5. Saves the load times to a MySQL / MariaDB DB.
  6. Shuts down the browser session.

Python script

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Fri Nov 20 09:48:54 2020

@author: C. Thomas Brittain

"""
####################
# Libraries
####################
import os

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException

import pyotp # Handles 2FA
from sqlalchemy import create_engine # Stores results
from datetime import datetime
import pandas as pd
import json
from time import sleep

####################
# Parameters
####################

# Instance information.
CHROME_USER_NAME = "looker_performance"
LOOKER_URL = "YOUR_LOOKER_INSTANCE"
EMAIL = "YOUR@EMAIL.com"

# Paths
HOME = os.environ["HOME"]
CHROME_DRIVER_PATH = HOME + "/looker-ui-tests/chromedriver"

# Session specific.
TIMEOUT_SECS = 15 * 60
SLEEP_BETWEEN = 0
RENEW_SESSION_THRESH = 500

# Database info.
OUTPUT_TABLE_NAME = "looker_load_times"
HOST = os.environ["LOOKER_PERF_TEST_DB_HOSTNAME"]
DB_NAME = os.environ["LOOKER_PERF_TEST_DB_NAME"]
USER_NAME = os.environ["LOOKER_PERF_TEST_DB_USER_NAME"]
DB_PASS = os.environ["LOOKER_PERF_TEST_DB_PASSWORD"]

###################
# Login Info
###################
PASS = ""
OTP_KEY = ""
with open(f"{HOME}/.credentials/config.json", "r") as f:
tmp = json.load(f)
PASS = tmp["looker_ui_password"]
OTP_KEY = tmp["looker_otp_key"]

####################
# DB Connection
####################

MARIADB_CONNECTION = f"mysql+pymysql://{USER_NAME}:{DB_PASS}@{HOST}:3306/{DB_NAME}"
engine = create_engine(MARIADB_CONNECTION, echo=False)

####################
# Setup
####################

# Path
PROFILE_PATH = f"{HOME}/.config/google-chrome/{CHROME_USER_NAME}/"

# Major paths.
LOGIN_PATH = LOOKER_URL + "/login"
CONNECTION = "MODEL_NAME"
EXPLORE_NAME = "EXPLORE_NAME"
EXPLORE_TO_TEST = LOOKER_URL + f"/explore/{MODEL_NAME}/{EXPLORE_NAME}"

# Add Chrome Options.
options = webdriver.ChromeOptions()
options.add_argument("--ignore-certificate-errors")
options.add_argument("—-incognito")
options.add_argument("--headless")

###################
# Actions
###################


def check_for_login_page(browser: webdriver.Chrome):
wait_for_login_screen = 15
try:
WebDriverWait(browser, wait_for_login_screen).until(
EC.visibility_of_element_located(
(
By.XPATH,
"//form[@class='form-horizontal lk-simple-form ng-pristine ng-valid ng-scope ng-valid-email']",
)
)
)
return True
except TimeoutException:
return False


def get_2fa():
return pyotp.TOTP(OTP_KEY).now()


def enter_2fa(browser: webdriver.Chrome):
try:
# Enter OTP
otp_input_box = browser.find_element_by_name("totp")
otp_input_box.send_keys(get_2fa())

# Click trusted box.
browser.find_element_by_name("remember").click()

# Submit.
browser.find_element_by_xpath(
'//*[@id="lk-layout-simple-wrapper"]/form/div[2]/div/input'
).click()

return True
except:
return False


def looker_login(browser: webdriver.Chrome):

# Try to get Main Explore
browser.get(LOGIN_PATH)

try:
# Enter username
email_input_box = browser.find_element_by_id("login-email")
email_input_box.send_keys(EMAIL)

# Enter password
password_input_box = browser.find_element_by_id("login-password")
password_input_box.send_keys(PASS)

# Click remember box.
browser.find_element_by_name("remember").click()

# Submit
browser.find_element_by_id("login-submit").click()

return True
except:
return False


def load_main_explore(browser):

# If the test times out, ensure
seconds_to_fully_loaded = TIMEOUT_SECS

try:
start_time = datetime.now()
browser.get(EXPLORE_TO_TEST)

WebDriverWait(browser, TIMEOUT_SECS).until(
EC.visibility_of_element_located(
(By.XPATH, "//button[@class='btn ng-binding btn-default']")
)
)
except:
pass

fully_loaded_time = datetime.now()
return {
"start_time": start_time,
"end_time": fully_loaded_time,
"type": "load_time",
}

def initialize_browser() -> webdriver.Chrome:
browser = webdriver.Chrome(
executable_path=CHROME_DRIVER_PATH, options=options
)
try:
wait_for_login_screen = 15 * 60

if not looker_login(browser):
quit("Login failed")

if not enter_2fa(browser):
quit("Failed OTP.")

except Exception as e:
print(e)

return browser


##############################
# Main Explore Load Test
##############################

test_counter = 0

# Create the browser
browser = initialize_browser()
try:
while True:
####################
# Login
####################
if test_counter > RENEW_SESSION_THRESH:
# Refresh
browser.close()
browser = initialize_browser()

####################
# Test
####################
df = pd.DataFrame.from_records([load_main_explore(browser)])

####################
# Save
####################
df.to_sql(OUTPUT_TABLE_NAME, con=engine, if_exists="append", index_label="id")
sleep(SLEEP_BETWEEN)

test_counter += 1
except Exception as e:
print(e)
# Cleanup
browser.close()

A few setup notes.

This script is daemonized on a Red Hat (REHL) distribution box and is meant to run headlessly (no UI). The service contains most of the sensitive variables, such as DB password, etc. The script above should be stored at `/home/linux_user/looker-ui-tests/looker_ui_tests.py`. Or you will need to update the service file’s `ExecStart` to direct a different location.

Here is the service file:

[Unit]
Description=Looker UI Performance Test
After=network.target

[Service]
Environment="LOOKER_PERF_TEST_DB_HOSTNAME=DB_IP_HERE"
Environment="LOOKER_PERF_TEST_DB_USER_NAME=DB_USERNAME"
Environment="LOOKER_PERF_TEST_DB_PASSWORD=DB_PASSWORD"
Environment="LOOKER_PERF_TEST_DB_NAME=DB_NAME"
Type=simple
User=linux_user
ExecStart=/bin/python3 /home/linux_user/looker-ui-tests/looker_ui_tests.py
Restart=on-abort

[Install]
WantedBy=multi-user.target

It can be installed running

sudo nano /etc/systemd/system/looker_ui_tests.service
[paste service text from above, replacing passwords etc]
sudo systemctl start looker_ui_tests.service
sudo systemctl enable looker_ui_tests.service

You will also need to create a test Looker user. I would highly recommend this user’s credentials be limited, for security reasons.

The Python script depends on Python3 (sorry 2.7 diehards) and the following packages.

pip3 install selenium
pip3 install pyotp
pip3 install sqlalchemy
pip3 install pandas

The script itself will load the Looker user’s credentials from a JSON file at `/home/linux_user/.credentials/config.json`. It must contain two entries.

{
"looker_ui_password": "LOOKER_USER_PASSWORD",
"looker_otp_key": "LOOKER_OTP_KEY"
}

These would probably be better stored in the service file as environment variables. Or if your infrastructure has a more secure cloud solution, like AWS Secrets Manager, that’d be best.

To get the user’s OTP, I reset the test user’s 2FA and grabbed a picture of the QR Code when I logged in. I then used the following site:

To get the OTP key.

Also, selenium is reliant on a chromedriver. You will need to download it to your node running the script and change the constant in the script to point to the directory.

CHROME_DRIVER_PATH = HOME + "/looker-ui-tests/chromedriver"

Lastly, you will need a database for the result times to be stored. Here’s a MySQL snippet which is compatible.

CREATE DATABASE looker_performance;

CREATE TABLE looker_load_times (
id INT NOT NULL AUTO_INCREMENT KEY,
start_time TIMESTAMP NOT NULL DEFAULT '0000-00-00 00:00:00',
end_time TIMESTAMP NOT NULL DEFAULT '0000-00-00 00:00:00',
type TEXT
);

CREATE USER 'looker_performance'@localhost IDENTIFIED BY 'PASSWORD_HERE';
GRANT ALL PRIVILEGES ON looker_performance.* TO 'looker_performance'@'localhost';

FLUSH PRIVILEGES;

I’d provide the LookML for the model, but, well, that part I bet people here have covered. ;)

This code was written as MVP, it will have issues. Feel free to provide suggestions and I’ll update the sample as needed.


2 replies

Considering you’re a data analyst, it makes sense why you’d go with the selenium solution, rather than using puppeteer. (python, obviously :p)

Buuuuuuut I’d encourage you to look into using puppeteer for this use case. While Google maintains them both, puppeteer is newer, easily records a verbose execution timeline of each run (which probably would have saved you guys some time on the whole Bandwidth-Issue-Masquerading-as-a-CPU-Bottleneck fiasco), and gives closer to real world and real user results time-wise. Selenium works very well for bug squashing across different possible device/UA pairs; Puppeteer works EXCEPTIONALLY well on apps like looker and can give better fine grained data on how long each function of the process, on frontend and backend, took to execute, as if it were a real instance of Chromium on desktop running at whatever you set the viewport for (which matters for looker)

Edit: arg, i’d really like to edit this for clarity (and rambling), but I gotta get back to work. I hope the point is legible enough!

Userlevel 2

Hey @topato!

Actually, I’m a data engineer. I prefer TypeScript for solving such problems. Unfortunately, my fault here was ignorance. The Puppeter framework looks much more robust and easier to use than Selenium. Thank you greatly for sharing.

Cheers!

 

Reply