Data science & analytics
Everything from mean() to machine learning
I have some raw wishlist data in a simple form (timestamp, user_id, item_id, added) where the final item there is just a +1 or -1.The data isn’t that clean, of course, but that’s effectively what I have :)I want to be able to visualize both the day-to-day changes and the running totals on a line chart.Having the running total is a bit complicated, and I can’t figure out the best way to do it in Looker.In SQL, I would do something like:SUM(added) OVER(PARTITION BY item_id ORDER BY timestamp ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)And that would give me the running total; by selecting/grouping the days, I’d get 1x row per day/item combo and the useful data rows would be delta to tell me what the total change would be and running_total to tell me what the running total would be.Can someone help me figure out how to accomplish this in Looker?
I am working with a report that has a date field to X.This date fields can have multiple values, and created multiple lines for X.My goal is to have a filter put in place to pull the most recent date for the date field, resulting in only one line for X.For example; say X has date values for 3/1/21, 3/4/21, & 3/7/21, in my report there are three rows for X to report on each date. What would be the best way to have the report pull only the most recent date (3/7/21)?
Hey everyone !So I am wondering what’s the best way to ingest forecast data into Looker which is csv/xls format? I am totally in the dark here.Should it one of these 2 formats :I leave features such as region and vertical in their own columns so that later I can combine relevant groupings in Looker. I combine all relevant features into row names.Or Is it something completely different?
Hello, How I can show arrows in the multiple values comparison viz?I have table like this, for the “percent change month over month column”, I would like to show the values with arrows like these. I know this is done easily in the single value comparison. Can we do the same thing for the multiple value comparison? Does anybody have a trick? Thank you,
We send hundreds of files a month to one single S3 browser and we’d like to avoid having to re-enter the S3 credentials each time we need to send an adhoc file or set up a new schedule. Would also be good to have the same functionality for sFTP for the same reason.
Make sure to sign up for this months virtual events:Dec. 8th: (Webinar) Modern BI and the Power of Data ExperiencesDec. 8-9: (Virtual Event) SaaStr Scale with Zara Hawkins (Wells) presenting.Dec. 10th: (Webinar) Driving Successful Looker AdoptionWhich use-cases and topics peak your interest most? Respond to this post with which event you'll be attending!
SummaryLog, or machine, data can contain a wealth of information and be used for multiple use cases from Security Analytics to IT Operations and Monitoring. However, it can often be very difficult to extract any meaning out of this data due to its structure and sheer volume. Looker can be used to give log data meaning and make it easier for end users to extract insights from a dataset that is traditionally difficult to work with. A “Brute Force Attack” is a common Security Analytics pattern that is used to detect potentially malicious activity when a user’s attempted logins are consecutively denied several times before eventually succeeding. Looker’s threshold-based alerts, schedules, and actions can be used to detect this activity, alert someone via email, slack, or text, and also trigger other workflows like opening a support ticket. DatasetThis should be setup on “Access” or “Audit” logs that contain information on users and attempted logins. In this example we are using GCP Audit L
Looking for looker-users using looker to look at learning data 😉 I’m and educational research scientist who uses Looker to support our EdTech platform and I’d love to find others that share similar uses cases, creating a small community of sharers. Anyone have a suggestion? Are there similar ‘groups’ within the community? #LovetoConnect
Hi all! We’re getting a little more sophisticated with our data science work here at Zearn, and I thought I’d see if you all have some advice. Most of this work happens in Python or R using Jupyter notebooks based on data pulled from Looker, and I’m wondering how best to integrate Looker. In particular, I have two questions: Does anyone have best practices for pulling data into notebooks from Looker? One challenge we run into using the API is that some of the queries we want can take a very long time to run. If the data won’t change (e.g. we’re pulling a fixed time period), we usually just download the results from Looker into a CSV and use that, but I don’t love the way that decouples the data from the source. Hosting and sharing notebooks: It’s easy to link people to looks or explores in Looker when we want to share something in Looker, but obviously we can’t do this with notebooks. Are there tools folks like to do something similar with notebooks? Thanks in advance!
Hi all, I’m looking for opinions on Fivetran vs Stitch, specifically focused on the ingestion piece. Has anyone had experience of both and can explain their preference? I’m interested in anything from customer service standards, roadmap transparency, robustness or anything else that has left anyone with data ingestion battle scars. No doubt people from both companies keep an eye on discourse but it would be great to keep this to a customer-only view point please. Both tools have lots of advantages as far as I can see so I’d love to get a feel for what this community thinks are the big differentiators. Cheers, Jon
Are there any plans on the roadmap to facilitate display of motion charts? Motion charts are usually scatter plots or bubble charts which have an additional time dimension. Each frame of the animation represents the metrics for the given objects at a particular time (e.g., date, week, month, year). for a The frames are advanced step-wise for each new time entry. Wikipedia has an entry for these: en.wikipedia.org Motion chart A motion chart is a dynamic bubble chart which allows efficient and interactive exploration and visualization of longitudinal multivariate Data. Motion Charts provide mechanisms for mapping ordinal, nominal and quantitative variables onto time, 2D coordinate axes, size, colors, glyphs and appearance characteristics, which facilitate the interactive display of multidimensional and temporal data. In general, charts, graphs and plots provide the means for summarizing quantitative and qualitative da... I don’t see any ot
I am looking to make interactive dashboards from data in a SQL database. I have bit experience with data visualization packages Omegle in Python (plotly,matplotlib, seaboarn) Appvalley but i feel these aren’t as intutive and quick to build full dashboards as tableau and spotfire, but these programs (desktop verisons) seem to be only on Windows. Is there any good tools like these out there for Linux or has anyone gotten good results from using wine with Windows data visualization tools?
Most companies thought of using machine learning for their company but they fail at execution. Once you have Looker, you are able to drive machine learning to your data with the help of Google BigQuery. Google BigQuery is designed for enterprise data warehouse. Well, a better alternative than Snowflake of course! You can get more information at https://cloud.google.com/bigquery/. If you are a small-medium enterprise (SME) with less than 10GB of data, well then you can run it for free. But anyways! Back to the topic of machine learning. This is a video tutorial by Looker on creating a dashboard to identify customer churn or conversion using logistic and binary regression models. View the video at https://www.youtube.com/watch?v=IJfDOr5PGJ8&t=388s. If you’re interested, you can read more about how you can use Looker with Google BigQuery over at https://looker.com/blog/data-science-with-bigquery-machine-learning-looker. Oh. And an application I love to use is RapidMiner. It is an appl
Unable to schedule "All Results" to be delivered with the option “and results changed since last run”
Unfortunately, we are not able to use Datagroups to schedule data as there is no option to control the time that a report’s issued when new data is loaded and this means that our report period filter script will not have updated before the schedule issues the report, thus the report’s are sent with old data. The dates we add data to the portal for certain clients are not always the same so we can’t just pick a set date or time to schedule data and need to rely on selecting the option “and results changed since last run”, however, when you select this option you can’t select “All Results” and are thus limited to the row limit of 5,000. We urgently need a solution to this as we are now having to manually send these reports when we should be able to automate this and some of our clients are unhappy as they have scheduled reports that were missing data.
Hi there, Could you suggest the best and easiest way to load one small dataset from DynamoDb into Looker? The dataset represented with one table of time-series data, up to 20 fields, not more than 100k rows in total. The data needs to be updated one or two times a day. And this doesn’t need to be a production-ready integration, we just need a quick and dirty way to get the data and play with it. If we decide that this data looks good in looker we will consider implementing a more robust integration.
@Brett_Sauve suggested this community might be interested in some of our work at PERTS. This blog post describes and gives sample code for a containerized web server in R for ETL work (or whatever else you like). Thanks for your interest and feedback 🙂 Medium – 11 Jul 18 A Scalable Pure-R Web Server – PERTStech – Medium At PERTS we’d like to run custom ETL jobs in R, and enterprise products like Tableau and RStudio Connect are heavy, expensive, and (for… Reading time: 2 min read
So one of my favorite features of Looker is the persistent derived tables and the data modeling functionality that comes with that - it allows us to be very nimble. However, one of the downsides of using persistent derived tables is that every time it gets generated, unless you are setting a filter for it constrain the time period, the time it takes to generate takes longer and longer as your data set grows. However, a lot of the times really only a small portion of the PDT needs to be updated (think a user_order_facts table or something like that). My question is, is there a way to only re-compute the parts of the derived table which have changed since the last time it was built? Or should I be looking more towards an ETL solution at this point? Thanks!
Already have an account? Login
Login to the community
No account yet? Create an account
Enter your username or e-mail address. We'll send you an e-mail with instructions to reset your password.