Got a question?
Ask your big data questions here.
- 123 Topics
- 221 Replies
Most Recent Date
I am working with a report that has a date field to X.This date fields can have multiple values, and created multiple lines for X.My goal is to have a filter put in place to pull the most recent date for the date field, resulting in only one line for X.For example; say X has date values for 3/1/21, 3/4/21, & 3/7/21, in my report there are three rows for X to report on each date. What would be the best way to have the report pull only the most recent date (3/7/21)?
Multiple values comparison with arrow
Hello, How I can show arrows in the multiple values comparison viz?I have table like this, for the “percent change month over month column”, I would like to show the values with arrows like these. I know this is done easily in the single value comparison. Can we do the same thing for the multiple value comparison? Does anybody have a trick? Thank you,
How can I replicate a Sum with an Over clause?
I have some raw wishlist data in a simple form (timestamp, user_id, item_id, added) where the final item there is just a +1 or -1.The data isn’t that clean, of course, but that’s effectively what I have :)I want to be able to visualize both the day-to-day changes and the running totals on a line chart.Having the running total is a bit complicated, and I can’t figure out the best way to do it in Looker.In SQL, I would do something like:SUM(added) OVER(PARTITION BY item_id ORDER BY timestamp ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)And that would give me the running total; by selecting/grouping the days, I’d get 1x row per day/item combo and the useful data rows would be delta to tell me what the total change would be and running_total to tell me what the running total would be.Can someone help me figure out how to accomplish this in Looker?
Jupyter Notebooks and Looker
Hi all! We’re getting a little more sophisticated with our data science work here at Zearn, and I thought I’d see if you all have some advice. Most of this work happens in Python or R using Jupyter notebooks based on data pulled from Looker, and I’m wondering how best to integrate Looker. In particular, I have two questions: Does anyone have best practices for pulling data into notebooks from Looker? One challenge we run into using the API is that some of the queries we want can take a very long time to run. If the data won’t change (e.g. we’re pulling a fixed time period), we usually just download the results from Looker into a CSV and use that, but I don’t love the way that decouples the data from the source. Hosting and sharing notebooks: It’s easy to link people to looks or explores in Looker when we want to share something in Looker, but obviously we can’t do this with notebooks. Are there tools folks like to do something similar with notebooks? Thanks in advance!
Segment v's Snowplow
ETL Tool Recommendations
I am thinking about getting a commercial ETL tool. We have a bunch of data sources (Facebook, Salesforce, NetSuite etc.). They are important and right now we have jobs scheduled in all kinds of places that are owned by multiple employees. In an effort to operationalize this, I am looking at some ETL tools. Do you guys have any recommendations? Does anyone know of “the Looker of ETL tools?” Thanks, Lucas
I’ve begun exploration of creating a transformed database (data warehouse) for use with looker. Right now we just have Looker hooked up to a replica of our production DB, which has worked great in the short-term, but is showing some limitations. In order to not go into this too blindly I’ve been doing some reading on data warehouse best practices, including ‘The Data Warehouse Toolkit’, which from what I can gather is one of the must read books in this space. In this book they are quite adamant about the creation of a ‘Date’ table, instead of using date fields directly in your fact tables. Benefits include being able to assign dimensions to dates such as ‘holiday’, ‘quarter’, ‘weekend’, etc. that aren’t readily accessible in SQL. Of course, Looker does have some helpful date/time tools allowing easy grouping by month/week etc. So, as I begin down this road I wonder how important these classical warehouse approaches are when using a tool like looker, as it seems much of what looker does
Are there any plans on the roadmap to facilitate display of motion charts? Motion charts are usually scatter plots or bubble charts which have an additional time dimension. Each frame of the animation represents the metrics for the given objects at a particular time (e.g., date, week, month, year). for a The frames are advanced step-wise for each new time entry. Wikipedia has an entry for these: en.wikipedia.org Motion chart A motion chart is a dynamic bubble chart which allows efficient and interactive exploration and visualization of longitudinal multivariate Data. Motion Charts provide mechanisms for mapping ordinal, nominal and quantitative variables onto time, 2D coordinate axes, size, colors, glyphs and appearance characteristics, which facilitate the interactive display of multidimensional and temporal data. In general, charts, graphs and plots provide the means for summarizing quantitative and qualitative da... I don’t see any ot
Designing a Data Warehouse - what would a BI solution recommend?
At Looker we are often asked about best practices when it comes to designing a new data warehouse. Typically this happens just as when companies are moving to MPP, and maybe even, column-oriented, databases, so it is clear from the start that replicating the design from an operational database is not appropriate. It is also clear that there is not a single answer - search on Google turns up millions of results. That said, here are a few rules of thumb that you can apply as you focus on building your analytical data warehouse to work with Looker: simplicity (aka shortest path) performance single copy of data transparent EL process Shortest Path You should not need to use mapping tables or Entity–attribute–value tables to get to the value. The path to any two dimensions in one SELECT query should not involve more than a couple of joins. Typically “long path” designs arise from storing original data in NoSQL format. Because there is very little analytical value derived from performing s
Fivetran vs Stitch for data ingestion
Hi all, I’m looking for opinions on Fivetran vs Stitch, specifically focused on the ingestion piece. Has anyone had experience of both and can explain their preference? I’m interested in anything from customer service standards, roadmap transparency, robustness or anything else that has left anyone with data ingestion battle scars. No doubt people from both companies keep an eye on discourse but it would be great to keep this to a customer-only view point please. Both tools have lots of advantages as far as I can see so I’d love to get a feel for what this community thinks are the big differentiators. Cheers, Jon
Hi there, Could you suggest the best and easiest way to load one small dataset from DynamoDb into Looker? The dataset represented with one table of time-series data, up to 20 fields, not more than 100k rows in total. The data needs to be updated one or two times a day. And this doesn’t need to be a production-ready integration, we just need a quick and dirty way to get the data and play with it. If we decide that this data looks good in looker we will consider implementing a more robust integration.
So one of my favorite features of Looker is the persistent derived tables and the data modeling functionality that comes with that - it allows us to be very nimble. However, one of the downsides of using persistent derived tables is that every time it gets generated, unless you are setting a filter for it constrain the time period, the time it takes to generate takes longer and longer as your data set grows. However, a lot of the times really only a small portion of the PDT needs to be updated (think a user_order_facts table or something like that). My question is, is there a way to only re-compute the parts of the derived table which have changed since the last time it was built? Or should I be looking more towards an ETL solution at this point? Thanks!
Unable to schedule "All Results" to be delivered with the option “and results changed since last run”
Unfortunately, we are not able to use Datagroups to schedule data as there is no option to control the time that a report’s issued when new data is loaded and this means that our report period filter script will not have updated before the schedule issues the report, thus the report’s are sent with old data. The dates we add data to the portal for certain clients are not always the same so we can’t just pick a set date or time to schedule data and need to rely on selecting the option “and results changed since last run”, however, when you select this option you can’t select “All Results” and are thus limited to the row limit of 5,000. We urgently need a solution to this as we are now having to manually send these reports when we should be able to automate this and some of our clients are unhappy as they have scheduled reports that were missing data.
Formula to multiply values in 2 columns
Hello, I’m trying to add a calculation to multiply values in 2 columns but the calculation is showing this error. I’m using the “Table Calculation” field under the “Custom Fields” section. I’m trying to add a 3rd column that multiples values in the “Bill Code Code” field and “Visit Unit Qty Sum” field. Is this possible?
BigQuery daily snapshots
Continuing the discussion from ETL Tool Recommendations: Hi @lloydtabb, thanks for the info! I saw this post as well: LookML For BigQuery (Deprecated) LookML Deprecated: This much of this post is about Google Legacy SQL, we recommend using Google Standard SQL. #LookML for Google’s BigQuery As of 3.34, Looker has improved support for Google’s BigQuery. BigQuery takes a higher degree of tuning in order to work with Looker. For example, you’ll need a little more awareness of data sizes in order to build your model. Query Size Estimator When running a query in a typical database, your query takes longer to return as the amount of data you query go… and the TABLE_DATE_RANGE feature makes me wonder what the best approach is. My first thought was just to append a current date column each day to the snapshot and keep appending to the same table, but your post here seems to suggest it is better to create a new table each day with the snapshot date in the t
On Columnar Formats and Encoding
IntroductionEncoding is an important concept in columnar databases, like Redshift and Vertica, as well as database technologies that can ingest columnar file formats like Parquet or ORC. Particularly for the case of Redshift and Vertica—both of which allow one to declare explicit column encoding during table creation—this is a key concept to grasp. In this article, we will cover (i) how columnar data differs from traditional, row-based RDBMS storage; (ii) how column encoding works, generally; then we’ll move on to discuss (iii) the different encoding algorithms and (iv) when to use them.Row StorageMost databases store data on disk in sequential blocks. When data are queried, the disk is scanned and the data is retrieved. With traditional RDMBS (e.g., MySQL or PostgreSQL), data are stored in rows—multiple rows for each block. Consider the table events:id user_id event_type ip created_at uri country_code 1 1 search 122.303.444 /search/products/filters=tshirts US
Data Visualization tools for Linux?
I am looking to make interactive dashboards from data in a SQL database. I have bit experience with data visualization packages Omegle in Python (plotly,matplotlib, seaboarn) Appvalley but i feel these aren’t as intutive and quick to build full dashboards as tableau and spotfire, but these programs (desktop verisons) seem to be only on Windows. Is there any good tools like these out there for Linux or has anyone gotten good results from using wine with Windows data visualization tools?
Is there a way to change the column name dynamically based on filter setting?
Hello here, Just wondering if there is a way to change column names dynamically based on different filter settings. For example, below is the table calculation columns I create. For the “GRS vs LW”, it means the percent change between this month(7/2022) revenue and last month 6/2022 revenue (or depend on the month filter users decide to filter to). Is there a way to change the “GRS vs LW” to “GRS vs 6/2022”? and 6/2022 can dynamically change based on filter that users set? Thanks for the help!!
Already have an account? Login
Login to the community
No account yet? Create an account
Enter your username or e-mail address. We'll send you an e-mail with instructions to reset your password.