Got a question?
Ask your big data questions here.
- 123 Topics
- 221 Replies
ETL Tool Recommendations
I am thinking about getting a commercial ETL tool. We have a bunch of data sources (Facebook, Salesforce, NetSuite etc.). They are important and right now we have jobs scheduled in all kinds of places that are owned by multiple employees. In an effort to operationalize this, I am looking at some ETL tools. Do you guys have any recommendations? Does anyone know of “the Looker of ETL tools?” Thanks, Lucas
Are there any plans on the roadmap to facilitate display of motion charts? Motion charts are usually scatter plots or bubble charts which have an additional time dimension. Each frame of the animation represents the metrics for the given objects at a particular time (e.g., date, week, month, year). for a The frames are advanced step-wise for each new time entry. Wikipedia has an entry for these: en.wikipedia.org Motion chart A motion chart is a dynamic bubble chart which allows efficient and interactive exploration and visualization of longitudinal multivariate Data. Motion Charts provide mechanisms for mapping ordinal, nominal and quantitative variables onto time, 2D coordinate axes, size, colors, glyphs and appearance characteristics, which facilitate the interactive display of multidimensional and temporal data. In general, charts, graphs and plots provide the means for summarizing quantitative and qualitative da... I don’t see any ot
Unable to schedule "All Results" to be delivered with the option “and results changed since last run”
Unfortunately, we are not able to use Datagroups to schedule data as there is no option to control the time that a report’s issued when new data is loaded and this means that our report period filter script will not have updated before the schedule issues the report, thus the report’s are sent with old data. The dates we add data to the portal for certain clients are not always the same so we can’t just pick a set date or time to schedule data and need to rely on selecting the option “and results changed since last run”, however, when you select this option you can’t select “All Results” and are thus limited to the row limit of 5,000. We urgently need a solution to this as we are now having to manually send these reports when we should be able to automate this and some of our clients are unhappy as they have scheduled reports that were missing data.
So one of my favorite features of Looker is the persistent derived tables and the data modeling functionality that comes with that - it allows us to be very nimble. However, one of the downsides of using persistent derived tables is that every time it gets generated, unless you are setting a filter for it constrain the time period, the time it takes to generate takes longer and longer as your data set grows. However, a lot of the times really only a small portion of the PDT needs to be updated (think a user_order_facts table or something like that). My question is, is there a way to only re-compute the parts of the derived table which have changed since the last time it was built? Or should I be looking more towards an ETL solution at this point? Thanks!
Jupyter Notebooks and Looker
Hi all! We’re getting a little more sophisticated with our data science work here at Zearn, and I thought I’d see if you all have some advice. Most of this work happens in Python or R using Jupyter notebooks based on data pulled from Looker, and I’m wondering how best to integrate Looker. In particular, I have two questions: Does anyone have best practices for pulling data into notebooks from Looker? One challenge we run into using the API is that some of the queries we want can take a very long time to run. If the data won’t change (e.g. we’re pulling a fixed time period), we usually just download the results from Looker into a CSV and use that, but I don’t love the way that decouples the data from the source. Hosting and sharing notebooks: It’s easy to link people to looks or explores in Looker when we want to share something in Looker, but obviously we can’t do this with notebooks. Are there tools folks like to do something similar with notebooks? Thanks in advance!
I’ve begun exploration of creating a transformed database (data warehouse) for use with looker. Right now we just have Looker hooked up to a replica of our production DB, which has worked great in the short-term, but is showing some limitations. In order to not go into this too blindly I’ve been doing some reading on data warehouse best practices, including ‘The Data Warehouse Toolkit’, which from what I can gather is one of the must read books in this space. In this book they are quite adamant about the creation of a ‘Date’ table, instead of using date fields directly in your fact tables. Benefits include being able to assign dimensions to dates such as ‘holiday’, ‘quarter’, ‘weekend’, etc. that aren’t readily accessible in SQL. Of course, Looker does have some helpful date/time tools allowing easy grouping by month/week etc. So, as I begin down this road I wonder how important these classical warehouse approaches are when using a tool like looker, as it seems much of what looker does
How can I replicate a Sum with an Over clause?
I have some raw wishlist data in a simple form (timestamp, user_id, item_id, added) where the final item there is just a +1 or -1.The data isn’t that clean, of course, but that’s effectively what I have :)I want to be able to visualize both the day-to-day changes and the running totals on a line chart.Having the running total is a bit complicated, and I can’t figure out the best way to do it in Looker.In SQL, I would do something like:SUM(added) OVER(PARTITION BY item_id ORDER BY timestamp ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)And that would give me the running total; by selecting/grouping the days, I’d get 1x row per day/item combo and the useful data rows would be delta to tell me what the total change would be and running_total to tell me what the running total would be.Can someone help me figure out how to accomplish this in Looker?
Filtering dates 'in the past [X]' - Looker inserts empty string into CURRENT_DATE('')
Hey all,I’m trying to figure out this weird error I’m getting. When I filter my dates with ‘is in the past 5 months’, I get the error `Query execution failed: - Invalid empty time zone`, but when I use a date range for the filter, there is no issue.I looked into the SQL code behind the scenes, and it seems like Looker is using CURRENT_DATE(‘’) with an empty string for the ‘is in the past [x]’ filter. Has anyone experienced this? Any suggestions to still use the ‘is in the past [x]’ filter without having this issue.Thanks!Joe
Segment v's Snowplow
Formula to multiply values in 2 columns
Hello, I’m trying to add a calculation to multiply values in 2 columns but the calculation is showing this error. I’m using the “Table Calculation” field under the “Custom Fields” section. I’m trying to add a 3rd column that multiples values in the “Bill Code Code” field and “Visit Unit Qty Sum” field. Is this possible?
Designing a Data Warehouse - what would a BI solution recommend?
At Looker we are often asked about best practices when it comes to designing a new data warehouse. Typically this happens just as when companies are moving to MPP, and maybe even, column-oriented, databases, so it is clear from the start that replicating the design from an operational database is not appropriate. It is also clear that there is not a single answer - search on Google turns up millions of results. That said, here are a few rules of thumb that you can apply as you focus on building your analytical data warehouse to work with Looker: simplicity (aka shortest path) performance single copy of data transparent EL process Shortest Path You should not need to use mapping tables or Entity–attribute–value tables to get to the value. The path to any two dimensions in one SELECT query should not involve more than a couple of joins. Typically “long path” designs arise from storing original data in NoSQL format. Because there is very little analytical value derived from performing s
Fivetran vs Stitch for data ingestion
Hi all, I’m looking for opinions on Fivetran vs Stitch, specifically focused on the ingestion piece. Has anyone had experience of both and can explain their preference? I’m interested in anything from customer service standards, roadmap transparency, robustness or anything else that has left anyone with data ingestion battle scars. No doubt people from both companies keep an eye on discourse but it would be great to keep this to a customer-only view point please. Both tools have lots of advantages as far as I can see so I’d love to get a feel for what this community thinks are the big differentiators. Cheers, Jon
Suggestions for moving Looker data to Salesforce?
Does anyone have recommendations for a tool we can use to sync Looker data to Salesforce? Note that we’d like to actually sync data from Looker to Salesforce rather than embedding a Looker dashboard within Salesforce which I know is also possible. We’re hoping there is a tool out there that can handle this without too much engineering effort using the Looker API or the soon to be released Looker webhooks.
Hi there, Could you suggest the best and easiest way to load one small dataset from DynamoDb into Looker? The dataset represented with one table of time-series data, up to 20 fields, not more than 100k rows in total. The data needs to be updated one or two times a day. And this doesn’t need to be a production-ready integration, we just need a quick and dirty way to get the data and play with it. If we decide that this data looks good in looker we will consider implementing a more robust integration.
Your company's data team organization
I had asked a question earlier this year surveying everyone here about your company’s data stack and thought it was a fruitful conversation. As my company Payoff has matured, I’ve realized more and more that something that is as or possibly more important is how the data team is structured across the company. And when I say data team I’m referring to the full data stack and the role people play in building out the infrastructure, building data integrations and of course analyzing the data itself. So how is your data team structured? Does it have a clear delineation between data engineers (generate the data in the application aka website), DBAs (those that work on the data pipes or ETL) and data analysts/scientists (those that derive business value from the data)? Or do you have an organization where the line is blurred between DBAs and data scientists kind of like Stichfix does it: their motto is engineers shouldn’t write ETL. To kick things off I thought I’d share how my company is st
BigQuery daily snapshots
Continuing the discussion from ETL Tool Recommendations: Hi @lloydtabb, thanks for the info! I saw this post as well: LookML For BigQuery (Deprecated) LookML Deprecated: This much of this post is about Google Legacy SQL, we recommend using Google Standard SQL. #LookML for Google’s BigQuery As of 3.34, Looker has improved support for Google’s BigQuery. BigQuery takes a higher degree of tuning in order to work with Looker. For example, you’ll need a little more awareness of data sizes in order to build your model. Query Size Estimator When running a query in a typical database, your query takes longer to return as the amount of data you query go… and the TABLE_DATE_RANGE feature makes me wonder what the best approach is. My first thought was just to append a current date column each day to the snapshot and keep appending to the same table, but your post here seems to suggest it is better to create a new table each day with the snapshot date in the t
Already have an account? Login
Login to the community
No account yet? Create an account
Enter your username or e-mail address. We'll send you an e-mail with instructions to reset your password.