Got a question?
Ask your big data questions here.
- 82 Topics
- 195 Replies
We send hundreds of files a month to one single S3 browser and we’d like to avoid having to re-enter the S3 credentials each time we need to send an adhoc file or set up a new schedule. Would also be good to have the same functionality for sFTP for the same reason.
IntroductionEncoding is an important concept in columnar databases, like Redshift and Vertica, as well as database technologies that can ingest columnar file formats like Parquet or ORC. Particularly for the case of Redshift and Vertica—both of which allow one to declare explicit column encoding during table creation—this is a key concept to grasp. In this article, we will cover (i) how columnar data differs from traditional, row-based RDBMS storage; (ii) how column encoding works, generally; then we’ll move on to discuss (iii) the different encoding algorithms and (iv) when to use them.Row StorageMost databases store data on disk in sequential blocks. When data are queried, the disk is scanned and the data is retrieved. With traditional RDMBS (e.g., MySQL or PostgreSQL), data are stored in rows—multiple rows for each block. Consider the table events:id user_id event_type ip created_at uri country_code 1 1 search 122.303.444 /search/products/filters=tshirts US
Looking for looker-users using looker to look at learning data 😉 I’m and educational research scientist who uses Looker to support our EdTech platform and I’d love to find others that share similar uses cases, creating a small community of sharers. Anyone have a suggestion? Are there similar ‘groups’ within the community? #LovetoConnect
Hi all! We’re getting a little more sophisticated with our data science work here at Zearn, and I thought I’d see if you all have some advice. Most of this work happens in Python or R using Jupyter notebooks based on data pulled from Looker, and I’m wondering how best to integrate Looker. In particular, I have two questions: Does anyone have best practices for pulling data into notebooks from Looker? One challenge we run into using the API is that some of the queries we want can take a very long time to run. If the data won’t change (e.g. we’re pulling a fixed time period), we usually just download the results from Looker into a CSV and use that, but I don’t love the way that decouples the data from the source. Hosting and sharing notebooks: It’s easy to link people to looks or explores in Looker when we want to share something in Looker, but obviously we can’t do this with notebooks. Are there tools folks like to do something similar with notebooks? Thanks in advance!
So one of my favorite features of Looker is the persistent derived tables and the data modeling functionality that comes with that - it allows us to be very nimble. However, one of the downsides of using persistent derived tables is that every time it gets generated, unless you are setting a filter for it constrain the time period, the time it takes to generate takes longer and longer as your data set grows. However, a lot of the times really only a small portion of the PDT needs to be updated (think a user_order_facts table or something like that). My question is, is there a way to only re-compute the parts of the derived table which have changed since the last time it was built? Or should I be looking more towards an ETL solution at this point? Thanks!
Hi all, I’m looking for opinions on Fivetran vs Stitch, specifically focused on the ingestion piece. Has anyone had experience of both and can explain their preference? I’m interested in anything from customer service standards, roadmap transparency, robustness or anything else that has left anyone with data ingestion battle scars. No doubt people from both companies keep an eye on discourse but it would be great to keep this to a customer-only view point please. Both tools have lots of advantages as far as I can see so I’d love to get a feel for what this community thinks are the big differentiators. Cheers, Jon
I am looking to make interactive dashboards from data in a SQL database. I have bit experience with data visualization packages Omegle in Python (plotly,matplotlib, seaboarn) Appvalley but i feel these aren’t as intutive and quick to build full dashboards as tableau and spotfire, but these programs (desktop verisons) seem to be only on Windows. Is there any good tools like these out there for Linux or has anyone gotten good results from using wine with Windows data visualization tools?
Most companies thought of using machine learning for their company but they fail at execution. Once you have Looker, you are able to drive machine learning to your data with the help of Google BigQuery. Google BigQuery is designed for enterprise data warehouse. Well, a better alternative than Snowflake of course! You can get more information at https://cloud.google.com/bigquery/. If you are a small-medium enterprise (SME) with less than 10GB of data, well then you can run it for free. But anyways! Back to the topic of machine learning. This is a video tutorial by Looker on creating a dashboard to identify customer churn or conversion using logistic and binary regression models. View the video at https://www.youtube.com/watch?v=IJfDOr5PGJ8&t=388s. If you’re interested, you can read more about how you can use Looker with Google BigQuery over at https://looker.com/blog/data-science-with-bigquery-machine-learning-looker. Oh. And an application I love to use is RapidMiner. It is an appl
Unable to schedule "All Results" to be delivered with the option “and results changed since last run”
Unfortunately, we are not able to use Datagroups to schedule data as there is no option to control the time that a report’s issued when new data is loaded and this means that our report period filter script will not have updated before the schedule issues the report, thus the report’s are sent with old data. The dates we add data to the portal for certain clients are not always the same so we can’t just pick a set date or time to schedule data and need to rely on selecting the option “and results changed since last run”, however, when you select this option you can’t select “All Results” and are thus limited to the row limit of 5,000. We urgently need a solution to this as we are now having to manually send these reports when we should be able to automate this and some of our clients are unhappy as they have scheduled reports that were missing data.
Hi there, Could you suggest the best and easiest way to load one small dataset from DynamoDb into Looker? The dataset represented with one table of time-series data, up to 20 fields, not more than 100k rows in total. The data needs to be updated one or two times a day. And this doesn’t need to be a production-ready integration, we just need a quick and dirty way to get the data and play with it. If we decide that this data looks good in looker we will consider implementing a more robust integration.
@Brett_Sauve suggested this community might be interested in some of our work at PERTS. This blog post describes and gives sample code for a containerized web server in R for ETL work (or whatever else you like). Thanks for your interest and feedback 🙂 Medium – 11 Jul 18 A Scalable Pure-R Web Server – PERTStech – Medium At PERTS we’d like to run custom ETL jobs in R, and enterprise products like Tableau and RStudio Connect are heavy, expensive, and (for… Reading time: 2 min read
Does anyone have recommendations for a tool we can use to sync Looker data to Salesforce? Note that we’d like to actually sync data from Looker to Salesforce rather than embedding a Looker dashboard within Salesforce which I know is also possible. We’re hoping there is a tool out there that can handle this without too much engineering effort using the Looker API or the soon to be released Looker webhooks.
Hello, Is there a place where I can find more information about how to go about sending file exports from Oracle Responsys to a Looker customer’s instance? The use case is to allow for marketing data exports to flow from Responsys Interact into customer’s Looker DB where it can be developed into their reporting model. Or specifically to know if it’s possible to have an sftp folder associated directly to the Looker instance. Or more generally - what are the suggested ways to build 3rd party connections in Looker? Are there specific partners or solutions that one would use for integrating Looker with their ESP (email service provider) or marketing platform? Thanks in advance, any advice much appreciated! -Nicole H
I had asked a question earlier this year surveying everyone here about your company’s data stack and thought it was a fruitful conversation. As my company Payoff has matured, I’ve realized more and more that something that is as or possibly more important is how the data team is structured across the company. And when I say data team I’m referring to the full data stack and the role people play in building out the infrastructure, building data integrations and of course analyzing the data itself. So how is your data team structured? Does it have a clear delineation between data engineers (generate the data in the application aka website), DBAs (those that work on the data pipes or ETL) and data analysts/scientists (those that derive business value from the data)? Or do you have an organization where the line is blurred between DBAs and data scientists kind of like Stichfix does it: their motto is engineers shouldn’t write ETL. To kick things off I thought I’d share how my company is st
Hello, I was wondering how often does the data from Salesforce update in Looker. I know it is 24 hours but when I pull the data and compare it to Looker it is off by more than 48 hours of data update. Can anyone help me to understand if there is something I missing in the refresh that is supposed to occur every night (or every 24 hours)? Thanks, Parijat
Already have an account? Login
Login to the community
No account yet? Create an account
Enter your username or e-mail address. We'll send you an e-mail with instructions to reset your password.