What Google Cloud product supports high concurrency and provides low latency for Analytical queries?

I am new to Google Cloud Platform, so this is a question for those who have experience with GCP suite of products. I am trying to figure out which Google Cloud product would be the best fit for my needs. Here are the requirements I am trying to meet:


1. Store data in Google Cloud Platform.
2. Support potentially thousands of concurrent users running analytical queries.
3. The data volumes would be moderate (as far as Google is concerned) - we are talking 10's of GB up to potentially 100s of GB.
4. The data will be updated a few times a day.
5. Perhaps my biggest goal is the following: I am trying to avoid having to build a traditional data warehouse with start-schema and ETL process. I have experience with these projects and they end-up being very costly, error-prone, hard to troubleshoot and rigid (hard to deploy changes). Instead I am hoping to just copy my OLTP tables to GCP and query them from there relying on GCP horse power.


I appreciate any feedback.

0 6 948
6 REPLIES 6

It sure feels like you are describing the Google BigQuery product offering.  If you search on "BigQuery" you'll find more references that you could ever want.  At the highest level, it's a SQL compliant analytics database optimized for queries.

@sql_guy if you want to get hands on with BigQuery, you can take the learning challenge

At our company we do the same, we use CloudSQL for the 'business data', i.e. things we want to store. But all the analytical stuff is done with BigQuery (BQ).

To make working with BQ easy, we load both 1) all user events into BQ and 2) overnight we also create/update some sort of aggregated view into BQ, based on our 'core' data from CloudSQL.

The latter makes it easier querying e.g. not only ID's of things, but also have the possibility to output e.g. a title of a 'thing', as you can easily join that information while having it at hand. Hope this helps!


Thank you for sharing your experience! Could you tell me how much data you keep in BQ and how many concurrent users do you support. I am asking because for my use case we need to support hundreds of concurrent connections and I see that BQ limit is 100 concurrent connections. This is not going to work for us, but I wonder if there is a way around it.
https://cloud.google.com/bigquery/quotas

 

In our software we have a screen/section that shows graphs based on BQ data. For every user visiting those pages, our backend server queries BQ, thus only needing 1 connection. This works really well, but I must say we don't have that many people simultaneously using these screens.

In your situation I would use the same approach: make sure that the backend-server (being just 1 connection) is connecting on behalf of all application users and that your server caches results, to minimize impact (and costs) on BQ. That way you can have up to virtually 100 applications using BQ.

Thank you again for sharing your experience!