Datastream in Project-A, CloudSQL in Project-B

Hey y'all

I'm trying to configure Datastream to enable streaming data from CloudSQL to the BigQuery.

Our CloudSQL instance in the project-A, Datastream in the project -B.
Project-A has vpc-A which is peered with vpc-B in the Project-B.
vpc-A has cloudsql_auth proxy and I'm able to ping or login to this database via cloudsql_auth_proxy using psql from VPC-B(Project-B) but I'm not able to connect to the cloudsql_auth_proxy from the Datastream. I need to be able to connect Datastream(Project-A) to the CloudSQL(Project-B) using private connection
Thank you!

0 2 1,516
2 REPLIES 2

Hi @realsharip,

Welcome to Google Cloud Community!

On Google Cloud Platform, you can use VPC Peering to connect the VPCs of Project-A and Project-B. This will allow Datastream in Project-B to access the CloudSQL instance in Project-A over a private connection.
  1. Go to the VPC peering page in the Google Cloud Console: https://console.cloud.google.com/networking/peering
  2. Click on "Create VPC Peering Connection"
  3. In the "Peer VPC" section, select Project-A as the "My VPC" and Project-B as the "Peer VPC"
  4. Click on "Create"
  5. Once the peering connection is created, you will need to accept it in Project-A and configure the necessary firewall rules to allow communication between the two VPCs.
  6. Create a service account in Project-B, and grant it the necessary permissions to access CloudSQL instance in Project-A.
  7. Configure Datastream to use the service account and connect to the CloudSQL instance using its private IP address.
  8. Make sure that the firewall rules for both VPCs allow for communication on the necessary ports for CloudSQL and Datastream.
Please double check the instructions with Google Cloud Platform documentation since it can vary depending on the version of the service and the specific use case.
 
Thank you.

Actually, Datastream is a serverless product, so it isn't physically in the project B VPC-B.
And because VPC peering isn't transitive, peering Datastream > VPC-B > -VPC-A > Cloud SQL doesn't work. To get around this, you'll need to either:
a) set up an additional reverse proxy in VPC-B to forward the traffic to the AuthProxy in VPC-A (which will then connect to Cloud SQL)
or
b) set up a Shared VPC, peer Datastream to it, and put the reverse-proxy there (and I think in this case you don't need the additional AuthProxy, just point the proxy at the Cloud SQL database?)

In either case, make sure that Datastream is pointing to the proxy's IP, not the database's.

(DISCLAIMER: I'm not a networking expert, so I might be missing something)