Cloud Data Fusion - Spark program 'phase-1' failed with error: The TCP/IP connection to the host...

Hello guys,

I get this error when trying to run an ingest pipeline from on-prem SQL to BigQuery:

"Spark program 'phase-1' failed with error: The TCP/IP connection to the host xx.xx.xx, port 1433 has failed. Error: "connect timed out. Verify the connection properties. Make sure that an instance of SQL Server is running on the host and accepting TCP/IP connections at the port. Make sure that TCP connections to the port are not blocked by a firewall.".. Please check the system logs for more details.

I did a "connectivity test" in GCP where the destination was reachable. However when I try to ping the SQL server it cant reach from a Linux instance in the same network. I cannot reach it using Dataproc (SSH) either.

When looking at all completed Dataproc ingest jobs, they all have succeeded.

What can be the issue?

 

 

 

 

0 2 313
2 REPLIES 2

I have also checked that the TCP connections to the port are not blocked by a firewall.

Here are several steps and considerations to help troubleshoot and resolve the issue:

  1. Direct Port Connectivity Tests:

    • Utilize Telnet or Netcat on the Linux instance to confirm connectivity to the SQL Server's TCP port 1433 (e.g., nc -zv xx.xx.xx 1433). Successful tests indicate open network paths and ports, while failures suggest network or firewall issues requiring attention.
    • Conduct connectivity tests from both Linux and Windows machines within the same network to identify any OS-specific discrepancies.
  2. SQL Server Configuration and Firewall:

    • TCP/IP Protocols: Ensure TCP/IP is enabled in the SQL Server Configuration Manager and that the server is not limited to named pipes connections.
    • Browser Service: For named instances, confirm the SQL Server Browser Service is operational.
    • Firewalls: Conduct a detailed review of firewall configurations, both on-premises and within GCP, to guarantee that inbound traffic from GCP IPs to the SQL Server on port 1433 is permitted. Pay special attention to the Windows Firewall settings on the SQL Server.
    • IP and Port Configuration: Verify that the SQL Server is set to listen on the correct IP address and port, particularly in multi-network interface or instance environments.
    • Security Features: Check for any IP address whitelisting or SQL Server security settings that might block connections from specific sources.
  3. Networking and Security:

    • Routing & VPN/Interconnect: Examine the network routing between GCP and your on-premises network for any misconfigurations in VPN, VPC peering, or Cloud Interconnect setups that could impede traffic.
    • NSGs/ACLs: Confirm that Network Security Groups and Access Control Lists within GCP are configured to allow the necessary traffic.
    • Subnet Overlaps: Ensure there are no overlapping subnets between the GCP VPC and the on-premises network that could cause routing conflicts.
    • Network Changes: Look into any recent network configuration changes that might align with the start of the connectivity issues.
  4. Understanding Successful Dataproc Jobs:

    • Configuration Analysis: Conduct a thorough comparison of network and SQL Server connection configurations between successful Dataproc jobs and the failing Data Fusion pipeline, including authentication methods and credentials.
    • Logs & Metrics: Delve into Dataproc job logs for details on specific network paths, ports, and settings that facilitated success.
  5. Additional Investigation Points:

    • DNS: Verify DNS settings in both GCP and on-premises networks to ensure the SQL Server hostname resolves correctly across all necessary locations.
    • Proxies and SSL/TLS: Identify any proxies that might affect traffic from GCP and check for SSL/TLS inspection or termination features that could interfere with secure connections.
    • Dynamic Ports: If SQL Server is configured for dynamic ports, pinpoint the current active port and adjust the Data Fusion setup accordingly. Consider setting SQL Server to use a static port for easier management.