What happens when a node in a cluster fails?

jonallen · ‎07-07-2021

Knowledge Drop

Last tested: Jan 10, 2020

Do the other nodes recognize the failed node?

Indirectly, yes -- The node's aren't keeping an eye on each other directly (they aren't pinging each other) but the scheduling engine recognizes the node failure based on the internal db's node heartbeat table and will restart/redistribute the schedule jobs accordingly.

When one of the looker node fails, do the active looker nodes start accepting requests that were sent to the failed node?

The load balancer will stop sending front-end requests to the failed node when it fails to receive a response. The scheduler will pick up on work that has failed due to a node disappearing, and restart that work on an active node.

This content is subject to limited support.