DNS network incident
Incident Report for Gandi.net


On July 16Th between 13:26 UTC and 13:48 UTC we have experienced an incident on our livedns plateform.

Two DNS nodes went down.

Customer impact

Queries towards those two DNS node failed with a timeout.

Root cause analysis

Why did two DNS nodes go down ?

  • Due to a software bug which triggered a fault and the server stopped.

Why did it have such an impact ?

There are three problems :

First one :

We do anycasting on our DNS infrastructure, for redundancy.

Anycasting means that we announce our DNS nodes' addresses in BGP.

But if the DNS server crashes, we should stop announcing the route from this DNS node. The failsafe used for this case failed.

Second one :

A default in our configuration has made these two nodes announce more IP addresses than expected. The nodes are supposed to announce these anycasted IP addresses :

ns-206-a.gandi.net. -

ns-64-b.gandi.net. -

ns-110-c.gandi.net. -

Domain names that use LiveDNS are served by resolvers named like so:




In normal circumstances, if ns-x-a.gandi.net can't answer the request, ns-x-b.gandi.net and then ns-x-c.gandi.net will be tried next.

But if all three are announced by the same broken node, all queries and retries will fail (the redundancy doesn't work).

Third one :

Before the DNS incident, we were dealing with an internal network incident.

This incident generated a lot of noise in our monitoring systems, leading to a bad interpretation of the alerts triggered on the DNS servers, as we thought they were false positives.

This lead to the issue extending beyond what it should have, because of the time it took us to re-evaluate the LiveDNS monitoring alerts.


  • We will fix the way we disable a node when the dns server is not operating correctly
  • We will fix our node configuration and setup monitoring to make sure we don't have one node announcing several clusters.
  • We will rework our monitoring and our internal training/organisation.
  • We already tracked the bug in our dns server software and a fix is currently being deployed.
Posted Jul 16, 2020 - 16:36 UTC

This incident has been resolved.
Posted Jul 16, 2020 - 13:49 UTC
Impact for FR customers
Posted Jul 16, 2020 - 13:26 UTC
This incident affected: DNS (LiveDNS, {abc}.dns.gandi.net).