What happens when you update your DNS

eat_veggies · on June 22, 2020

One of my favorite revelations about the network tracing tools (things like `traceroute` and `dig +trace`) that might not be obvious for people like me who work higher up in the stack, is that the data they provide isn't usually made available during "normal" usage. Packets don't just phone home and tell you where they've been. Something else is going on.

When you send a DNS query to a recursive server like your ISP's or something like 1.1.1.1, you make a single DNS query and get back a single response, because the recursive DNS server handles all the different queries that Julia outlines in the post. As the client, we have no idea what steps just happened in the background.

But when you run `dig +trace`, dig is actually pretending to be a recursive name server, and making all those queries itself instead of letting the real recursive name servers do their work. It's a fun hack but that means it's not always 100% accurate to what's going on in the real world [0]

[0] https://serverfault.com/questions/482913/is-dig-trace-always...

closeparen · on June 22, 2020

Traceroute’s trick is amusing. It abuses the TTL field, sending out packets with too-low TTLs and waiting to see who complains about them. When layers reveal themselves they are doing it voluntarily, and those wise to the game can choose not to participate, or to troll it.

https://www.theregister.com/2013/02/15/star_wars_traceroute/

m3047 · on June 23, 2020

Those are IP packet TTLs. Also useful in some cases for diagnosing mysterious RSTs and SYNs. Also useful in some cases to purposefully set low if there is route flapping.

allarm · on June 23, 2020

The IP TTL field is unrelated to the topic - DNS and IP are sitting on the different OSI layers. Also, the traceroute command does not abuse this field, it’s just the way the IP protocol works. I.e. eBGP sets TTL to 1 by default, to prevent establishing relationships with the neighbors further that one hop away.

Tepix · on June 23, 2020

It's not abusing the TTL, it's working as designed

closeparen · on June 23, 2020

AFAIK the design purpose of TTL is to prevent an infinite loop in case the route contains a cycle.

LogicX · on June 22, 2020

Just to add to the discussion -- 'whats happening in the background' -- more specifically is your operating system's stub resolver.

So when you ask for www.amazon.com it ends up making multiple DNS lookups, as www.amazon.com is a CNAME record.

Nothing about this CNAME lookup gets passed back up the stack to your application; you just get that end-result: the IP address.

host www.amazon.com www.amazon.com is an alias for tp.47cf2c8c9-frontier.amazon.com. tp.47cf2c8c9-frontier.amazon.com is an alias for www.amazon.com.edgekey.net. www.amazon.com.edgekey.net is an alias for e15316.e22.akamaiedge.net. e15316.e22.akamaiedge.net has address 23.204.68.114

chrisweekly · on June 23, 2020

Am I the only one who finds it strange that Amazon uses Akamai instead of Cloudfront?

forbiddenlake · on June 23, 2020

It varies. For example, for www.amazon.com, I get www.amazon.com -> tp.47cf2c8c9-frontier.amazon.com. -> d3ag4hukkh62yn.cloudfront.net.

_lqaf · on June 23, 2020

Why is that?

(My first thought was, wow, surprised they outsource that.)

Legogris · on June 23, 2020

I'd wager at least part of it is the same reason many companies use CloudFront: Separate point of failure in case of downtime at the AWS side, some one else shouldering DDoS.

blyry · on June 23, 2020

Pure speculation, but probably the same reason they ran Oracle for so long -- not many cdn providers when amazon.com launched. Afaik akamai still has one of the biggest edge networks of all the providers.

MaxBarraclough · on June 23, 2020

I asked the same question 7 days ago: https://news.ycombinator.com/item?id=23537135

nijave · on June 22, 2020

Yup, and to complicate matters more those resolvers you're talking to may be talking to more caching resolvers.

For a given application server it might be:

- check local dns caching resolver

- check local network caching resolver

- if internal domain, check local authoritative resolver

- if public domain check isp resolver

- recursively resolve from there

fragmede · on June 22, 2020

In particular, one your ISPs/their ISP's DNS servers may be caching a record for longer than it's supposed to and will return incorrect and expired data.

The other possibility is different IP's being returned by a DNS server based on where a query is coming from, eg a CDN. If you're in location A and your ISPs DNS server is in location B, the CDN's DNS server may return a different IP based on if the request is coming from A or B. ECS [0] is supposed to mitigate this, but may or may not be used.

[0] https://en.wikipedia.org/wiki/EDNS_Client_Subnet

qes · on June 22, 2020

> In particular, one your ISPs/their ISP's DNS servers may be caching a record for longer than it's supposed to and will return incorrect and expired data.

It's disturbing how many clients we'll see hitting an old IP address for 30 days after a change.

m3047 · on June 23, 2020

My favorite dig trick is asking caching resolvers with +norecurse to see what's in the cache. Of course some resolvers don't honor it, just like they don't support ANY.

inamberclad · on June 23, 2020

I thought DNS was a UDP protocol where you fire off a packet and use the first response that comes back. It's not always one to one.

eat_veggies · on June 23, 2020

UDP is stateless but DNS is definitely 1:1 in that there is a notion of a query and its reply. This relationship is made even clearer when you consider DoH.

dgl · on June 22, 2020

dig +trace takes one path, there’s also tools like dnstrace that attempt to show all the paths: https://github.com/rs/dnstrace

Still there can be caches that don’t quite agree as the other comment mentions.

muppetman · on June 22, 2020

Glad to see this. One of my (stupid) pet peeves is people that say "You have to wait for the DNS to propogate". DNS does not propogate. What you're actually waiting for is the cache TTL to expire so those name-servers that have cached it have to query the real answer again, thus getting the newly pushed information. Of course it appears exactly like it "takes time to propagate" which is why it's actually a pretty sound description of what's happening, and thus why it's a stupid pet peeve. Pointless rant ends.

rovr138 · on June 22, 2020

The change is propagating through the network, but it’s not a push like most would assume based on the wording

LogicX · on June 22, 2020

I respect and agree with your rant. CTO and Co-Founder of DNSFilter here.

When we get a complaint that our recursive servers are returning an 'old IP' as if we did something wrong... I start by explaining that they likely didn't properly lower their TTL prior to making a change.

Sadly we'll never win this battle; so at some point in the future we'll need to join the likes of google in providing a page to force cache expiration of domains at customers request :/

https://developers.google.com/speed/public-dns/cache

raegis · on June 23, 2020

Oh! I always thought the TTL was the delay in "propagating" the DNS change, not an expiration time for lookups. Learn something new everyday... Good thing that's not my day job.

gerdesj · on June 22, 2020

Don't forget negative caching. Windows famously fucks up here. A DNS look up these days is minute in the grand scheme of things and yet Windows still insists on caching a failed lookup for five minutes.

So you fire up cmd.exe and issue ifconfig /releasedns, ..., ipconfig /?, ipconfig /flushdns and then you go back to pinging the bloody address instead of using nslookup because you learned from another idle/jaded sysadmin to use ping as a shortcut to querying DNS, instead of actually querying what the DNS servers respond with.

Obviously, a better thing to do when checking your DNS entries is dig out ... dig.

DNS changes _do_ propagate: from the one you edited to the others via zone transfers and the like (primary to secondary etc) and thence to caching resolvers.

Ayesh · on June 23, 2020

Resolvers are supposed to cache negative results according to the SOA record of the zone.

gerdesj · on June 24, 2020

I'll admit that I have not looked too far ... (Google) ... https://tools.ietf.org/html/rfc2308:

"Negative caching was an optional part of the DNS specification and deals with the caching of the non-existence of an RRset [RFC2181] or domain name."

Optional.

Now, I know how DNS works and so do you but your average sysadmin wants instant results as they thrash around trying to get something to work. You see the same effect with firewalls. If state tables don't get flushed after a change then the change will seem to be ineffective. Been there and done that. States often die after five minutes and so do Windows DNS fuck ups.

That is why the master says: "wait longer" followed by "turn it off and on again" or "go out and smoke a fag" or "get me a coffee" or whatever. When your arse is on fire in IT, learn when to trust your judgement and take five.

tialaramex · on June 22, 2020

Yes, I'm annoyed about this too.

The most egregious case I've seen was an Amiga site. The site went down and for several days reported that users would need to wait for the updated records to propagate and lots of loyal fans were insisting anybody who couldn't read the site was just being too impatient.

What was actually wrong? They wrote their new IP address as a DNS name in their DNS configuration rather than as an IP address. Once they fixed that it began working and they acted as though that was just because now it had successfully propagated.

On the other hand propagation is a thing when it comes to distributing modified DNS records to multiple notionally authoritative DNS servers.

This can be a problem for using Let's Encrypt dns-01 challenges for example, especially with a third party DNS provider.

Suppose you write a TXT record to pass dns-01 and get a wildcard certificates for your domain example.com. You submit it to your provider's weird custom API and it says OK. Unfortunately when you do this all it really did was write the updated TXT record to a text file on an SFTP server. Each of the provider's say three authoritative DNS servers (mango, lime, kiwi) check this site every five minutes, download any updated files and begin serving the new answers.

Still they said OK, so you call Let's Encrypt and say you're ready to pass the challenge. Let's Encrypt calls authoritative server kiwi, which has never seen this TXT record and you fail the challenge.

So you check DNS - your cache infrastructure calls lime, which has updated and gives the correct answer, it seems like everything is fine, so you report a bug with Let's Encrypt. But nothing was wrong on their side.

Now, unlike typical "DNS propagation" myths the times for authoritative servers are usually minutes and can be only seconds for a sensible design (SFTP servers is not a sensible design) so you can just add a nice generous allowance of time and it'll usually work. But clearly the Right Thing™ is to have an API that actually confirms the authoritative servers are updated before returning OK.

henrikschroder · on June 22, 2020

> So you check DNS - your cache infrastructure calls lime, which has updated and gives the correct answer, it seems like everything is fine

Been there, done that, got burned. If you're mucking around with DNS records that are going to be verified by someone else, never trust your local lookups, always try to verify through a fresh third party that everything resolves correctly before you submit.

Because oops, something went wrong, the verification hit a wildcard entry with a cache time of days, and now you have to wait that long before trying again, because that entry isn't budging from other resolver's caches...

m3047 · on June 23, 2020

Yeah, flushing the cache in your own recursive resolver doesn't flush them all over the internet.

m3047 · on June 23, 2020

Thow off the chains of victimhood and run your own caching resolver: flush cache anytime you want!

jeffbee · on June 23, 2020

"Propagate" just means "spread". Your latest record certainly does "propagate" in that it displaces the previous version of the record in resolvers' caches.

muppetman · on June 23, 2020

People use it in the context of "I've made the change now the change has to be pushed out to everyone" but that's not true. Everyone else has to pull the change.

It's the idea that it spreads out slowly from the centre to the far reaches that annoys me - which is what most people seem to be suggesting when they refer to "DNS Propagating".

It's not one big push, it's millions of tiny pulls.

HenryBemis · on June 22, 2020

Old guy here: maybe people were confusing DNS with the WINS service that was helping to propagate ("replicate") the name/servers changes 20 years ago?

asciimike · on June 22, 2020

https://howdns.works is one of my favorite educational booklets on the subject. Not as in depth as many other resources, but highly amusing and fairly sticky.

logikblok · on June 22, 2020

This is brilliant thanks.

Tepix · on June 23, 2020

There is a very elegant way to update your DNS if you are running djbdns: You can optionally specify date ranges for every record![1] The server will automatically adjust the TTL. By having two records with different time ranges you can switch IP addresses at an exact moment.

The timestamps are provided in DJBs TAI64 format, use something like https://github.com/hinnerk/py-tai64 to convert them.

[1] config file spec at https://cr.yp.to/djbdns/tinydns-data.html

JoshMcguigan · on June 22, 2020

DNS infrastructure is really interesting. I did a bit of a deep dive on it a few months ago, culminating in running my own authoritative name servers [0] for a while.

[0]: https://www.joshmcguigan.com/blog/run-your-own-dns-servers/

rhizome · on June 22, 2020

One neat way of retaining that control is running your own SOA(s), but getting robust secondaries and listing those in WHOIS so that they take all of the wild queries. Then you just work with your little SOA and everything just propagates as necessary and you don't get hammered.

jrockway · on June 22, 2020

This reminds me that I wish DNS had some way to define a load balancing algorithm for clients to use, so browsers could make load balancing decisions. This would eliminate the need for virtual IP addresses, having to pass originating subnet information up recursive queries, having to remove faulty VIPs (or hosts) from DNS, etc.

It is baffling to me that inside the datacenter, I can control the balancing strategy for every service-to-service transaction, but for the end user's browser, all I can do is some L3 hacks to make two routers appear as one (for failover purposes). L3 balancing would be completely unnecessary if I could just program the user agent to go to the right host, after all. The end result is unnecessary cost and complexity multiplied over a billion websites.

LinuxBender · on June 22, 2020

That is a use of SRV records [1], however it was not accepted into the HTTP protocol specification. I bring it up every time there is a new protocol version but I am too lazy to write an RFC addendum for it and hope that someone else will. Existing protocols may not be modified in this manor once ratified. Maybe HTTP/4.0? /s

Some applications use SRV records for load balancing. Many VoIP and video conferencing apps do this. There is a better list on Wikipedia.

    _service._proto.name. TTL class SRV priority weight port target.

[1] - https://en.wikipedia.org/wiki/SRV_record

jrockway · on June 22, 2020

Yeah, I always liked SRV records. It seems that they proved inadequate for gRPC balancing, so there are new experiments in progress (mostly xDS).

aeden · on June 22, 2020

FWIW, there is an IETF draft that may be suitable for addressing this: https://datatracker.ietf.org/doc/draft-ietf-dnsop-svcb-https...

jrockway · on June 22, 2020

Aha, this sounds like exactly what I was looking for!

m3047 · on June 22, 2020

> [...] I wish DNS had some way to define a load balancing algorithm for clients to use, so browsers could make load balancing decisions.

There's actually the germ of an interesting idea in that statement. If I'm going to go to the trouble, let's say, of running a local TCP forwarder (good for the whole device), can I run a packet sniffer at the same time and watch netflows and edit the responses I return to the device based on what I see performance-wise concerning those flows?

Expert me says that web sites are loaded with too much cruft and since the far end terminations are spread far and wide, there's not enough opportunity to apply that learning in any practical sense. But I could be wrong. (https://github.com/m3047/shodohflo)

m3047 · on June 23, 2020

Why doesn't a stub resolver do this, and try (plain) TCP and DoT (port 852) opportunistically while it's at it: you know, good for the whole device?

m3047 · on June 23, 2020

> port 852

Typo, meant 853.

jlgaddis · on June 23, 2020

FWIW, you can edit your own comments for up to two hours.

jeffbee · on June 23, 2020

As DNS-over-HTTPS spreads I expect to see a lot of innovation in DNS including stapling load-distribution directions along with the RRs.

1996 · on June 22, 2020

How would it be better than round robin DNS with low TTL?

jrockway · on June 22, 2020

Basically, it affords you the ability to cache for longer and still end up with users able to go to your website.

Right now, you can try resolving common hosts, and you will see that they often provide several hosts in response to a lookup. What the browser does with those IPs is up to the browser, the standard does not define what to do. What the administrator that sets up that record wants is "send to whichever one of these seems healthy", and some browsers do do that. Other browsers just pick one at random and report failure, so your redundancy makes the system more likely to break.

What I want is a way to define what to do in this case. Maybe you want to try them all in parallel and pick the first to respond (at the TCP connection level). Maybe you want to try them sequentially. Maybe you want to open a connection to all of them and send 1/n requests to each. Right now, there is no way to know what the service intends, so the browser has to guess. And each one guesses differently.

(You will notice that people like Google and Cloudflare skillfully respond with only one record with a 5 minute TTL. That is so the behavior of the browser is well defined, but it also eats their entire year of 99.999% uptime with one bad reply. Your systems had better be very reliable if DNS issues can eat a year's worth of error budget.)

achiang · on June 23, 2020

> (You will notice that people like Google and Cloudflare skillfully respond with only one record with a 5 minute TTL. That is so the behavior of the browser is well defined, but it also eats their entire year of 99.999% uptime with one bad reply. Your systems had better be very reliable if DNS issues can eat a year's worth of error budget.)

This chapter in the Google SRE book explains how our load balancing DNS works:

https://landing.google.com/sre/sre-book/chapters/load-balanc...

Source: my team runs this service

1996 · on June 23, 2020

I skimmed though, not a bad idea- instead of using a reverse proxy, you are basically doing a poor man multicast, by letting many servers answer a request. And instead of rewriting the packets, you encapsulate, which should be lighter and faster.

It might be a little more resilient than even a very minimal nginx, but more than that, I think it must give you more control about what happens when a packet is not "answered" after some set amount of time - you write off who should have been the answerer, then resend that same packet to another server. Keep a buffer of packet, scrape them from the buffer when ACK'ed by the answerer, resend them to another answerer if not ACK'ed after some set amount of time.

Am I guessing correctly?

It seems a bit overcomplicated for normal usecases, but adequate for a large scale like google.

achiang · on June 23, 2020

The design you propose is stateful, and if you read the chapter closely, you can see we spend a lot of effort to make things stateless.

The main thing I wanted to respond to in this thread about a single bad server destroying your yearly SLO is described in the first paragraph in the section on load balancing at the virtual IP address.

blueblisters · on June 23, 2020

Sorry I couldn't find a clear rationale in the link. Why does Google prefer a stateless load balancer? Is it infeasible to maintain state at that scale?

1996 · on June 23, 2020

Sorry, I didn't read the document that closely. It was a bit too long.

Overall, virtual IPs are still an interesting solution.

jrockway · on June 23, 2020

You might like the actual paper: https://static.googleusercontent.com/media/research.google.c...

1996 · on June 22, 2020

> What the administrator that sets up that record wants is "send to whichever one of these seems healthy

In the rrDNS, remove the A record of the hosts that fails tests, or that has a load that's too high

> Maybe you want to try them all in parallel and pick the first to respond (at the TCP connection level).

Something a geoIP at your DNS can do, certainly not as good as doing that in the client, but it should be decent enough.

> Your systems had better be very reliable if DNS issues can eat a year's worth of error budget

Or, if you aren't google or cloudflare, use a 30 to 60s TTL in rrDNS, with health checks to selectively remove IP that fail, on pools splitting your servers by region with geoIP - this way, if 1/10 of your east coast servers fail, nobody from APAC will be impacted, and only 1/10th of your US east users, and only for the TTL (I'm abstracting ISP that cache for too long, but you already mitigate a lot of the problem there)

I can see how it would be easier to handle that in the browser, but you may already be able to do that with some JS to estimate the latency, then store the result in a cookie that causes a reload to www.eastcoast.yoursite.com if the user sticks to www.yoursite.com or after returning home goes to www.apac.yoursite.com while new measurement say "not optimal" and update the cookie

jrockway · on June 23, 2020

I am kind of OK with this solution, and is in fact my plan to roll out HTTP/3 for my personal sites. I wrote https://github.com/jrockway/nodedns to update a DNS record to contain the IP addresses of all schedulable nodes in my cluster. I can then serve HTTP/3 on a well-known port and it is probable that many requests will reach me successfully. (I had to do this because my cloud provider's load balancer doesn't support UDP, and I don't have access to "floating IPs"; basically my node IPs change whenever the cluster topology needs to change.)

I don't really like it because it still means a minute of downtime when the topology does change. I would prefer telling the browser what strategy to use to try a new node, rather than relying on heuristics and defaults.

m3047 · on June 23, 2020

Unless you're using DoH, in most cases the browser is using the device's stub resolver just like all other apps on the device.

m3047 · on June 22, 2020

DNS has its own load balancing at several levels (and several different kinds):

Nameserver records (NS records) used to locate a resource are served by other nameservers. NS records are chosen from among those offered in response to a query (RRs), and should all be tried if necessary to elicit a response. The algorithm isn't strictly specified and some nameservers will shuffle the order in which they return RRs in their answers, some won't assuming the stub resolver or app will do it. The foregoing also applies to A and AAAA records (returning IP addresses for names), and this has long been used as a quick and easy form of load balancing/failover, except that it doesn't really failover very well unless your app is coded to try all of the different answers (and the stub resolver returns them to your app).

Nameservers querying other nameservers (caching/recursive resolvers) are supposed to compile metrics on response times when they make upstream requests and pick the fastest upstreams once they learn them.

Stub resolvers (running on your device) typically query nameservers in the order you specified them in your network config, but not always.

From the foregoing, you can probably see that running a caching/recursive resolver close to your devices is supposed to be desirable, by design.

So far, so far. ;-)

As specified, and it's never been changed, DNS tries UDP first. "Ok", you think "that must mean it will try TCP" but that's not actually true: it only tries TCP if it receives a UDP response with TC=1 (flagged as truncated). But if there's a UDP frag and it doesn't get all the frags or never gets a UDP response at all it /never/ tries TCP.

You're mixing two very different environments above: 1) a datacenter with (let's just assume) VPCs and 2) a web browser.

In case #2 I'll match your ante and raise you an overloaded segment which is dropping UDP packets, in which case stuff may fail to resolve at all. Oh look, I drew a wildcard: traditionally browsers have utilized the devices stub resolver, but since they've pushed ahead with DoH they've had to implement their own. People think I'm a DNS expert (what do they know?) and I guess conventional wisdom amongst myself and my peers is that UDP should perform better than TCP but anecdotally people are claiming that DoH and DoT perform better for them than their stub resolver. "Must be your ISP messing with you" says someone, "yeah right, that's gotta be it". Me: "did you try running your own local resolver?" them: "wuut?"

So here's where I confess that the experts aren't always right, because I run my own local resolver and I have the same problem: when the streaming media devices are running DNS resolution on the wifi-connected laptop sucks and if I run a TCP forwarder it starts working! (https://github.com/m3047/tcp_only_forwarder).

Now to case #1, the datacenter. I hope you're running your own authoritative and caching server, and you should read about views in your server config guide; using EDNS to pass subnet info is a kludge. If you're writing datacenter apps, you should consider doing your own resolution and using TCP (try the forwarder, I dare you), and provisioning accordingly (because DNS servers assume most requests will come in via UDP).

If you want load balancing "you know, like nginx" I've got news for you: BIND comes with instructions for configuring nginx as a reverse TCP proxy. Oh! Looks like I've got a straight in a single suit: nginx provides SSL termination so I've got DoT for free!

jrockway · on June 22, 2020

I am not really talking about load balancing the DNS traffic, I'm talking about interpreting the response of the DNS query. (The reliability at the network level seems to be handled by moving everything to DNS-over-HTTPS or something, and is a debate for another day.)

For example, consider the case where you resolve ycombinator.com. You get:

    ycombinator.com.        59      IN      A       13.225.214.21
    ycombinator.com.        59      IN      A       13.225.214.51
    ycombinator.com.        59      IN      A       13.225.214.81
    ycombinator.com.        59      IN      A       13.225.214.73

Which of those hosts should I open a TCP connection to to begin speaking TLS/ALPN/HTTP2? The standard doesn't say. I would like a standard that says what to do. (The more interesting case is, say I pick 13.225.214.21 at random. It doesn't respond. What do I do now? Tell the user ycombinator.com is down? Try another one? All of this could be defined by a standard ;)

m3047 · on June 23, 2020

Perfect example. :-) There's not enough information to make a considered response, unless you've got a history of opening TCP connections to them to base a decision on.

Don't get me wrong, I think stub resolver logic is stuck in the 1980s!

If your app or device doesn't have such a history, and no way to obtain it, then maybe the server can do it based on what it knows about its history with IP addresses "close" to yours (the EDNS kludge).

> It doesn't respond. What do I do now? Tell the user ycombinator.com is down? Try another one? All of this could be defined by a standard

I would argue the DNS is clear about this from its own behavior: it tries another one.

Although it's not clear from `pydoc3 socket.create_connection`, it's pretty clear from https://docs.python.org/3/library/socket.html#creating-socke... that socket.create_connection() that it will "...try to connect to all possible addresses in turn until a connection succeeds."

So I would say that the correct action would be to try all possible addresses until one succeeds.

toast0 · on June 23, 2020

> Which of those hosts should I open a TCP connection to to begin speaking TLS/ALPN/HTTP2? The standard doesn't say. I would like a standard that says what to do.

Well, there was an RFC (found it, RFC 3484) that told you to pick the one closest to your network (which wouldn't make a difference in this case, unless you were in say 13.225.214.0/27 or so). But that's not actually helpful, because given two destination IPs, one in the same /8 as me, and one not, I don't have any information that would help me determine which is a better choice.

From experience, most browsers will try a couple IPs before showing an error message, but that's not standard. If you have a fancy authoritative server, a lot of traffic, and a bunch of server IPs, you can get OK balancing by telling some clients some IPs and some clients other IPs; but it depends on having enough diversity in recursive servers; if all of your users are coming from one mobile ISP, chances are you won't get a lot of balancing.

(And I'm sure you already know all this :)

Better to have clients with a bit of intelligence. :)

kokey · on June 23, 2020

I think the main problem with great ideas like this is that some clients will do a really bad job at implementing the spec correctly and one of those clients will be the default browser on a very popular OS or device.

rswail · on June 23, 2020

Isn't that what SRV records are for? If there's one for _http specifying ycombinator.com as the name, then any of those IP addresses should accept a connection on port 80 speaking HTTP. Without independent names, they should all be treated equally and your app (like a browser) gets to try just one or all of them.

If you're talking about subprotocols/versions of HTTP like HTTP2 then you can define subservices, so you could have _http2._http. But no one has proposed that yet :)

Of course with anycast, multiple A records can be redundant :)

r1ch · on June 22, 2020

Another cool thing about DNS - you only need the IP of a single root server to be able to get the nameservers, IPs and everything else to resolve any name in any TLD. The hierarchical nature of DNS is very neat to see in action. I built a toy DNS checker tool[0] while I was learning about it to get a more visual overview, and it ended up being one of my tools I still use every few days to verify a domain is properly delegated if I suspect it has issues.

[0] https://r-1.ch/r1dns/ (and yes it has a lot of bugs, don't break it :)).

kalleboo · on June 23, 2020

Off topic,

you wouldn't be the R1ch of old something awful forums fame, would you? I used to run a waffleimages mirror :)

r1ch · on June 23, 2020

Yup, that's me! The SA forums inspired a lot of my random side projects, the TF2 server integration stuff was especially fun (got me into reverse engineering Valve's game .so to fix bugs). I miss those days when I had all the free time in the world :).

mrb · on June 23, 2020

"What I’d expect to happen in practice when updating a DNS record with a 5 minute TTL is that a large percentage of clients will move over to the new IPs quickly (like within 15 minutes),"

That's not true. The vast majority of clients will move to the new IP within the TTL, within 5min (not 15min). Then there will be some stragglers that slowly update over the next hours/days (typically poorly written bots)

Source : my own experience updating a site with 500k hits per month and sniffing and watching network traffic at the 3 endpoints: DNS, old IP, new IP.

vbsteven · on June 23, 2020

Or any proxy using default nginx configuration which caches DNS resolution for upstream blocks at first use and never invalidate until the config is reloaded or Nginx restarted.

rkagerer · on June 23, 2020

How distributed is DNS these days in practice, compared to 10 or 20 years ago?

If the major internet powers agreed to stop serving responses to DNS servers they detected weren't respecting reasonable TTL's, do you think they could "bully" the industry into tightening things up? (Kind of how Google and others compelled the web toward widespread HTTPS)

rswail · on June 23, 2020

I'm not sure why people don't run their own caching server instead of stubs. Or at least for the LAN? You don't have to use your ISPs DNS servers, unless they are evil and capture port 53.

ricardo81 · on June 22, 2020

Recursive DNS servers can also throw you off the scent a bit by giving you an answer that is not the same as the authoritative server.

I've seen 8.8.8.8 return something other than NXDOMAIN for some domains that do not exist

Cloudflare will not honour dns ANY requests

Knowing how to query the authoritative nameservers is a handy tool for debugging.

LogicX · on June 22, 2020

Agreed. There's a lot of 'magic' that goes into running a quality recursive resolver, least of which is eDNS0 and EDNS Client Subnet - which intentionally returns different answers based on the requester's source IP -- in most cases for the most-optimal CDN location to be returned.

Test with:

dig @ns1.google.com www.google.es +subnet=193.8.172.75/24

dig @ns1.google.com www.google.es +subnet=157.88.0.0/16

Note how you get different IPs returned.

preinheimer · on June 22, 2020

Here's a pretty clear demo of different results around the world: https://wheresitup.com/demo/results/5ef1403cb8e31e3fb3298503

jamesholden · on June 23, 2020

As someone newly trying to learn DNS, I don't _use_ 8.8.8.8 personally. So I was confused at first why they kept offering it up on the page. It might help in the first reference to it, so say 'Google DNS' along with the 8.8.8.8

devdas · on June 23, 2020

1.1.1.1 is Cloudflare. 8.8.8.8 is Google 9.9.9.9 is PCH/IBM as Quad 9. They all also offer IPv6.

z3t4 · on June 22, 2020

I have an idea: Because DNS requests are made from a server close to the user, the TLD should use a GEO table in order to give the the two closest DNS servers. Kinda like Anycast without having to configure routing/BGP sessions.

m3047 · on June 23, 2020

A lot of DNS infrastructure is anycast.

z3t4 · on June 23, 2020

Yes. But most domains eg. websites dont have Anycast. And Anycast is expensive if you just have a private web site or blog. And Anycast services have poor coverage, its only clodflare that has decent coverage. But they only offer proper DNS service to enterprise customers.

m3047 · on June 23, 2020

> But most domains eg. websites dont have Anycast.

Are you talking about the (mathematical) "domain" in the DNS specs, or the popular domain i.e. the web server?

The latter is arguably true, in which case the geoip proposition is moot: there is only one web server. Maybe you mean the web server has multiple addresses instead of being anycast. Ok, yes that happens; and some DNS servers do use geoip to tailor replies to try and hand the closest address. Here is from tbe BIND ARM:

"By default, if a DNS query includes an EDNS Client Subnet (ECS) option which encodes a non-zero address prefix, then GeoIP ACLs will be matched against that address prefix. Otherwise,they are matched against the source address of the query"

Regarding the former, does anyone have info on how many DNS providers use anycast? I think a lot; or maybe I should say that a lot of domains are hosted on anycast, the DNS isn't as distributed as it used to be. If you're using DNS as a distributed key/value store, I hope you're doing a better job thinking about externalities (leakage) than e.g. the antivirus companies in terms of locating authoritatives and how you update them opaquely.

Personally I think stub resolvers are stuck in the 1980s. They could do a lot more by monitoring traffic health and editing DNS replies. Due to peering arrangements you could be in the same IX as someone else but that might not be the best route. Traceroute, SYN exchanges, (IP) TTLs might be better signals for determining the health of a particular path. I'd never thought about it until this thread started, maybe the stub resolver could use netflow analysis to inform editing the responses it returns to the applications.

z3t4 · on June 23, 2020

DNS getting less distributed is a problem, as public DNS services generally do not hold a cache for long. They also give up if it's unlucky and tries a DNS server that is down!

So my case for top level (TLD) GeoIP: I have many DNS servers for my web-address/domains: Three in EU and two in US. The problem is that when the TLD servers sends the list of DNS servers it's randomized. Instead I want it to return the list in GEO order (and also network health order), so that the recursive resolver ask the best/closest DNS server. So in worst case scenario for a recursive resolver in EU tries a DNS server in USA, and it happens to be down, and then gives up. The best case scenario is that it tries the closest server in EU.

Trying to solve my problem I've tried the top 10 DNS providers (ranked by uptime and query speed), which uses anycast. Only two could be used as secondary/slaves, and both of them took over two days to propagate an update (they did not use TTL). The reason why I need fast updates is because of Letsencrypt which requires DNS challenges for wild-star domain SSL/TSL.

About anycast use, the root server's have been using anycast for a while now. Some TLD's use anycast I think (haven't actually checked). For web-hotels, and ISP's most do not use Anycast. ISP however have the DNS server's very close to the end users and they are good at caching. Which is the second reason why I'm against DNS centralization. Querying from example 8.8.8.8 is often 10x slower then using ISP DNS (assuming the ISP have the query cached).

Anycast, although proven to work nice for the root servers, which makes it harder to DDOS-attack, doesn't actually work that great. I argue they could just list the servers in GEO order instead of configuring BGP routes. When I evaluated the "top 10" Anycast DNS providers, sometimes my amateur setup got lucky (eg test server from US got the US IP first and vice versa) and thus beat the Anycast network in query performance/latency.

m3047 · on June 23, 2020

> So in worst case scenario for a recursive resolver in EU tries a DNS server in USA, and it happens to be down, and then gives up.

Recursive resolving algorithm for caching servers is actually addressed in the RFCs, it /should/ be trying all of them and using its own findings to prefer the best-performing one(s). But it doesn't know about anycast, if a recursive was switching between anycast nodes (with the same address) then that would imply that routes were flapping. :-(

Interesting data points.

I think the elephant in the room is the Universal Terrestrial Radio Access Network (UTRAN) a.k.a. "mobile" and I don't work with provisioning much so all I can say is that I suspect that if mobile is your concern you just prostrate yourself to the UTRAN masters and co-locate wherever they tell you to.

SSL/TLS cert management is a fiasco in my opinion, it's a shame that DNSSEC hasn't achieved market dominance so that if you own a domain you can automatically sign certs for it yourself. (Then we wouldn't need CA lists in browsers and OSes either.)

fomine3 · on June 23, 2020

The title "update your DNS" looks ambiguous