Skip to content

c-ares DNS misconfiguration on Windows Consumption causes mongodb+srv:// and dns.resolve*() failures #830

Description

@TsuyoshiUshio

Problem

On Windows Consumption plan (Dynamic SKU), the Node.js dns.resolve*() family of functions (which use the c-ares library internally) can fail with ECONNREFUSED 127.0.0.1:53 or ETIMEOUT errors. This affects any application that relies on SRV record lookups, most notably mongodb+srv:// connection strings used with MongoDB Atlas and Cosmos DB MongoDB API.

Root Cause

Node.js has two completely separate DNS resolution paths:

Path Mechanism Sandbox-aware?
dns.lookup() OS getaddrinfo() Yes — the Consumption sandbox detours getaddrinfo() to Azure DNS (168.63.129.16)
dns.resolve*() / dns.resolveSrv() c-ares library (raw UDP) No — c-ares reads DNS servers directly from the Windows registry

A known c-ares bug causes a character-width mismatch (sizeof(TCHAR) vs sizeof(WCHAR)) when reading the Windows registry DhcpNameServer / NameServer entries. On Consumption plan VMs — where the DHCP DNS suffix list is empty — this causes c-ares to:

  1. Misread the empty DHCP suffix string due to the character-width bug
  2. Discard the valid Azure DNS server address (168.63.129.16)
  3. Fall back to 127.0.0.1:53 (localhost, where no DNS server is listening)

As a result, any dns.resolveSrv() call (e.g., MongoDB SRV connection discovery) fails immediately with ECONNREFUSED or times out.

Why most apps are unaffected

Most Node.js applications never trigger the c-ares code path. HTTP clients, regular database drivers, and net.connect() all use dns.lookup() (getaddrinfo), which is correctly sandboxed. Only applications that explicitly use:

  • mongodb+srv:// connection strings (triggers dns.resolveSrv())
  • dns.resolve(), dns.resolve4(), dns.resolveSrv(), etc. directly
  • Libraries that create dns.Resolver instances

...are affected. Kusto telemetry across the entire East US fleet shows only ~8 apps with querySrv errors in a 24-hour window, confirming the narrow blast radius.

Mitigation

const dns = require('dns');dns.setServers(['168.63.129.16']);

Upstream fix status

  • c-ares fix: c-ares/c-ares#1064 — merged
  • Node.js backport: nodejs/node#61453 — merged, backported to v20.20.1, v24.14.0, v25.6.1
  • Current platform versions: Node 20.20.2, 22.22.3, 24.16.0 — all include the fix
  • Fix completeness: nodejs/node#62326 reports the fix may be incomplete — users still observe ECONNREFUSED and dns.getServers() returning 127.0.0.1 on patched versions (v24.14.0, v25.8.1)

Because the upstream fix is potentially incomplete, upgrading Node.js versions alone does not reliably resolve the issue.

Proposed Solution

Add a DNS server guard to the worker entry point (nodejsWorker.ts) that detects and corrects the c-ares misconfiguration before any user code or gRPC connections are established.

Implementation sketch

// Fix c-ares DNS server misconfiguration on Windows Consumption sandbox.
// See: https://github.com/c-ares/c-ares/issues/1056
//      https://github.com/nodejs/node/issues/62326
if (process.platform === 'win32' && process.env.WEBSITE_SKU === 'Dynamic') {
    try {
        const dns = require('dns');
        const servers = dns.getServers();
        if (servers.length === 0 || (servers.length === 1 && servers[0] === '127.0.0.1')) {
            dns.setServers(['168.63.129.16']);
        }
    } catch (_) {
        // Best-effort: do not crash if DNS module fails
    }
}

Why this approach

Aspect Detail
Scope Only activates on Windows Consumption (WEBSITE_SKU === "Dynamic"). Dedicated, Premium, ASE, and Linux are unaffected
Safety Checks dns.getServers() first; only overrides if the current value is broken (empty or 127.0.0.1)
Timing Executes before gRPC setup, before user code, before library imports — the earliest possible point
Code size ~10 lines
Side effects None. dns.setServers() only affects the global c-ares channel; dns.lookup() behavior is unchanged

Alternatives considered

Alternative Why not
Wait for upstream c-ares fix Fix is incomplete per nodejs/node#62326; timeline unknown
Patch the sandbox to intercept c-ares UDP Requires changes to the site container; complex with large blast radius
Monkey-patch dns.resolveSrv() Fragile, version-dependent, maintenance burden
Customer-side dns.setServers() workaround Requires every affected customer to discover and apply the fix; not scalable

Impact

  • Current: Multiple CRIs filed, affecting customers using mongodb+srv:// on Windows Consumption
  • Fleet-wide: ~8 apps affected in East US alone in any 24-hour window (Kusto: FunctionsLogs | where Summary has "querySrv")
  • Customer experience: Complete inability to connect to MongoDB Atlas / Cosmos DB MongoDB API when using SRV connection strings

Related Issues

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions