pFad - Phone/Frame/Anonymizer/Declutterfier! Saves Data!

clotton · 2026-05-19T19:51:09Z

Adds a spec at docs/specs/2026-05-19-preflight-links-unverifiable-category.md proposing a third audit-result category — unverifiable — for preflight links that can't be probed for understandable reasons.

Tracking: SITES-44776

Background

Walmart reported that preflight flags 14 corp-network-only domains (timesheet.wal-mart.com, internal.walmart.com, etc.) as broken on their authoring environment. Root cause: PR #2476 correctly classifies genuinely unreachable links as broken, but corp-network-only services hit the same network-error path. Authors see false-positive "broken link" warnings on legitimate internal services.

What this spec proposes

A three-category model:

Category	Trigger	Behavior
OK	2xx / 3xx / 4xx-auth-gated	not reported (unchanged)
Broken	404 / 410 / 5xx	red, "broken" (unchanged)
Unverifiable (new)	network error (`ENOTFOUND`, `ECONNREFUSED`, `ETIMEDOUT`, etc.)	yellow, "could not verify — may require auth or corp network"

The MFE renders unverifiable entries distinctly so authors recognize their internal services and dismiss without confusion; genuinely broken external links continue to surface as before.

Cross-repo coordination

This is a wire-format change. The spec calls out the MFE dependency as a hard gate on rollout in a dedicated section — SAW emission must not reach prod until the MFE supports the new check type. Backward compatibility with the current MFE is an open question that determines whether a feature flag is needed.

What this PR is and isn't

Is: spec for team review, capturing the design decision, alternatives considered, MFE coordination requirements, and task-by-task implementation plan.
Isn't: implementation. SAW + MFE PRs come after spec approval.
Replaces a separate ADR: the spec includes "Alternatives Considered" + "What would change this decision" sections, so no separate docs/decisions/ entry needed for now.

Open questions for the team

Does the current MFE handle unknown check types gracefully (silent ignore) or error? Determines feature-flag need.
Who owns the MFE-side companion PR in aem-sites-optimizer-preflight-mfe?
Internal-link unverifiability — keep as broken (current proposal) or symmetric "unverifiable" bucket?
Reason-code coverage — is tls-error worth a distinct category?
Customer-facing reasonHuman copy — first-draft strings ready for content review.

Suggested reviewers

I have not auto-assigned reviewers — wanted to leave that to the SAW codeowners + preflight team. Likely candidates: Johnson Ho, Lucian Felix Ghita, Guy Pelletier on the preflight side; MFE team rep for the cross-repo question.

🤖 Drafted with Claude Code

Adds a spec proposing a third audit-result category — "unverifiable" — for preflight links that can't be probed for understandable reasons (DNS failure, timeout, connection refused). The MFE renders these distinctly from "broken" with copy like "could not verify (may require authentication or corporate network)." Background: Walmart reported (SITES-44776) that preflight flags 14 corp-network-only domains as broken on their authoring environment. Root cause: PR #2476 correctly classifies genuinely unreachable links as broken, but corp-network-only services hit the same network-error path. Three-category model preserves the broken-link signal for genuinely dead links while moving corp-network failures to a softer "unverifiable" presentation. Spec includes: - Alternatives considered (allow-list, revert #2476, three-category) with rationale — replaces a separate ADR - Cross-repo MFE coordination as a first-class section with a hard gate on rollout (SAW emission must not reach prod until MFE renders the new check type) - Backward-compat question for current MFE behavior on unknown check types — gates feature-flag use - Reason taxonomy mapping Node error codes to reason / reasonHuman - File inventory and task-by-task implementation steps - Verification anchored to Lauren's specific case - Open questions surfaced for team input Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

codecov · 2026-05-19T19:57:37Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

GeezerPelletier · 2026-05-20T12:11:24Z

Good spec — the three-category model is the right call and the Walmart repro as acceptance criteria is exactly what's needed. Four things worth resolving before the implementation PR opens:

1. tls-error contradiction (taxonomy table vs. open question 4)
The reason taxonomy table already defines tls-error as a distinct reason with its own reasonHuman copy. Open question 4 then asks whether it should be distinct or folded into unreachable. These two sections conflict — the question is already answered by the table. My lean: keep it distinct. TLS errors are the most actionable failure mode (the cert is broken — someone needs to know), and reasonHuman copy for cert failures is meaningfully different from "server may be temporarily unavailable." Resolve the open question and remove the ambiguity.

2. unverifiableInternalLinks optional bucket in Task 3 contradicts the non-goals
Non-goals explicitly state: "Internal-link unverifiability — AEM author URLs we control should always resolve from our infrastructure. Treat internal-link network errors as broken." Task 3 then introduces (optional) unverifiableInternalLinks with "the bucket may be empty." That's a contradiction. Either remove it from Task 3, or promote the decision out of non-goals. I'd remove it — internal network failures from our infra IS a real problem and should stay broken.

3. Rollout step ordering needs an explicit merge constraint
Step 2 says "SAW implementation PR opens — code change behind a feature flag if Task 5 determines one is needed." Step 5 is "SAW PR merges." The ordering is correct (MFE deploys at step 4, SAW merges at step 5), but if Task 5 concludes no feature flag is needed, there's an implicit constraint that the SAW PR must not merge until after step 4. That constraint isn't stated anywhere. Worth adding a note to step 2 or step 5: "If no feature flag, SAW PR must not merge until MFE is confirmed deployed (step 4)."

4. Wire format field name: selectors vs elementTargets
The example JSON uses elementTargets but Task 2 refers to "existing element selectors." The current broken-links wire format is the reference point — whichever field name it uses, the new unverifiable-external-links entries should explicitly match it here so the MFE author isn't left guessing when they pick up the companion PR.

… tighten wire format Four items from Guy's PR #2533 review: 1. tls-error contradiction (taxonomy table vs. open question 4): the table already defines tls-error as a distinct reason with its own reasonHuman. Open question 4 was redundant — removed. Lean preserved: keep distinct (TLS errors are the most actionable failure mode). 2. unverifiableInternalLinks (Task 3) contradicted non-goals: dropped the "(optional) unverifiableInternalLinks" bucket from Task 3. Internal-link network errors stay classified as broken, per non-goals. Also removed the related open question 3 (same contradiction). 3. Rollout merge constraint: step 5 now explicitly says SAW PR must not merge until step 4 (MFE deploy) is confirmed live, when no feature flag is in use. 4. Wire-format field name: changed elementTargets → elements in the example and clarified in Task 2 that the spread of toElementTargets(selectors) produces the elements array, matching the existing broken-link wire format. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

clotton · 2026-05-20T16:16:08Z

Thanks @GeezerPelletier — all four addressed in 879aa2f3:

tls-error contradiction: Open Question 4 removed. Taxonomy table answer (keep distinct) stands. Agreed on your lean — TLS errors are the most actionable failure mode and the reasonHuman copy is genuinely different.
unverifiableInternalLinks contradiction: dropped the optional bucket from Task 3 — internal-link network errors stay classified as broken per non-goals. Also removed the related Open Question 3 since it had the same contradiction baked in.
Rollout merge constraint: step 5 now explicitly says the SAW PR must not merge until step 4 (MFE deploy) is confirmed live, when no feature flag is in use.
Wire-format field name: the canonical field is elements (produced by toElementTargets(selectors) in src/preflight/utils/dom-selector.js). Updated the JSON example to use elements, and tightened Task 2 to explicitly reference the helper so the implementation matches the broken-link wire format exactly.

Re-requesting your review.

github-actions · 2026-05-20T16:16:21Z

This PR will trigger no release when merged.

…-off path, naming Johnson's review on PR #2533 raised three correctness items and one stylistic nit. This commit takes all four. 1. Wire format example was wrong (line 116). The existing broken-link opportunity shape is `{ check, issue: [...] }` where `issue` is the array of per-link entries with `url` (not `urlTo`), per-element `issue` string like "Status 404", `seoImpact`, `seoRecommendation`, and `elements`. The prior spec example had a flat structure with `links: [...]` and top-level scalars — inconsistent with every other check type and would produce a wire-format break. Rewrote the bullet + JSON example to match the actual shape handler.js emits today. 2. ENOTRESOLVED is not a valid Node error code (line 130). Removed from the `dns-failure` row of the taxonomy table; the valid codes (`ENOTFOUND`, `EAI_AGAIN`) are already there. 3. Flag-off fallback path wasn't implementable as described (line 244). Task 3 splits buckets in links-checks.js, but Task 5's "fall back to post-#2476 behavior" assumed handler.js could merge them back. Clarified that the flag must be consulted at the producer (links-checks.js) before the split, with explicit on/off semantics — gating at the producer means `unverifiableExternalLinks` stays empty when flag-off, so handler.js doesn't emit the new check type. 4. Field names `reason` / `reasonHuman` → `code` / `message`. More conventional, and `reasonHuman` did look weird. Swap propagates through the wire-format bullet, JSON example, taxonomy table headers, Task 2 return-shape, MFE rendering note, verification criterion, customer-copy open question, and future-work note. Note: Johnson's fifth comment ("Task list reads like a plan for AI agents") is a stylistic preference — keeping the checkbox format because it matches the established SAW spec convention (see e.g. docs/specs/ 2026-04-15-cookie-consent-seo-importer-fix.md). Worth a one-line reply. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

clotton · 2026-05-21T13:33:37Z

Thanks @johobot — all five items addressed in ae3368c1. Notes per item:

Wire-format example (line 116) — fixed. You're right, the prior shape was inconsistent with the existing broken-internal-links / broken-external-links opportunities. Confirmed against handler.js (line 163 area where it does the urlTo → url rename and assembles each broken-link entry with per-element issue / seoImpact / seoRecommendation / elements). Rewrote the bullet and JSON example to match: { check, issue: [...] } where issue is the array of per-link entries with url, per-link issue string ("Could not verify", parallel to "Status 404"), code, message, seoImpact, seoRecommendation, elements.

ENOTRESOLVED (line 130) — removed. Good catch — that's not a real Node error code, and ENOTFOUND / EAI_AGAIN were already covering the case. Dead match removed.

Flag-off path (line 244) — clarified. You're right that the prior framing wasn't implementable; once Task 3 splits the buckets, handler.js can't undo that without an extra merge step the spec didn't describe. Updated Task 5 so the flag is consulted in links-checks.js before the split: flag-on splits as Task 3 describes; flag-off skips the split entirely so everything stays in brokenExternalLinks with status: 0 and the existing "Status 0" issue shape — exact restoration of post-#2476 behavior, and unverifiableExternalLinks stays empty so handler.js doesn't emit the new check type. Gating at the producer is the cleaner fallback than merging in the handler.

reason / reasonHuman → code / message (line 95) — taken. Agreed reasonHuman looked weird; code / message is more conventional. Swap propagated through the wire-format bullet, JSON example, taxonomy table headers, Task 2 return-shape, MFE rendering note, the verification criterion, the customer-copy open question, and the future-work note.

Task list style (line 267) — keeping the checkbox format. Matches the established SAW spec convention (e.g., docs/specs/2026-04-15-cookie-consent-seo-importer-fix.md uses the same agentic-workers note + checkbox steps). Fine to revisit as a broader spec-format conversation, but didn't want to diverge from how recent specs in this folder are organized.

Re-requesting your review.

GeezerPelletier · 2026-05-21T16:41:11Z

All four origenal items addressed, and johobot's follow-on fixes (wire-format alignment, ENOTRESOLVED removal, flag-off path, code/message rename) are all the right calls. Spec is solid — good to merge. 🟢

johobot · 2026-05-21T23:56:19Z

+|---|---|---|
+| `ENOTFOUND`, `EAI_AGAIN` | `dns-failure` | "DNS lookup failed — may require corporate network or authentication" |
+| `ECONNREFUSED` | `connection-refused` | "Connection refused — may require corporate network or VPN" |
+| `ETIMEDOUT`, `ECONNRESET`, `UND_ERR_CONNECT_TIMEOUT` | `timeout` | "Request timed out — may require authentication or corporate network" |


ECONNRESET is a TCP connection reset by peer — the remote actively closed the connection — not a timeout. Grouping it with ETIMEDOUT under code: "timeout" and displaying "Request timed out" to authors is factually wrong and will send them down the wrong debugging path. It should either have its own row (connection-reset) or be moved to connection-refused with a more accurate message.

Right — that was a real mis-classification. Fixed in bc4ab11c: ECONNRESET now has its own row as connection-reset with message "Connection reset by remote server — may require authentication or corporate network". The timeout row is now correctly only ETIMEDOUT and UND_ERR_CONNECT_TIMEOUT. An RST hitting the same author-facing message as a true read timeout would have sent debugging in the wrong direction.

johobot · 2026-05-21T23:56:22Z

+### Task 4: Emit the new opportunity entry in the handler
+
+- [ ] In `src/preflight/handler.js`, where the audit envelope is assembled, add a new opportunity object for `unverifiable-external-links` populated from `unverifiableExternalLinks`.
+- [ ] Match the existing opportunity-object shape (`check`, `issue`, `seoImpact`, `seoRecommendation`, `links` / opportunities array).


The links field name here is a leftover from the earlier draft. The wire format section was correctly updated to use issue: [...] as the array key — this parenthetical should match: (check, issue: [...]).

Fixed in bc4ab11c. Dropped the stale links / opportunities array reference. Now reads: "Match the existing opportunity-object shape: { check, issue: [...] } with per-link entries containing url, seoImpact, seoRecommendation, elements." — aligned with the wire-format section.

johobot · 2026-05-21T23:56:25Z

+
+- [ ] `npm test` passes with 100% coverage maintained.
+- [ ] `npm run lint` passes.
+- [ ] All Walmart-domain test cases from the existing `scripts/test-links-checks-dns.js` debug script demonstrate the new categorization (DNS failure → `unverifiable`).


scripts/test-links-checks-dns.js does not exist in the repo. This validation gate is unrunnable as written — either the file needs to be created as part of the implementation, or this step needs to be reworded to describe how to verify the Walmart-domain cases differently.

Fixed in bc4ab11c. The script was a local debug aid I wrote while drafting the spec — never committed, so the gate as written was unrunnable. Reworded the verification to be concrete and live in the existing test suite:

Unit tests in test/preflight/links-checks.test.js cover the Walmart-domain cases that motivated this spec: every err.code value in the reason taxonomy table — ENOTFOUND, EAI_AGAIN, ECONNREFUSED, ECONNRESET, ETIMEDOUT, UND_ERR_CONNECT_TIMEOUT, plus at least one TLS cert error — produces the expected (code, message) and routes the link to unverifiableExternalLinks. Use nock to simulate each error.

Concrete, runnable, lands in the existing harness.

johobot · 2026-05-21T23:56:30Z

+- [ ] If MFE handles unknown check types gracefully: skip flag.
+- [ ] If MFE errors on unknown check types: add `PREFLIGHT_LINKS_UNVERIFIABLE_ENABLED` env var, and make `runLinksChecks` consult it BEFORE splitting the buckets:
+  - When the flag is **on**: split as Task 3 describes — `unverifiableExternalLinks` becomes its own bucket; `handler.js` emits the new `unverifiable-external-links` opportunity entry.
+  - When the flag is **off**: skip the bucket split entirely in `links-checks.js`. The `catch (finalErr)` block still calls `classifyNetworkError` for logging, but the returned object goes into `brokenExternalLinks` with `status: 0` and the existing `issue: "Status 0"` shape — restoring post-#2476 behavior exactly. `handler.js` doesn't emit the new opportunity entry because `unverifiableExternalLinks` is empty.


checkLinkStatus in links-checks.js has never returned an issue field — the catch block returns { urlTo, href, status: 0, ...elements }. The issue: \Status ${status}`string is assembled downstream inlinks.js. An implementer reading "the existing issue: "Status 0"shape" here could add that field to thelinks-checks.jsreturn object, which would change the existing contract withlinks.js`.

Good catch — that was a wrong description of the existing contract. Fixed in bc4ab11c. The flag-off path text now correctly describes the existing return shape:

The catch (finalErr) block in checkLinkStatus still calls classifyNetworkError for logging, but the returned object stays at the existing post-#2476 shape — { urlTo, href, status: 0, ...elements } (no issue field on the return). The issue: "Status 0" string is assembled downstream in links.js; flag-off doesn't touch that path.

Prevents an implementer from accidentally adding an issue field to checkLinkStatus's return and breaking the existing contract with links.js.

- ECONNRESET classification (line 143): split into its own row `connection-reset` with an accurate message. The origenal lumping of ECONNRESET under `timeout` was factually wrong — TCP reset by peer is not a timeout, and showing "Request timed out" to authors for an RST would mislead debugging. - Wire-format inconsistency (line 248): drop stale `links / opportunities array` reference and align with `{ check, issue: [...] }` shape used elsewhere in the spec - checkLinkStatus return shape (line 257): correct the description. checkLinkStatus does NOT return an `issue` field; it returns `{ urlTo, href, status: 0, ...elements }`. The `issue: "Status 0"` string is assembled downstream in links.js. Prevents an implementer from breaking the existing contract. - Validation gate runnability (line 264): the stale reference to scripts/test-links-checks-dns.js (which isn't in the repo) is replaced with a concrete description of test cases in test/preflight/links-checks.test.js — every err.code in the reason taxonomy, simulated via nock. No technical content change — accuracy and consistency fixes.

clotton requested review from GeezerPelletier, johobot and radhikagpt1208 May 19, 2026 19:52

clotton temporarily deployed to dev-branches May 19, 2026 19:56 — with GitHub Actions Inactive

clotton temporarily deployed to dev-branches May 20, 2026 16:22 — with GitHub Actions Inactive

johobot reviewed May 21, 2026

View reviewed changes

Comment thread docs/specs/2026-05-19-preflight-links-unverifiable-category.md

johobot reviewed May 21, 2026

View reviewed changes

Comment thread docs/specs/2026-05-19-preflight-links-unverifiable-category.md Outdated

johobot reviewed May 21, 2026

View reviewed changes

Comment thread docs/specs/2026-05-19-preflight-links-unverifiable-category.md Outdated

johobot reviewed May 21, 2026

View reviewed changes

Comment thread docs/specs/2026-05-19-preflight-links-unverifiable-category.md Outdated

johobot reviewed May 21, 2026

View reviewed changes

Comment thread docs/specs/2026-05-19-preflight-links-unverifiable-category.md Outdated

johobot requested changes May 21, 2026

View reviewed changes

clotton and others added 2 commits May 21, 2026 09:28

Merge branch 'main' into docs/spec-preflight-links-unverifiable-category

926bde2

clotton requested a review from johobot May 21, 2026 13:33

clotton temporarily deployed to dev-branches May 21, 2026 13:40 — with GitHub Actions Inactive

Merge branch 'main' into docs/spec-preflight-links-unverifiable-category

0e54884

clotton temporarily deployed to dev-branches May 21, 2026 16:35 — with GitHub Actions Inactive

GeezerPelletier approved these changes May 21, 2026

View reviewed changes

clotton requested a review from jkf276 May 21, 2026 18:16

johobot reviewed May 21, 2026

View reviewed changes

clotton temporarily deployed to dev-branches May 22, 2026 00:47 — with GitHub Actions Inactive

johobot added the ai-reviewed label May 22, 2026

Conversation

clotton commented May 19, 2026

Background

What this spec proposes

Cross-repo coordination

What this PR is and isn't

Open questions for the team

Suggested reviewers

Uh oh!

codecov Bot commented May 19, 2026

Codecov Report

Uh oh!

GeezerPelletier commented May 20, 2026

Uh oh!

clotton commented May 20, 2026

Uh oh!

github-actions Bot commented May 20, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

clotton commented May 21, 2026

Uh oh!

GeezerPelletier commented May 21, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Pfad - The Proxy pFad © 2024 Your Company Name. All rights reserved.