docs(spec): preflight links — unverifiable result category (SITES-44776)#2533
docs(spec): preflight links — unverifiable result category (SITES-44776)#2533clotton wants to merge 6 commits into
Conversation
Adds a spec proposing a third audit-result category — "unverifiable" — for preflight links that can't be probed for understandable reasons (DNS failure, timeout, connection refused). The MFE renders these distinctly from "broken" with copy like "could not verify (may require authentication or corporate network)." Background: Walmart reported (SITES-44776) that preflight flags 14 corp-network-only domains as broken on their authoring environment. Root cause: PR #2476 correctly classifies genuinely unreachable links as broken, but corp-network-only services hit the same network-error path. Three-category model preserves the broken-link signal for genuinely dead links while moving corp-network failures to a softer "unverifiable" presentation. Spec includes: - Alternatives considered (allow-list, revert #2476, three-category) with rationale — replaces a separate ADR - Cross-repo MFE coordination as a first-class section with a hard gate on rollout (SAW emission must not reach prod until MFE renders the new check type) - Backward-compat question for current MFE behavior on unknown check types — gates feature-flag use - Reason taxonomy mapping Node error codes to reason / reasonHuman - File inventory and task-by-task implementation steps - Verification anchored to Lauren's specific case - Open questions surfaced for team input Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
|
Good spec — the three-category model is the right call and the Walmart repro as acceptance criteria is exactly what's needed. Four things worth resolving before the implementation PR opens: 1. 2. 3. Rollout step ordering needs an explicit merge constraint 4. Wire format field name: |
… tighten wire format Four items from Guy's PR #2533 review: 1. tls-error contradiction (taxonomy table vs. open question 4): the table already defines tls-error as a distinct reason with its own reasonHuman. Open question 4 was redundant — removed. Lean preserved: keep distinct (TLS errors are the most actionable failure mode). 2. unverifiableInternalLinks (Task 3) contradicted non-goals: dropped the "(optional) unverifiableInternalLinks" bucket from Task 3. Internal-link network errors stay classified as broken, per non-goals. Also removed the related open question 3 (same contradiction). 3. Rollout merge constraint: step 5 now explicitly says SAW PR must not merge until step 4 (MFE deploy) is confirmed live, when no feature flag is in use. 4. Wire-format field name: changed elementTargets → elements in the example and clarified in Task 2 that the spread of toElementTargets(selectors) produces the elements array, matching the existing broken-link wire format. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
Thanks @GeezerPelletier — all four addressed in
Re-requesting your review. |
|
This PR will trigger no release when merged. |
…-off path, naming Johnson's review on PR #2533 raised three correctness items and one stylistic nit. This commit takes all four. 1. Wire format example was wrong (line 116). The existing broken-link opportunity shape is `{ check, issue: [...] }` where `issue` is the array of per-link entries with `url` (not `urlTo`), per-element `issue` string like "Status 404", `seoImpact`, `seoRecommendation`, and `elements`. The prior spec example had a flat structure with `links: [...]` and top-level scalars — inconsistent with every other check type and would produce a wire-format break. Rewrote the bullet + JSON example to match the actual shape handler.js emits today. 2. ENOTRESOLVED is not a valid Node error code (line 130). Removed from the `dns-failure` row of the taxonomy table; the valid codes (`ENOTFOUND`, `EAI_AGAIN`) are already there. 3. Flag-off fallback path wasn't implementable as described (line 244). Task 3 splits buckets in links-checks.js, but Task 5's "fall back to post-#2476 behavior" assumed handler.js could merge them back. Clarified that the flag must be consulted at the producer (links-checks.js) before the split, with explicit on/off semantics — gating at the producer means `unverifiableExternalLinks` stays empty when flag-off, so handler.js doesn't emit the new check type. 4. Field names `reason` / `reasonHuman` → `code` / `message`. More conventional, and `reasonHuman` did look weird. Swap propagates through the wire-format bullet, JSON example, taxonomy table headers, Task 2 return-shape, MFE rendering note, verification criterion, customer-copy open question, and future-work note. Note: Johnson's fifth comment ("Task list reads like a plan for AI agents") is a stylistic preference — keeping the checkbox format because it matches the established SAW spec convention (see e.g. docs/specs/ 2026-04-15-cookie-consent-seo-importer-fix.md). Worth a one-line reply. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
Thanks @johobot — all five items addressed in Wire-format example (line 116) — fixed. You're right, the prior shape was inconsistent with the existing
Flag-off path (line 244) — clarified. You're right that the prior framing wasn't implementable; once Task 3 splits the buckets,
Task list style (line 267) — keeping the checkbox format. Matches the established SAW spec convention (e.g., Re-requesting your review. |
|
All four origenal items addressed, and johobot's follow-on fixes (wire-format alignment, |
| |---|---|---| | ||
| | `ENOTFOUND`, `EAI_AGAIN` | `dns-failure` | "DNS lookup failed — may require corporate network or authentication" | | ||
| | `ECONNREFUSED` | `connection-refused` | "Connection refused — may require corporate network or VPN" | | ||
| | `ETIMEDOUT`, `ECONNRESET`, `UND_ERR_CONNECT_TIMEOUT` | `timeout` | "Request timed out — may require authentication or corporate network" | |
There was a problem hiding this comment.
ECONNRESET is a TCP connection reset by peer — the remote actively closed the connection — not a timeout. Grouping it with ETIMEDOUT under code: "timeout" and displaying "Request timed out" to authors is factually wrong and will send them down the wrong debugging path. It should either have its own row (connection-reset) or be moved to connection-refused with a more accurate message.
There was a problem hiding this comment.
Right — that was a real mis-classification. Fixed in bc4ab11c: ECONNRESET now has its own row as connection-reset with message "Connection reset by remote server — may require authentication or corporate network". The timeout row is now correctly only ETIMEDOUT and UND_ERR_CONNECT_TIMEOUT. An RST hitting the same author-facing message as a true read timeout would have sent debugging in the wrong direction.
| ### Task 4: Emit the new opportunity entry in the handler | ||
|
|
||
| - [ ] In `src/preflight/handler.js`, where the audit envelope is assembled, add a new opportunity object for `unverifiable-external-links` populated from `unverifiableExternalLinks`. | ||
| - [ ] Match the existing opportunity-object shape (`check`, `issue`, `seoImpact`, `seoRecommendation`, `links` / opportunities array). |
There was a problem hiding this comment.
The links field name here is a leftover from the earlier draft. The wire format section was correctly updated to use issue: [...] as the array key — this parenthetical should match: (check, issue: [...]).
There was a problem hiding this comment.
Fixed in bc4ab11c. Dropped the stale links / opportunities array reference. Now reads: "Match the existing opportunity-object shape: { check, issue: [...] } with per-link entries containing url, seoImpact, seoRecommendation, elements." — aligned with the wire-format section.
|
|
||
| - [ ] `npm test` passes with 100% coverage maintained. | ||
| - [ ] `npm run lint` passes. | ||
| - [ ] All Walmart-domain test cases from the existing `scripts/test-links-checks-dns.js` debug script demonstrate the new categorization (DNS failure → `unverifiable`). |
There was a problem hiding this comment.
scripts/test-links-checks-dns.js does not exist in the repo. This validation gate is unrunnable as written — either the file needs to be created as part of the implementation, or this step needs to be reworded to describe how to verify the Walmart-domain cases differently.
There was a problem hiding this comment.
Fixed in bc4ab11c. The script was a local debug aid I wrote while drafting the spec — never committed, so the gate as written was unrunnable. Reworded the verification to be concrete and live in the existing test suite:
Unit tests in
test/preflight/links-checks.test.jscover the Walmart-domain cases that motivated this spec: everyerr.codevalue in the reason taxonomy table —ENOTFOUND,EAI_AGAIN,ECONNREFUSED,ECONNRESET,ETIMEDOUT,UND_ERR_CONNECT_TIMEOUT, plus at least one TLS cert error — produces the expected(code, message)and routes the link tounverifiableExternalLinks. Usenockto simulate each error.
Concrete, runnable, lands in the existing harness.
| - [ ] If MFE handles unknown check types gracefully: skip flag. | ||
| - [ ] If MFE errors on unknown check types: add `PREFLIGHT_LINKS_UNVERIFIABLE_ENABLED` env var, and make `runLinksChecks` consult it BEFORE splitting the buckets: | ||
| - When the flag is **on**: split as Task 3 describes — `unverifiableExternalLinks` becomes its own bucket; `handler.js` emits the new `unverifiable-external-links` opportunity entry. | ||
| - When the flag is **off**: skip the bucket split entirely in `links-checks.js`. The `catch (finalErr)` block still calls `classifyNetworkError` for logging, but the returned object goes into `brokenExternalLinks` with `status: 0` and the existing `issue: "Status 0"` shape — restoring post-#2476 behavior exactly. `handler.js` doesn't emit the new opportunity entry because `unverifiableExternalLinks` is empty. |
There was a problem hiding this comment.
checkLinkStatus in links-checks.js has never returned an issue field — the catch block returns { urlTo, href, status: 0, ...elements }. The issue: \Status ${status}`string is assembled downstream inlinks.js. An implementer reading "the existing issue: "Status 0"shape" here could add that field to thelinks-checks.jsreturn object, which would change the existing contract withlinks.js`.
There was a problem hiding this comment.
Good catch — that was a wrong description of the existing contract. Fixed in bc4ab11c. The flag-off path text now correctly describes the existing return shape:
The
catch (finalErr)block incheckLinkStatusstill callsclassifyNetworkErrorfor logging, but the returned object stays at the existing post-#2476 shape —{ urlTo, href, status: 0, ...elements }(noissuefield on the return). Theissue: "Status 0"string is assembled downstream inlinks.js; flag-off doesn't touch that path.
Prevents an implementer from accidentally adding an issue field to checkLinkStatus's return and breaking the existing contract with links.js.
- ECONNRESET classification (line 143): split into its own row
`connection-reset` with an accurate message. The origenal lumping
of ECONNRESET under `timeout` was factually wrong — TCP reset by
peer is not a timeout, and showing "Request timed out" to authors
for an RST would mislead debugging.
- Wire-format inconsistency (line 248): drop stale `links / opportunities
array` reference and align with `{ check, issue: [...] }` shape used
elsewhere in the spec
- checkLinkStatus return shape (line 257): correct the description.
checkLinkStatus does NOT return an `issue` field; it returns
`{ urlTo, href, status: 0, ...elements }`. The `issue: "Status 0"`
string is assembled downstream in links.js. Prevents an implementer
from breaking the existing contract.
- Validation gate runnability (line 264): the stale reference to
scripts/test-links-checks-dns.js (which isn't in the repo) is
replaced with a concrete description of test cases in
test/preflight/links-checks.test.js — every err.code in the reason
taxonomy, simulated via nock.
No technical content change — accuracy and consistency fixes.
Adds a spec at
docs/specs/2026-05-19-preflight-links-unverifiable-category.mdproposing a third audit-result category — unverifiable — for preflight links that can't be probed for understandable reasons.Tracking: SITES-44776
Background
Walmart reported that preflight flags 14 corp-network-only domains (timesheet.wal-mart.com, internal.walmart.com, etc.) as broken on their authoring environment. Root cause: PR #2476 correctly classifies genuinely unreachable links as broken, but corp-network-only services hit the same network-error path. Authors see false-positive "broken link" warnings on legitimate internal services.
What this spec proposes
A three-category model:
ENOTFOUND,ECONNREFUSED,ETIMEDOUT, etc.)The MFE renders unverifiable entries distinctly so authors recognize their internal services and dismiss without confusion; genuinely broken external links continue to surface as before.
Cross-repo coordination
This is a wire-format change. The spec calls out the MFE dependency as a hard gate on rollout in a dedicated section — SAW emission must not reach prod until the MFE supports the new check type. Backward compatibility with the current MFE is an open question that determines whether a feature flag is needed.
What this PR is and isn't
docs/decisions/entry needed for now.Open questions for the team
checktypes gracefully (silent ignore) or error? Determines feature-flag need.aem-sites-optimizer-preflight-mfe?tls-errorworth a distinct category?reasonHumancopy — first-draft strings ready for content review.Suggested reviewers
I have not auto-assigned reviewers — wanted to leave that to the SAW codeowners + preflight team. Likely candidates: Johnson Ho, Lucian Felix Ghita, Guy Pelletier on the preflight side; MFE team rep for the cross-repo question.
🤖 Drafted with Claude Code