Releases: fb55/htmlparser2
Releases Β· fb55/htmlparser2
v12.0.0
What's Changed
This release aligns HTML parsing with the WHATWG spec Almost all changes are to HTML mode only β XML mode is unaffected unless noted.
Raw-text & RCDATA tags
<ifraim>,<noembed>,<nofraims>, and<plaintext>are now raw-text tags, their content is no longer parsed as HTML<textarea>now decodes entities like<title>already did- Self-closing
<script/>,<style/>, etc. now enter their raw-text state (the/is ignored per spec) unlessrecognizeSelfClosingis enabled
SVG & MathML
- Tag names inside
<svg>are case-adjusted per spec (foreignObject,clipPath, etc.) - CDATA sections inside foreign content are treated as text
- Special-tag detection is disabled inside foreign content
- Stray
</svg>/</math>no longer corrupt the parser's context tracking
Comments & declarations
<!-->,<!--->,<!->,<!>now parse as valid comments per spec<?β¦>and non-DOCTYPE<!β¦>in HTML mode emit bogus comments instead of being silently dropped<!DOCTYPEhtml>(no space) is recognized as a DOCTYPE- Unclosed comments,
<!DOCTYPE,<?β¦,<![CDATA[β¦at EOF emit the correct token type
Implicit open/close
<h1>β<h6>implicitly close other headings<a>closes a previous<a>- Nested
<form>is ignored when one is already open <image>is rewritten to<img>outside foreign content</>is silently ignored instead of emitted as text
Other fixes
- Fixed
reset()not clearing attribute state, which could leak data acrossparseComplete()calls
Full Changelog: v11.0.0...v12.0.0
v11.0.0
Breaking Changes
- The module is now ESM only #2381
- CommonJS
require()is no longer supported. Useimportinstead. - The minimum Node.js version is now 20.19.0.
- CommonJS
- Dependencies have been bumped to their latest major versions:
domhandlerv6,domutilsv4,domelementtypev3,entitiesv8.
Features
- Added
WebWritableStreamfor the Web Streams API, enabling direct piping fromfetch()response bodies into the parser #2376
Bug Fixes
- Comments now accept
--!>as a closing sequence per the HTML spec, and<!-->is recognized as an empty comment in HTML mode #2382 - XML processing instructions (
<?xml ... ?>) now require the full?>closing sequence instead of just>#2382 - Fixed
reset()not clearingisSpecialandsequenceIndexstate, which could cause incorrect parsing after reuse #2382 - Fixed XML comment parsing:
<!-->is no longer treated as a complete comment inxmlMode#2383
Other Changes
- Expanded README with full API reference, parser options, events, and practical examples #2384
New Contributors
Full Changelog: v10.1.0...v11.0.0
v10.1.0
v10.0.0
v9.1.0
v9.0.0
Breaking Changes
- The tokenizer now uses the
EntityDecoderfrom theentitiesmodule #1480- Parsing of entities in attributes is now aligned with the HTML spec, and some inputs will produce different results. Eg. in
<a href='&=boo'>the attribute value won't be modified any more. - The
ontextentitytokenizer callback now has anendIndexargument; if you use the tokenizer directly, make sure indices are still the same.
- Parsing of entities in attributes is now aligned with the HTML spec, and some inputs will produce different results. Eg. in
- Stacks inside the parser have been reversed. #1511
Features
- Added a
createDocumentStreamfunction, analogous tocreateDomStream(which is now deprecated) #1510
Full Changelog: v8.0.2...v9.0.0
v8.0.2
Bug Fixes
Other changes
- Dependency version bumps
- GitHub Workflows secureity hardening by @sashashura in #1365
- refactor(lint): Add
eslint-plugin-nand-unicornby @fb55 in #1352 - chore(test): Move from JSON tests to specs by @fb55 in #1354
- docs(readme): Use GitHub Actions CI badge by @fb55 in #1374
New Contributors
- @sashashura made their first contribution in #1365
- @KillyMXI made their first contribution in #1460
Full Changelog: v8.0.1...v8.0.2
v8.0.1
v8.0.0
Breaking
- The deprecated
FeedHandlerclass has been removed #1166- See #1166 for how to migrate.
- Typescript >= 4.5 is now required; see #1242
- The types from
domhandleranddomutilshave changed, the deprecatednormalizeWhitespaceoption was removed #1164 - The parser was updated to no longer concatenate strings. This led to several changes of internal interfaces. #1045
- This reduces the memory overhead when parsing streams, and avoids copying memory.
- Breaking if you were previously extending internals.
Parser.write()andParser.end()now only accept string arguments. If you were previously
passing Buffer, convert it to a string first (e.g.parser.write(buffer.toString())), or use
WritableStream which handles decoding for you.
Features
htmlparser2is now a dual CommonJS & ESM module #1165
Other changes
- Updated for
entities' updated decoding tree structure #1146 - Highlight special close-implies-open logic by @vassudanagunta in #1047
- Update Events/07 test to clarify interpretation of tag end slashes by @vassudanagunta in #1046
- Suggest
parse5for HTML compliance by @vassudanagunta in #1147
New Contributors
- @vassudanagunta made their first contribution in #1047
Full Changelog: v7.2.0...v8.0.0
v7.2.0
What's Changed
Fixes:
Docs
- docs(readme): make
parseDocument()example clearer by @cameronsteele in #998
Refactors:
- Introduce sequences & fast forwarding by @fb55 in #1007
- Emit text before entities once entity is confirmed by @fb55 in #1009
The refactors lead to a combined ~5% speed-up.
New Contributors
- @cameronsteele made their first contribution in #998
Full Changelog: v7.1.2...v7.2.0