NASM lexer fixes#3059
Conversation
…ew test cases for preprocessor and instruction parsing
|
Hi, and thanks for being upfront about the origen of this code. You're right about people being cautious about AI contribution, especially if - as it appears from your note - you're submitting the AI output without own review. One think I see at a glance is that both the old and new code include regexes that should nowadays be generated by feeding a word list to |
…andling; improve regex patterns for clarity
|
Done :) I guess my point is with all of this is that AI can be very useful for improving code, if you give the right model the right prompt. |
|
Would be also good (besides self-review) to teach Claude how to run the tests/read the contribution guidelines, because |
ASM Lexer Fixes
Issues Fixed
spmatched insidesprintf@plt(?![a-zA-Z0-9_])on register patterns;<symbol@plt>now tokenized asComment.Special%definenot recognized when indented^\s*anchor with explicit directive listrip,eip,ip,sil,dil,bpl,spl,rflags,eflags,flags,mxcsr,gdtr,ldtr,idtr,bnd0–bnd3,cr5–cr15,k0–k7(AVX-512 opmask)ALIGNB,FLOAT,INCBIN,ISTRUC,IEND,ATNew Test Snippets
tests/snippets/nasm/registers_extended.txt— extended register teststests/snippets/nasm/preproc_indented.txt— indented preprocessor directivestests/snippets/nasm/objdump_plt.txt—<symbol@plt>disassembly annotationstests/snippets/nasm/directives_extended.txt— extended assembler directivesResults
Original prompt (verbatim)
Pygments Lexer: NASM (Netwide Assembler)
Task
Fix the existing NASM (Netwide Assembler) lexer in Pygments. Work inside my local fork of the pygments/pygments repo on a separate branch.
Official references
MANDATORY: Before writing or modifying the lexer, you MUST fetch and read every
URL in this list. This is not background reading — it is a required prerequisite
step. Fetch each page, extract the keywords or function names, and verify them
against the lexer before declaring any work complete.
Pygments references
pygments/lexers/asm.pypygments/lexers/sql.pyPhase 1: Setup and audit
Confirm you're in the root of a Pygments repo checkout (look for
pygments/lexers/,tests/,setup.py).Run
git checkout -b fix/nasm mainto create a dedicated branch.Set up a venv:
python -m venv venv && source venv/bin/activate && pip install -e ".[dev]".Run
tox -e pyto confirm the existing test suite passes.Establish a baseline — run the existing lexer against a sample and count Error tokens:
Read the existing lexer end-to-end. Understand the current states, token patterns, and keyword sets.
Known issues to fix
<sprintf@plt>causesspinsidesprintfto be tokenized as a register.%definepreceded by whitespace produces Error tokens.%directives may not be covered.Phase 1: Research
Before writing any code, fetch and read the official references listed above.
Do not invent or assume any syntax elements. If something is ambiguous in the docs, web-search to verify before including it.
Phase 2: Fix the lexer
Apply fixes to the existing lexer file.
Review the existing lexer at
pygments/lexers/asm.py(theNasmLexerclass) and fix:spinside longer words (e.g.,sprintf). Fix by using word boundary anchors or negative lookahead.%defineand other preprocessor directives must be recognized even when preceded by whitespace, not just at column 0.<symbol@plt>patterns and hex address prefixes.After each fix, run the tests to confirm no regressions:
Phase 3: Expand tests
Review and expand the existing test snippets in
tests/snippets/nasm/. Add snippets that cover the syntax that was previously broken.Each snippet file is a
.txtfile containing source code. Run:This auto-populates expected tokens. Review them for correctness, then check them in.
Phase 4: Test and iterate
This is the critical phase. Use
pygmentizeas the feedback loop.Run
tox -e py. Fix any failures.Test your lexer on the example file and count Error tokens:
If there are Error tokens, identify the unmatched text:
For each Error token:
a. Identify what syntax element the unmatched text represents.
b. Web-search the official docs to confirm the syntax is valid.
c. Fix the lexer rule.
d. Re-run
tox -e py -- tests/snippets/nasm/to confirm no regressions.e. Re-test with
pygmentizeto verify the Error is gone.Repeat until the Error token count is zero.
Run the full test suite one more time:
tox -e py.Visually inspect the HTML output for sanity:
Confirm that keywords, functions, operators, strings, numbers, and comments are each highlighted distinctly.
Phase 5: Finalize
tox -e pyone final time — full pass, zero failures.git diff --stat. You should have these files:pygments/lexers/asm.py(the fixes)tests/snippets/nasm/(new or updated test snippets)tests/examplefiles/nasm/(expanded example)git add -A && git commit -m "Fix NASM (Netwide Assembler) lexer: <summarize fixes>".Constraints (applies to all phases)
sql.pyand the lexer development guide) for patterns. Usewords(),bygroups(),include(), anddefault()helpers appropriately.toxpassing is necessary but not sufficient — you must also have zeroToken.Errorin both test snippets and example files.tox -e pypasses AND the Error token count is zero.