pFad - Phone/Frame/Anonymizer/Declutterfier! Saves Data!


--- a PPN by Garber Painting Akron. With Image Size Reduction included!

URL: http://github.com/pygments/pygments/pull/3059

ts/global-b40ec823a1a6a1af.css" /> NASM lexer fixes by seanthegeek · Pull Request #3059 · pygments/pygments · GitHub
Skip to content

NASM lexer fixes#3059

Open
seanthegeek wants to merge 2 commits intopygments:masterfrom
seanthegeek:fix/nasm
Open

NASM lexer fixes#3059
seanthegeek wants to merge 2 commits intopygments:masterfrom
seanthegeek:fix/nasm

Conversation

@seanthegeek
Copy link
Copy Markdown

@seanthegeek seanthegeek commented Mar 11, 2026

ASM Lexer Fixes

Note: This patch was written by Claude Sonnet 4.6 via the Claude Code CLI. I know the Pygments project may be cautious about LLM-generated contributions, and I'd genuinely welcome feedback on the quality of this work — both the code itself and how well it follows Pygments conventions. I'm using this as a real-world test of how well Claude handles a non-trivial open-source contribution task given a detailed prompt. Any review comments, even harsh ones, are appreciated.

Issues Fixed

Issue Fix
#1231sp matched inside sprintf@plt Added negative lookahead (?![a-zA-Z0-9_]) on register patterns; <symbol@plt> now tokenized as Comment.Special
#728%define not recognized when indented Preprocessor directive rule now uses ^\s* anchor with explicit directive list
Missing registers Added rip, eip, ip, sil, dil, bpl, spl, rflags, eflags, flags, mxcsr, gdtr, ldtr, idtr, bnd0–bnd3, cr5–cr15, k0–k7 (AVX-512 opmask)
Missing directives Added ALIGNB, FLOAT, INCBIN, ISTRUC, IEND, AT

New Test Snippets

  • tests/snippets/nasm/registers_extended.txt — extended register tests
  • tests/snippets/nasm/preproc_indented.txt — indented preprocessor directives
  • tests/snippets/nasm/objdump_plt.txt<symbol@plt> disassembly annotations
  • tests/snippets/nasm/directives_extended.txt — extended assembler directives

Results

  • Error token count: 0
  • pytest: 5209 passed, 16 skipped (4 new tests added)
  • ruff: clean

Original prompt (verbatim)

Pygments Lexer: NASM (Netwide Assembler)

Task

Fix the existing NASM (Netwide Assembler) lexer in Pygments. Work inside my local fork of the pygments/pygments repo on a separate branch.

Official references

MANDATORY: Before writing or modifying the lexer, you MUST fetch and read every
URL in this list.
This is not background reading — it is a required prerequisite
step. Fetch each page, extract the keywords or function names, and verify them
against the lexer before declaring any work complete.

Pygments references

Phase 1: Setup and audit

  1. Confirm you're in the root of a Pygments repo checkout (look for pygments/lexers/, tests/, setup.py).

  2. Run git checkout -b fix/nasm main to create a dedicated branch.

  3. Set up a venv: python -m venv venv && source venv/bin/activate && pip install -e ".[dev]".

  4. Run tox -e py to confirm the existing test suite passes.

  5. Establish a baseline — run the existing lexer against a sample and count Error tokens:

    echo '<sample code>' | python -m pygments -l nasm -f html | grep -o 'class="err"' | wc -l
  6. Read the existing lexer end-to-end. Understand the current states, token patterns, and keyword sets.

Known issues to fix

Phase 1: Research

Before writing any code, fetch and read the official references listed above.

Do not invent or assume any syntax elements. If something is ambiguous in the docs, web-search to verify before including it.

Phase 2: Fix the lexer

Apply fixes to the existing lexer file.

Review the existing lexer at pygments/lexers/asm.py (the NasmLexer class) and fix:

  1. Register matching greediness: The lexer matches register names like sp inside longer words (e.g., sprintf). Fix by using word boundary anchors or negative lookahead.
  2. Macro whitespace: %define and other preprocessor directives must be recognized even when preceded by whitespace, not just at column 0.
  3. Missing registers: Audit and add any missing x86-64 extended registers, AVX-512 registers, mask registers.
  4. Missing directives: Ensure all NASM preprocessor and assembler directives are covered.
  5. Disassembly compatibility: Consider gracefully handling <symbol@plt> patterns and hex address prefixes.

After each fix, run the tests to confirm no regressions:

tox -e py -- tests/snippets/nasm/

Phase 3: Expand tests

Review and expand the existing test snippets in tests/snippets/nasm/. Add snippets that cover the syntax that was previously broken.

Each snippet file is a .txt file containing source code. Run:

tox -- --update-goldens tests/snippets/nasm/new_test.txt

This auto-populates expected tokens. Review them for correctness, then check them in.

Phase 4: Test and iterate

This is the critical phase. Use pygmentize as the feedback loop.

  1. Run tox -e py. Fix any failures.

  2. Test your lexer on the example file and count Error tokens:

    python -m pygments -l nasm -f html tests/examplefiles/nasm/* | grep -o 'class="err"' | wc -l
  3. If there are Error tokens, identify the unmatched text:

    python -m pygments -l nasm -f testcase tests/examplefiles/nasm/* | grep "Token.Error"
  4. For each Error token:
    a. Identify what syntax element the unmatched text represents.
    b. Web-search the official docs to confirm the syntax is valid.
    c. Fix the lexer rule.
    d. Re-run tox -e py -- tests/snippets/nasm/ to confirm no regressions.
    e. Re-test with pygmentize to verify the Error is gone.

  5. Repeat until the Error token count is zero.

  6. Run the full test suite one more time: tox -e py.

  7. Visually inspect the HTML output for sanity:

    python -m pygments -l nasm -f html -O full,style=monokai tests/examplefiles/nasm/* > /tmp/preview.html
    open /tmp/preview.html  # or xdg-open on Linux

    Confirm that keywords, functions, operators, strings, numbers, and comments are each highlighted distinctly.

Phase 5: Finalize

  1. Run tox -e py one final time — full pass, zero failures.
  2. Review the diff: git diff --stat. You should have these files:
    • pygments/lexers/asm.py (the fixes)
    • tests/snippets/nasm/ (new or updated test snippets)
    • Possibly tests/examplefiles/nasm/ (expanded example)
  3. Commit: git add -A && git commit -m "Fix NASM (Netwide Assembler) lexer: <summarize fixes>".
  4. Report what you've done: list the keyword count, function count, token types used, and confirm zero Error tokens.

Constraints (applies to all phases)

  • No hallucinated syntax. Every keyword, function, operator, and language construct must come from the official documentation listed above. If you're unsure, web-search the docs before adding it.
  • Follow Pygments conventions exactly. Read existing lexers (especially sql.py and the lexer development guide) for patterns. Use words(), bygroups(), include(), and default() helpers appropriately.
  • Python code must include type hints and pass ruff linter checks.
  • The Error token count is the ground truth. tox passing is necessary but not sufficient — you must also have zero Token.Error in both test snippets and example files.
  • Iterate until clean. Do not declare the task complete until both tox -e py passes AND the Error token count is zero.

…ew test cases for preprocessor and instruction parsing
@seanthegeek seanthegeek changed the title NASM lexer fixes ASM lexer fixes Mar 11, 2026
@seanthegeek seanthegeek changed the title ASM lexer fixes NASM lexer fixes Mar 11, 2026
@birkenfeld
Copy link
Copy Markdown
Member

Hi, and thanks for being upfront about the origen of this code.

You're right about people being cautious about AI contribution, especially if - as it appears from your note - you're submitting the AI output without own review.

One think I see at a glance is that both the old and new code include regexes that should nowadays be generated by feeding a word list to words().

@seanthegeek
Copy link
Copy Markdown
Author

Done :)

I guess my point is with all of this is that AI can be very useful for improving code, if you give the right model the right prompt.

@Anteru
Copy link
Copy Markdown
Collaborator

Anteru commented Mar 30, 2026

Would be also good (besides self-review) to teach Claude how to run the tests/read the contribution guidelines, because tox -e check locally would have captured the CI failure before submission. Not sure why that failed though. How did Claude run the tests but not run the checks?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

pFad - Phonifier reborn

Pfad - The Proxy pFad © 2024 Your Company Name. All rights reserved.





Check this box to remove all script contents from the fetched content.



Check this box to remove all images from the fetched content.


Check this box to remove all CSS styles from the fetched content.


Check this box to keep images inefficiently compressed and original size.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy