pFad - Phone/Frame/Anonymizer/Declutterfier! Saves Data!


--- a PPN by Garber Painting Akron. With Image Size Reduction included!

URL: http://github.com/pygments/pygments/pull/3051

href="https://github.githubassets.com/assets/actions-0e714a98ea09295a.css" /> Fix: C/C++ lexer support for preprocessor multiline comment by gluesmith2021 · Pull Request #3051 · pygments/pygments · GitHub
Skip to content

Fix: C/C++ lexer support for preprocessor multiline comment#3051

Open
gluesmith2021 wants to merge 2 commits intopygments:masterfrom
gluesmith2021:CLexer_Support_preproc_multiline_comment
Open

Fix: C/C++ lexer support for preprocessor multiline comment#3051
gluesmith2021 wants to merge 2 commits intopygments:masterfrom
gluesmith2021:CLexer_Support_preproc_multiline_comment

Conversation

@gluesmith2021
Copy link
Copy Markdown

@gluesmith2021 gluesmith2021 commented Feb 27, 2026

Issue

Current C/C++ lexers behavior is to treat everything that follows an #include <file> directive as comments, even when it's not written as a proper C/C++ comment. This makes sense since a compiler would simply ignore such text and this is a comment for practical purposes.

But when this is a proper multi-line comment, the lexer processes the second line (and others if any) in a statement state instead of a continuing comment, and emits various incorrect tokens, depending on the comment content.

For example:

#include <file1.h>   // Correct: rest of the line, including leading whitespaces, is tokenized as a comment
#include <file2.h>   Correct: this is returned as a comment too, even without the leading double-slash.
#include <file3.h>   /* Incorrect: this line returned as a comment, but
                                 this line gets processed as a statement */

The last line yields:

'#'           Comment.Preproc
'include'     Comment.Preproc
' '           Text.Whitespace
'<file3.h>'   Comment.PreprocFile
'   /* Incorrect: this line returned as a comment, but' Comment.Single
'\n'          Comment.Preproc

'                                 ' Text.Whitespace
'this'        Name
' '           Text.Whitespace
'line'        Name
' '           Text.Whitespace
'gets'        Name
' '           Text.Whitespace
'processed'   Name
' '           Text.Whitespace
'as'          Name
' '           Text.Whitespace
'a'           Name
' '           Text.Whitespace
'statement'   Name
' '           Text.Whitespace
'*'           Operator
'/'           Operator
'\n'          Text.Whitespace

Also after such failure, the lexer may also remain in statement state for the rest of the program to parse, thus failing to identify functions as Function tokens (rather emits the more generic Name token).

Expected Behavior

Full comment is returned from the lexer as a single token, as with comments appearing elsewhere in the code, and lexer internally returns to root state afterwards.

Proposed Changes

Fix: parse /* ... */ as a Comment.Multiline

  • Also handle a valid C/C++ code corner case: text between the include directive and the proper comment is included in the comment, as it is ignored by the compiler as well. For example:
    #include <file.h>  Some text here /* followed by a multi-line
                                    comment is valid code */
    returns "Some text here /* followed by multi-line\n comment is valid code */" as a Comment.Multiline
  • Merge "file" and <file> cases in regex to avoid combinatory rules duplication

Other consistency change:

  • To align with Comment tokens from elsewhere in the code, leading whitespaces (i.e. between #include <file> and the comment) are now returned as a single Text.Whitespace token, followed by Comment.Single (or Multiline).
    • Does not make much difference when coloring/formatting text as those are whitespaces, but can be less confusing for a Filter or other automated token processing.

@gluesmith2021 gluesmith2021 changed the title C lexer support preproc multiline comment C/C++ lexer support preproc multiline comment Feb 27, 2026
@gluesmith2021 gluesmith2021 changed the title C/C++ lexer support preproc multiline comment Fix: C/C++ lexer support for preprocessor multiline comment Feb 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

pFad - Phonifier reborn

Pfad - The Proxy pFad © 2024 Your Company Name. All rights reserved.





Check this box to remove all script contents from the fetched content.



Check this box to remove all images from the fetched content.


Check this box to remove all CSS styles from the fetched content.


Check this box to keep images inefficiently compressed and original size.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy