Fix: C/C++ lexer support for preprocessor multiline comment#3051
Open
gluesmith2021 wants to merge 2 commits intopygments:masterfrom
Open
Fix: C/C++ lexer support for preprocessor multiline comment#3051gluesmith2021 wants to merge 2 commits intopygments:masterfrom
gluesmith2021 wants to merge 2 commits intopygments:masterfrom
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Issue
Current C/C++ lexers behavior is to treat everything that follows an
#include <file>directive as comments, even when it's not written as a proper C/C++ comment. This makes sense since a compiler would simply ignore such text and this is a comment for practical purposes.But when this is a proper multi-line comment, the lexer processes the second line (and others if any) in a
statementstate instead of a continuing comment, and emits various incorrect tokens, depending on the comment content.For example:
The last line yields:
Also after such failure, the lexer may also remain in
statementstate for the rest of the program to parse, thus failing to identify functions asFunctiontokens (rather emits the more genericNametoken).Expected Behavior
Full comment is returned from the lexer as a single token, as with comments appearing elsewhere in the code, and lexer internally returns to
rootstate afterwards.Proposed Changes
Fix: parse
/* ... */as aComment.Multiline"Some text here /* followed by multi-line\n comment is valid code */"as aComment.Multiline"file"and<file>cases in regex to avoid combinatory rules duplicationOther consistency change:
Commenttokens from elsewhere in the code, leading whitespaces (i.e. between#include <file>and the comment) are now returned as a singleText.Whitespacetoken, followed byComment.Single(orMultiline).Filteror other automated token processing.