pFad - Phone/Frame/Anonymizer/Declutterfier! Saves Data!


--- a PPN by Garber Painting Akron. With Image Size Reduction included!

URL: http://github.com/python/cpython/commit/6715f91edcf6f379f666e18f57b8a0dcb724bf79

gh-102856: Python tokenizer implementation for PEP 701 (#104323) · python/cpython@6715f91 · GitHub
Skip to content

Commit 6715f91

Browse files
gh-102856: Python tokenizer implementation for PEP 701 (#104323)
This commit replaces the Python implementation of the tokenize module with an implementation that reuses the real C tokenizer via a private extension module. The tokenize module now implements a compatibility layer that transforms tokens from the C tokenizer into Python tokenize tokens for backward compatibility. As the C tokenizer does not emit some tokens that the Python tokenizer provides (such as comments and non-semantic newlines), a new special mode has been added to the C tokenizer mode that currently is only used via the extension module that exposes it to the Python layer. This new mode forces the C tokenizer to emit these new extra tokens and add the appropriate metadata that is needed to match the old Python implementation. Co-authored-by: Pablo Galindo <pablogsal@gmail.com>
1 parent 3ed57e4 commit 6715f91

22 files changed

Lines changed: 426 additions & 376 deletions

Doc/library/token-list.inc

Lines changed: 4 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Doc/library/token.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -50,11 +50,13 @@ The following token type values aren't used by the C tokenizer but are needed fo
5050
the :mod:`tokenize` module.
5151

5252
.. data:: COMMENT
53+
:noindex:
5354

5455
Token value used to indicate a comment.
5556

5657

5758
.. data:: NL
59+
:noindex:
5860

5961
Token value used to indicate a non-terminating newline. The
6062
:data:`NEWLINE` token indicates the end of a logical line of Python code;

Grammar/Tokens

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -64,9 +64,9 @@ SOFT_KEYWORD
6464
FSTRING_START
6565
FSTRING_MIDDLE
6666
FSTRING_END
67+
COMMENT
68+
NL
6769
ERRORTOKEN
6870

6971
# These aren't used by the C tokenizer but are needed for tokenize.py
70-
COMMENT
71-
NL
7272
ENCODING

Include/internal/pycore_global_objects_fini_generated.h

Lines changed: 1 addition & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Include/internal/pycore_global_strings.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -406,6 +406,7 @@ struct _Py_global_strings {
406406
STRUCT_FOR_ID(exception)
407407
STRUCT_FOR_ID(exp)
408408
STRUCT_FOR_ID(extend)
409+
STRUCT_FOR_ID(extra_tokens)
409410
STRUCT_FOR_ID(facility)
410411
STRUCT_FOR_ID(factory)
411412
STRUCT_FOR_ID(false)

Include/internal/pycore_runtime_init_generated.h

Lines changed: 1 addition & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Include/internal/pycore_token.h

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -77,7 +77,9 @@ extern "C" {
7777
#define FSTRING_START 61
7878
#define FSTRING_MIDDLE 62
7979
#define FSTRING_END 63
80-
#define ERRORTOKEN 64
80+
#define COMMENT 64
81+
#define NL 65
82+
#define ERRORTOKEN 66
8183
#define N_TOKENS 68
8284
#define NT_OFFSET 256
8385

Include/internal/pycore_unicodeobject_generated.h

Lines changed: 3 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Lib/inspect.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2187,15 +2187,15 @@ def _signature_strip_non_python_syntax(signature):
21872187
if string == ',':
21882188
current_parameter += 1
21892189

2190-
if (type == ERRORTOKEN) and (string == '$'):
2190+
if (type == OP) and (string == '$'):
21912191
assert self_parameter is None
21922192
self_parameter = current_parameter
21932193
continue
21942194

21952195
add(string)
21962196
if (string == ','):
21972197
add(' ')
2198-
clean_signature = ''.join(text)
2198+
clean_signature = ''.join(text).strip()
21992199
return clean_signature, self_parameter
22002200

22012201

Lib/tabnanny.py

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -107,6 +107,10 @@ def check(file):
107107
errprint("%r: Token Error: %s" % (file, msg))
108108
return
109109

110+
except SyntaxError as msg:
111+
errprint("%r: Token Error: %s" % (file, msg))
112+
return
113+
110114
except IndentationError as msg:
111115
errprint("%r: Indentation Error: %s" % (file, msg))
112116
return
@@ -272,6 +276,12 @@ def format_witnesses(w):
272276
return prefix + " " + ', '.join(firsts)
273277

274278
def process_tokens(tokens):
279+
try:
280+
_process_tokens(tokens)
281+
except TabError as e:
282+
raise NannyNag(e.lineno, e.msg, e.text)
283+
284+
def _process_tokens(tokens):
275285
INDENT = tokenize.INDENT
276286
DEDENT = tokenize.DEDENT
277287
NEWLINE = tokenize.NEWLINE

0 commit comments

Comments
 (0)
pFad - Phonifier reborn

Pfad - The Proxy pFad © 2024 Your Company Name. All rights reserved.





Check this box to remove all script contents from the fetched content.



Check this box to remove all images from the fetched content.


Check this box to remove all CSS styles from the fetched content.


Check this box to keep images inefficiently compressed and original size.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy