pFad - Phone/Frame/Anonymizer/Declutterfier! Saves Data!


--- a PPN by Garber Painting Akron. With Image Size Reduction included!

URL: http://github.com/python/cpython/commit/bb09ba679223666e01f8da780f97888a29d07131

ylesheet" href="https://github.githubassets.com/assets/global-d18f184ea1a06a2c.css" /> gh-122291: Intern latin-1 one-byte strings at startup (GH-122303) · python/cpython@bb09ba6 · GitHub
Skip to content

Commit bb09ba6

Browse files
authored
gh-122291: Intern latin-1 one-byte strings at startup (GH-122303)
1 parent c086962 commit bb09ba6

2 files changed

Lines changed: 40 additions & 62 deletions

File tree

InternalDocs/string_interning.md

Lines changed: 31 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -8,51 +8,50 @@
88

99
This is used to optimize dict and attribute lookups, among other things.
1010

11-
Python uses three different mechanisms to intern strings:
11+
Python uses two different mechanisms to intern strings: singletons and
12+
dynamic interning.
1213

13-
- Singleton strings marked in C source with `_Py_STR` and `_Py_ID` macros.
14-
These are statically allocated, and collected using `make regen-global-objects`
15-
(`Tools/build/generate_global_objects.py`), which generates code
16-
for declaration, initialization and finalization.
14+
## Singletons
1715

18-
The difference between the two kinds is not important. (A `_Py_ID` string is
19-
a valid C name, with which we can refer to it; a `_Py_STR` may e.g. contain
20-
non-identifier characters, so it needs a separate C-compatible name.)
16+
The 256 possible one-character latin-1 strings, which can be retrieved with
17+
`_Py_LATIN1_CHR(c)`, are stored in statically allocated arrays,
18+
`_PyRuntime.static_objects.strings.ascii` and
19+
`_PyRuntime.static_objects.strings.latin1`.
2120

22-
The empty string is in this category (as `_Py_STR(empty)`).
21+
Longer singleton strings are marked in C source with `_Py_ID` (if the string
22+
is a valid C identifier fragment) or `_Py_STR` (if it needs a separate
23+
C-compatible name.)
24+
These are also stored in statically allocated arrays.
25+
They are collected from CPython sources using `make regen-global-objects`
26+
(`Tools/build/generate_global_objects.py`), which generates code
27+
for declaration, initialization and finalization.
2328

24-
These singletons are interned in a runtime-global lookup table,
25-
`_PyRuntime.cached_objects.interned_strings` (`INTERNED_STRINGS`),
26-
at runtime initialization.
29+
The empty string is one of the singletons: `_Py_STR(empty)`.
2730

28-
- The 256 possible one-character latin-1 strings are singletons,
29-
which can be retrieved with `_Py_LATIN1_CHR(c)`, are stored in runtime-global
30-
arrays, `_PyRuntime.static_objects.strings.ascii` and
31-
`_PyRuntime.static_objects.strings.latin1`.
31+
The three sets of singletons (`_Py_LATIN1_CHR`, `_Py_ID`, `_Py_STR`)
32+
are disjoint.
33+
If you have such a singleton, it (and no other copy) will be interned.
3234

33-
These are NOT interned at startup in the normal build.
34-
In the free-threaded build, they are; this avoids modifying the
35-
global lookup table after threads are started.
35+
These singletons are interned in a runtime-global lookup table,
36+
`_PyRuntime.cached_objects.interned_strings` (`INTERNED_STRINGS`),
37+
at runtime initialization, and immutable until it's torn down
38+
at runtime finalization.
39+
It is shared across threads and interpreters without any synchronization.
3640

37-
Interning a one-char latin-1 string will always intern the corresponding
38-
singleton.
3941

40-
- All other strings are allocated dynamically, and have their
41-
`_PyUnicode_STATE(s).statically_allocated` flag set to zero.
42-
When interned, such strings are added to an interpreter-wide dict,
43-
`PyInterpreterState.cached_objects.interned_strings`.
42+
## Dynamically allocated strings
4443

45-
The key and value of each entry in this dict reference the same object.
44+
All other strings are allocated dynamically, and have their
45+
`_PyUnicode_STATE(s).statically_allocated` flag set to zero.
46+
When interned, such strings are added to an interpreter-wide dict,
47+
`PyInterpreterState.cached_objects.interned_strings`.
4648

47-
The three sets of singletons (`_Py_STR`, `_Py_ID`, `_Py_LATIN1_CHR`)
48-
are disjoint.
49-
If you have such a singleton, it (and no other copy) will be interned.
49+
The key and value of each entry in this dict reference the same object.
5050

5151

5252
## Immortality and reference counting
5353

54-
Invariant: Every immortal string is interned, *except* the one-char latin-1
55-
singletons (which might but might not be interned).
54+
Invariant: Every immortal string is interned.
5655

5756
In practice, this means that you must not use `_Py_SetImmortal` on
5857
a string. (If you know it's already immortal, don't immortalize it;
@@ -115,8 +114,5 @@ The valid transitions between these states are:
115114
Using `_PyUnicode_InternStatic` on these is an error; the other cases
116115
don't change the state.
117116

118-
- One-char latin-1 singletons can be interned (0 -> 3) using any interning
119-
function; after that the functions don't change the state.
120-
121-
- Other statically allocated strings are interned (0 -> 3) at runtime init;
117+
- Singletons are interned (0 -> 3) at runtime init;
122118
after that all interning functions don't change the state.

Objects/unicodeobject.c

Lines changed: 9 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -325,22 +325,20 @@ init_global_interned_strings(PyInterpreterState *interp)
325325
return _PyStatus_ERR("failed to create global interned dict");
326326
}
327327

328-
/* Intern statically allocated string identifiers and deepfreeze strings.
328+
/* Intern statically allocated string identifiers, deepfreeze strings,
329+
* and one-byte latin-1 strings.
329330
* This must be done before any module initialization so that statically
330331
* allocated string identifiers are used instead of heap allocated strings.
331332
* Deepfreeze uses the interned identifiers if present to save space
332333
* else generates them and they are interned to speed up dict lookups.
333334
*/
334335
_PyUnicode_InitStaticStrings(interp);
335336

336-
#ifdef Py_GIL_DISABLED
337-
// In the free-threaded build, intern the 1-byte strings as well
338337
for (int i = 0; i < 256; i++) {
339338
PyObject *s = LATIN1(i);
340339
_PyUnicode_InternStatic(interp, &s);
341340
assert(s == LATIN1(i));
342341
}
343-
#endif
344342
#ifdef Py_DEBUG
345343
assert(_PyUnicode_CheckConsistency(&_Py_STR(empty), 1));
346344

@@ -15355,26 +15353,14 @@ intern_static(PyInterpreterState *interp, PyObject *s /* stolen */)
1535515353
assert(s != NULL);
1535615354
assert(_PyUnicode_CHECK(s));
1535715355
assert(_PyUnicode_STATE(s).statically_allocated);
15358-
15359-
switch (PyUnicode_CHECK_INTERNED(s)) {
15360-
case SSTATE_NOT_INTERNED:
15361-
break;
15362-
case SSTATE_INTERNED_IMMORTAL_STATIC:
15363-
return s;
15364-
default:
15365-
Py_FatalError("_PyUnicode_InternStatic called on wrong string");
15366-
}
15356+
assert(!PyUnicode_CHECK_INTERNED(s));
1536715357

1536815358
#ifdef Py_DEBUG
1536915359
/* We must not add process-global interned string if there's already a
1537015360
* per-interpreter interned_dict, which might contain duplicates.
15371-
* Except "short string" singletons: those are special-cased. */
15361+
*/
1537215362
PyObject *interned = get_interned_dict(interp);
15373-
assert(interned == NULL || unicode_is_singleton(s));
15374-
#ifdef Py_GIL_DISABLED
15375-
// In the free-threaded build, don't allow even the short strings.
1537615363
assert(interned == NULL);
15377-
#endif
1537815364
#endif
1537915365

1538015366
/* Look in the global cache first. */
@@ -15446,11 +15432,6 @@ intern_common(PyInterpreterState *interp, PyObject *s /* stolen */,
1544615432
return s;
1544715433
}
1544815434

15449-
/* Handle statically allocated strings. */
15450-
if (_PyUnicode_STATE(s).statically_allocated) {
15451-
return intern_static(interp, s);
15452-
}
15453-
1545415435
/* Is it already interned? */
1545515436
switch (PyUnicode_CHECK_INTERNED(s)) {
1545615437
case SSTATE_NOT_INTERNED:
@@ -15467,6 +15448,9 @@ intern_common(PyInterpreterState *interp, PyObject *s /* stolen */,
1546715448
return s;
1546815449
}
1546915450

15451+
/* Statically allocated strings must be already interned. */
15452+
assert(!_PyUnicode_STATE(s).statically_allocated);
15453+
1547015454
#if Py_GIL_DISABLED
1547115455
/* In the free-threaded build, all interned strings are immortal */
1547215456
immortalize = 1;
@@ -15477,13 +15461,11 @@ intern_common(PyInterpreterState *interp, PyObject *s /* stolen */,
1547715461
immortalize = 1;
1547815462
}
1547915463

15480-
/* if it's a short string, get the singleton -- and intern it */
15464+
/* if it's a short string, get the singleton */
1548115465
if (PyUnicode_GET_LENGTH(s) == 1 &&
1548215466
PyUnicode_KIND(s) == PyUnicode_1BYTE_KIND) {
1548315467
PyObject *r = LATIN1(*(unsigned char*)PyUnicode_DATA(s));
15484-
if (!PyUnicode_CHECK_INTERNED(r)) {
15485-
r = intern_static(interp, r);
15486-
}
15468+
assert(PyUnicode_CHECK_INTERNED(r));
1548715469
Py_DECREF(s);
1548815470
return r;
1548915471
}

0 commit comments

Comments
 (0)
pFad - Phonifier reborn

Pfad - The Proxy pFad © 2024 Your Company Name. All rights reserved.





Check this box to remove all script contents from the fetched content.



Check this box to remove all images from the fetched content.


Check this box to remove all CSS styles from the fetched content.


Check this box to keep images inefficiently compressed and original size.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy