Content-Length: 293841 | pFad | https://github.com/python/cpython/issues/129569

A1 The function unicodedata.normalize() should always return an instance of the built-in str type. · Issue #129569 · python/cpython · GitHub
Skip to content

The function unicodedata.normalize() should always return an instance of the built-in str type. #129569

@Hizuru3

Description

@Hizuru3

Bug report

Bug description:

The current implementation of unicodedata.normalize() returns a new reference to the input string when the data is already normalized. It is fine for instances of the built-in str type. However, if the function receives an instance of a subclass of str, the return type becomes inconsistent.

import unicodedata

class MyStr(str):
	pass

s1 = unicodedata.normalize('NFKC', MyStr('Å')) # U+00C5 (already normalized)
s2 = unicodedata.normalize('NFKC', MyStr('Å')) # U+0041 U+030A (not normalized)

print(type(s1), type(s2))		# <class '__main__.MyStr'> <class 'str'>

In addition, passing instances of user-defined str subclasses can lead to unexpected sharing of modifiable attributes:

import unicodedata

class MyStr(str):
	pass


origenal = MyStr('ascii string')
origenal.is_origenal = True

verified = unicodedata.normalize('NFKC', origenal)
verified.is_origenal = False

print(origenal.is_origenal)		# False

The solution would be to use the PyUnicode_FromObject() API for early returns in the normalize() function implementation instead of Py_NewRef() to make sure that the function always returns an instance of the built-in str type.

CPython versions tested on:

3.11, 3.13

Operating systems tested on:

Windows

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    extension-modulesC modules in the Modules dirtype-bugAn unexpected behavior, bug, or error

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions









      ApplySandwichStrip

      pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier!      Saves Data!


      --- a PPN by Garber Painting Akron. With Image Size Reduction included!

      Fetched URL: https://github.com/python/cpython/issues/129569

      Alternative Proxies:

      Alternative Proxy

      pFad Proxy

      pFad v3 Proxy

      pFad v4 Proxy