Bug report
When running pygettext --docstrings file.py on Python 3.7 and above, the module docstring does not get extracted.
Reproduction steps:
- Create
repro.py with the following contents (actually you can omit everything but the first three lines):
"""
Module docstring
"""
class X:
"""class docstring"""
def method(self):
"""method docstring"""
def function():
"""function docstring"""
- Try running:
python pygettext.py --docstrings repro.py
- Look at the
messages.pot that was created and see that it doesn't contain the module docstring:
# SOME DESCRIPTIVE TITLE.
# Copyright (C) YEAR ORGANIZATION
# FIRST AUTHOR <EMAIL@ADDRESS>, YEAR.
#
msgid ""
msgstr ""
"Project-Id-Version: PACKAGE VERSION\n"
"POT-Creation-Date: 2022-08-06 00:54+0200\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language-Team: LANGUAGE <LL@li.org>\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"
"Generated-By: pygettext.py 1.5\n"
#: repro.py:6
#, docstring
msgid "class docstring"
msgstr ""
#: repro.py:9
#, docstring
msgid "method docstring"
msgstr ""
#: repro.py:13
#, docstring
msgid "function docstring"
msgstr ""
The reason for this appears to be that pygettext doesn't account for token.ENCODING which was added in Python 3.7.
A simple solution for this would be to skip tokenize.ENCODING here:
|
elif ttype not in (tokenize.COMMENT, tokenize.NL): |
|
self.__freshmodule = 0 |
|
return |
This actually reveals another bug which is caused by the return in the line 340 - detection of module docstring causes pygettext to swallow one token without handling it. This means that for a code like this:
class X:
"""class docstring"""
pygettext will not extract the docstring of class X once the solution gets applied if proper care isn't taken. I'm mentioning it so that the fix is tested with both of these cases.
Your environment
- CPython versions tested on: 3.7.13 (installed from deadsnakes ppa), 3.10.4 (default Python on my system)
- Operating system and architecture: Ubuntu 22.04 LTS
The pygettext.py script was taken directly from this repository, I'm not sure that my distro even has a package that ships it.
Bug report
When running
pygettext --docstrings file.pyon Python 3.7 and above, the module docstring does not get extracted.Reproduction steps:
repro.pywith the following contents (actually you can omit everything but the first three lines):python pygettext.py --docstrings repro.pymessages.potthat was created and see that it doesn't contain the module docstring:The reason for this appears to be that pygettext doesn't account for
token.ENCODINGwhich was added in Python 3.7.A simple solution for this would be to skip
tokenize.ENCODINGhere:cpython/Tools/i18n/pygettext.py
Lines 338 to 340 in 29650fe
This actually reveals another bug which is caused by the
returnin the line 340 - detection of module docstring causes pygettext to swallow one token without handling it. This means that for a code like this:pygettextwill not extract the docstring of class X once the solution gets applied if proper care isn't taken. I'm mentioning it so that the fix is tested with both of these cases.Your environment
The
pygettext.pyscript was taken directly from this repository, I'm not sure that my distro even has a package that ships it.