`pygettext --docstrings` doesn't actually extract module docstring due to tokenize returning ENCODING token

# Bug report

When running `pygettext --docstrings file.py` on Python 3.7 and above, the module docstring does not get extracted.

Reproduction steps:
1. Create `repro.py` with the following contents (actually you can omit everything but the first three lines):
```py
"""
Module docstring
"""

class X:
    """class docstring"""

    def method(self):
        """method docstring"""


def function():
    """function docstring"""
```
2. Try running: `python pygettext.py --docstrings repro.py`
3. Look at the `messages.pot` that was created and see that it doesn't contain the module docstring:
```
# SOME DESCRIPTIVE TITLE.
# Copyright (C) YEAR ORGANIZATION
# FIRST AUTHOR <EMAIL@ADDRESS>, YEAR.
#
msgid ""
msgstr ""
"Project-Id-Version: PACKAGE VERSION\n"
"POT-Creation-Date: 2022-08-06 00:54+0200\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language-Team: LANGUAGE <LL@li.org>\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"
"Generated-By: pygettext.py 1.5\n"


#: repro.py:6
#, docstring
msgid "class docstring"
msgstr ""

#: repro.py:9
#, docstring
msgid "method docstring"
msgstr ""

#: repro.py:13
#, docstring
msgid "function docstring"
msgstr ""
```

The reason for this appears to be that pygettext doesn't account for `token.ENCODING` which was added in Python 3.7.

A simple solution for this would be to skip `tokenize.ENCODING` here:
https://github.com/python/cpython/blob/29650fea9605bf1f48320487c6d5d6d70d97ad95/Tools/i18n/pygettext.py#L338-L340

This actually reveals another bug which is caused by the `return` in the line 340 - detection of module docstring causes pygettext to swallow one token without handling it. This means that for a code like this:
```
class X:
    """class docstring"""
```
`pygettext` will not extract the docstring of class X once the solution gets applied if proper care isn't taken. I'm mentioning it so that the fix is tested with both of these cases.

# Your environment

- CPython versions tested on: 3.7.13 (installed from deadsnakes ppa), 3.10.4 (default Python on my system)
- Operating system and architecture: Ubuntu 22.04 LTS
The `pygettext.py` script was taken directly from this repository, I'm not sure that my distro even has a package that ships it.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`pygettext --docstrings` doesn't actually extract module docstring due to tokenize returning ENCODING token #95731

Bug report

Your environment

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier! Saves Data!

	elif ttype not in (tokenize.COMMENT, tokenize.NL):
	self.__freshmodule = 0
	return

Uh oh!

pygettext --docstrings doesn't actually extract module docstring due to tokenize returning ENCODING token #95731

Description

Bug report

Your environment

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier! Saves Data!

`pygettext --docstrings` doesn't actually extract module docstring due to tokenize returning ENCODING token #95731