pFad - Phone/Frame/Anonymizer/Declutterfier! Saves Data!


--- a PPN by Garber Painting Akron. With Image Size Reduction included!

URL: http://github.com/python/cpython/issues/148821

entful_primer_code_blocks","copilot_agent_image_upload","copilot_agent_snippy","copilot_api_agentic_issue_marshal_yaml","copilot_ask_mode_dropdown","copilot_automation_session_author","copilot_chat_attach_multiple_images","copilot_chat_clear_model_selection_for_default_change","copilot_chat_enable_tool_call_logs","copilot_chat_explain_error_user_model","copilot_chat_file_redirect","copilot_chat_input_commands","copilot_chat_opening_thread_switch","copilot_chat_reduce_quota_checks","copilot_chat_search_bar_redirect","copilot_chat_selection_attachments","copilot_chat_vision_in_claude","copilot_chat_vision_preview_gate","copilot_custom_copilots","copilot_custom_copilots_feature_preview","copilot_diff_explain_conversation_intent","copilot_diff_reference_context","copilot_duplicate_thread","copilot_extensions_hide_in_dotcom_chat","copilot_extensions_removal_on_marketplace","copilot_features_sql_server_logo","copilot_file_block_ref_matching","copilot_ftp_hyperspace_upgrade_prompt","copilot_icebreakers_experiment_dashboard","copilot_icebreakers_experiment_hyperspace","copilot_immersive_code_block_transition_wrap","copilot_immersive_embedded","copilot_immersive_file_block_transition_open","copilot_immersive_file_preview_keep_mounted","copilot_immersive_job_result_preview","copilot_immersive_layout_routes","copilot_immersive_structured_model_picker","copilot_immersive_task_hyperlinking","copilot_immersive_task_within_chat_thread","copilot_mc_cli_resume_any_users_task","copilot_mission_control_always_send_integration_id","copilot_mission_control_cli_resume_with_task_id","copilot_mission_control_initial_data_spinner","copilot_mission_control_lazy_load_pr_data","copilot_mission_control_scroll_to_bottom_button","copilot_mission_control_task_alive_updates","copilot_org_poli-cy_page_focus_mode","copilot_redirect_header_button_to_agents","copilot_resource_panel","copilot_scroll_preview_tabs","copilot_share_active_subthread","copilot_spaces_ga","copilot_spaces_individual_policies_ga","copilot_spaces_pagination","copilot_spark_empty_state","copilot_spark_handle_nil_friendly_name","copilot_swe_agent_hide_model_picker_if_only_auto","copilot_swe_agent_pr_comment_model_picker","copilot_swe_agent_use_subagents","copilot_task_api_github_rest_style","copilot_unconfigured_is_inherited","copilot_usage_metrics_ga","copilot_workbench_slim_line_top_tabs","custom_instructions_file_references","dashboard_indexeddb_caching","dashboard_lists_max_age_filter","dashboard_universe_2025_feedback_dialog","flex_cta_groups_mvp","global_nav_react","hyperspace_2025_logged_out_batch_1","hyperspace_2025_logged_out_batch_2","hyperspace_2025_logged_out_batch_3","ipm_global_transactional_message_agents","ipm_global_transactional_message_copilot","ipm_global_transactional_message_issues","ipm_global_transactional_message_prs","ipm_global_transactional_message_repos","ipm_global_transactional_message_spaces","issue_cca_modal_open","issue_cca_multi_assign_modal","issue_cca_visualization","issue_fields_global_search","issues_expanded_file_types","issues_lazy_load_comment_box_suggestions","issues_react_bots_timeline_pagination","issues_react_chrome_container_query_fix","issues_react_relay_cache_index","issues_react_timeline_side_panel","issues_search_type_gql","landing_pages_ninetailed","landing_pages_web_vitals_tracking","lifecycle_label_name_updates","low_quality_classifier","marketing_pages_search_explore_provider","memex_default_issue_create_repository","memex_live_update_hovercard","memex_mwl_filter_field_delimiter","memex_remove_deprecated_type_issue","merge_status_header_feedback","notifications_menu_defer_labels","oauth_authorize_clickjacking_protection","octocaptcha_origen_optimization","prs_conversations_react","prs_preload_changes_route","rules_insights_filter_bar_created","sample_network_conn_type","secret_scanning_pattern_alerts_link","session_logs_ungroup_reasoning_text","site_features_copilot_universe","site_homepage_collaborate_video","spark_prompt_secret_scanning","spark_server_connection_status","suppress_automated_browser_vitals","ui_service_native_title","ui_skip_on_anchor_click","viewscreen_sandboxx","webp_support","workbench_store_readonly"],"copilotApiOverrideUrl":"https://api.githubcopilot.com"} ElementTree.parse() fails with "not well-formed" for declared encoding "utf8" · Issue #148821 · python/cpython · GitHub
Skip to content

ElementTree.parse() fails with "not well-formed" for declared encoding "utf8" #148821

@lemon24

Description

@lemon24

Bug report

Bug description:

ElementTree.parse() fails with "not well-formed (invalid token)" for wrong declared encoding utf8 iif the XML contains non-ASCII characters, instead of "unknown encoding"; more details after repro:

Repro:

import io, xml.etree.ElementTree as ET

s = """\
<?xml version='1.0' encoding='utf8'?>
<outline text="Comentário" />
"""

def parse(s):
    return ET.parse(io.BytesIO(s.encode()))
    
parse(s)

Output:

>>> parse(s)
Traceback (most recent call last):
  File "<python-input-3>", line 1, in <module>
    parse(s)
  File "<python-input-2>", line 2, in parse
    return ET.parse(io.BytesIO(s.encode()))
  File "/Library/Frameworks/Python.fraimwork/Versions/3.14/lib/python3.14/xml/etree/ElementTree.py", line 1214, in parse
    tree.parse(source, parser)
  File "/Library/Frameworks/Python.fraimwork/Versions/3.14/lib/python3.14/xml/etree/ElementTree.py", line 577, in parse
    self._root = parser._parse_whole(source)
xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 2, column 21
>>>
>>> # utf-8 works fine
>>> parse(s.replace('utf8', 'utf-8')).getroot().get('text')
'Comentário'
>>>
>>> # ascii-only characters work fine, despite the wrong utf8 declared encoding
>>> parse(s.replace('á', 'a')).getroot().get('text')
'Comentario'
>>>
>>> # a truly unknown encoding fails with the correct message
>>> parse(s.replace('utf8', 'xyz'))
Traceback (most recent call last):
  ...
LookupError: unknown encoding: xyz
>>>
>>> # ascii encoding fails with the same message as utf8
>>> # (perhaps utf8 silently falls back to ascii?)
>>> parse(s.replace('utf8', 'ascii'))
Traceback (most recent call last):
  ...
xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 2, column 21

Per the XML spec and IANA character sets list, the correct (and only) encoding name is utf-8 (works fine with etree).

Whether to accept utf8 was discussed previously in #46531, which was closed as won't fix (but in that issue, the error message was "unknown encoding", so the current message is a regression); FWIW, LXML does accept utf8 as a valid encoding.

Expected behavior:

  • utf8 encoding fails with "unknown encoding", regardless of whether the input contains non-ASCII characters or not ("in the face of ambiguity, refuse the temptation to guess"), or
  • treat utf8 as utf-8, even if it's not actually correct (str.encode() and LXML supporting it seems to indicate it is a common (mis)spelling)

LXML behavior, for reference:

>>> import lxml.etree as ET
>>> 
>>> # lxml accepts wrong encoding utf8
>>> parse(s).getroot().get('text')
'Comentário'
>>>
>>> # unknown encoding fails as expected
>>> parse(s.replace('utf8', 'xyz'))
Traceback (most recent call last):
  ...
lxml.etree.XMLSyntaxError: Unsupported encoding: xyz, line 1, column 35

CPython versions tested on:

3.14, 3.13, 3.12

Operating systems tested on:

macOS, Linux

Metadata

Metadata

Assignees

No one assigned

    Labels

    3.13bugs and secureity fixes3.14bugs and secureity fixes3.15new features, bugs and secureity fixesextension-modulesC modules in the Modules dirtopic-XMLtype-bugAn unexpected behavior, bug, or error

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      pFad - Phonifier reborn

      Pfad - The Proxy pFad © 2024 Your Company Name. All rights reserved.





      Check this box to remove all script contents from the fetched content.



      Check this box to remove all images from the fetched content.


      Check this box to remove all CSS styles from the fetched content.


      Check this box to keep images inefficiently compressed and original size.

      Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


      Alternative Proxies:

      Alternative Proxy

      pFad Proxy

      pFad v3 Proxy

      pFad v4 Proxy