Bug report
Bug description:
The INT opcode in pickle is the I character followed by an ASCII number and a newline. There are multiple comments asking if the base should be explicitly set to 10, or kept as 0. However, a discrepancy exists between pickle implementations:
_pickle.c uses strtol(s, &endptr, 0); with a base of 0, meaning 0xf would succeed
pickle.py uses int(data, 0) with a base of 0, meaning 0xf would succeed
pickletools.py uses read_decimalnl_short(), which calls int(s), meaning any non-decimal base would fail
This same inconsistency exists with the LONG opcode:
This means an attempt to disassemble a pickle bytestream using pickletools would fail here, while the actual unpickling process would proceed undisputed.
Personally, I don't really care whether all implementations are changed to base 10 or base 0 (save_long() only puts it in decimal form), but I think it should be consistent across all implementations. I'd submit a pull request for one way or the other, but I'm not sure which way you'd prefer it.
Also as a note, the pickle bytestream b'I0001\n.' (INT with the argument 0001) fails in pickle.py because having leading 0s in a number with base 0 causes an error. Note that no errors are thrown in _pickle.c because it uses strtol or pickletools.py because it doesn't have base 0 specified. If we keep the implementation as base 0, that discrepancy between pickle.py and other pickle implementations would stay, whereas if we change it to base 10 (aka remove base 0), that inconsistency would also go away. For LONG, both pickle.py and _pickle.c fail with b'L0001L\n.', but pickletools.py has no problem displaying that number (since it has no base specified).
CPython versions tested on:
3.11
Operating systems tested on:
Linux
Linked PRs
Bug report
Bug description:
The
INTopcode in pickle is theIcharacter followed by an ASCII number and a newline. There are multiple comments asking if the base should be explicitly set to 10, or kept as 0. However, a discrepancy exists between pickle implementations:_pickle.cusesstrtol(s, &endptr, 0);with a base of 0, meaning0xfwould succeedpickle.pyusesint(data, 0)with a base of 0, meaning0xfwould succeedpickletools.pyusesread_decimalnl_short(), which callsint(s), meaning any non-decimal base would failThis same inconsistency exists with the
LONGopcode:_pickle.cpickle.pypickletools.pyThis means an attempt to disassemble a pickle bytestream using
pickletoolswould fail here, while the actual unpickling process would proceed undisputed.Personally, I don't really care whether all implementations are changed to base 10 or base 0 (
save_long()only puts it in decimal form), but I think it should be consistent across all implementations. I'd submit a pull request for one way or the other, but I'm not sure which way you'd prefer it.Also as a note, the pickle bytestream
b'I0001\n.'(INTwith the argument0001) fails inpickle.pybecause having leading 0s in a number with base 0 causes an error. Note that no errors are thrown in_pickle.cbecause it usesstrtolorpickletools.pybecause it doesn't have base 0 specified. If we keep the implementation as base 0, that discrepancy betweenpickle.pyand other pickle implementations would stay, whereas if we change it to base 10 (aka remove base 0), that inconsistency would also go away. ForLONG, bothpickle.pyand_pickle.cfail withb'L0001L\n.', butpickletools.pyhas no problem displaying that number (since it has no base specified).CPython versions tested on:
3.11
Operating systems tested on:
Linux
Linked PRs