Every once in a while I feel guilty for not using Python 3, so I spin it up for a few rounds. My experience is usually:
- Start using Python 3
- oops, UnicodeDecodeError
- Go back to Python 2
Looks like I’m not the only one who has this frustration. Knowing when to use encode vs decode was always a frustrating exercise in trial and error. There are some good tips in the linked thread and is worth a thorough read. A useful bit is this comment from redditor Fylwind, part of which is:
encode
: textual data to binary data.decode
: binary data to textual data.The term “encode” means to a transformation from some high-level structure into bytes, hence in the context of strings it means converting text into binary data.
Q. What are the appropriate data types for textual data and binary data?
- In Python 3:
- Textual data is
str
, written as"foo"
.- Binary data is
bytes
, written asb"foo"
.- The
encode
function only works on textual data, and thedecode
function only works on binary data.- In Python 2:
- Textual data is
unicode
, written asu"foo"
. Ifunicode_literals
is enabled, then it’s"foo"
.- Binary data is
str
(alias:bytes
), written as"foo"
. Ifunicode_literals
is enabled, then it’sb"foo"