So “str” in Python 2 is now called “bytes,” and “unicode” in Python 2 is now called “str”.

Oh boy, that’s why I have been so confused working on both Py2 and Py3 on various projects.

It’s year 2020. From this point on, it’s all Py3. So I/O is all byte string, and try to keep unicode/str inside python. IO_byte_string.decode() -> unicode_string, unicode_string.encode() -> IO_byte_string. So:

[code lang=“python”] with open(filename, ‘rb’) as f: byte_string = f.read() # binary

external knowledge: data encoded in utf-8

my_string = byte_string.decode(‘utf-8’)

my_string is a list of “code points”

Output say, using 8859-1 (Latin-1)

output_byte_string = my_string.encode(‘8859-1’) [/code]

[code lang=“python”] # external knowledge: data encoded in utf-8 with open(filename, ‘r’, encoding=‘utf-8’) as f: my_string = f.read() # code points

# Output say, using 8859-1 (Latin-1) with open(filename2, ‘r’, encoding=‘8859-1’) as f: f.write(my_string) [/code]

ref: https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/ https://nedbatchelder.com/text/unipain.html

external knowledge: data encoded in utf-8#

my_string is a list of “code points”#

Output say, using 8859-1 (Latin-1)#

external knowledge: data encoded in utf-8

my_string is a list of “code points”

Output say, using 8859-1 (Latin-1)