So “str” in Python 2 is now called “bytes,” and “unicode” in Python 2 is now called “str”.
Oh boy, that’s why I have been so confused working on both Py2 and Py3 on various projects.
It’s year 2020. From this point on, it’s all Py3. So I/O is all byte string, and try to keep unicode/str inside python. IO_byte_string.decode() -> unicode_string, unicode_string.encode() -> IO_byte_string. So:
with open(filename, 'rb') as f: byte_string = f.read() # binary # external knowledge: data encoded in utf-8 my_string = byte_string.decode('utf-8') # my_string is a list of "code points" # Output say, using 8859-1 (Latin-1) output_byte_string = my_string.encode('8859-1')
# external knowledge: data encoded in utf-8 with open(filename, 'r', encoding='utf-8') as f: my_string = f.read() # code points # Output say, using 8859-1 (Latin-1) with open(filename2, 'r', encoding='8859-1') as f: f.write(my_string)