UnicodeEncodeError when redirecting Python output on Windows
While working on socialscan, I stumbled upon an unexpected error. When attempting to redirect output from the Python script to an external file, the following error appeared in the terminal:
UnicodeEncodeError: 'charmap' codec can't encode character '\U0001f4d6' in position 52: character maps to <undefined>
The script outputs Unicode characters, so this would indicate that the encoding used by Python does not support Unicode, but what’s strange was that this only occured when redirecting the output to a file, and not when printing to the console.
Upon further investigation, it seemed that while Python correctly uses the utf8
encoding when printing to the Windows console, the encoding defaults to cp1252
when printing to an external file (cp1252
is the default code page on English installations of Windows). The mismatch in encodings leads to an error.
To fix this error, we can explicitly set the encoding of stdout to UTF-8 so that Python does not default to CP-1252 for external files. Since Python 3.7, you can do so with:
sys.stdout.reconfigure(encoding='utf-8')
(A primer on all things Unicode)