Insight 27 Feb 2024

Python tip: always specify your file encoding

If your code might ever run on multiple operating systems (both Windows and Linux for example) make sure to specify the character encoding of the files you read/write. If a non-ASCII character (e.g. Ł or✨) makes its way into your data, you'll likely wish you'd specified a character encoding. But why? What's the issue?

Trey Hunner Trey Hunner

Curious about character encodings in Python? Watch this screencast.

Well, on Windows, Python reads and writes files with a Windows-1252 (cp1252) encoding by default, but on other machines it defaults to using a UTF-8 (utf-8) encoding.

In your cross-platform Python code I recommend always specifying encoding="utf-8" when using the built-in open function.

>>> with open("file.txt", encocding="utf-8") as file:
...     text = file.read()
...
>>> text
'Łukasz and Pablo of core.py ✨🐍'