Nelson's Weblog: tech / bad / dosBabel


Mastodon @nelson@tech.lgbt Linkblog Mon 2026-04-06 Switching to OpenStreetMap Hunky Jesus 2026 (NSFWish) Farrow on Altman guppylm Fri 2026-04-03 Adobe fuckery Jujutsu Tutorial Thu 2026-04-02 RIP Stuey Weills AI security reviews Tue 2026-03-31 Learn and Test DMARC Mon 2026-03-30 GrindrPlus Sun 2026-03-29 Moonfrost White House terrible app Sat 2026-03-28 Homocore anthology Fri 2026-03-27 Understanding passkeys Sahlins biography Squares in Squares Wed 2026-03-25 about exe.dev curl > /dev/sda pr-review Tue 2026-03-24 Left-right split in Paris Search Archives 2024 12 11 10 09 08 07 06 05 04 03 02 01 2023 12 11 10 09 08 07 06 05 04 03 02 01 2022 12 11 10 09 08 07 06 05 04 03 02 01 2021 12 11 10 09 08 07 06 05 04 03 02 01 2020 12 11 10 09 08 07 06 05 04 03 02 01 2019 12 11 10 09 08 07 06 05 04 03 02 01 2018 12 11 10 09 08 07 06 05 04 03 02 01 2017 12 11 10 09 08 07 06 05 04 03 02 01 2016 12 11 10 09 08 07 06 05 04 03 02 01 2015 12 11 10 09 08 07 06 05 04 03 02 01 2014 12 11 10 09 08 07 06 05 04 03 02 01 2013 12 11 10 09 08 07 06 05 04 03 02 01 2012 12 11 10 09 08 07 06 05 04 03 02 01 2011 12 11 10 09 08 07 06 05 04 03 02 01 2010 12 11 10 09 08 07 06 05 04 03 02 01 2009 12 11 10 09 08 07 06 05 04 03 02 01 2008 12 11 10 09 08 07 06 05 04 03 02 01 2007 12 11 10 09 08 07 06 05 04 03 02 01 2006 12 11 10 09 08 07 06 05 04 03 02 01 2005 12 11 10 09 08 07 06 05 04 03 02 01 2004 12 11 10 09 08 07 06 05 04 03 02 01 2003 12 11 10 09 08 07 06 05 04 03 02 01 2002 12 11 10 09 08 07 06 05 04 03 02 01 2001 12 11 10 09 08 07 One good site MDN Nelson Minar nelson@monkey.org Blog licensed under a Creative Commons License		Windows Codepages are Insane Who knew that Microsoft Windows had so many different encodings? There's CP1252, the almost-but-not-quite ISO-Latin-1 that is responsible for the evil breakage of "smart quotes" by encouraging web publishers to act like `0x93` is a valid way to represent a left double quote. At least it encodes É in a sensible place, `0xc9`. But why stop at one codepage? There's also CP437, an ancient DOS codepage that is nothing like Latin-1 but contains Latin-1 characters like É at 0x90. Yes, that's a different place than CP1252. Apparently both of these evil 19th century codepages are still coexisting on my 21st century Windows XP system. I just dumped a bunch of MP3 files from my WinXP box to Linux and found the filenames hopelessly garbled. I finally guessed they're in CP437. I'm a bit surprised Samba didn't take care of it for me. Python to the rescue: def cp437ToLatin1(s): return unicode(s, 'cp437').encode('latin-1') tech • bad 2003-05-11 22:26 Z Nelson's Weblog • tech • bad