paulgorman.org

< ^ txt

Mon Feb 27 08:24:38 EST 2017 Slept from around ten to six without waking. High of forty-eight and mostly sunny today. Decided to take the day off, just for the hell of it. I need to understand a little more about character encoding. https://paulgorman.org/technical/character_encoding_ascii_unicode.txt https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/ > What do web browsers do if they don’t find any Content-Type, either in the http headers or the meta tag? Internet Explorer actually does something quite interesting: it tries to guess, based on the frequency in which various bytes appear in typical text in typical encodings of various languages, what language and encoding was used. Because the various old 8 bit code pages tended to put their national letters in different ranges between 128 and 255, and because every human language has a different characteristic histogram of letter usage, this actually has a chance of working. It’s truly weird, but it does seem to work often enough that naïve web-page writers who never knew they needed a Content-Type header look at their page in a web browser and it _looks ok_, until one day, they write something that doesn’t exactly conform to the letter-frequency-distribution of their native language, and Internet Explorer decides it’s Korean and displays it thusly, proving, I think, the point that Postel’s Law about being “conservative in what you emit and liberal in what you accept” is quite frankly not a good engineering principle. Thirty minute walk before dark. Shimmering gold and slate-purple sunset. Watched several episodes of The Tunnel. Good. Read the first chapter of The Three Musketeers. Breakfast: coffee, carrots Lunch: apple, almonds, cheese Dinner: Philly cheesesteak

< ^ txt