Understanding Character Sets and Encodings

🎙 14 May 2014

Having only just recently been bit by the character encoding issue again, we thought it would be a good time to bring it up on the podcast. Starting from the beginning with ASCII, we move on to discuss how 8-bit compatible machines brought way to the ISO-8859-* standards. This leads us on to Unicode, with the goal to develop a single character-set encoding standard that could support all of the world’s scripts. Finally, we discuss the de-factor character encoding implementation used on the web today ‘UTF-8’, and reasons why this is the case.

Show Links

PhalconPHP
Team Pacific Rowers
Computerphile
phpwtf
wtfjs
Twitter - fabpot: php -r ‘echo in_array(“foo”, …
3v4l - EvAluate your code in our online PHP shell (100+ PHP versions)
Reversing a String in PHP
Reversing a Unicode String in PHP using UTF-16BE/LE
Portable UTF-8 in PHP
Lazy Load Enabled With AJAX Content
Foundation Version Control for Web Developers
Detecting UTF BOM - byte order mark
Unicode Character Table
Unicode - Wikipedia
Unicode <3 JavaScript - YouTube
Characters, Symbols and the Unicode Miracle - Computerphile - YouTube
Decode Unicode - Johannes Bergerhausen at TEDxVienna - YouTube
Pragmatic Unicode - YouTube
Punycode - Wikipedia
Understanding Unicode
Encool Tool - Generate Text with Symbols