site banner

Friday Fun Thread for January 9, 2026

Be advised: this thread is not for serious in-depth discussion of weighty topics (we have a link for that), this thread is not for anything Culture War related. This thread is for Fun. You got jokes? Share 'em. You got silly questions? Ask 'em.

2
Jump in the discussion.

No email address required.

Fun Unicode story time.

In 2015, I went to North Korea to teach computer science. One of the things I taught was how to integrate their computer systems to the internet, and one of the major challenges is the lack of compatibility between Unicode and their internal character sets.

In North Korea, when they use Unicode (which is rare actually), they use these private-use code points for special characters for the Kims. The Kims are thought to be so special that their names are always written in a fancy calligraphy, and the Norks don't want to rely on HTML to provide this fancy calligraphy (because that might not always be available), and so they do this calligraphy at the font level. The "advantage" of this is that you can differentiate between an "ordinary" peasant Kim Il Sung (of which there actually were some) and "the" Kim Il Sung at the font level in every computer program. (The North Koreans didn't invent this idea, but rather borrowed it from how some Arabic encodings treat Muhammad and his sayings.) Anyway, this caused problems for US diplomats when we would receive documents from the North, convert them to Unicode, but then all references to any Kim would appear as square boxes and diplomats didn't know whether the document was talking about Kim Il Sung, Kim Jong Un, or Kim Jong Il. So I went to Korea to help sort this mess out.

Below is (part of) a memo I wrote for the North's ministry of education that outlines some of the other problems that the North has had with the Unicode standard.


Technical Problems

The Committee for Standardization of the DPRK (CSK) submitted a memo to the Unicode Consortium in 1997 that lists three difficulties in working with Unicode in the DPRK. None of these problems have been fixed in the last 25 years. The problems are:

  1. The official name of the Korean language script in Unicode is "Hangul" (see Section 18.6 of the Unicode 14.0 standard). Hangul is the ROK's name for their script, and the DPRK prefers the name "Choseongul". The DPRK suggested that the name "Korean characters" be adopted as a politically neutral term.

  2. The DPRK and ROK use a different sorting order for their alphabets. The ROK order for consonants is

    ㄱ   ㄲ  ㄴ  ㄷ  ㄸ  ㄹ  ㅁ  ㅂ  ㅃ  ㅅ  ㅆ  ㅇ  ㅈ  ㅉ  ㅊ  ㅋ  ㅌ  ㅍ  ㅎ
    

    and the DPRK order is

    ㄱ   ㄴ  ㄷ  ㄹ  ㅁ  ㅂ  ㅅ  ㅈ  ㅊ  ㅋ  ㅌ  ㅍ  ㅎ  ㄲ  ㄸ  ㅃ  ㅆ  ㅉ  ㅇ
    

    For example, in the ROK, the word 까치 (magpie) comes alphabetically before the word 나비 (butterfly), but in the DPRK the word 나비 comes alphabetically before 까치.

    The Unicode standard orders Korean characters according to the ROK-ordering, and so by default all sorting done in any programming language will sort Korean words in the ROK-preferred way. A special extension called a collation algorithm is required to sort according to the DPRK-ordering.

    As of 2022, the current list of collation algorithms does not have an entry for the DPRK-dialect of Korean, and so it is currently impossible in any programming language to sort text alphabetically accoding to the DPRK-ordering.

  3. The DPRK internally uses the KPS9566 character set. This character set contains several characters that the Unicode Consortium does not want to support. For example, it contains political characters representing the Workers Party of Korea, and 4 distinct versions of the character 김 (one for normal text, and one each for Kim Il Sung, Kim Jong Il, and Kim Jong Un).

    This lack of support for certain characters used by the DPRK prevents documents produced in the DPRK from being opened in tools like Microsoft Word, and even programming languages like Python and R cannot work with these documents. This lack of compatibility adds considerable friction to negotiations, since diplomats between the DPRK and the United States cannot easily exchange documents.

There is at least one more problem with the Unicode standard for the DPRK not listed above:

  1. The current Unicode standard does not support transliteration of Korean into Latin characters using the DPRK's preferred Romanization system, and instead only supports the McCune–Reischauer system. Furthermore, transliterations into non-Latin alphabets are not supported at all, despite the importance of transliterating into Cyrillic. A 2018 UN report on romanization describes a good history of the many Romanization systems for Korean.

Historical Basis

The ROK has been actively and publicly developing their systems for encoding Korean text since the earliest days of the internet. KAIST first developed the KSC5601 encoding method in 1974, and actively worked with companies like IBM and Microsoft, and standards organizations in the US and Europe to ensure widespread support for this standard. The ROK issued an official Request for Comments (RFC) on the encoding in 1993 via RFC1557 to suggest that KSC5601 be the standard format for exchanging Korean emails. When the Unicode Consortium was first founded in 1991, ROK programmers were well positioned to contribute to the developing standard. They had the detailed technical knowledge of developing many of their own internal encodings, they had experience interacting with diverse technical committees, and they had the English communication skills for communicating in the Unicode Consortium's working language.

In contrast, the DPRK has severely lagged the ROK in this area. It's not known when the DPRK first developed their own Korean encoding, but the DPRK's KPS9566 encoding was first published internationally in 1997 and officially registered with the Internaional Standards Organization (ISO) in 1998. It wasn't until August 1999 that the DPRK began discussions for enabling Unicode compatibility. The DPRK submitted an official statement to the Unicode Consortium outlining their difficulties adopting the Unicode standard (summarized above), but since they entered this discussion 8 years after it began, the technical decisions had already been made. In order to not break backwards compatibility, the Unicode Consortium issued a statement that they could not implement the changes requested by the DPRK.

Fun Fact: There are 7 emojis in the current Unicode standard that were added at the request of the DPRK. The DPRK originally suggested that the HOT BEVERAGE emoji ☕ should be called the HOT TEA emoji, but an American suggested the emoji be renamed so that Americans could use it to represent coffee. The DPRK delegation agreed, and so the emoji was renamed. This is an example of technical experts working on narrow technical problems being able to work together in a way that diplomats can't.


This is awesome