SuperMemo 2004: Bug in UTF-8 encoding

All users of SuperMemo 2004 are encouraged to update to SuperMemo 2004 Build 12.03 dated Sep 15, 2004 (or later). 

The update is particularly important for users who use languages other than English. Some text encoding strategies adopted by earlier versions will not be supported in the future. Updating at later time may require conversion of some foreign language texts to retain compatibility with future versions. We sincerely apologize for this reversal.

If you do not care about technical details explained below, see Recommendations

Introduction

On September 7, 2004, three days after the release of SuperMemo 2004, we have documented a bug in UTF-8 encoding in the final release of the program. This bug resulted in ambiguous interpretation of UTF-8 encoded texts stored in SuperMemo registry. It would primarily affect students of Asian languages who used plain text components, as opposed to HTML components. 

Symptoms

Collections upgraded from SuperMemo 2002 might display incorrect texts. In particular, short Asian texts in some encodings may yield 10-20% incorrect conversion rate.

For example, when learning Japanese, the following text:

would be displayed as

Reasons

SuperMemo 2004 adopted UTF-8 encoding in the registry as the most efficient way of representing Unicode. However, some short MBCS strings used by plain text components from earlier versions of SuperMemo can form legitimate UTF-8 sequences and yield seemingly random texts on decoding.

Solution

Updated SuperMemo 2004 will not attempt to convert legacy collections created with earlier SuperMemos. Instead, it will allow of un-encoded texts in plain texts components. This implies that both UTF-8 encoded and plain ANSI strings will coexist at the registry level. Users will be able to use old-style code page fonts with plain text components to create collections for learning languages (with the added benefit of the possibility to display non-Latin titles in the contents window, registries, browser, etc.). At the same time, users who prefer Unicode and HTML-based incremental reading, will still be able to fully benefit from UTF-8 encoding at the registry level.

Side effects

  1. Encoded strings: Users who created UTF-8 encoded material with plain text components between September 4 and the day of the update, may need to convert some UTF-8 encoded texts back to ANSI. For example, after the update, if the answer component is a plain text component, the following UTF-8 encoded text:

Q: A set of [...] peptides are generated from -APP by proteases known as the - and gamma-secretases
A: -amyloid (A)

May show up as:

Q: A set of [...] peptides are generated from -APP by proteases known as the - and gamma-secretases
A: ß-amyloid (Aß)

Right-click over the encoded text, and choose Text : Convert : Decode UTF-8 to revert this change.

  1. Duplicate registry entries: Some confusion may result from awkward registry sorting when two encodings are used. For example the word -amyloid will be coded differently if it is used by HTML or RTF components and differently if it is used by plain text components or as plain text in HTML components. In HTML components, the HTML string ß-amyloid will be retrieved in Unicode and UTF-8 encoded. In plain text components, simple ANSI string -amyloid will be stored in the registry. Those two member names will sort differently, but will appear identical as -amyloid in collection elements as well as in the registry window in the member list

RECOMMENDATIONS

It is highly recommended all users of SuperMemo 2004 update to the newest version available. Here are the steps:

  1. In File : Level check Middle or Professional. This will provide access to the Help menu
  2. Use Help : About to see which version of SuperMemo you are using
  3. See the Date field. If the date is Sep 15, 2004 or later, no further action is necessary
  4. If the Date field indicates any date before Sep 15, 2004, continue reading below
  5. Write to shop(AT)supermemo(.)com to receive detailed update instructions
  6. If you happen to rerun the installer with update options, be sure not to overwrite layouts.ini and suprememo.ini files