ASCII Output

If checked, this option forces the output to only contain ASCII characters*.

For example, the non-ASCII string Îñţérñåţîöñåļîžåţîờñ might be encoded in various languages as follows:
HTML"Îñţérñåţîöñåļîžåţîờñ"
JS"\xce\xf1\u0163\xe9r\xf1\xe5\u0163\xee\xf6\xf1\xe5\u013c\xee\u017e\xe5\u0163\xee\u1edd\xf1"
CSS"\ce\f1\163\e9 r\f1\e5\163\ee\f6\f1\e5\13c\ee\17e\e5\163\ee\1edd\f1"

This option means that the output can be conveniently stored in systems that might not preserve character encoding or charset headers since 7bit ASCII characters translate directly to unicode codepoints, and the UTF-8 and ASCII encodings of text that contains only ASCII codepoints are identical.

Latin text with a few non-latin characters also often compresses better when only ASCII characters are used in the output.


* — Strictly speaking, ASCII is an encoding so talking about ASCII characters is fudging terms. The Caja Web Tools ship content as UTF-8 encoded text, so when we say we restrict the output to ASCII, we mean that the output will be correctly encoded text not containing any code units outside the range [1, 126]; the group which Unicode set aside for ASCII compatibility.

Codepoint 0 is defined in ASCII, and has the same meaning in unicode, but many systems have trouble with NUL bytes, so we make sure those do not appear literally in the output either. For example, many C & C++ programs treat the NUL char as a string terminator, so content with NULs might be only partially interpreted. Some UTF-8 encoders decide to encode the NUL char using a 2 byte sequence which means that the output would not be both valid UTF-8 and ASCII if NUL chars were present.