I am editing HTML right now. I came across several difficulties/problems.
a) It seems, that the character-encoding of the HTML file is only to be specified at the save-dialog (UTF-8, Unicode, ANSI).
a.1) Other encodings (like European) are missing.
a.2) It would be nice if one could set a default-encoding per filetype. I use UTF-8 for all my HTML, though it often happens, that I forget to switch to UTF-8 at saving time.
b) Saving in UTF-8 and then using HTMLTidy with the UTF-8 input-switch (input is UTF-8) causes this to happen:
Code: Select all
line 37 column 388 - Warning: Warning: replacing invalid UTF-8 bytes (char. code
U+00BB)
line 37 column 398 - Warning: Warning: replacing invalid UTF-8 bytes (char. code
U+00AB)
line 41 column 4283 - Warning: Warning: replacing invalid UTF-8 bytes (char. cod
e U+00BB)
line 41 column 4290 - Warning: Warning: replacing invalid UTF-8 bytes (char. cod
e U+00AB)
line 41 column 5582 - Warning: Warning: replacing invalid UTF-8 bytes (char. cod
e U+00BB)
line 41 column 5595 - Warning: Warning: replacing invalid UTF-8 bytes (char. cod
e U+00AB)
line 41 column 5714 - Warning: Warning: replacing invalid UTF-8 bytes (char. cod
e U+00BB)
line 41 column 5727 - Warning: Warning: replacing invalid UTF-8 bytes (char. cod
e U+00AB)
line 43 column 87 - Warning: Warning: replacing invalid UTF-8 bytes (char. code
U+00BB)
line 43 column 99 - Warning: Warning: replacing invalid UTF-8 bytes (char. code
U+00AB)
line 52 column 7472 - Warning: Warning: replacing invalid UTF-8 bytes (char. cod
e U+00BB)
line 52 column 7478 - Warning: Warning: replacing invalid UTF-8 bytes (char. cod
e U+00AB)
line 52 column 7544 - Warning: Warning: replacing invalid UTF-8 bytes (char. cod
e U+00BB)
line 52 column 7559 - Warning: Warning: replacing invalid UTF-8 bytes (char. cod
e U+00AB)
line 54 column 1923 - Warning: Warning: replacing invalid UTF-8 bytes (char. cod
e U+00BB)
line 54 column 1948 - Warning: Warning: replacing invalid UTF-8 bytes (char. cod
e U+00AB)
line 54 column 2465 - Warning: Warning: replacing invalid UTF-8 bytes (char. cod
e U+00BB)
line 54 column 2512 - Warning: Warning: replacing invalid UTF-8 bytes (char. cod
e U+00AB)
line 54 column 3792 - Warning: Warning: replacing invalid UTF-8 bytes (char. cod
e U+00BB)
line 54 column 3797 - Warning: Warning: replacing invalid UTF-8 bytes (char. cod
e U+00AB)
line 54 column 4827 - Warning: Warning: replacing invalid UTF-8 bytes (char. cod
e U+00BB)
line 54 column 4914 - Warning: Warning: replacing invalid UTF-8 bytes (char. cod
e U+00AB)
line 55 column 309 - Warning: Warning: replacing invalid UTF-8 bytes (char. code
U+00BB)
line 55 column 626 - Warning: Warning: replacing invalid UTF-8 bytes (char. code
U+00AB)
line 59 column 4751 - Warning: Warning: replacing invalid UTF-8 bytes (char. cod
e U+0097)
Info: Doctype given is "-//W3C//DTD HTML 4.01 Transitional//EN"
Info: Document content looks like HTML 4.01
50 warnings, 0 errors were found!
Character codes for UTF-8 must be in the range: U+0000 to U+10FFFF.
The definition of UTF-8 in Annex D of ISO/IEC 10646-1:2000 also
allows for the use of five- and six-byte sequences to encode
characters that are outside the range of the Unicode character set;
those five- and six-byte sequences are illegal for the use of
UTF-8 as a transformation of Unicode characters. ISO/IEC 10646
does not allow mapping of unpaired surrogates, nor U+FFFE and U+FFFF
(but it does allow other noncharacters). For more information please refer to
http://www.unicode.org/unicode and http://www.cl.cam.ac.uk/~mgk25/unicode.html
Also the final result does not show special-characters other than with the "?" substitute in Mozilla, even if manually selecting UTF-8 as char-encoding for the browser.
Am I doing something wrong ?
Within the data flow I took great care to let all be UTF-8 (Zeus save, HTML char-encoding definition, Browser encoding)