Line: 1 to 1 | ||||||||
---|---|---|---|---|---|---|---|---|
Deleted: | ||||||||
< < | On this page:
| |||||||
Changed: | ||||||||
< < | Appendix B: Encode URLs With UTF8 | |||||||
> > | Appendix B: Encode URLs With UTF8 | |||||||
Use internationalised characters within WikiWords and attachment names
This topic addresses implemented UTF-8 support for URLs only. The overall plan for UTF-8 support for TWiki is described in TWiki:Codev.ProposedUTF8SupportForI18N![]() | ||||||||
Added: | ||||||||
> > | On this page:
| |||||||
Current StatusTo simplify use of internationalised characters within WikiWords and attachment names, TWiki now supports UTF-8 URLs, converting on-the-fly to virtually any character set, including ISO-8859-*, KOI8-R, EUC-JP, and so on. |
Line: 1 to 1 | ||||||||
---|---|---|---|---|---|---|---|---|
On this page:
|
Line: 1 to 1 | ||||||||
---|---|---|---|---|---|---|---|---|
On this page:
| ||||||||
Changed: | ||||||||
< < | Appendix C: Encode URLs With UTF8 | |||||||
> > | Appendix B: Encode URLs With UTF8 | |||||||
Changed: | ||||||||
< < | This page addresses implemented UTF-8 support for URLs only. The overall plan for UTF-8 support for TWiki is described in TWiki:Codev.ProposedUTF8SupportForI18N![]() | |||||||
> > | Use internationalised characters within WikiWords and attachment names
This topic addresses implemented UTF-8 support for URLs only. The overall plan for UTF-8 support for TWiki is described in TWiki:Codev.ProposedUTF8SupportForI18N![]() | |||||||
Current Status | ||||||||
Line: 17 to 19 | ||||||||
| ||||||||
Changed: | ||||||||
< < | ISO-2022-*, HZ-* and other 'non-ASCII-safe' multi-byte character sets are now specifically excluded from use as the site character set, since they interfere with TWiki ML; however, many multi-byte character sets work fine, e.g. EUC-JP, GB2312, etc. | |||||||
> > | The following 'non-ASCII-safe' character encodings are now excluded from use as the site character set, since they interfere with TWiki markup: ISO-2022-*, HZ-*, Shift-JIS, MS-Kanji, GB2312, GBK, GB18030, Johab and UHC. However, many multi-byte character sets work fine, e.g. EUC-JP, EUC-KR, EUC-TW, and EUC-CN. In addition, UTF-8 can already be used, with some limitations, for East Asian languages where EUC character encodings are not acceptable - see TWiki:Codev.ProposedUTF8SupportForI18N![]() | |||||||
Changed: | ||||||||
< < | It's now possible to override the site character set defined in the $siteLocale setting in TWiki.cfg - this enables you to have a slightly different spelling of the character set in the server locale (e.g. 'eucjp') and the HTTP header sent to the browser (e.g. 'euc-jp'). | |||||||
> > | It's now possible to override the site character set defined in the {SiteLocale} setting in configure - this enables you to have a slightly different spelling of the character set in the server locale (e.g. 'eucjp') and the HTTP header sent to the browser (e.g. 'euc-jp'). | |||||||
This feature should also support use of Mozilla Browser with TWiki:Codev.TWikiOnMainframe![]() | ||||||||
Line: 43 to 45 | ||||||||
The main point is that you can use TWiki with international characters in WikiWords without changing your browser setup from the default, and you can also still use TWiki using non-UTF-8 URLs. This works on any Perl version from 5.005_03 onwards and corresponds to Phase 1 of TWiki:Codev.ProposedUTF8SupportForI18N![]() | ||||||||
Changed: | ||||||||
< < | UTF-8 URLs are automatically converted to the current $siteCharset (from the TWiki.cfg locale setting), using modules such as CPAN:Encode![]() | |||||||
> > | UTF-8 URLs are automatically converted to the current {Site}{Charset}, using modules such as CPAN:Encode![]() | |||||||
TWiki generates the whole page in the site charset, e.g. ISO-8859-1 or EUC-JP, but the browser dynamically UTF-8 encodes the attachment's URL when it's used. Since Apache serves attachment downloads without TWiki being involved, TWiki's code can't do its UTF-8 decoding trick, so TWiki URL-encodes such URLs in ISO-8859-1 or whatever when generating the page, to bypass this URL encoding, ensuring that the URLs and filenames seen by Apache remain in the site charset. | ||||||||
Line: 58 to 60 | ||||||||
For up to date information see TWiki:Codev.EncodeURLsWithUTF8![]() | ||||||||
Deleted: | ||||||||
< < | -- TWiki:Main.RichardDonkin![]() -- TWiki:Main.MattWilkie ![]() -- TWiki:Main.PeterThoeny ![]() | |||||||
Added: | ||||||||
> > | Related Topics: AdminDocumentationCategory |
Line: 1 to 1 | ||||||||
---|---|---|---|---|---|---|---|---|
Added: | ||||||||
> > | On this page:
Appendix C: Encode URLs With UTF8This page addresses implemented UTF-8 support for URLs only. The overall plan for UTF-8 support for TWiki is described in TWiki:Codev.ProposedUTF8SupportForI18N![]() Current StatusTo simplify use of internationalised characters within WikiWords and attachment names, TWiki now supports UTF-8 URLs, converting on-the-fly to virtually any character set, including ISO-8859-*, KOI8-R, EUC-JP, and so on. Support for UTF-8 URL encoding avoids having to configure the browser to turn off this encoding in URLs (the default in Internet Explorer, Opera Browser and some Mozilla Browser URLs) and enables support of browsers where only this mode is supported (e.g. Opera Browser for Symbian smartphones). A non-UTF-8 site character set (e.g. ISO-8859-*) is still used within TWiki, and in fact pages are stored and viewed entirely in the site character set - the browser dynamically converts URLs from the site character set into UTF-8, and TWiki converts them back again. System requirements are updated as follows:
$siteLocale setting in TWiki.cfg - this enables you to have a slightly different spelling of the character set in the server locale (e.g. 'eucjp') and the HTTP header sent to the browser (e.g. 'euc-jp').
This feature should also support use of Mozilla Browser with TWiki:Codev.TWikiOnMainframe![]() ![]() ![]() Details of ImplementationURLs are not allowed to contain non-ASCII (8th bit set) characters: http://www.w3.org/TR/html4/appendix/notes.html#non-ascii-chars![]() ![]() ![]() ![]() ![]() $siteCharset (from the TWiki.cfg locale setting), using modules such as CPAN:Encode![]() ![]() Testing and LimitationIt should work with TWiki:Codev.TWikiOnMainframe![]() ![]() ![]() -- TWiki:Main.MattWilkie ![]() -- TWiki:Main.PeterThoeny ![]() |