get_html_translation_table() - php 字符串函数
get_html_translation_table()
(PHP 4, PHP 5, PHP 7)
返回使用htmlspecialchars()和htmlentities()后的转换表
说明
get_html_translation_table([int $table= HTML_SPECIALCHARS[,int $flags= ENT_COMPAT | ENT_HTML401[,string $encoding= 'UTF-8']]]) : arrayget_html_translation_table()将返回htmlspecialchars()和htmlentities()处理后的转换表。
Note:特殊字符可以使用多种转换方式。例如:"可以被转换成","或者".get_html_translation_table()返回其中最常用的。
参数
$table有两个新的常量(HTML_ENTITIES
,HTML_SPECIALCHARS
)允许你指定你想要的表。
A bitmask of one or more of the following flags, which specify which quotes the table will contain as well as which document type the table is for. The default isENT_COMPAT | ENT_HTML401.
Constant Name | Description |
---|---|
ENT_COMPAT | Table will contain entities for double-quotes, but not for single-quotes. |
ENT_QUOTES | Table will contain entities for both double and single quotes. |
ENT_NOQUOTES | Table will neither contain entities for single quotes nor for double quotes. |
ENT_HTML401 | Table for HTML 4.01. |
ENT_XML1 | Table for XML 1. |
ENT_XHTML | Table for XHTML. |
ENT_HTML5 | Table for HTML 5. |
Encoding to use. If omitted, the default value for this argument is ISO-8859-1 in versions of PHP prior to 5.4.0, and UTF-8 from PHP 5.4.0 onwards.
支持以下字符集:
字符集 | 别名 | 描述 |
---|---|---|
ISO-8859-1 | ISO8859-1 | 西欧,Latin-1 |
ISO-8859-5 | ISO8859-5 | Little used cyrillic charset (Latin/Cyrillic). |
ISO-8859-15 | ISO8859-15 | 西欧,Latin-9。增加欧元符号,法语和芬兰语字母在 Latin-1(ISO-8859-1)中缺失。 |
UTF-8 | ASCII 兼容的多字节 8 位 Unicode。 | |
cp866 | ibm866, 866 | DOS 特有的西里尔编码。本字符集在 4.3.2 版本中得到支持。 |
cp1251 | Windows-1251, win-1251, 1251 | Windows 特有的西里尔编码。本字符集在 4.3.2 版本中得到支持。 |
cp1252 | Windows-1252, 1252 | Windows 特有的西欧编码。 |
KOI8-R | koi8-ru, koi8r | 俄语。本字符集在 4.3.2 版本中得到支持。 |
BIG5 | 950 | 繁体中文,主要用于中国台湾省。 |
GB2312 | 936 | 简体中文,中国国家标准字符集。 |
BIG5-HKSCS | 繁体中文,附带香港扩展的 Big5 字符集。 | |
Shift_JIS | SJIS, 932 | 日语 |
EUC-JP | EUCJP | 日语 |
MacRoman | Mac OS 使用的字符串。 | |
'' | An empty string activates detection from script encoding (Zend multibyte),default_charsetand current locale (seenl_langinfo()andsetlocale()), in this order. Not recommended. |
Note:其他字符集没有认可。将会使用默认编码并抛出异常。
返回值
将转换表作为一个数组返回。
更新日志
版本 | 说明 |
---|---|
5.4.0 | The default value for the$encodingparameter was changed to UTF-8. |
5.4.0 | The constantsENT_HTML401 ,ENT_XML1 ,ENT_XHTML andENT_HTML5 were added. |
5.3.4 | The$encodingparameter was added. |
范例
Translation Table Example
以上例程的输出类似于:
array(1510) { [" "]=> string(9) "
" ["!"]=> string(6) "!" ["""]=> string(6) """ ["#"]=> string(5) "#" ["$"]=> string(8) "$" ["%"]=> string(8) "%" ["&"]=> string(5) "&" ["'"]=> string(6) "'" // ... }
参见
htmlspecialchars()
将特殊字符转换为 HTML 实体htmlentities()
将字符转换为 HTML 转义字符html_entity_decode()
Convert HTML entities to their corresponding characters
Be careful using get_html_translation_table() in a loop, as it's very slow.
The fact that MS-word and some other sources use CP-1252, and that it is so close to Latin1 ('ISO-8859-1') causes a lot of confusion. What confused me the most was finding that mySQL uses CP-1252 by default. You may run into trouble if you find yourself tempted to do something like this: Don't do it. DON'T DO IT! You can use: or just convert directly: But your web page is probably encoded UTF-8, and you probably don't really want CP-1252 text flying around, so fix the character encoding first:
Not sure what's going on here but I've run into a problem that others might face as well... returns the single quote ' as being equal to ' while returns it as being equal to ' I've had to do a specific string replacement for the time being... Not sure if it's an issue with the function or the array manipulation. -Pat
I wrote a quick little function for converting something like '·' into '·': $to_convert = '·'; $table = get_html_translation_table(HTML_ENTITIES); $equiv = ''.ord(array_search($to_convert,$table)).';';
to display the mapping on a webpage no matter what the server encoding is, this can be used echo "\n"; echo htmlentities(print_r((get_html_translation_table(HTML_SPECIALCHARS)), true)); echo htmlentities(print_r((get_html_translation_table(HTML_ENTITIES)), true)); since get_html_translation_table() actually gives the special chars in iso-8859-1 (Latin-1) encoding, so to see the tables correctly using print_r(get_html_translation_table(HTML_ENTITIES)); your server needs to give a HTTP header as iso-8859-1, unless you use header() or manually set the browser's encoding setting to iso-8859-1. And you need to view the source of the page to see the mapping. (except English version of IE 7 outputs the page source as iso-8859-1 anyway).get_html_translation_table It works only with the first 256 Codepositions. For Higher Positions, for Example ф (a kyrillic Letter) it shows the same.without heavy scientific analysis, this seems to work as a quick fix to making text originating from a Microsoft Word document display as HTML:htmlentities includes htmlspecialchars, so here's how to convert an UTF-8 string : htmlentities($string, ENT_QUOTES, 'UTF-8');If you have troubles (like me) getting data from ISO-8859-1 encoded forms where user copy and paste from word, this routine could be useful. It adds to the standard get_html_translation_table the codes of the characters usually M$ Word replacs into typed text. Otherwise those characters would never be displayed correctly in html output. function get_html_translation_table_CP1252() { $trans = get_html_translation_table(HTML_ENTITIES); $trans[chr(130)] = '‚'; // Single Low-9 Quotation Mark $trans[chr(131)] = 'ƒ'; // Latin Small Letter F With Hook $trans[chr(132)] = '„'; // Double Low-9 Quotation Mark $trans[chr(133)] = '…'; // Horizontal Ellipsis $trans[chr(134)] = '†'; // Dagger $trans[chr(135)] = '‡'; // Double Dagger $trans[chr(136)] = 'ˆ'; // Modifier Letter Circumflex Accent $trans[chr(137)] = '‰'; // Per Mille Sign $trans[chr(138)] = 'Š'; // Latin Capital Letter S With Caron $trans[chr(139)] = '‹'; // Single Left-Pointing Angle Quotation Mark $trans[chr(140)] = 'Œ '; // Latin Capital Ligature OE $trans[chr(145)] = '‘'; // Left Single Quotation Mark $trans[chr(146)] = '’'; // Right Single Quotation Mark $trans[chr(147)] = '“'; // Left Double Quotation Mark $trans[chr(148)] = '”'; // Right Double Quotation Mark $trans[chr(149)] = '•'; // Bullet $trans[chr(150)] = '–'; // En Dash $trans[chr(151)] = '—'; // Em Dash $trans[chr(152)] = '˜'; // Small Tilde $trans[chr(153)] = '™'; // Trade Mark Sign $trans[chr(154)] = 'š'; // Latin Small Letter S With Caron $trans[chr(155)] = '›'; // Single Right-Pointing Angle Quotation Mark $trans[chr(156)] = 'œ'; // Latin Small Ligature OE $trans[chr(159)] = 'Ÿ'; // Latin Capital Letter Y With Diaeresis ksort($trans); return $trans; }If you want to display special HTML entities in a web browser, you can use the following code: If you don't, the key name of each element will appear to be the same as the element content itself, making it look mighty stupid. ;)I found this useful in converting latin charactersIf you want to decode all those { symbols as well.... function unhtmlentities ($string) { $trans_tbl = get_html_translation_table (HTML_ENTITIES); $trans_tbl = array_flip ($trans_tbl); $ret = strtr ($string, $trans_tbl); return preg_replace('/\&\#([0-9]+)\;/me', "chr('\\1')",$ret); }Alans version didn't seem to work right. If you're having the same problem consider using this slightly modified version instead: function unhtmlentities ($string) { $trans_tbl = get_html_translation_table (HTML_ENTITIES); $trans_tbl = array_flip ($trans_tbl); $ret = strtr ($string, $trans_tbl); return preg_replace('/(\d+);/me', "chr('\\1')",$ret); }
鹏仔微信 15129739599 鹏仔QQ344225443 鹏仔前端 pjxi.com 共享博客 sharedbk.com
图片声明:本站部分配图来自网络。本站只作为美观性配图使用,无任何非法侵犯第三方意图,一切解释权归图片著作权方,本站不承担任何责任。如有恶意碰瓷者,必当奉陪到底严惩不贷!