get_html_translation_table() - php 字符串函数

乐乐1年前 (2023-11-21)阅读数 24#技术干货

get_html_translation_table()

(PHP 4, PHP 5, PHP 7)

返回使用htmlspecialchars()和htmlentities()后的转换表

说明

get_html_translation_table([int $table= HTML_SPECIALCHARS[,int $flags= ENT_COMPAT | ENT_HTML401[,string $encoding= 'UTF-8']]]) : array

get_html_translation_table()将返回htmlspecialchars()和htmlentities()处理后的转换表。

Note:

特殊字符可以使用多种转换方式。例如："可以被转换成","或者".get_html_translation_table()返回其中最常用的。

参数

$table

有两个新的常量(HTML_ENTITIES,HTML_SPECIALCHARS)允许你指定你想要的表。

$flags

get_html_translation_table() - php 字符串函数

A bitmask of one or more of the following flags, which specify which quotes the table will contain as well as which document type the table is for. The default isENT_COMPAT | ENT_HTML401.

Available$flagsconstants

Constant Name	Description
`ENT_COMPAT`	Table will contain entities for double-quotes, but not for single-quotes.
`ENT_QUOTES`	Table will contain entities for both double and single quotes.
`ENT_NOQUOTES`	Table will neither contain entities for single quotes nor for double quotes.
`ENT_HTML401`	Table for HTML 4.01.
`ENT_XML1`	Table for XML 1.
`ENT_XHTML`	Table for XHTML.
`ENT_HTML5`	Table for HTML 5.

$encoding

Encoding to use. If omitted, the default value for this argument is ISO-8859-1 in versions of PHP prior to 5.4.0, and UTF-8 from PHP 5.4.0 onwards.

支持以下字符集：

支持的字符集列表

字符集	别名	描述
ISO-8859-1	ISO8859-1	西欧，Latin-1
ISO-8859-5	ISO8859-5	Little used cyrillic charset (Latin/Cyrillic).
ISO-8859-15	ISO8859-15	西欧，Latin-9。增加欧元符号，法语和芬兰语字母在 Latin-1(ISO-8859-1)中缺失。
UTF-8		ASCII 兼容的多字节 8 位 Unicode。
cp866	ibm866, 866	DOS 特有的西里尔编码。本字符集在 4.3.2 版本中得到支持。
cp1251	Windows-1251, win-1251, 1251	Windows 特有的西里尔编码。本字符集在 4.3.2 版本中得到支持。
cp1252	Windows-1252, 1252	Windows 特有的西欧编码。
KOI8-R	koi8-ru, koi8r	俄语。本字符集在 4.3.2 版本中得到支持。
BIG5	950	繁体中文，主要用于中国台湾省。
GB2312	936	简体中文，中国国家标准字符集。
BIG5-HKSCS		繁体中文，附带香港扩展的 Big5 字符集。
Shift_JIS	SJIS, 932	日语
EUC-JP	EUCJP	日语
MacRoman		Mac OS 使用的字符串。
''		An empty string activates detection from script encoding (Zend multibyte),default_charsetand current locale (seenl_langinfo()andsetlocale()), in this order. Not recommended.

Note:其他字符集没有认可。将会使用默认编码并抛出异常。

返回值

将转换表作为一个数组返回。

更新日志

版本	说明
5.4.0	The default value for the$encodingparameter was changed to UTF-8.
5.4.0	The constants`ENT_HTML401`,`ENT_XML1`,`ENT_XHTML`and`ENT_HTML5`were added.
5.3.4	The$encodingparameter was added.

范例

Translation Table Example

以上例程的输出类似于：

array(1510) {
  ["
"]=>
  string(9) "&NewLine;"
  ["!"]=>
  string(6) "&excl;"
  ["""]=>
  string(6) """
  ["#"]=>
  string(5) "&num;"
  ["$"]=>
  string(8) "&dollar;"
  ["%"]=>
  string(8) "&percnt;"
  ["&"]=>
  string(5) "&"
  ["'"]=>
  string(6) "'"
  // ...
}

参见

htmlspecialchars()将特殊字符转换为 HTML 实体
htmlentities()将字符转换为 HTML 转义字符
html_entity_decode()Convert HTML entities to their corresponding characters

Be careful using get_html_translation_table() in a loop, as it's very slow.

The fact that MS-word and some other sources use CP-1252, and that it is so close to Latin1 ('ISO-8859-1') causes a lot of confusion. What confused me the most was finding that mySQL uses CP-1252 by default.
You may run into trouble if you find yourself tempted to do something like this:

Don't do it. DON'T DO IT!
You can use:

or just convert directly:

But your web page is probably encoded UTF-8, and you probably don't really want CP-1252 text flying around, so fix the character encoding first:

Not sure what's going on here but I've run into a problem that others might face as well...

returns the single quote ' as being equal to ' while

returns it as being equal to '
I've had to do a specific string replacement for the time being... Not sure if it's an issue with the function or the array manipulation.
-Pat

I wrote a quick little function for converting something like '·' into '·':
$to_convert = '·'; 
$table = get_html_translation_table(HTML_ENTITIES);
$equiv = ''.ord(array_search($to_convert,$table)).';';

to display the mapping on a webpage no matter what the server encoding is, this can be used
 echo "\n";
 echo htmlentities(print_r((get_html_translation_table(HTML_SPECIALCHARS)), true));
 echo htmlentities(print_r((get_html_translation_table(HTML_ENTITIES)), true));
since get_html_translation_table() actually gives the special chars in iso-8859-1 (Latin-1) encoding, so to see the tables correctly using
 print_r(get_html_translation_table(HTML_ENTITIES));
your server needs to give a HTTP header as iso-8859-1, unless you use header() or manually set the browser's encoding setting to iso-8859-1. And you need to view the source of the page to see the mapping. (except English version of IE 7 outputs the page source as iso-8859-1 anyway).
get_html_translation_table
It works only with the first 256 Codepositions.
For Higher Positions, for Example ф
(a kyrillic Letter) it shows the same.
without heavy scientific analysis, this seems to work as a quick fix to making text originating from a Microsoft Word document display as HTML:
htmlentities includes htmlspecialchars, so here's how to convert an UTF-8 string :
htmlentities($string, ENT_QUOTES, 'UTF-8');
If you have troubles (like me) getting data from ISO-8859-1 encoded forms where user copy and paste from word, this routine could be useful.
It adds to the standard get_html_translation_table the codes of the characters usually M$ Word replacs into typed text.
Otherwise those characters would never be displayed correctly in html output.
function get_html_translation_table_CP1252() {
  $trans = get_html_translation_table(HTML_ENTITIES);
  $trans[chr(130)] = '‚';  // Single Low-9 Quotation Mark
  $trans[chr(131)] = 'ƒ';  // Latin Small Letter F With Hook
  $trans[chr(132)] = '„';  // Double Low-9 Quotation Mark
  $trans[chr(133)] = '…';  // Horizontal Ellipsis
  $trans[chr(134)] = '†';  // Dagger
  $trans[chr(135)] = '‡';  // Double Dagger
  $trans[chr(136)] = 'ˆ';  // Modifier Letter Circumflex Accent
  $trans[chr(137)] = '‰';  // Per Mille Sign
  $trans[chr(138)] = 'Š';  // Latin Capital Letter S With Caron
  $trans[chr(139)] = '‹';  // Single Left-Pointing Angle Quotation Mark
  $trans[chr(140)] = 'Œ  ';  // Latin Capital Ligature OE
  $trans[chr(145)] = '‘';  // Left Single Quotation Mark
  $trans[chr(146)] = '’';  // Right Single Quotation Mark
  $trans[chr(147)] = '“';  // Left Double Quotation Mark
  $trans[chr(148)] = '”';  // Right Double Quotation Mark
  $trans[chr(149)] = '•';  // Bullet
  $trans[chr(150)] = '–';  // En Dash
  $trans[chr(151)] = '—';  // Em Dash
  $trans[chr(152)] = '˜';  // Small Tilde
  $trans[chr(153)] = '™';  // Trade Mark Sign
  $trans[chr(154)] = 'š';  // Latin Small Letter S With Caron
  $trans[chr(155)] = '›';  // Single Right-Pointing Angle Quotation Mark
  $trans[chr(156)] = 'œ';  // Latin Small Ligature OE
  $trans[chr(159)] = 'Ÿ';  // Latin Capital Letter Y With Diaeresis
  ksort($trans);
  return $trans;
}
If you want to display special HTML entities in a web browser, you can use the following code:

If you don't, the key name of each element will appear to be the same as the element content itself, making it look mighty stupid. ;)
I found this useful in converting latin characters
If you want to decode all those { symbols as well.... 
function unhtmlentities ($string) {
  $trans_tbl = get_html_translation_table (HTML_ENTITIES);
  $trans_tbl = array_flip ($trans_tbl);
  $ret = strtr ($string, $trans_tbl);
  return preg_replace('/\&\#([0-9]+)\;/me', 
    "chr('\\1')",$ret);
}
Alans version didn't seem to work right. If you're having the same problem consider using this slightly modified version instead:
function unhtmlentities ($string) {
  $trans_tbl = get_html_translation_table (HTML_ENTITIES);
  $trans_tbl = array_flip ($trans_tbl);
  $ret = strtr ($string, $trans_tbl);
  return preg_replace('/(\d+);/me', 
   "chr('\\1')",$ret);
}

鹏仔微信 15129739599 鹏仔QQ344225443 鹏仔前端 pjxi.com 共享博客 sharedbk.com

免责声明：我们致力于保护作者版权，注重分享，当前被刊用文章因无法核实真实出处，未能及时与作者取得联系，或有版权异议的，请联系管理员，我们会立即处理! 部分文章是来自自研大数据AI进行生成,内容摘自(百度百科,百度知道,头条百科,中国民法典,刑法,牛津词典,新华词典,汉语词典,国家院校,科普平台)等数据,内容仅供学习参考,不准确地方联系删除处理!邮箱：344225443@qq.com)

图片声明：本站部分配图来自网络。本站只作为美观性配图使用,无任何非法侵犯第三方意图,一切解释权归图片著作权方,本站不承担任何责任。如有恶意碰瓷者,必当奉陪到底严惩不贷!

内容声明：本文中引用的各种信息及资料（包括但不限于文字、数据、图表及超链接等）均来源于该信息及资料的相关主体（包括但不限于公司、媒体、协会等机构）的官方网站或公开发表的信息。部分内容参考包括:(百度百科,百度知道,头条百科,中国民法典,刑法,牛津词典,新华词典,汉语词典,国家院校,科普平台)等数据,内容仅供参考使用,不准确地方联系删除处理！本站为非盈利性质站点,本着为中国教育事业出一份力,发布内容不收取任何费用也不接任何广告!)