html_entity_decode() - 将HTML实体转换为相应的字符 - php 字符串函数

百变鹏仔1年前 (2023-11-21)阅读数 27#技术干货

html_entity_decode()

(PHP 4 >= 4.3.0, PHP 5, PHP 7)

将HTML实体转换为相应的字符

说明

html_entity_decode(string $string[,int $flags= ENT_COMPAT | ENT_HTML401[,string $encoding= ini_get("default_charset")]]): string

html_entity_decode()is the opposite ofhtmlentities()in that it converts HTML entities in the$stringto their corresponding characters.

More precisely, this function decodes all the entities(including all numeric entities)that a)are necessarily valid for the chosen document type — i.e., for XML, this function does not decode named entities that might be defined in some DTD — and b)whose character or characters are in the coded character set associated with the chosen encoding and are permitted in the chosen document type. All other entities are left as is.

参数

$string

The input string.

$flags

A bitmask of one or more of the following flags, which specify how to handle quotes and which document type to use. The default isENT_COMPAT | ENT_HTML401.

Available$flagsconstants

Constant Name	Description
`ENT_COMPAT`	Will convert double-quotes and leave single-quotes alone.
`ENT_QUOTES`	Will convert both double and single quotes.
`ENT_NOQUOTES`	Will leave both double and single quotes unconverted.
`ENT_HTML401`	Handle code as HTML 4.01.
`ENT_XML1`	Handle code as XML 1.
`ENT_XHTML`	Handle code as XHTML.
`ENT_HTML5`	Handle code as HTML 5.

$encoding

An optional argument defining the encoding used when converting characters.

If omitted, the default value of the$encodingvaries depending on the PHP version in use. In PHP 5.6 and later,thedefault_charsetconfiguration option is used as the default value. PHP 5.4 and 5.5 will useUTF-8as the default. Earlier versions of PHP useISO-8859-1.

Although this argument is technically optional, you are highly encouraged to specify the correct value for your code if you are using PHP 5.5 or earlier, or if yourdefault_charsetconfiguration option may be set incorrectly for the given input.

支持以下字符集：

支持的字符集列表

字符集	别名	描述
ISO-8859-1	ISO8859-1	西欧，Latin-1
ISO-8859-5	ISO8859-5	Little used cyrillic charset(Latin/Cyrillic).
ISO-8859-15	ISO8859-15	西欧，Latin-9。增加欧元符号，法语和芬兰语字母在 Latin-1(ISO-8859-1)中缺失。
UTF-8		ASCII 兼容的多字节 8 位 Unicode。
cp866	ibm866, 866	DOS 特有的西里尔编码。本字符集在 4.3.2 版本中得到支持。
cp1251	Windows-1251, win-1251, 1251	Windows 特有的西里尔编码。本字符集在 4.3.2 版本中得到支持。
cp1252	Windows-1252, 1252	Windows 特有的西欧编码。
KOI8-R	koi8-ru, koi8r	俄语。本字符集在 4.3.2 版本中得到支持。
BIG5	950	繁体中文，主要用于中国台湾省。
GB2312	936	简体中文，中国国家标准字符集。
BIG5-HKSCS		繁体中文，附带香港扩展的 Big5 字符集。
Shift_JIS	SJIS, 932	日语
EUC-JP	EUCJP	日语
MacRoman		Mac OS 使用的字符串。
''		An empty string activates detection from script encoding(Zend multibyte),default_charsetand current locale(seenl_langinfo()andsetlocale()), in this order. Not recommended.

Note:其他字符集没有认可。将会使用默认编码并抛出异常。

返回值

Returns the decoded string.

更新日志

版本	说明
5.6.0	The default value for the$encodingparameter was changed to be the value of thedefault_charsetconfiguration option.
5.4.0	Default encoding changed from ISO-8859-1 to UTF-8.
5.4.0	The constants`ENT_HTML401`,`ENT_XML1`,`ENT_XHTML`and`ENT_HTML5`were added.

范例

Decoding HTML entities

注释

Note:

You might wonder why trim(html_entity_decode(' ')); doesn't reduce the string to an empty string, that's because the ' ' entity is not ASCII code 32(which is stripped bytrim())but ASCII code 160(0xa0)in the default ISO 8859-1 encoding.

参见

htmlentities()将字符转换为 HTML 转义字符
htmlspecialchars()将特殊字符转换为 HTML 实体
get_html_translation_table()返回使用 htmlspecialchars 和 htmlentities 后的转换表
urldecode()解码已编码的 URL 字符串

If you need something that converts [0-9]+ entities to UTF-8, this is simple and works:

Use the following to decode all entities:

I've checked these special entities: 
- double quotes (")
- single quotes (' and ') 
- non printable chars (e.g. 
)
With other $flags some or all won't be decoded.
It seems that ENT_XML1 and ENT_XHTML are identical when decoding.

I wanted to use this function today and I found the documentation, especially about the flags, not particularly helpful.
Running the code below, for example, failed because the flag I used was the wrong one...
$string = 'Donna's Bakery';
$title = html_entity_decode($string, ENT_HTML401, 'UTF-8');
echo $title;
The correct flag to use in this case is ENT_QUOTES.
My understanding of the flag to use is the one that would correspond to the expected, converted outcome. So, ENT_QUOTES for a character that would be a single or double quote when converted... and so on.
Please help make the documentation a bit clearer.

This functionality is now implemented in the PEAR package PHP_Compat.
More information about using this function without upgrading your version of PHP can be found on the below link:
http://pear.php.net/package/PHP_Compat

The following function decodes named and numeric HTML entities and works on UTF-8. Requires iconv.
function decodeHtmlEnt($str) {
  $ret = html_entity_decode($str, ENT_COMPAT, 'UTF-8');
  $p2 = -1;
  for(;;) {
    $p = strpos($ret, '', $p2+1);
    if ($p === FALSE)
      break;
    $p2 = strpos($ret, ';', $p);
    if ($p2 === FALSE)
      break;
      
    if (substr($ret, $p+2, 1) == 'x')
      $char = hexdec(substr($ret, $p+3, $p2-$p-3));
    else
      $char = intval(substr($ret, $p+2, $p2-$p-2));
      
    //echo "$char\n";
    $newchar = iconv(
      'UCS-4', 'UTF-8',
      chr(($char>>24)&0xFF).chr(($char>>16)&0xFF).chr(($char>>8)&0xFF).chr($char&0xFF) 
    );
    //echo "$newchar

鹏仔微信 15129739599 鹏仔QQ344225443 鹏仔前端 pjxi.com 共享博客 sharedbk.com

免责声明：我们致力于保护作者版权，注重分享，当前被刊用文章因无法核实真实出处，未能及时与作者取得联系，或有版权异议的，请联系管理员，我们会立即处理! 部分文章是来自自研大数据AI进行生成,内容摘自(百度百科,百度知道,头条百科,中国民法典,刑法,牛津词典,新华词典,汉语词典,国家院校,科普平台)等数据,内容仅供学习参考,不准确地方联系删除处理!邮箱：344225443@qq.com)

图片声明：本站部分配图来自网络。本站只作为美观性配图使用,无任何非法侵犯第三方意图,一切解释权归图片著作权方,本站不承担任何责任。如有恶意碰瓷者,必当奉陪到底严惩不贷!

内容声明：本文中引用的各种信息及资料（包括但不限于文字、数据、图表及超链接等）均来源于该信息及资料的相关主体（包括但不限于公司、媒体、协会等机构）的官方网站或公开发表的信息。部分内容参考包括:(百度百科,百度知道,头条百科,中国民法典,刑法,牛津词典,新华词典,汉语词典,国家院校,科普平台)等数据,内容仅供参考使用,不准确地方联系删除处理！本站为非盈利性质站点,本着为中国教育事业出一份力,发布内容不收取任何费用也不接任何广告!)