
html_entity_decode() - 将HTML实体转换为相应的字符 - php 字符串函数

百变鹏仔1年前 (2023-11-21)阅读数 26#技术干货


(PHP 4 >= 4.3.0, PHP 5, PHP 7)



html_entity_decode(string $string[,int $flags= ENT_COMPAT | ENT_HTML401[,string $encoding= ini_get("default_charset")]]): string

html_entity_decode()is the opposite ofhtmlentities()in that it converts HTML entities in the$stringto their corresponding characters.

More precisely, this function decodes all the entities(including all numeric entities)that a)are necessarily valid for the chosen document type — i.e., for XML, this function does not decode named entities that might be defined in some DTD — and b)whose character or characters are in the coded character set associated with the chosen encoding and are permitted in the chosen document type. All other entities are left as is.



The input string.


A bitmask of one or more of the following flags, which specify how to handle quotes and which document type to use. The default isENT_COMPAT | ENT_HTML401.

Constant NameDescription
ENT_COMPATWill convert double-quotes and leave single-quotes alone.
ENT_QUOTESWill convert both double and single quotes.
ENT_NOQUOTESWill leave both double and single quotes unconverted.
ENT_HTML401Handle code as HTML 4.01.
ENT_XML1Handle code as XML 1.
ENT_XHTMLHandle code as XHTML.
ENT_HTML5Handle code as HTML 5.

An optional argument defining the encoding used when converting characters.

If omitted, the default value of the$encodingvaries depending on the PHP version in use. In PHP 5.6 and later,thedefault_charsetconfiguration option is used as the default value. PHP 5.4 and 5.5 will useUTF-8as the default. Earlier versions of PHP useISO-8859-1.

Although this argument is technically optional, you are highly encouraged to specify the correct value for your code if you are using PHP 5.5 or earlier, or if yourdefault_charsetconfiguration option may be set incorrectly for the given input.


ISO-8859-5ISO8859-5Little used cyrillic charset(Latin/Cyrillic).
ISO-8859-15ISO8859-15西欧,Latin-9。增加欧元符号,法语和芬兰语字母在 Latin-1(ISO-8859-1)中缺失。
UTF-8ASCII 兼容的多字节 8 位 Unicode。
cp866ibm866, 866DOS 特有的西里尔编码。本字符集在 4.3.2 版本中得到支持。
cp1251Windows-1251, win-1251, 1251Windows 特有的西里尔编码。本字符集在 4.3.2 版本中得到支持。
cp1252Windows-1252, 1252Windows 特有的西欧编码。
KOI8-Rkoi8-ru, koi8r俄语。本字符集在 4.3.2 版本中得到支持。
BIG5-HKSCS繁体中文,附带香港扩展的 Big5 字符集。
Shift_JISSJIS, 932日语
MacRomanMac OS 使用的字符串。
''An empty string activates detection from script encoding(Zend multibyte),default_charsetand current locale(seenl_langinfo()andsetlocale()), in this order. Not recommended.

html_entity_decode() - 将HTML实体转换为相应的字符 - php 字符串函数


Returns the decoded string.


5.6.0The default value for the$encodingparameter was changed to be the value of thedefault_charsetconfiguration option.
5.4.0Default encoding changed from ISO-8859-1 to UTF-8.
5.4.0The constantsENT_HTML401,ENT_XML1,ENT_XHTMLandENT_HTML5were added.


Decoding HTML entities



You might wonder why trim(html_entity_decode(' ')); doesn't reduce the string to an empty string, that's because the ' ' entity is not ASCII code 32(which is stripped bytrim())but ASCII code 160(0xa0)in the default ISO 8859-1 encoding.


  • htmlentities()将字符转换为 HTML 转义字符
  • htmlspecialchars()将特殊字符转换为 HTML 实体
  • get_html_translation_table()返回使用 htmlspecialchars 和 htmlentities 后的转换表
  • urldecode()解码已编码的 URL 字符串
If you need something that converts [0-9]+ entities to UTF-8, this is simple and works:
Use the following to decode all entities:

I've checked these special entities: 
- double quotes (")
- single quotes (' and ') 
- non printable chars (e.g. 
With other $flags some or all won't be decoded.
It seems that ENT_XML1 and ENT_XHTML are identical when decoding.
I wanted to use this function today and I found the documentation, especially about the flags, not particularly helpful.
Running the code below, for example, failed because the flag I used was the wrong one...
$string = 'Donna's Bakery';
$title = html_entity_decode($string, ENT_HTML401, 'UTF-8');
echo $title;
The correct flag to use in this case is ENT_QUOTES.
My understanding of the flag to use is the one that would correspond to the expected, converted outcome. So, ENT_QUOTES for a character that would be a single or double quote when converted... and so on.
Please help make the documentation a bit clearer.
This functionality is now implemented in the PEAR package PHP_Compat.
More information about using this function without upgrading your version of PHP can be found on the below link:
The following function decodes named and numeric HTML entities and works on UTF-8. Requires iconv.
function decodeHtmlEnt($str) {
  $ret = html_entity_decode($str, ENT_COMPAT, 'UTF-8');
  $p2 = -1;
  for(;;) {
    $p = strpos($ret, '', $p2+1);
    if ($p === FALSE)
    $p2 = strpos($ret, ';', $p);
    if ($p2 === FALSE)
    if (substr($ret, $p+2, 1) == 'x')
      $char = hexdec(substr($ret, $p+3, $p2-$p-3));
      $char = intval(substr($ret, $p+2, $p2-$p-2));
    //echo "$char\n";
    $newchar = iconv(
      'UCS-4', 'UTF-8',
    //echo "$newchar

鹏仔微信 15129739599 鹏仔QQ344225443 鹏仔前端 pjxi.com 共享博客 sharedbk.com

免责声明:我们致力于保护作者版权,注重分享,当前被刊用文章因无法核实真实出处,未能及时与作者取得联系,或有版权异议的,请联系管理员,我们会立即处理! 部分文章是来自自研大数据AI进行生成,内容摘自(百度百科,百度知道,头条百科,中国民法典,刑法,牛津词典,新华词典,汉语词典,国家院校,科普平台)等数据,内容仅供学习参考,不准确地方联系删除处理!邮箱:344225443@qq.com)

