html_entity_decode() - 将HTML实体转换为相应的字符 - php 字符串函数
html_entity_decode()
(PHP 4 >= 4.3.0, PHP 5, PHP 7)
将HTML实体转换为相应的字符
说明
html_entity_decode(string $string[,int $flags= ENT_COMPAT | ENT_HTML401[,string $encoding= ini_get("default_charset")]]): stringhtml_entity_decode()is the opposite ofhtmlentities()in that it converts HTML entities in the$stringto their corresponding characters.
More precisely, this function decodes all the entities(including all numeric entities)that a)are necessarily valid for the chosen document type — i.e., for XML, this function does not decode named entities that might be defined in some DTD — and b)whose character or characters are in the coded character set associated with the chosen encoding and are permitted in the chosen document type. All other entities are left as is.
参数
$stringThe input string.
$flagsA bitmask of one or more of the following flags, which specify how to handle quotes and which document type to use. The default isENT_COMPAT | ENT_HTML401.
Constant Name | Description |
---|---|
ENT_COMPAT | Will convert double-quotes and leave single-quotes alone. |
ENT_QUOTES | Will convert both double and single quotes. |
ENT_NOQUOTES | Will leave both double and single quotes unconverted. |
ENT_HTML401 | Handle code as HTML 4.01. |
ENT_XML1 | Handle code as XML 1. |
ENT_XHTML | Handle code as XHTML. |
ENT_HTML5 | Handle code as HTML 5. |
An optional argument defining the encoding used when converting characters.
If omitted, the default value of the$encodingvaries depending on the PHP version in use. In PHP 5.6 and later,thedefault_charsetconfiguration option is used as the default value. PHP 5.4 and 5.5 will useUTF-8as the default. Earlier versions of PHP useISO-8859-1.
Although this argument is technically optional, you are highly encouraged to specify the correct value for your code if you are using PHP 5.5 or earlier, or if yourdefault_charsetconfiguration option may be set incorrectly for the given input.
支持以下字符集:
字符集 | 别名 | 描述 |
---|---|---|
ISO-8859-1 | ISO8859-1 | 西欧,Latin-1 |
ISO-8859-5 | ISO8859-5 | Little used cyrillic charset(Latin/Cyrillic). |
ISO-8859-15 | ISO8859-15 | 西欧,Latin-9。增加欧元符号,法语和芬兰语字母在 Latin-1(ISO-8859-1)中缺失。 |
UTF-8 | ASCII 兼容的多字节 8 位 Unicode。 | |
cp866 | ibm866, 866 | DOS 特有的西里尔编码。本字符集在 4.3.2 版本中得到支持。 |
cp1251 | Windows-1251, win-1251, 1251 | Windows 特有的西里尔编码。本字符集在 4.3.2 版本中得到支持。 |
cp1252 | Windows-1252, 1252 | Windows 特有的西欧编码。 |
KOI8-R | koi8-ru, koi8r | 俄语。本字符集在 4.3.2 版本中得到支持。 |
BIG5 | 950 | 繁体中文,主要用于中国台湾省。 |
GB2312 | 936 | 简体中文,中国国家标准字符集。 |
BIG5-HKSCS | 繁体中文,附带香港扩展的 Big5 字符集。 | |
Shift_JIS | SJIS, 932 | 日语 |
EUC-JP | EUCJP | 日语 |
MacRoman | Mac OS 使用的字符串。 | |
'' | An empty string activates detection from script encoding(Zend multibyte),default_charsetand current locale(seenl_langinfo()andsetlocale()), in this order. Not recommended. |
返回值
Returns the decoded string.
更新日志
版本 | 说明 |
---|---|
5.6.0 | The default value for the$encodingparameter was changed to be the value of thedefault_charsetconfiguration option. |
5.4.0 | Default encoding changed from ISO-8859-1 to UTF-8. |
5.4.0 | The constantsENT_HTML401 ,ENT_XML1 ,ENT_XHTML andENT_HTML5 were added. |
范例
Decoding HTML entities
注释
Note:
You might wonder why trim(html_entity_decode(' ')); doesn't reduce the string to an empty string, that's because the ' ' entity is not ASCII code 32(which is stripped bytrim())but ASCII code 160(0xa0)in the default ISO 8859-1 encoding.
参见
htmlentities()
将字符转换为 HTML 转义字符htmlspecialchars()
将特殊字符转换为 HTML 实体get_html_translation_table()
返回使用 htmlspecialchars 和 htmlentities 后的转换表urldecode()
解码已编码的 URL 字符串
If you need something that converts [0-9]+ entities to UTF-8, this is simple and works:
Use the following to decode all entities: I've checked these special entities: - double quotes (") - single quotes (' and ') - non printable chars (e.g. ) With other $flags some or all won't be decoded. It seems that ENT_XML1 and ENT_XHTML are identical when decoding.
I wanted to use this function today and I found the documentation, especially about the flags, not particularly helpful. Running the code below, for example, failed because the flag I used was the wrong one... $string = 'Donna's Bakery'; $title = html_entity_decode($string, ENT_HTML401, 'UTF-8'); echo $title; The correct flag to use in this case is ENT_QUOTES. My understanding of the flag to use is the one that would correspond to the expected, converted outcome. So, ENT_QUOTES for a character that would be a single or double quote when converted... and so on. Please help make the documentation a bit clearer.
This functionality is now implemented in the PEAR package PHP_Compat. More information about using this function without upgrading your version of PHP can be found on the below link: http://pear.php.net/package/PHP_Compat
The following function decodes named and numeric HTML entities and works on UTF-8. Requires iconv. function decodeHtmlEnt($str) { $ret = html_entity_decode($str, ENT_COMPAT, 'UTF-8'); $p2 = -1; for(;;) { $p = strpos($ret, '', $p2+1); if ($p === FALSE) break; $p2 = strpos($ret, ';', $p); if ($p2 === FALSE) break; if (substr($ret, $p+2, 1) == 'x') $char = hexdec(substr($ret, $p+3, $p2-$p-3)); else $char = intval(substr($ret, $p+2, $p2-$p-2)); //echo "$char\n"; $newchar = iconv( 'UCS-4', 'UTF-8', chr(($char>>24)&0xFF).chr(($char>>16)&0xFF).chr(($char>>8)&0xFF).chr($char&0xFF) ); //echo "$newchar
鹏仔微信 15129739599 鹏仔QQ344225443 鹏仔前端 pjxi.com 共享博客 sharedbk.com
图片声明:本站部分配图来自网络。本站只作为美观性配图使用,无任何非法侵犯第三方意图,一切解释权归图片著作权方,本站不承担任何责任。如有恶意碰瓷者,必当奉陪到底严惩不贷!