htmlspecialchars() - 将特殊字符转换为 HTML 实体 - php 字符串函数

乐乐1年前 (2023-11-21)阅读数 16#技术干货

htmlspecialchars()

(PHP 4, PHP 5, PHP 7)

将特殊字符转换为 HTML 实体

说明

htmlspecialchars(string $string[,int $flags= ENT_COMPAT | ENT_HTML401[,string $encoding= ini_get("default_charset")[,bool $double_encode= TRUE]]]) : string

某类字符在 HTML 中有特殊用处，如需保持原意，需要用 HTML 实体来表达。本函数会返回字符转义后的表达。如需转换子字符串中所有关联的名称实体，使用htmlentities()代替本函数。

htmlspecialchars() - 将特殊字符转换为 HTML 实体 - php 字符串函数

如果传入字符的字符编码和最终的文档是一致的，则用函数处理的输入适合绝大多数 HTML 文档环境。然而，如果输入的字符编码和最终包含字符的文档是不一样的，想要保留字符（以数字或名称实体的形式），本函数以及htmlentities()（仅编码名称实体对应的子字符串）可能不够用。这种情况可以使用mb_encode_numericentity()代替。

执行转换

字符	替换后
&(&符号)	&
"(双引号)	"，除非设置了`ENT_NOQUOTES`
'(单引号)	设置了`ENT_QUOTES`后，'(如果是`ENT_HTML401`)，或者'(如果是`ENT_XML1`、`ENT_XHTML`或`ENT_HTML5`)。
(大于)	>

参数

$string

待转换的string。

$flags

位掩码，由以下某个或多个标记组成，设置转义处理细节、无效单元序列、文档类型。默认是ENT_COMPAT | ENT_HTML401。

有效的$flags常量

常量名称	描述
`ENT_COMPAT`	会转换双引号，不转换单引号。
`ENT_QUOTES`	既转换双引号也转换单引号。
`ENT_NOQUOTES`	单/双引号都不转换
`ENT_IGNORE`	静默丢弃无效的代码单元序列，而不是返回空字符串。不建议使用此标记，因为它»可能有安全影响。
`ENT_SUBSTITUTE`	替换无效的代码单元序列为 Unicode 代替符（Replacement Character）， U+FFFD (UTF-8)或者�(其他)，而不是返回空字符串。
`ENT_DISALLOWED`	为文档的无效代码点替换为 Unicode 代替符（Replacement Character）： U+FFFD (UTF-8)，或�（其他），而不是把它们留在原处。比如以下情况下就很有用：要保证 XML 文档嵌入额外内容时格式合法。
`ENT_HTML401`	以 HTML 4.01 处理代码。
`ENT_XML1`	以 XML 1 处理代码。
`ENT_XHTML`	以 XHTML 处理代码。
`ENT_HTML5`	以 HTML 5 处理代码。

$encoding

An optional argument defining the encoding used when converting characters.

If omitted, the default value of the$encodingvaries depending on the PHP version in use. In PHP 5.6 and later,thedefault_charsetconfiguration option is used as the default value. PHP 5.4 and 5.5 will useUTF-8as the default. Earlier versions of PHP useISO-8859-1.

Although this argument is technically optional, you are highly encouraged to specify the correct value for your code if you are using PHP 5.5 or earlier, or if yourdefault_charsetconfiguration option may be set incorrectly for the given input.

本函数使用效果上，如果$string对以下字符编码是有效的，ISO-8859-1、ISO-8859-15、UTF-8、cp866、cp1251、cp1252、KOI8-R将具有相同的效果。也就是说，在这些编码里，受htmlspecialchars()影响的字符会占据相同的位置。

支持以下字符集：

支持的字符集列表

字符集	别名	描述
ISO-8859-1	ISO8859-1	西欧，Latin-1
ISO-8859-5	ISO8859-5	Little used cyrillic charset (Latin/Cyrillic).
ISO-8859-15	ISO8859-15	西欧，Latin-9。增加欧元符号，法语和芬兰语字母在 Latin-1(ISO-8859-1)中缺失。
UTF-8		ASCII 兼容的多字节 8 位 Unicode。
cp866	ibm866, 866	DOS 特有的西里尔编码。本字符集在 4.3.2 版本中得到支持。
cp1251	Windows-1251, win-1251, 1251	Windows 特有的西里尔编码。本字符集在 4.3.2 版本中得到支持。
cp1252	Windows-1252, 1252	Windows 特有的西欧编码。
KOI8-R	koi8-ru, koi8r	俄语。本字符集在 4.3.2 版本中得到支持。
BIG5	950	繁体中文，主要用于中国台湾省。
GB2312	936	简体中文，中国国家标准字符集。
BIG5-HKSCS		繁体中文，附带香港扩展的 Big5 字符集。
Shift_JIS	SJIS, 932	日语
EUC-JP	EUCJP	日语
MacRoman		Mac OS 使用的字符串。
''		An empty string activates detection from script encoding (Zend multibyte),default_charsetand current locale (seenl_langinfo()andsetlocale()), in this order. Not recommended.

Note:其他字符集没有认可。将会使用默认编码并抛出异常。$double_encode

关闭$double_encode时，PHP 不会转换现有的 HTML 实体，默认是全部转换。

返回值

转换后的string。

如果指定的编码$encoding里，$string包含了无效的代码单元序列，没有设置ENT_IGNORE或者ENT_SUBSTITUTE标记的情况下，会返回空字符串。

更新日志

版本	说明
5.6.0	The default value for the$encodingparameter was changed to be the value of thedefault_charsetconfiguration option.
5.4.0	$encoding参数的默认值改成 UTF-8。
5.4.0	增加常量`ENT_SUBSTITUTE`、`ENT_DISALLOWED`、`ENT_HTML401`、`ENT_XML1`、`ENT_XHTML`、`ENT_HTML5`。
5.3.0	增加常量`ENT_IGNORE`。
5.2.3	增加参数$double_encode。

范例

Example #1htmlspecialchars()例子

注释

Note:

注意，本函数不会转换以上列表以外的实体。完整转换请参见htmlentities()。

Note:

如果$flags的设置模糊易混淆，将遵循以下规则：

当ENT_COMPAT、ENT_QUOTES、ENT_NOQUOTES都没设置，默认就是ENT_COMPAT。
如果设置不止一个ENT_COMPAT、ENT_QUOTES、ENT_NOQUOTES，优先级最高的是ENT_QUOTES，其次是ENT_COMPAT。
当ENT_HTML401、ENT_HTML5、ENT_XHTML、ENT_XML1都没设置，默认是ENT_HTML401。
如果设置不止一个ENT_HTML401、ENT_HTML5、ENT_XHTML、ENT_XML1，优先级最高的是ENT_HTML5其次是ENT_XHTML和ENT_HTML401。
如果设置不止一个ENT_DISALLOWED、ENT_IGNORE、ENT_SUBSTITUTE，优先级最高的是ENT_IGNORE，其次是ENT_SUBSTITUTE。

参见

get_html_translation_table()返回使用 htmlspecialchars 和 htmlentities 后的转换表
htmlspecialchars_decode()将特殊的 HTML 实体转换回普通字符
strip_tags()从字符串中去除 HTML 和 PHP 标记
htmlentities()将字符转换为 HTML 转义字符
nl2br()在字符串所有新行之前插入 HTML 换行标记

As of PHP 5.4 they changed default encoding from "ISO-8859-1" to "UTF-8". So if you get null from htmlspecialchars or htmlentities
where you have only set 

you can fix it by
 
On linux you can find the scripts you need to fix by
grep -Rl "htmlspecialchars\\ | htmlentities" /path/to/php/scripts/

Unfortunately, as far as I can tell, the PHP devs did not provide ANY way to set the default encoding used by htmlspecialchars() or htmlentities(), even though they changed the default encoding in PHP 5.4 (*golf clap for PHP devs*). To save someone the time of trying it, this does not work:

Eitherway none of these solutions are good practice and are not entirely unflawed. This function should simply never be used in such a fashion.
I hope this will prevent newbies using this function incorrectly (as they apparently do).

Problem
In many PHP legacy products the function htmlspecialchars($string) is used to convert characters like  and quotes a.s.o to HTML-entities. That avoids the interpretation of HTML Tags and asymmetric quote situations.
Since PHP 5.4 for $string in htmlspecialchars($string) utf8 characters are expected if no charset is defined explicitly as third parameter in the function. Legacy products are mostly in Latin1 (alias iso-8859-1) what makes the functions htmlspecialchars(), htmlentites() and html_entity_decode() to return empty strings if a special character, e. g. a German Umlaut, is present in $string:
PHP

鹏仔微信 15129739599 鹏仔QQ344225443 鹏仔前端 pjxi.com 共享博客 sharedbk.com

免责声明：我们致力于保护作者版权，注重分享，当前被刊用文章因无法核实真实出处，未能及时与作者取得联系，或有版权异议的，请联系管理员，我们会立即处理! 部分文章是来自自研大数据AI进行生成,内容摘自(百度百科,百度知道,头条百科,中国民法典,刑法,牛津词典,新华词典,汉语词典,国家院校,科普平台)等数据,内容仅供学习参考,不准确地方联系删除处理!邮箱：344225443@qq.com)

图片声明：本站部分配图来自网络。本站只作为美观性配图使用,无任何非法侵犯第三方意图,一切解释权归图片著作权方,本站不承担任何责任。如有恶意碰瓷者,必当奉陪到底严惩不贷!

内容声明：本文中引用的各种信息及资料（包括但不限于文字、数据、图表及超链接等）均来源于该信息及资料的相关主体（包括但不限于公司、媒体、协会等机构）的官方网站或公开发表的信息。部分内容参考包括:(百度百科,百度知道,头条百科,中国民法典,刑法,牛津词典,新华词典,汉语词典,国家院校,科普平台)等数据,内容仅供参考使用,不准确地方联系删除处理！本站为非盈利性质站点,本着为中国教育事业出一份力,发布内容不收取任何费用也不接任何广告!)