preg_replace() - php 正则表达式（PCRE）

梵高1年前 (2023-11-21)阅读数 43#技术干货

文章标签数组

preg_replace()

(PHP 4, PHP 5, PHP 7)

执行一个正则表达式的搜索和替换

说明

preg_replace(mixed $pattern, mixed $replacement, mixed $subject[,int $limit= -1[,int &$count]]): mixed

搜索$subject中匹配$pattern的部分，以$replacement进行替换。

参数

$pattern

要搜索的模式。可以使一个字符串或字符串数组。

可以使用一些PCRE修饰符。

$replacement

用于替换的字符串或字符串数组。如果这个参数是一个字符串，并且$pattern是一个数组，那么所有的模式都使用这个字符串进行替换。如果$pattern和$replacement都是数组，每个$pattern使用$replacement中对应的元素进行替换。如果$replacement中的元素比$pattern中的少，多出来的$pattern使用空字符串进行替换。

$replacement中可以包含后向引用n或$n，语法上首选后者。每个这样的引用将被匹配到的第n个捕获子组捕获到的文本替换。n可以是0-99，0和$0代表完整的模式匹配文本。捕获子组的序号计数方式为：代表捕获子组的左括号从左到右，从1开始数。如果要在$replacement中使用反斜线，必须使用4个(""，译注：因为这首先是php的字符串，经过转义后，是两个，再经过正则表达式引擎后才被认为是一个原文反斜线)。

当在替换模式下工作并且后向引用后面紧跟着需要是另外一个数字(比如：在一个匹配模式后紧接着增加一个原文数字)，不能使用1这样的语法来描述后向引用。比如，11将会使preg_replace()不能理解你希望的是一个1后向引用紧跟一个原文1，还是一个11后向引用后面不跟任何东西。这种情况下解决方案是使用${1}1。这创建了一个独立的$1后向引用,一个独立的原文1。

当使用被弃用的e修饰符时,这个函数会转义一些字符(即：'、"、和 NULL)然后进行后向引用替换。当这些完成后请确保后向引用解析完后没有单引号或双引号引起的语法错误(比如：'strlen('$1')+strlen("$2")')。确保符合PHP的字符串语法，并且符合eval语法。因为在完成替换后，引擎会将结果字符串作为php代码使用eval方式进行评估并将返回值作为最终参与替换的字符串。

$subject

要进行搜索和替换的字符串或字符串数组。

如果$subject是一个数组，搜索和替换回在$subject的每一个元素上进行,并且返回值也会是一个数组。

$limit

每个模式在每个$subject上进行替换的最大次数。默认是-1(无限)。

$count

如果指定，将会被填充为完成的替换次数。

返回值

如果$subject是一个数组，preg_replace()返回一个数组，其他情况下返回一个字符串。

如果匹配被查找到，替换后的$subject被返回，其他情况下返回没有改变的$subject。如果发生错误，返回NULL。

错误／异常

PHP 5.5.0 起，传入"e"修饰符的时候，会产生一个E_DEPRECATED错误； PHP 7.0.0 起，会产生E_WARNING错误，同时"e"也无法起效。

更新日志

版本	说明
7.0.0	不再支持/e修饰符。请用preg_replace_callback()代替。
5.5.0	/e修饰符已经被弃用了。使用preg_replace_callback()代替。参见文档中PREG_REPLACE_EVAL关于安全风险的更多信息。
5.1.0	增加参数$count.

范例

使用后向引用紧跟数值原文

以上例程会输出：

April1,2003

preg_replace()中使用基于索引的数组

以上例程会输出：

The bear black slow jumps over the lazy dog.

对模式和替换内容按key进行排序我们可以得到期望的结果。

preg_replace() - php 正则表达式（PCRE）

以上例程会输出：

The slow black bear jumps over the lazy dog.

替换一些值

以上例程会输出：

$startDate = 5/27/1999

剥离空白字符

这个例子剥离多余的空白字符

使用参数$count

以上例程会输出：

xp***to
3

注释

Note:

当使用数组形式的$pattern和$replacement时,将会按照key在数组中出现的顺序进行处理.这不一定和数组的索引顺序一致.如果你期望使用索引对等方式用$replacement对$pattern进行替换,你可以在调用preg_replace()之前对两个数组各进行一次ksort()排序.

参见

PCRE 模式
preg_quote()转义正则表达式字符
preg_filter()执行一个正则表达式搜索和替换
preg_match()执行匹配正则表达式
preg_replace_callback()执行一个正则表达式搜索并且使用一个回调进行替换
preg_split()通过一个正则表达式分隔字符串
preg_last_error()返回最后一个PCRE正则执行产生的错误代码

Because i search a lot 4 this:
The following should be escaped if you are trying to match that character
\ ^ . $ | ( ) [ ]
* + ? { } ,
Special Character Definitions
\ Quote the next metacharacter
^ Match the beginning of the line
. Match any character (except newline)
$ Match the end of the line (or before newline at the end)
| Alternation
() Grouping
[] Character class
* Match 0 or more times
+ Match 1 or more times
? Match 1 or 0 times
{n} Match exactly n times
{n,} Match at least n times
{n,m} Match at least n but not more than m times
More Special Character Stuff
\t tab (HT, TAB)
\n newline (LF, NL)
\r return (CR)
\f form feed (FF)
\a alarm (bell) (BEL)
\e escape (think troff) (ESC)
\033 octal char (think of a PDP-11)
\x1B hex char
\c[ control char
\l lowercase next char (think vi)
\u uppercase next char (think vi)
\L lowercase till \E (think vi)
\U uppercase till \E (think vi)
\E end case modification (think vi)
\Q quote (disable) pattern metacharacters till \E
Even More Special Characters
\w Match a "word" character (alphanumeric plus "_")
\W Match a non-word character
\s Match a whitespace character
\S Match a non-whitespace character
\d Match a digit character
\D Match a non-digit character
\b Match a word boundary
\B Match a non-(word boundary)
\A Match only at beginning of string
\Z Match only at end of string, or before newline at the end
\z Match only at end of string
\G Match only where previous m//g left off (works only with /g)

Post slug generator, for creating clean urls from titles.
It works with many languages.

Example: post_slug(' -Lo#&@rem IPSUM //dolor-/sit - amet-/-consectetur! 12 -- ')
will output: lorem-ipsum-dolor-sit-amet-consectetur-12

If you want to catch characters, as well european, russian, chinese, japanese, korean of whatever, just :
- use mb_internal_encoding('UTF-8');
- use preg_replace('`...`u', '...', $string) with the u (unicode) modifier
For further information, the complete list of preg_* modifiers could be found at :
http://php.net/manual/en/reference.pcre.pattern.modifiers.php

Note that it is in most cases much more efficient to use preg_replace_callback(), with a named function or an anonymous function created with create_function(), instead of the /e modifier. When preg_replace() is called with the /e modifier, the interpreter must parse the replacement string into PHP code once for every replacement made, while preg_replace_callback() uses a function that only needs to be parsed once.

It may be useful to note that if you pass an associative array as the $replacement parameter, the keys are preserved.

If you want to replace only the n-th occurrence of $pattern, you can use this function:

this outputs |aa|b|cc|dd is the 4th|e|ff|gg|kkk| 
backreferences are accepted in $replacement

preg_replace (and other preg-functions) return null instead of a string when encountering problems you probably did not think about!
-------------------------
It may not be obvious to everybody that the function returns NULL if an error of any kind occurres. An error I happen to stumple about quite often was the back-tracking-limit:
http://de.php.net/manual/de/pcre.configuration.php
#ini.pcre.backtrack-limit
When working with HTML-documents and their parsing it happens that you encounter documents that have a length of over 100.000 characters and that may lead to certain regular-expressions to fail due the back-tracking-limit of above.
A regular-expression that is ungreedy ("U", http://de.php.net/manual/de/reference.pcre.pattern.modifiers.php) often does the job, but still: sometimes you just need a greedy regular expression working on long strings ...
Since, an unhandled return-value of NULL usually creates a consecutive error in the application with unwanted and unforeseen consequences, I found the following solution to be quite helpful and at least save the application from crashing:

You may or should also put a log-message or the sending of an email into the if-condition in order to get informed, once, one of your regular-expressions does not have the effect you desired it to have.

[Editor's note: in this case it would be wise to rely on the preg_quote() function instead which was added for this specific purpose]
If your replacement string has a dollar sign or a backslash. it may turn into a backreference accidentally! This will fix it.
I want to replace 'text' with '$12345' but this becomes a backreference to $12 (which doesn't exist) and then it prints the remaining '34'. The function down below will return a string that escapes the backreferences.
OUTPUT:
string(8) "some 345"
string(11) "some \12345"
string(8) "some 345"
string(11) "some $12345"

This code must convert numeric html entities to utf8. And it does with a little exception. It treats wrong codes starting with 
The reason is that code2utf will be called with leading zero, exactly what the pattern matches - code2utf(039).
And it does matter! PHP treats 039 as octal number.
Try 
Solution:

There seems to be some confusion over how greediness works. For those familiar with Regular Expressions in other languages, particularly Perl: it works like you would expect, and as documented. Greedy by default, un-greedy if you follow a quantifier with a question mark.
There is a PHP/PCRE-specific U pattern modifier that flips the greediness, so that quantifiers are by default un-greedy, and become greedy if you follow the quantifier with a question mark: http://www.php.net/manual/en/reference.pcre.pattern.modifiers.php
To make things clear, a series of examples:

Results in this:
Default, no ?: a bunch of stuff this that and more stuff with a second code block then extra at the end
Default, with ?: a bunch of stuff this that and more stuff with a second code block then extra at the end
U flag, no ?: a bunch of stuff this that and more stuff with a second code block then extra at the end
U flag, with ?: a bunch of stuff this that and more stuff with a second code block then extra at the end
As expected: greedy by default, ? inverts it to ungreedy. With the U flag, un-greedy by default, ? makes it greedy.

Warning: a common made mistake in trying to remove all characters except numbers and letters from a string, is to use code with a regex similar to preg_replace('[^A-Za-z0-9_]', '', ...). The output goes in an unexpected direction in case your input contains two double quotes.
echo preg_replace('[^A-Za-z0-9_]', '', 'D"usseldorfer H"auptstrasse')
D"usseldorfer H"auptstrasse
It is important to not forget a leading an trailing forward slash in the regex: 
echo preg_replace('/[^A-Za-z0-9_]/', '', 'D"usseldorfer H"auptstrasse')
Dusseldorfer Hauptstrasse
PS An alternative is to use preg_replace('/\W/', '', $t) for keeping all alpha numeric characters including underscores.

Take care when you try to strip whitespaces out of an UTF-8 text. Using something like:

brokes in my case the letter à which is hex c3a0. But a0 is a whitespace. So use 

to strip all spaces and tabs, or better, use a multibyte function like mb_ereg_replace.

If there's a chance your replacement text contains any strings such as "$0.95", you'll need to escape those $n backreferences:

Hi,
as I wasn't able to find another way to do this, I wrote a function converting any UTF-8 string into a correct NTFS filename (see http://en.wikipedia.org/wiki/Filename).

It converts all control characters and filename characters which are reserved by Windows ('\/:*?"') into an underscore.
This way you can safely create an NTFS filename out of any UTF-8 string.

$firstname = htmlspecialchars($_POST['campo']);
$firstname = preg_replace("/[^a-zA-Z0-9]/", "", $firstname, -1, $count_fn);
// $count_fn conta quantos caracteres foram mudados.
// $firstname variavel que captura o input

preg_replace to only show alpha numeric characters
$info = "The Development of code . http://www.";
$info = preg_replace("/[^a-zA-Z0-9]+/", "", $info);
echo $info;
OUTPUTS: TheDevelopmentofcodehttpwww
This is a good workable code
http://www.sioure.com

if your intention to code and decode mod_rewrite urls and handle it with php and mysql ,this should work
to convert to url
$url = preg_replace('/[^A-Za-z0-9_-]+/', '-', $string);
And to check in mysql with the url value,use the same expression discounting '-'.
first replace the url value with php using preg_replace and use with mysql REGEXP
$sql = "select * from table where fieldname_to_check REGEXP '".preg_replace("/-+/",'[^A-Za-z0-9_]+',$url)."'"

There seems to be some unexpected behavior when using the /m modifier when the line terminators are win32 or mac format.
If you have a string like below, and try to replace dots, the regex won't replace correctly:

The /m modifier doesn't seem to work properly when CRLFs or CRs are used. Make sure to convert line endings to LFs (*nix format) in your input string.

Also worth noting is that you can use array_keys()/array_values() with preg_replace like:

Why not offset parameter to replace the string? It would be helpful
example:
mixed preg_replace (mixed $pattern, mixed $replacement, mixed $subject [, int $limit = -1 [, int & $count [, int $offset = 0]]]) 
1 $pattern
2 $replacement 
3 $subject
4 $limit
5 $count 
6 $offset

A variable can handle a huge quantity of data but preg_replace can't.
Example :

$head can have the desired content, or be empty, depends on the length of $data.
For this application, just add :
$data = substr($data, 0, 4096);
before using preg_replace, and it will work fine.

This function will strip all the HTML-like content in a string.
I know you can find a lot of similar content on the web, but this one is simple, fast and robust. Don't simply use the built-in functions like strip_tags(), they dont work so good.
Careful however, this is not a correct validation of a string ; you should use additional functions like mysql_real_escape_string and filter_var, as well as custom tests before putting a submission into your database.

Hope this helps someone else out there trying to do the same thing :)

If you have issues where preg_replace returns an empty string, please take a look at these two ini parameters:
pcre.backtrack_limit
pcre.recursion_limit
The default is set to 100K. If your buffer is larger than this, look to increase these two values.

To covert a string to SEO friendly, do this:

This will print: this-is-the-string-to-be-made-seo-friendly

The function seofy () creates a SEO friendly version from a string. Umlauts and other letters not contained in the ASCII character set are either reduced to the basic form equivalent (e. g.: é becomes e and ú wid u) or completely converted (e. g. ß becomes ss and ü becomes ue).
On the one hand this succeeds because the php function preg_replace performs the replacement by means of unicode - Unicode Regular Expressions - and on the other hand because an approximate translation is attempted by means of the php function iconv with the TRANSLIT option.
Quote php. net about iconv and TRANSLIT:
"If you append the character string //TRANSLIT to out_charset, transliteration is activated. This means that a character that cannot be displayed in the target character set can be approximated with one or more similar-looking characters.[…]"
Source:
https://blog.ueffing.net/post/2016/03/14/string-seo-optimieren-creating-seo-friendly-url/

Hello there, 
I would like to share a regex (PHP) sniplet of code 
I wrote (2012) for myself it is also being used in the 
Yerico sriptmerge plugin for joomla marked as simple code.. 
To compress javascript code and remove all comments from it. 
It also works with mootools It is fast... 
(in compairison to other PHP solutions) and does not damage the 
Javascript it self and it resolves lots of comment removal isseus.
//START Remove comments.
  $buffer = str_replace('/// ', '///', $buffer);    
  $buffer = str_replace(',//', ', //', $buffer);
  $buffer = str_replace('{//', '{ //', $buffer);
  $buffer = str_replace('}//', '} //', $buffer);
  $buffer = str_replace('*//*', '*/ /*', $buffer);
  $buffer = str_replace('/**/', '/* */', $buffer);
  $buffer = str_replace('*///', '*/ //', $buffer);
  $buffer = preg_replace("/\/\/.*\n\/\/.*\n/", "", $buffer);
  $buffer = preg_replace("/\s\/\/\".*/", "", $buffer);
  $buffer = preg_replace("/\/\/\n/", "\n", $buffer);
  $buffer = preg_replace("/\/\/\s.*.\n/", "\n \n", $buffer);
  $buffer = preg_replace('/\/\/w[^w].*/', '', $buffer);
  $buffer = preg_replace('/\/\/s[^s].*/', '', $buffer);
  $buffer = preg_replace('/\/\/\*\*\*.*/', '', $buffer);
  $buffer = preg_replace('/\/\/\*\s\*\s\*.*/', '', $buffer);
  $buffer = preg_replace('/[^\*]\/\/[*].*/', '', $buffer);
  $buffer = preg_replace('/([;])\/\/.*/', '$1', $buffer);
  $buffer = preg_replace('/((\r)|(\n)|(\R)|([^0]1)|([^\"]\s*\-))(\/\/)(.*)/', '$1', $buffer);
  $buffer = preg_replace("/([^\*])[\/]+\/\*.*[^a-zA-Z0-9\s\-=+\|!@#$%^&()`~\[\]{};:\'\",?]/", "$1", $buffer);
 $buffer = preg_replace("/\/\*/", "\n/*dddpp", $buffer);
 $buffer = preg_replace('/((\{\s*|:\s*)[\"\']\s*)(([^\{\};\"\']*)dddpp)/','$1$4', $buffer);
 $buffer = preg_replace("/\*\//", "xxxpp*/\n", $buffer);
 $buffer = preg_replace('/((\{\s*|:\s*|\[\s*)[\"\']\s*)(([^\};\"\']*)xxxpp)/','$1$4', $buffer);
 $buffer = preg_replace('/([\"\'])\s*\/\*/', '$1/*', $buffer);
 $buffer = preg_replace('/(\n)[^\'"]?\/\*dddpp.*?xxxpp\*\//s', '', $buffer);
 $buffer = preg_replace('/\n\/\*dddpp([^\s]*)/', '$1', $buffer);
 $buffer = preg_replace('/xxxpp\*\/\n([^\s]*)/', '*/$1', $buffer);
 $buffer = preg_replace('/xxxpp\*\/\n([\"])/', '$1', $buffer);
 $buffer = preg_replace('/(\*)\n*\s*(\/\*)\s*/', '$1$2$3', $buffer);
 $buffer = preg_replace('/(\*\/)\s*(\")/', '$1$2', $buffer);
 $buffer = preg_replace('/\/\*dddpp(\s*)/', '/*', $buffer);
 $buffer = preg_replace('/\n\s*\n/', "\n", $buffer);
 $buffer = preg_replace("/([^\'\"]\s*)(?!()).*/","$1", $buffer);
 $buffer = preg_replace('/([^\n\w\-=+\|!@#$%^&*()`~\[\]{};:\'",\/?\\\\])(\/\/)(.*)/', '$1', $buffer);
//END Remove comments.  
//START Remove all whitespaces
 $buffer = preg_replace('/\s+/', ' ', $buffer);
 $buffer = preg_replace('/\s*(?:(?=[=\-\+\|%&\*\)\[\]\{\};:\,\.\\!\@\#\^`~]))/', '', $buffer);
 $buffer = preg_replace('/(?:(?

鹏仔微信 15129739599 鹏仔QQ344225443 鹏仔前端 pjxi.com 共享博客 sharedbk.com

免责声明：我们致力于保护作者版权，注重分享，当前被刊用文章因无法核实真实出处，未能及时与作者取得联系，或有版权异议的，请联系管理员，我们会立即处理! 部分文章是来自自研大数据AI进行生成,内容摘自(百度百科,百度知道,头条百科,中国民法典,刑法,牛津词典,新华词典,汉语词典,国家院校,科普平台)等数据,内容仅供学习参考,不准确地方联系删除处理!邮箱：344225443@qq.com)

图片声明：本站部分配图来自网络。本站只作为美观性配图使用,无任何非法侵犯第三方意图,一切解释权归图片著作权方,本站不承担任何责任。如有恶意碰瓷者,必当奉陪到底严惩不贷!

内容声明：本文中引用的各种信息及资料（包括但不限于文字、数据、图表及超链接等）均来源于该信息及资料的相关主体（包括但不限于公司、媒体、协会等机构）的官方网站或公开发表的信息。部分内容参考包括:(百度百科,百度知道,头条百科,中国民法典,刑法,牛津词典,新华词典,汉语词典,国家院校,科普平台)等数据,内容仅供参考使用,不准确地方联系删除处理！本站为非盈利性质站点,本着为中国教育事业出一份力,发布内容不收取任何费用也不接任何广告!)